Setup of Datalake (Hadoop and Kafka) from Different Sources

Project Description:

This project is about creating a datalake in Hadoop and real time data transformation in Kafka. The souces of data were Oracle, MySQL and Mongo. The source data was highly transactional and with huge history.

Project Details:

Number of databases : 50+ (Oracle, MySQL and Mongo included)
Data Size: 600+ TB
Team size: 40+

Methodology:

The data was streamed real time into Kafka from different sources and the portion of staging data and final output was taken into Hadoop. This was further processed real time and taken to cloud for final consumption.