Hadoop as a substitue for ETL tools like SSIS, Informatica?

+1 vote

Can we use Hadoop as a substitue for ETL tools like Informatica for ETL processes ?

posted Jun 12, 2014 by Amit Sharma

2 Answers

0 votes

The power of Hadoop comes from the synergy or MR and HDFS, moving the compute close to the data. When you're talking about using Hadoop for ETL from OLTP relational tables to a DW then Hadoop will have to connect, extract data, and do the upload. Having a cluster of workers pounding the OLTP database to extract data will help little your ETL process. Even when your T phase is complex, is seldom the case that is even a blimp on the radar compared with the E of extracting from relational DB.

The more complex, IO intensive and not relational tables dependent your transformation is, the better the case for Hadoop.

Hadoop would be an obvious choice if the data would be already in HDFS. With the data located in a central RDBMS, you'll need to prove the case why Hadoop would/could help.

answer Jun 12, 2014 by Shatark Bajpai
0 votes

Approach in which data is extracted from the sources, loaded into the target database, and then transformed and integrated into the desired format. All the heavy data processing takes place inside the target database. Hadoop is right choice here to have as target database, Good in processing heavy data as long as we are having good reader for each different file format.

answer Jun 13, 2014 by Shweta Singh
