top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Hadoop as a substitue for ETL tools like SSIS, Informatica?

+1 vote
470 views

Can we use Hadoop as a substitue for ETL tools like Informatica for ETL processes ?

posted Jun 12, 2014 by Amit Sharma

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

2 Answers

0 votes

The power of Hadoop comes from the synergy or MR and HDFS, moving the compute close to the data. When you're talking about using Hadoop for ETL from OLTP relational tables to a DW then Hadoop will have to connect, extract data, and do the upload. Having a cluster of workers pounding the OLTP database to extract data will help little your ETL process. Even when your T phase is complex, is seldom the case that is even a blimp on the radar compared with the E of extracting from relational DB.

The more complex, IO intensive and not relational tables dependent your transformation is, the better the case for Hadoop.

Hadoop would be an obvious choice if the data would be already in HDFS. With the data located in a central RDBMS, you'll need to prove the case why Hadoop would/could help.

answer Jun 12, 2014 by Shatark Bajpai
0 votes

Approach in which data is extracted from the sources, loaded into the target database, and then transformed and integrated into the desired format. All the heavy data processing takes place inside the target database. Hadoop is right choice here to have as target database, Good in processing heavy data as long as we are having good reader for each different file format.

answer Jun 13, 2014 by Shweta Singh
Similar Questions
+1 vote

What are the ares to be considered for moving traditional BI which involves ETL loads (Tools used Informatica) to Hadoop Ecosystem?

+1 vote

I'm trying to implement security on my hadoop data. I'm using Cloudera hadoop and looking for the following.

  1. ROLE BASED AUTHORIZATION AND AUTHENTICATION

  2. ENCRYPTION ON DATA RESIDING IN HDFS

I have looked into Kerboroes but it doesn't provide encryption for data already residing in HDFS. Are there any other security tools i can go for? has anyone done above two security features in cloudera hadoop.

+3 votes

I am trying to access a hadoop 1 installation via the hadoop 2.2.0 command line tools. I am wondering if this is possible at all?

From hadoop 1 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
Found 2 items
drwxr-xr-x - cs supergroup 0 2014-02-01 08:18 /tmp
drwxr-xr-x - cs supergroup 0 2014-02-01 08:19 /user

From hadoop 2.2.0 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
ls: Failed on local exception: java.io.EOFException; Host Details : 
local host is: "i7/127.0.1.1"; destination host is: "localhost":9000;

I am trying to find this information via a web-search, but up to now no success.

...