top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

What skills to Learn to become Hadoop Admin

+2 votes
189 views

I would like to enter into Big Data world as Hadoop Admin and I have setup 7 nodes cluster using Ambari, Cloudera Manager and Apache Hadoop.I have installed the services like hive, oozie, zookeeper etc.

I have done a web log integration using flume and twitter sentiment analysis. I wanted to understand what are the other skills I should learn ?

posted Mar 7, 2015 by Sridharan

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+1 vote

I would like to know if you have any examples or tutorials where I can learn hadoop mapReduce on mongodb in java?

+1 vote

Assume I have a machine on the same network as a hadoop 2 cluster but separate from it.

My understanding is that by setting certain elements of the config file or local xml files to point to the cluster I can launch a job without having to log into the cluster, move my jar to hdfs and start the job from the clusters hadoop machine.

Does this work? What Parameters need I sat? Where is the jar file? What issues would I see if the machine is running Windows with cygwin installed?

0 votes

I want to ask, what's the best way implementing a Job which is importing files into the HDFS?

I have an external System offering data accessible through a Rest API. My goal is to have a job running in Hadoop which is periodical (maybe started by chron?) looking into the Rest API if new data is available.

It would be nice if also this job could run on multiple data nodes. But in difference to all the MapReduce examples I found, is my job looking for new Data or changed data from an external interface and compares the data with existing one.

This is a conceptual example of the job:

  • The job ask the Rest API if there are new files
  • if so, the job imports the first file in the list
  • look if the file already exits

  • if not, the job imports the file

  • if yes, the job compares the data with the data already stored

  • if changed the job updates the file

  • if more file exits the job continues with 2 -

  • otherwise ends.

Can anybody give me a little help how to start (its my first job I write...) ?

...