What configuration parameters cause a Hadoop 2.x job to run on the cluster?

852 views

Assume I have a machine on the same network as a hadoop 2 cluster but separate from it.

My understanding is that by setting certain elements of the config file or local xml files to point to the cluster I can launch a job without having to log into the cluster, move my jar to hdfs and start the job from the clusters hadoop machine.

Does this work? What Parameters need I sat? Where is the jar file? What issues would I see if the machine is running Windows with cygwin installed?

closed with the note: Problem Solved

posted Apr 25, 2014 by Luv Kumar

Looking for an answer? Promote on:

What version of Hadoop you are using? (YARN or no YARN)

To answer your question; Yes its possible and simple. All you need to to is to have Hadoop JARs on the classpath with relevant configuration files on the same classpath pointing to the Hadoop cluster. Most often people simply copy core-site.xml, yarn-site.xml etc from the actual cluster to the application classpath and then you can run it straight from IDE.

Not a windows user so not sure about that second part of the question.

commented Apr 25, 2014 by anonymous

Thank you for your answer
1) I am using YARN
2) So presumably dropping core-site.xml, yarn-site into user.dir works do I need mapred-site.xml as well?

commented Apr 25, 2014 by anonymous

Yes, if you are running MR

commented Apr 25, 2014 by anonymous

What configuration parameters cause a Hadoop 2.x job to run on the cluster? [CLOSED]