How a job works in YARN/Map Reduce? like navigation path...

695 views

How a job works in YARN/Map Reduce? like navigation path.

Please check my understanding is right?

When the application or job or client starts, client communicate with Name node the application manager started on node (data node), Application manager communicates with Resource manager (on name node) to get resource.The resource are assigned to container. The job runs on Container which is JVM.

posted Apr 6, 2016 by Bob Wise

Looking for an answer? Promote on:

Similar Questions

+1 vote

How to set mapreduce.input.fileinputformat.split.maxsize for a specific job ?

In xmls configuration file of Hadoop-2.x, "mapreduce.input.fileinputformat.split.minsize" is given which can be set but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file. I need to set it in my mapreduce code.

+3 votes

How to find execution time of a MapReduce job?

Date date; long start, end; // for recording start and end time of job
date = new Date(); start = date.getTime(); // starting timer

job.waitForCompletion(true)

date = new Date(); end = date.getTime(); //end timer
log.info("Total Time (in milliseconds) = "+ (end-start));
log.info("Total Time (in seconds) = "+ (end-start)*0.001F);

I am not sure this is the correct way to find. Is there any other method or API to find the execution time of a MapReduce job?

+1 vote

How to stop a mapreduce job from terminal running on Hadoop Cluster?

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

Time out after 600 for YARN mapreduce application

I keep encountering an error when running nutch on hadoop YARN:

AttemptID:attempt_1423062241884_9970_m_000009_0 Timed out after 600 secs

Some info on my setup. I'm running a 64 nodes cluster with hadoop 2.4.1. Each node has 4 cores, 1 disk and 24Gb of RAM, and the namenode/resourcemanager has the same specs only with 8 cores.

I am pretty sure one of these parameters is to the threshold I'm hitting:

yarn.am.liveness-monitor.expiry-interval-ms 
yarn.nm.liveness-monitor.expiry-interval-ms 
yarn.resourcemanager.nm.liveness-monitor.interval-ms

but I would like to understand why.

The issue usually appears under heavier load, and most of the time the on the next attempts it is successful. Also if I restart the Hadoop cluster the error goes away for some time.

+3 votes

Hadoop: How reduce tasks know which partition they should read?

I am looking to the Yarn mapreduce internals to try to understand how reduce tasks know which partition of the map output they should read. Even, when they re-execute after a crash?

I am also looking to the mapreduce source code. Is there any class that I should look to try to understand this question?

...

How a job works in YARN/Map Reduce? like navigation path...

Your comment on this post:

Your answer

Preview