Time out after 600 for YARN mapreduce application

I keep encountering an error when running nutch on hadoop YARN:

AttemptID:attempt_1423062241884_9970_m_000009_0 Timed out after 600 secs

Some info on my setup. I'm running a 64 nodes cluster with hadoop 2.4.1. Each node has 4 cores, 1 disk and 24Gb of RAM, and the namenode/resourcemanager has the same specs only with 8 cores.

I am pretty sure one of these parameters is to the threshold I'm hitting:

yarn.am.liveness-monitor.expiry-interval-ms 
yarn.nm.liveness-monitor.expiry-interval-ms 
yarn.resourcemanager.nm.liveness-monitor.interval-ms

but I would like to understand why.

The issue usually appears under heavier load, and most of the time the on the next attempts it is successful. Also if I restart the Hadoop cluster the error goes away for some time.

1 Answer

Looking into attemptID, this is mapper task getting timed out in MapReduce job. The configuration that can be used to increase the value is 'mapreduce.task.timeout'.

The task timed out is because if there is no heartbeat from MapperTask(YarnChild) to MRAppMaster for 10 mins. Does MR job is custom job? If so any operation are you doing in cleanup() of Mapper ? Sometimes there would be possible that if cleanup() of Mapper is taking more time greater than timedout configured that result in task to timeout.

answer Feb 11, 2015 by Sumit Pokharna

Thank you for the quick reply.

I will modify the value to check if this is the threshold I'm hitting, but I was thinking of decreasing it because my jobs take to long If they get this time out. I would rather fail fast, than keep the cluster busy with jobs stuck in timeouts. Ideally I would like to troubleshoot the issue and not fail at all J .

My MR job is not a custom one it is a job from Nutch 1.8 . Actually there are several jobs from Nutch that fail (ex: Generator, Indexer ).

Also because this is related to Nutch 1.8 also should I move the question to the Nutch mailing list?

commented Feb 11, 2015 by anonymous

Similar Questions

+3 votes

How to find execution time of a MapReduce job?

Date date; long start, end; // for recording start and end time of job
date = new Date(); start = date.getTime(); // starting timer

job.waitForCompletion(true)

date = new Date(); end = date.getTime(); //end timer
log.info("Total Time (in milliseconds) = "+ (end-start));
log.info("Total Time (in seconds) = "+ (end-start)*0.001F);

I am not sure this is the correct way to find. Is there any other method or API to find the execution time of a MapReduce job?

+1 vote

How to set mapreduce.input.fileinputformat.split.maxsize for a specific job ?

In xmls configuration file of Hadoop-2.x, "mapreduce.input.fileinputformat.split.minsize" is given which can be set but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file. I need to set it in my mapreduce code.

+1 vote

How a job works in YARN/Map Reduce? like navigation path...

How a job works in YARN/Map Reduce? like navigation path.

Please check my understanding is right?

When the application or job or client starts, client communicate with Name node the application manager started on node (data node), Application manager communicates with Resource manager (on name node) to get resource.The resource are assigned to container. The job runs on Container which is JVM.

+2 votes

How to find min, max and mean of wordcount from text file in hadoop mapreduce?

public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+1 vote

How to stop a mapreduce job from terminal running on Hadoop Cluster?

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

Time out after 600 for YARN mapreduce application

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview