top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Getting times for all the jobs run on a YARN cluster

+1 vote
254 views

I'm trying to get all the start and finish times for all the run jobs on a yarn cluster.

yarn application -list -appStates ALL

Will get me most of the details of the jobs, but not the times. However, I can parse this for the application ids and then run

yarn application -status $ID

on each application id to get an output that I can parse for the time.

However this involves making lots of connections to yarn, so is relatively slow. Is there a single command I can use to get all this information?

posted Mar 21, 2014 by Kiran Kumar

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

First of all, I'm using Hadoop-2.6.0. I want to launch my own app master on a specific node in a YARN cluster in order to open a server on a predetermined IP address and port. To that end, I wrote a driver program in which I created a ResourceRequest object and called setResourceName method to set a hostname, and attached it to a ApplicationSubmissionContext object by callingsetAMContainerResourceRequest method.

I tried several times but couldn't launch the app master on a specific node. After searching code, I found that RMAppAttemptImpl invalidates what I've set in ResourceRequest as follows:

 // Currently, following fields are all hard code,
 // TODO: change these fields when we want to support
 // priority/resource-name/relax-locality specification for AM containers
 // allocation.
 appAttempt.amReq.setNumContainers(1);
 appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
 appAttempt.amReq.setResourceName(ResourceRequest.ANY);
 appAttempt.amReq.setRelaxLocality(true);

Is there another way to launch a container for an application master on a specific node in Hadoop-2.6.0?

+2 votes

I happened to run into this interesting scenario:

I had some mahout seq2sparse jobs, originally I run them in parallel using the distributed mode. But because the input files are so small, running them locally actually is much faster. so I turned them to local mode. But I run 10 of these jobs in parallel, so when 10 mahout jobs are run together, everyone became very slow. Is there an existing code that takes a desired shell script, and possibly some archive files (could contain the jar file, or C++ --generated executable code). I understand that I could use yarn API to code such a thing, but it would be nice if I could just take it and run in shell..

+1 vote

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

+1 vote

How can I track a job failure on node or list of nodes, using YARN apis. I could get the list of long running jobs, using yarn client API, but need to go further to AM, NM, task attempts for map or reduce.
Say, I have a job running for long,(about 4hours), might be caused of some task failures.

Please provide the sequence of APIs, or any reference.

+2 votes

I am using containerLaunchContext.setCommands() to add different commands that I wanted to run on container. But only first command is getting execute.Is there is something else I need to do?

List commands = new ArrayList();commands.add(cmd1);commands.add(cmd2);

I can see only cmd1 is getting executed.

...