Run arbitrary job (non-MR) on YARN ?

702 views

I happened to run into this interesting scenario:

I had some mahout seq2sparse jobs, originally I run them in parallel using the distributed mode. But because the input files are so small, running them locally actually is much faster. so I turned them to local mode. But I run 10 of these jobs in parallel, so when 10 mahout jobs are run together, everyone became very slow. Is there an existing code that takes a desired shell script, and possibly some archive files (could contain the jar file, or C++ --generated executable code). I understand that I could use yarn API to code such a thing, but it would be nice if I could just take it and run in shell..

posted Oct 27, 2014 by Parveen

Looking for an answer? Promote on:

Similar Questions

+1 vote

How can I track a job failure on node or list of nodes, using YARN APIs?

How can I track a job failure on node or list of nodes, using YARN apis. I could get the list of long running jobs, using yarn client API,Â but need to go further to AM, NM, task attempts for map or reduce.
Say, I have a job running for long,(about 4hours), might be caused of some task failures.

Please provide the sequence of APIs, or any reference.

+2 votes

Run my own application master on a specific node in a YARN cluster

First of all, I'm using Hadoop-2.6.0. I want to launch my own app master on a specific node in a YARN cluster in order to open a server on a predetermined IP address and port. To that end, I wrote a driver program in which I created a ResourceRequest object and called setResourceName method to set a hostname, and attached it to a ApplicationSubmissionContext object by callingsetAMContainerResourceRequest method.

I tried several times but couldn't launch the app master on a specific node. After searching code, I found that RMAppAttemptImpl invalidates what I've set in ResourceRequest as follows:

 // Currently, following fields are all hard code,
 // TODO: change these fields when we want to support
 // priority/resource-name/relax-locality specification for AM containers
 // allocation.
 appAttempt.amReq.setNumContainers(1);
 appAttempt.amReq.setPriority(AM_CONTAINER_PRIORITY);
 appAttempt.amReq.setResourceName(ResourceRequest.ANY);
 appAttempt.amReq.setRelaxLocality(true);

Is there another way to launch a container for an application master on a specific node in Hadoop-2.6.0?

+1 vote

Getting times for all the jobs run on a YARN cluster

I'm trying to get all the start and finish times for all the run jobs on a yarn cluster.

yarn application -list -appStates ALL

Will get me most of the details of the jobs, but not the times. However, I can parse this for the application ids and then run

yarn application -status $ID

on each application id to get an output that I can parse for the time.

However this involves making lots of connections to yarn, so is relatively slow. Is there a single command I can use to get all this information?

+1 vote

How a job works in YARN/Map Reduce? like navigation path...

How a job works in YARN/Map Reduce? like navigation path.

Please check my understanding is right?

When the application or job or client starts, client communicate with Name node the application manager started on node (data node), Application manager communicates with Resource manager (on name node) to get resource.The resource are assigned to container. The job runs on Container which is JVM.

+1 vote

Can we run mapreduce job from eclipse IDE on fully distributed mode hadoop cluster?

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

...

Run arbitrary job (non-MR) on YARN ?

Your comment on this post:

Your answer

Preview