Can we run mapreduce job from eclipse IDE on fully distributed mode hadoop cluster?

1 Answer

I could be wrong here, but the way I understand things I do not think that is even possible to run the JAR file from your PC. There are two things that you need to realize.

1) How is the JAR file going to connect to the cluster

2) How is the JAR file going to be distributed to the cluster.

Again I could be wrong here in my response, so anyone else on the list feel free to correct me. I am still a novice to Hadoop and have only worked with it on amazon EMR.

answer Apr 11, 2015 by anonymous

I have installed and configured my own hadoop cluster with one master node and 7 slave nodes. Now I just want to make sure that job running through Eclipse is internally same as running through jar file. Also job history server and web ui 8088 shows only list of those jobs which are submitted through jar using terminal.

commented Apr 11, 2015 by Sudhakar Singh

I think you can have a try at http://hdt.incubator.apache.org/

commented Apr 12, 2015 by anonymous

Similar Questions

+1 vote

How to stop a mapreduce job from terminal running on Hadoop Cluster?

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+3 votes

Can we control data distribution and load balancing in Hadoop Cluster?

As I studied that data distribution, load balancing, fault tolerance are implicit in Hadoop. But I need to customize it, can we do that?

+2 votes

How to find min, max and mean of wordcount from text file in hadoop mapreduce?

public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+2 votes

How do I customize data placement on DataNodes (DN) of Hadoop cluster?

Let we change the default block size to 32 MB and replication factor to 1. Let Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) each and DN3 and DN4 contain 1 block (32 MB) each. Can it be possible? How to accomplish it?

+1 vote

How to set mapreduce.input.fileinputformat.split.maxsize for a specific job ?

In xmls configuration file of Hadoop-2.x, "mapreduce.input.fileinputformat.split.minsize" is given which can be set but how to set "mapreduce.input.fileinputformat.split.maxsize" in xml file. I need to set it in my mapreduce code.

Can we run mapreduce job from eclipse IDE on fully distributed mode hadoop cluster?

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview