Does Hadoop/MapReduce have localization feature?

1 Answer

You need to be more clear about how do you process the files.

I think the important question is what kind of InputFormat and OutputFormat you are using in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and TextOutputFormat will both convert bytes array to text using UTF-8 encoding. So if your source data is UTF-8, then your output should be fine.

To help you in this case, you need to figure out following:
1) What kind InputFormat/OutputFormat you are using?
2) How do you write the data output? Using Reducer Context.write to output, or you write to HDFS directly in your code?
3) What encoding is your source data?

answer Jan 24, 2014 by Jai Prakash

Similar Questions

+1 vote

How to write a custom partitioner for a Hadoop MapReduce job?

+1 vote

How to learn hadoop mapReduce on mongodb in java

I would like to know if you have any examples or tutorials where I can learn hadoop mapReduce on mongodb in java?

+2 votes

How to find min, max and mean of wordcount from text file in hadoop mapreduce?

public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+1 vote

How to stop a mapreduce job from terminal running on Hadoop Cluster?

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

Can we run mapreduce job from eclipse IDE on fully distributed mode hadoop cluster?

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

Does Hadoop/MapReduce have localization feature?

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview