top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Does Hadoop/MapReduce have localization feature?

+2 votes
632 views

There is a scenario wherein we have to process files containing special characters of other language. When we process files containing a special character gets replaced by something in the output.

Is there any possible work around for this?

posted Jan 24, 2014 by Amit Mishra

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

1 Answer

0 votes

You need to be more clear about how do you process the files.

I think the important question is what kind of InputFormat and OutputFormat you are using in your case.
If you are using the default one, on Linux, I believe the TextInputFormat and TextOutputFormat will both convert bytes array to text using UTF-8 encoding. So if your source data is UTF-8, then your output should be fine.

To help you in this case, you need to figure out following:
1) What kind InputFormat/OutputFormat you are using?
2) How do you write the data output? Using Reducer Context.write to output, or you write to HDFS directly in your code?
3) What encoding is your source data?

answer Jan 24, 2014 by Jai Prakash
Similar Questions
+1 vote

I would like to know if you have any examples or tutorials where I can learn hadoop mapReduce on mongodb in java?

+2 votes
public class MaxMinReducer extends Reducer {
int max_sum=0; 
int mean=0;
int count=0;
Text max_occured_key=new Text();
Text mean_key=new Text("Mean : ");
Text count_key=new Text("Count : ");
int min_sum=Integer.MAX_VALUE; 
Text min_occured_key=new Text();

 public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
       int sum = 0;           

       for (IntWritable value : values) {
             sum += value.get();
             count++;
       }

       if(sum < min_sum)
          {
              min_sum= sum;
              min_occured_key.set(key);        
          }     


       if(sum > max_sum) {
           max_sum = sum;
           max_occured_key.set(key);
       }          

       mean=max_sum+min_sum/count;
  }

 @Override
 protected void cleanup(Context context) throws IOException, InterruptedException {
       context.write(max_occured_key, new IntWritable(max_sum));   
       context.write(min_occured_key, new IntWritable(min_sum));   
       context.write(mean_key , new IntWritable(mean));   
       context.write(count_key , new IntWritable(count));   
 }
}

Here I am writing minimum,maximum and mean of wordcount.

My input file :

high low medium high low high low large small medium

Actual output is :

high - 3------maximum

low - 3--------maximum

large - 1------minimum

small - 1------minimum

but i am not getting above output ...can anyone please help me?

+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

...