top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Unit tests on Hadoop Cluster

+2 votes
63 views

I am a beginner to writing unit tests in hadoop.

As per https://wiki.apache.org/hadoop/HowToDevelopUnitTests

the Hadoop Unit tests are all designed to work on a local machine, rather than a full-scale Hadoop cluster.

However I do see the Hadoop-QA https://issues.apache.org/jira/secure/ViewProfile.jspa?name=hadoopqa also runs unit test cases when it validates a patch for any issue like

https://issues.apache.org/jira/browse/YARN-2459?focusedCommentId=14115586&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14115586

So does this mean it runs unit tests only in single node / local setup for validating any patches?

I wish to write some unit tests in a HA environment thus I need test in a cluster setup

posted Jan 5, 2015 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

Let we change the default block size to 32 MB and replication factor to 1. Let Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) each and DN3 and DN4 contain 1 block (32 MB) each. Can it be possible? How to accomplish it?

+1 vote

I have a test cluster of two machines, on both of them hadoop is installed. I have configured the hadoop cluster but on admin UI (as in the below picture) I see that two nodes are running on the same master machine, and that the other machine has no Hadoop node.

On master machine following services are running:

~$ jps 26310 ResourceManager 27593 Jps 26216 DataNode 26135 NameNode 26557 NodeManager 26701 JobHistoryServer 

On the slave machine:

~$ jps 2614 DataNode 2920 Jps 2707 NodeManager 

I don't why the slave is not joining the cluster (It was before). I tried to shutdown all servers on both machines and format HDFS then restarting everything but that didnot help. Any help to figure whats causing that behavior is appreciated.

+1 vote

To run a job we use the command
$ hadoop jar example.jar inputpath outputpath
If job is so time taken and we want to stop it in middle then which command is used? Or is there any other way to do that?

+1 vote

A mapreduce job can be run as jar file from terminal or directly from eclipse IDE. When a job run as jar file from terminal it uses multiple jvm and all resources of cluster. Does the same thing happen when we run from IDE. I have run a job on both and it takes less time on IDE than jar file on terminal.

...