top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Issue about enable UseNUMA flag in hadoop framework

+1 vote
268 views

We get a problem about enable UseNUMA flag for my hadoop framework.

We've tried to specify JVM flags during hadoop daemon's starts,

e.g. export HADOOP_NAMENODE_OPTS="-XX:UseNUMA -Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS",  
export HADOOP_SECONDARYNAMENODE_OPTS="-XX:UseNUMA -Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS", etc.  

But the ratio between local and remote memory access is 2:1, just remains as same as before.

Then we find that hadoop MapReduce start child JVM processes to run task in containers. So we passes -XX:UseNUMA to JVMs by set theting configuration parameter child.java.opts. But hadoop starts to throw ExitCodeExceptionException (exitCode=1), seems that hadoop does not support this JVM parameter.

What should we do to enable UseNUMA flag for my hadoop? Or what should we do to decrease the local/remote memory access in NUMA framework? Should we just change Hadoop script or resorts to source code? And how to do it?

The hadoop version is 2.6.0.

posted Sep 19, 2015 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button

Similar Questions
+2 votes

I submit a MR job through hive ,but when it run stage-2 , it failed but why? It seems permission problem , but I do not know which dir cause the problem

Application application_1388730279827_0035 failed 1 times due to AM Container for appattempt_1388730279827_0035_000001 exited with exitCode: -1000 due to: EPERM: 
Operation not permitted at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581) at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:388) at 
org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1041) at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150) at 
org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:190) at 
org.apache.hadoop.fs.FileContext$4.next(FileContext.java:698) at 
org.apache.hadoop.fs.FileContext$4.next(FileContext.java:695) at 
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at 
org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:695) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:385) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:130) at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:861) .
Failing this attempt.. 
Failing the application.
+1 vote

I have a job running very slowly, when I examine cluster, I find my hdfs user using 170m swap though top command, thats user run datanode daemon, ps output show following info, there are two -Xmx value, and i do not know which value is the real ,1000m or 10240m

# ps -ef|grep 2853
root      2095  1937  0 15:06 pts/4    00:00:00 grep 2853
hdfs      2853     1  5 Nov07 ?        1-22:34:22 /usr/java/jdk1.7.0_45/bin/java -Dproc_datanode -Xmx1000m -Dhadoop.log.dir=/var/log/hadoop-hdfs -Dhadoop.log.file=hadoop-hdfs-datanode-ch14.log -Dhadoop.home.dir=/usr/lib/hadoop -Dhadoop.id.str=hdfs -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/usr/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -server -Xmx10240m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/var/log/hadoop-hdfs/gc-ch14-datanode.log -Dhadoop.security.logger=INFO,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
+1 vote

The original local file has execution permission, and then it was distributed to multiple nodemanager nodes with Distributed Cache feature of Hadoop-2.2.0, but the distributed file has lost the execution permission.

However I did not encounter such issue in Hadoop-1.1.1.

Why this happened? Some changes about dfs.umask option or related staffs?

...