Hadoop and Hive Performance Tuning

709 views

I am using hive queries on structured RC file. Can someone please let me know the key performance parameters that I have to tune for better query performance (HADOOP 2.3/ YARN AND HIVE 0.13).

posted Jul 31, 2014 by Sonu Jindal

Looking for an answer? Promote on:

Similar Questions

+2 votes

Hive install under hadoop

I want to use hive in hadoop2.2.0, so I execute following steps:

$ tar ¨Cxzf hive-0.11.0.tar.gz 
$ export HIVE_HOME=/home/software/hive 
$ export PATH=${HIVE_HOME}/bin:${PATH} 
$ hadoop fs -mkdir /tmp
$ hadoop fs -mkdir /user/hive/warehouse 
$ hadoop fs -chmod g+w /tmp
$ hadoop fs -chmod g+w /user/hive/warehouse 
$ hive

Error creating temp dir in hadoop.tmp.dir file:/home/software/temp due to Permission denied

How to make hive install success?

+2 votes

Issue about running hive MR job in hadoop

I submit a MR job through hive ,but when it run stage-2 , it failed but why? It seems permission problem , but I do not know which dir cause the problem

Application application_1388730279827_0035 failed 1 times due to AM Container for appattempt_1388730279827_0035_000001 exited with exitCode: -1000 due to: EPERM: 
Operation not permitted at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581) at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:388) at 
org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1041) at 
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150) at 
org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:190) at 
org.apache.hadoop.fs.FileContext$4.next(FileContext.java:698) at 
org.apache.hadoop.fs.FileContext$4.next(FileContext.java:695) at 
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at 
org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:695) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.initDirs(ContainerLocalizer.java:385) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:130) at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:103) at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:861) .
Failing this attempt.. 
Failing the application.

+2 votes

Hive LOAD DATA INPATH imports all records twice?

I am trying to load JSON data into Hive using hcatalog JsonSerDe. I have created the table, but when I use LOAD DATA INPATH command to load 8 records into the table. However, SELECT * shows 16 records in the table, each record duplicated. Why is this happening?

+2 votes

How to run hive on windows 7?

I'm a freshman in hadoop world. After some struggling, i've successfully make hadoop 2.6 running on my windows 7 laptop.

However when I want to run hive 1.0.0 on my win 7 system, I found there is no cmd line script as provided for linux. It's also hard to find any useful message in google.

Anyone can provide me any clue on how to run hive on window 7?

+1 vote

high ulimit for file descriptors in Hadoop?

I've been reading a lot of posts about needing to set a high ulimit for file descriptors in Hadoop and I think it's probably the cause of a lot of the errors I've been having when trying to run queries on larger data sets in Hive. However, I'm really confused about how and where to set the limit, so I have a number of questions:

How high is it recommended to set the ulimit?
What is the difference between soft and hard limits? Which one needs to be set to the value from question 1?
For which user(s) do I set the ulimit? If I am running the Hive query with my login, do I set my own ulimit to the high value?
Do I need to set this limit for these users on all the machines in the cluster? (we have one master node and 6 slave nodes)
Do I need to restart anything after configuring the ulimit?

...

Hadoop and Hive Performance Tuning

Your comment on this post:

Your answer

Preview