Tips for optimizing HDFS writes with replication=1?

493 views

I am writing temp files to HDFS with replication=1, so I expect the blocks to be stored on the writing node. Are there any tips, in general, for optimizing write performance to HDFS? I use 128K buffers in the write() calls. Are there any parameters that can be set on the connection or in HDFS configuration to optimize this use pattern?

posted Jan 22, 2014 by Kiran Kumar

Looking for an answer? Promote on:

Similar Questions

+2 votes

HDFS - Consolidate 2 small volumes into 1 large volume

Is it possible to consolidate two small data volumes (500GB each) into a larger data volume (3TB)?

I'm thinking that as long as the block file names and metadata are unique, then I should be able to shut down the datanode and use something like tar or rsync to copy the contents of each small volume to the large volume.

Will this work?

0 votes

Unable to use ./hdfs dfsadmin -report with HDFS Federation

Our hadoop cluster is using HDFS Federation, but when use the following command to report the HDFS status

$ ./hdfs dfsadmin -reportreport: FileSystem viewfs://nsX/ is not an HDFS file system
Usage: hdfs dfsadmin [-report] [-live] [-dead] [-decommissioning]

It gives me the following message that viewfs is NOT HDFS filesystem. Then how can I proceed to report the hdfs status

+1 vote

Hadoop: How Client communicates with HDFS?

+2 votes

NFS HDFS gateway with kerberos

Does anyone know if the NFS HDFS gateway is currently supported on secure clusters using Kerberos for Hadoop 2.2.0? We are using HDP 2.0 and looking to use NFS gateway

+3 votes

Support multiple block placement policies for HDFS?

According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default.The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well.

For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy.But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS.

One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts?

...

Tips for optimizing HDFS writes with replication=1?

Your comment on this post:

Your answer

Preview