Support multiple block placement policies for HDFS?

437 views

According to the code, the current implement of HDFS only supports one specific type of block placement policy, which is BlockPlacementPolicyDefault by default.The default policy is enough for most of the circumstances, but under some special circumstances, it works not so well.

For example, on a shared cluster, we want to erasure encode all the files under some specified directories. So the files under these directories need to use a new placement policy.But at the same time, other files still use the default placement policy. Here we need to support multiple placement policies for the HDFS.

One plain thought is that, the default placement policy is still configured as the default. On the other hand, HDFS can let user specify customized placement policy through the extended attributes(xattr). When the HDFS choose the replica targets, it firstly check the customized placement policy, if not specified, it fallbacks to the default one. Any thoughts?

posted Sep 15, 2014 by anonymous

Looking for an answer? Promote on:

Facebook Share Button

Twitter Share Button

LinkedIn Share Button

Similar Questions

+1 vote

What is a block and block scanner in HDFS?

+1 vote

Can I open multiple files on hdfs and write data to them in parallel and then close them at the end?

+2 votes

Tips for optimizing HDFS writes with replication=1?

I am writing temp files to HDFS with replication=1, so I expect the blocks to be stored on the writing node. Are there any tips, in general, for optimizing write performance to HDFS? I use 128K buffers in the write() calls. Are there any parameters that can be set on the connection or in HDFS configuration to optimize this use pattern?

+2 votes

How do I customize data placement on DataNodes (DN) of Hadoop cluster?

Let we change the default block size to 32 MB and replication factor to 1. Let Hadoop cluster consists of 4 DNs. Let input data size is 192 MB. Now I want to place data on DNs as following. DN1 and DN2 contain 2 blocks (32+32 = 64 MB) each and DN3 and DN4 contain 1 block (32 MB) each. Can it be possible? How to accomplish it?

0 votes

How to get info about which data in hdfs or file system that a MapReduce job visits?

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

...