Hadoop:Whats the best way to check the compression codec that an HDFS file was written with?

1 Answer

If you're looking for file header/contents based inspection, you could download the file and run the Linux utility 'file' on the file, and it should tell you the format.

I don't know about Snappy, but Gzip files can be identified simply by their header bytes for the magic sequence.

If its sequence files you are looking to analyse, a simple way is to read its first few hundred bytes, which should have the codec string in it. Programmatically you can use
https://hadoop.apache.org/docs/r1.0.4/api/org/apache/hadoop/io/SequenceFile.Reader.html#getCompressionCodec() for sequence files.

answer Dec 5, 2013 by Majula Joshi

Similar Questions

+3 votes

Is it possible to access a hadoop 1 file system (hdfs) via the hadoop 2.2.0 command line tools?

I am trying to access a hadoop 1 installation via the hadoop 2.2.0 command line tools. I am wondering if this is possible at all?

From hadoop 1 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
Found 2 items
drwxr-xr-x - cs supergroup 0 2014-02-01 08:18 /tmp
drwxr-xr-x - cs supergroup 0 2014-02-01 08:19 /user

From hadoop 2.2.0 I get:

$ hadoop fs -ls hdfs://127.0.0.1:9000/
ls: Failed on local exception: java.io.EOFException; Host Details : 
local host is: "i7/127.0.1.1"; destination host is: "localhost":9000;

I am trying to find this information via a web-search, but up to now no success.

0 votes

Can anyone provide me steps to grant a particular user access equivalnent to that of "hdfs" user in hadoop.

The reason behind this is I want to have my custom user who can create anything on the entire hdfs file system (/).
I tried couple of links however, none of them were useful. Is there any way by adding/modifying some property tags I can do that ?

+3 votes

Is there a way to run Mapreduce with mongodb as input and output to HDFS?

0 votes

How to get info about which data in hdfs or file system that a MapReduce job visits?

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

+1 vote

Hadoop: How Client communicates with HDFS?

Hadoop:Whats the best way to check the compression codec that an HDFS file was written with?

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview