Add few record(s) to a Hive table or a HDFS file on a daily basis

My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file.

In few articles It was being mentioned that we need to load to a staging table in Hive. And then insert like the below :

insert overwrite table finaltable select * from staging;

I am not getting this logic. How should I populate the staging table daily.

2 Answers

The staging table is typically defined as external hive table, data is loaded directly on HDFS and staging table therefore is able to read that data directly from HDFS and the transfer it to Hive managed tables, your current statement. Of course there are variations to this as well.

answer Feb 10, 2014 by Naveena Garg

Why not INSERT INTO for appending the new data?
a) Load the new data into staging table
b) INSERT INTO final table.

answer Feb 10, 2014 by Amit Parthsarthi

Similar Questions

0 votes

How to get info about which data in hdfs or file system that a MapReduce job visits?

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

0 votes

What happens to a read operation when the file is moved to trash in HDFS?

I have a basic question regarding the HDFS file read. I want to know what happens, when the following steps are followed:

Client opens the file for reading and starts reading the file.
In the meantime, someone deletes the file and file moves to the trash folder

Will Step 1. succeed? I feel, since the client has already opened the file and file still exists in .trash, the client should continue to read the file.

0 votes

Can the blocks be broken in HDFS file system?

If there are 10 HDFS blocks to be copied from one machine to another. However, the other machine can copy only 7.5 blocks, is there a possibility for the blocks to be broken down during the time of replication?

+1 vote

Partition file by content based through HDFS

When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content?

Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?

+1 vote

Can I open multiple files on hdfs and write data to them in parallel and then close them at the end?

Add few record(s) to a Hive table or a HDFS file on a daily basis

Your comment on this post:

2 Answers

Your comment on this answer:

Your comment on this answer:

Your answer

Preview