top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

Add few record(s) to a Hive table or a HDFS file on a daily basis

+4 votes
410 views

My requirement is a typical Datawarehouse and ETL requirement. I need to accomplish

1) Daily Insert transaction records to a Hive table or a HDFS file. This table or file is not a big table ( approximately 10 records per day). I don't want to Partition the table / file.

In few articles It was being mentioned that we need to load to a staging table in Hive. And then insert like the below :

insert overwrite table finaltable select * from staging;

I am not getting this logic. How should I populate the staging table daily.

posted Feb 10, 2014 by Tarun Singhal

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

2 Answers

+2 votes

The staging table is typically defined as external hive table, data is loaded directly on HDFS and staging table therefore is able to read that data directly from HDFS and the transfer it to Hive managed tables, your current statement. Of course there are variations to this as well.

answer Feb 10, 2014 by Naveena Garg
+1 vote

Why not INSERT INTO for appending the new data?
a) Load the new data into staging table
b) INSERT INTO final table.

answer Feb 10, 2014 by Amit Parthsarthi
Similar Questions
0 votes

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

0 votes

I have a basic question regarding the HDFS file read. I want to know what happens, when the following steps are followed:

  1. Client opens the file for reading and starts reading the file.
  2. In the meantime, someone deletes the file and file moves to the trash folder

Will Step 1. succeed? I feel, since the client has already opened the file and file still exists in .trash, the client should continue to read the file.

0 votes

If there are 10 HDFS blocks to be copied from one machine to another. However, the other machine can copy only 7.5 blocks, is there a possibility for the blocks to be broken down during the time of replication?

+1 vote

When a user is uploading a file from the local disk to its HDFS, can I make it partition the file into blocks based on its content?

Meaning, if I have a file with one integer column, can i say, I want the hdfs block to have even numbers?

...