When I submit a map reduce job, would it only work on the files present at that point?

0 votes

I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point.

My Question is that when I submit a map reduce job, would it only work on the files present at that point?

posted Aug 27, 2014 by Vijay Shukla

1 Answer

+1 vote

Normally MR job is used for batch processing. So I don't think this is a good use case here for MR. Since you need to run the program periodically, you cannot submit a single mapreduce job for this. An possible way is to create a cron job to scan the folder size and submit a MR job if necessary;

answer Aug 28, 2014 by Amit Parthsarthi
Or, maybe have a look at Apache Falcon:Falcon - Apache Falcon - Data management and processing platform
