How to limit the number of containers requested by a pig script?

I would like to know how I can limit the number of concurrent containers requested(and used ofcourse) by my pig-script (not as a yarn queue configuration or some such stuff. I want to limit it from outside on a per job basis. I would ideally like to set the number in my pig-script.) Can I do this?

posted Oct 21, 2014

1 Answer

As far as I understand, number of mappers you cannot drive. The number of reducers you can control via PARALEL keyword. Number of containers on a node is given by following combination of settings:yarn.nodemanager.resource.memory-mb - set on a cluster.

And following properties can be "modified" from your script setting to a different number, and

answer Oct 21, 2014 by Vijay Shukla
are you saying that we cant change the mappers per job through the script, right? Because, otherwise, if invoking through command line or code, then we can, I think. We do have this property mapreduce.job.maps.
What I understand so far is that in pig you cannot decide how many mappers will run. That is given by some optimization - given the number of files, size of blocks etc. What you can control is the number of reducers via Parallel directive. But for sure you can SET mapreduce.job.maps  but not sure what the effect will be. That is what I remember from doc.
Hope this helps
