Ways to manage user accounts on hadoop cluster when using kerberos security

+3 votes

From the documentation + code, "when kerberos is enabled, all tasks are run as the end user (e..g as user "joe" and not as hadoop user "mapred") using the task-controller (which is setuid root and when it runs, it does a setuid/setgid etc. to Joe and his groups ). For this to work, user "joe" linux account has to be present on all nodes of the cluster."

In a environment with large and dynamic user population; it is not practical to add every end user to every node of the cluster (and drop user when end user is deactivated etc.)

What are other options get this working ? I am assuming that if the users are in a LDAP, can using the PAM for LDAP solve the issue. Any other suggestions?

posted Jan 7, 2014 by Ahmed Patel

1 Answer

+1 vote

LDAP/AD is pretty much it. You can also have Kerberos authenticate directly to AD, or set up one-way trust between AD and MIT Kerberos. There are other identity management systems that basically implement the same. At the end of the day, you need to have (1) users in KDC (2) users on the nodes, and (3) user-group mapping. And it makes sense for all three to come from the same system.

answer Jan 8, 2014 by Tarun Singhal
