top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How hadoop can be used machine learning, language processing and fraud detection etc

+1 vote
301 views

I would like to understand how Hadoop is used for more real-time scenarios. Are machine learning, language processing and fraud detection examples available ? What are the other practical usecases ?

posted Apr 29, 2014 by Seema Siddique

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button LinkedIn Share Button
check-out Mahout framework for Machine Learning based applications of Hadoop...

Similar Questions
+2 votes

I have a use case wherein I need to process huge set of files stored in HDFS. Those files are non-splittable and they need to be processed as a whole. Here, I have the following question for which I need answers to proceed further in this.

  1. I wish to schedule the map process in task tracker where data is already available. How can I do it? Currently, I have a file that contains list of filenames. Each map get one line of it via NLineInputFormat. The map process then accesses the file via FSDataInputStream and work with it. Is there a way to ensure this map process is running on the node where the file is available?

  2. Since the files are not large and it can be called as small files by hadoop standard. Now, I came across CombineFileInputFormat that can process more than one file in a single map process. What I need here is a format that can process more than one files in a single map but does not have to read the files, and either in key or value, it has the filenames. In map process then, I can run a loop to process these files. Any help?

  3. Any other alternatives?

+1 vote

How can I store images in hadoop/hive and perform some processing on it? Is there any inbuilt library available to do so? How hadoop stores images in HDFS?

+1 vote

I have a test cluster of two machines, on both of them hadoop is installed. I have configured the hadoop cluster but on admin UI (as in the below picture) I see that two nodes are running on the same master machine, and that the other machine has no Hadoop node.

On master machine following services are running:

~$ jps 26310 ResourceManager 27593 Jps 26216 DataNode 26135 NameNode 26557 NodeManager 26701 JobHistoryServer 

On the slave machine:

~$ jps 2614 DataNode 2920 Jps 2707 NodeManager 

I don't why the slave is not joining the cluster (It was before). I tried to shutdown all servers on both machines and format HDFS then restarting everything but that didnot help. Any help to figure whats causing that behavior is appreciated.

...