top button
Flag Notify
    Connect to us
      Facebook Login
      Site Registration Why to Join

Facebook Login
Site Registration

Joins in Hadoop

+1 vote
58 views

I want to use hadoop for performing operation on graph data I have two file :

1) Edge list file
This file contains one line for each edge in the graph.
sample:

1  2 (here 1 is source and 2 is sink node for the edge)
1  5
2  3
4  2
4  3
5  6
5  4
5  7
7  8
8  9
8  10

2) Partition file
This file contains one line for each vertex. Each line has two values first number is and second number is
sample :

2  1
3  1
4  1
5  2
6  2
7  2
8  1
9  1
10  1

The Edge list file is having size of 32Gb, while partition file is of 10Gb. (size is so large that map/reduce can read only partition file . I have 20 node cluster with 24Gb memory per node.)

My aim is to get all vertices (along with their adjacency list) those having same partition id in one reducer so that I can perform further analytics on a given partition in reducer.

Is there any way in hadoop to get join of these two file in mapper and so that I can map based on the partition id?

posted Jun 24, 2015 by anonymous

Looking for an answer?  Promote on:
Facebook Share Button Twitter Share Button Google+ Share Button LinkedIn Share Button Multiple Social Share Button

Contact Us
+91 9880187415
sales@queryhome.net
support@queryhome.net
#280, 3rd floor, 5th Main
6th Sector, HSR Layout
Bangalore-560102
Karnataka INDIA.
QUERY HOME
...