combining files and no dupes

2 Answers

give

cat *.list >> output.joined | sort -u | uniq --all-repeated
a try. If the output is empty ===> no dupes!

answer May 14, 2013 by anonymous

1) You probably want '>' rather than '>>' if you're only running this once. Not that it makes a difference here but it's superfluous.

2) Since you're sending the output of 'cat' to a file, the pipe won't get any input, so you're you sorting nothing. If you actually want to capture the output in a file, you can use 'tee':

sort *.list | tee output | sort -u

or just run the two commands separately:

cat *.list > output
sort -u < output

If not, then "cat *.list|sort -u" is enough.

answer May 14, 2013 by anonymous

Similar Questions

+1 vote

Advantage/disadvantage of dbm vs join vs HBase

I have a roughly 5 GB file where each row is a key, value pair. I would like to use this as a "hashmap" against another large set of file. From searching around, one way to do it would be to turn it into a dbm like DBD and put it into a distributed cache. Another is by joining the data. A third one is putting it into HBase and use it for
lookup.

I'm more familiar with the first approach, so it seems simpler to me. However, I have read that using a distributed cache for files beyond a few megabytes is not recommended because the file is replicated across
all the data nodes. This doesn't seem that bad to me because I just pay this overhead once at the beginning of the job, and then each node gets a copy locally, right? If I were to go with join, would it not increase the workload (more entries) and create the same network congestion issue? And wouldn't going with HBase means making it a bottleneck?

What's the advantage and disadvantage of going for one solution over the others? What if, for example, that "hashmap" needs to be from, say, a 40GB file. How would my option change? At which point would
each option make sense?

+1 vote

using join in perl

I am facing some difficulty using join to display the array elements. Here is the code snippet

[code]use strict;use warnings
my @fruits = qw/apple mango orange banana guava/;
#print '[', join '][', @fruits;#print ']';
print '[', join '][', @fruits, ']';best,
[/code]

[output]
      [apple][mango][orange][banana][guava][]
[/output]

How can I make the output to eliminate the last empty square brackets [] using a single print statement. I used two print statements as shown in the code snippet above (#lines are commented out). Any help is greatly appreciated.

+1 vote

opreport error : no smaple files found.try using opcontrol --dump

Trying to profile an application on powerpc architecture.
while profiling the following error occurred by running opreport command..

#opcontrol --start
#./exec
#opcontrol --stop
#opcontrol --dump
#opreport

opreport error : no sample files found.try using opcontrol --dump

combining files and no dupes

Your comment on this post:

2 Answers

Your comment on this answer:

Your comment on this answer:

Your answer

Preview