top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

How do I Exclude matching rows in two flat files usign Informatica?

+1 vote
1,076 views

I have a flat file (source) with data that looks like this :
1,2,3
4,5,6

And a second flat file (source) that look like this:
1,2,3

I want result (target) to be a flat file that looks like this:
4,5,6

I know in SQL there is an EXCLUDE (opposite of INTERSECT) that would do the job. But I am doing this all with flat files and Informatica.

If anyone can tell what transformations should I use, that would be a great help.

posted Mar 4, 2014 by Pooja Bhanout

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

2 Answers

+1 vote

You can use the union transformation to union the result sets from both the files. This would give you the following result set .

col1, col2, col3

1,2,3
4,5,6
1,2,3

After this I would use an aggregator transformation and select these ports (and which ever you want to group by to consider them duplicate) and get the count in the aggregator transformation.

After Aggregator (new column count => total_count)

col1, col2, col3,total_count

1,2,3,2
4,5,6,1

The next step is simple, Pass them through a filter transformation and use the following filter condition to ignore rows which have duplicates.

total_count = 1

answer Mar 4, 2014 by Madhavi Kumari
0 votes

As I understood your question you want to passed the unmatched rows from your first source file.
There are various ways to do -
One method is already answered.
You can also use joiner tranformation to join then and then use Expression & Filter Tranformation to identify the unmatched record.

answer Apr 25, 2014 by Shatark Bajpai
Similar Questions
0 votes

How to calculate number of rows in flat files in informatica.
As we are doing in Teradata as SEL COUNT (*) FROM ABC.ACCT.

0 votes

I have some files that I would like to consolidate into a single database table. The files have similar but different formats. The files look something like this:

FileOne:
•ColA : string
•ColB : string
•ColC : string

FileTwo:
•ColAA : string
•ColBB : string
•ColCC : string

FileThree:
•Col01 : string
•Col02 : string
•Col03 : string

The destination table looks like this:

TableDestination:
•ColFirst : string
•ColSecond : string
•ColThird : string

I want to develop a mapping that ETLs these three files into this one database, but because the column names are different, it looks like I'll have to develop three different mappings, or three different sources, or three different somethings. The problem is that my example is contrived: I actually have many different files that all have different formats and column names, but the data is all very similar.

I would like to develop a single mapping or workflow that can handle all of this by only adding a table that holds the column mappings. Such a table would look like this based on the sample files and sample table above:

TableMappings:
enter image description here

In this way, to edit a column mapping I only have to make an edit this this TableMappings table. I wouldn't have to make any changes at all to the mapping or workflow. Nor would I have to redeploy an application.

What would a mapping or workflow look like that could take advantage of something like this? I assume there'd be a flat file source that takes files from a folder. There would be something in the middle that uses this TableMappings table to map column names. Finally there would be a relational data object that represents my destination database table "TableDestination". I don't know how to put this together though.

...