Tuesday, July 14, 2009

Duplicate Filtering in SSIS

Having being frustrated on lack of duplicate filtering in SSIS, I decided to take a work around, first dump the data in a temporary table and pick it up with "select distinct … “ in the next task. This work around works though but fails in instances that the filter had to be applied on some of the columns selected unlike “select distinct ..” that applies across all the selected columns.

Alas!!! SSIS provides this in the Sort Task and can be applied just as I want it in my project. Simply select the columns you want to filter on as the sort keys and check “Remove rows with duplicate soft values”.  Voila! problem solved.

image 

IMHO, I consider the sort task an overkill for this purpose but no other alternative unless the developer will not mind building a dedicated filtering task for SSIS. SQL Central provides samples on how to build SSIS task.

No comments: