Monday, January 11, 2010

SSIS vs Informatica

In a recent outing, I met with a product vendor wanting to do ETL for a Siebel CRM product from Oracle database.

Trust me, I was quick to recommend SSIS which I am very familiar and very recently just had a laudable success extracting and Migration of Billing Information from a Legacy ENDAN (SQL Server based) application to Singl.eView from Intec.

On further discussion with the vendor, he would prefer the use of Informatica (first time getting to know this). I was able to show him lots of cool features SSIS have to offer.

Afterwards, I decided to know more about Informatica and found this on the net

Informatica does not have an equivalent of SSIS’s UNION component . This is a big problem for what I’m trying to do because a lot of the logs that I’m trying to load are in different folders (to represent the different web servers). Informatica requires 2 pipelines (see #3 below) to extract this data whereas SSIS can just have 2 source adapters in the same data-flow and UNION the data together.

SSIS’s method of loading multiple files (i.e. the "Multiple Flat Files Source Adapter") is a lot better than Informatica's. With Informatica you have to, externally to Informatica, build a list of files to process and then pass that list back into Informatica. To make this dynamic at runtime you would have to shell out to an external process to produce the file list. This is not pleasant – especially compared to SSIS’s very simple method of just specifying “*.log” in the source adapter.

I have a pipeline (built in both SSIS & INFA) that filters out all comment lines from the web logs, extracts all the individual fields (e.g. Timestamp, ClientIP address, URI, Referrer etc...), and inserts into a SQL Server table. The SSIS pipeline is working on ~150000 rows and completes in ~23 seconds. The Informatica method (which uses 2 pipelines cos of #1 above) takes ~45 seconds. Even 1 Informatica pipeline on its own (working on about half the records) took ~27 seconds. Bear in mind also that the SSIS pipeline was run from BIDS and as I have previously mentioned, BIDS places a large overhead on the execution of a package. I would suppose that Informatica does not have the same restriction. In short, SSIS seems a lot quicker!

SSIS’s method of dynamically setting the destination at runtime (i.e. configurations) is a lot better than the Informatica equivalent. With Informatica you have to configure each task to use the dynamic value, and setting up the value itself is a manual process because you have to manually handcraft what is termed a parameter list. SSIS does this using the configuration wizard. Let me say again, EVERY Informatica task that uses an external connection has to have a dynamic configuration set up using this method; with SSIS you do it in one place, on the Connection Manager.




This even has more indept comparison
http://download.microsoft.com/download/1/0/3/103fd39e-3ca4-4db7-a087-1263dc6ed0b1/CompIntTools.pdf

10 comments:

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...

Nice dispatch and this mail helped me alot in my college assignement. Gratefulness you for your information.

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...
This comment has been removed by a blog administrator.