How SSIS can be used for Data Profiling.

Last Post 25 Jul 2008 01:31 AM by SQLUSA. 7 Replies.
AddThis - Bookmarking and Sharing Button
Author Messages
guga2006
New Member
New Member

--
14 Feb 2006 09:47 PM
Hi,

I am new to SSIS and trying to find out how SSIS can be used for Data profiling. Please provide some suggestions apartment from Fuzzy components.



Regards,

Guga
hyeager
New Member
New Member

--
28 Feb 2006 06:01 PM
This is my definition of data profiling. Do you know which of theses things SQL Server 2005 Integration Services can do?

1. Attribute analysis
a. Range
i. Minimum
ii. Maximum
iii. Mean
iv. Median
b. Summation
c. Completeness
i. Portion of the records populated with data
ii. Portion of records that are blank
iii. Portion of records that are null
d. Uniqueness
i. Cardinality – counts the number of unique values for a given attribute
ii. Frequency distribution – provides a count for each unique value in a given attribute
iii. Duplicate data identification
e. Format
i. Inferred pattern identification – identifies the different unique data formats in a given attribute
ii. Inferred pattern frequency – provides a count of each of the different formats identified within an attribute
f. Inferred type – alpha, numeric, date, binary, etc.
2. Relationship analysis
a. Structural Integrity
i. Unique primary keys
ii. Foreign keys
iii. Foreign key parents
iv. Normalized or denormalized table structure
b. Discovery
i. Functional dependencies
ii. Potential primary keys
iii. Potential foreign keys
iv. % agreement
v. Duplicate data columns
vi. Orphan analysis
3. Business Rule Compliance or Data Auditing – custom tests across columns and tables for adherence against known business rules
4. Reporting
a. Current state vs. documented state
b. Trending of data quality over time


Thank you,

Hans
hyeager
New Member
New Member

--
01 Mar 2006 08:02 PM
Yes. Many things can be done in the database. (As a salesperson would say, "It's a simple matter of programming.") However, the data sheet for SQL Server 2005 Integration Services specificaly states that it has data profiling. A robust data profling solution automatically performs all of the functions I cited without coding. I'm trying to learn how robust the data profiling solution is that comes with IS.
hyeager
New Member
New Member

--
11 Apr 2006 07:36 AM
I contacted Donald Farmer, the product manager at Microsoft. I quote from his response to my inquiry.

"No we don’t have a data profiling toolset as such in SQL Server 2005. The explorer features are indeed very simple, but quite useful in their way.

You could do this more complete profiling with queries, but typically data profiling not just about the queries but about how you cross-reference the different results to build up a picture of your data model.

We do have partners with this functionality."
iwhiteside
New Member
New Member

--
14 Nov 2007 12:18 PM

Please see my video on SSIS Data Profiling


http://www.jumpstarttv.com/Media.aspx?vid=72


Ira Whiteside
iwhiteside@iwhiteside.com
iwhiteside
New Member
New Member

--
24 Jul 2008 12:09 PM
I have published another article on how to accomplish Data Profiling in SSIS 2005 on www.sqlservercentral.com entitled Kimball ETL (subsystem 1) – Data Profiling via SSIS Data Flow – Pipeline Column Data Profiling. http://www.sqlservercentral.com/art...ETL/63792/" target="_blank" rel="nofollow">http://www.sqlservercentral.com/art...ETL/63792/ which specifically addresses Column Statistics, Value Distribution and Pattern analysis using the SSIS Data Flow Script component.

http://www.sqlservercentral.com/art...ETL/63792/" target="_blank" rel="nofollow">http://www.sqlservercentral.com/art...ETL/63792/"> Kimball ETL(subsystem 1) - Data Profiling via SSIS Data Flow - Pipeline Column Data Profiling

Ira Whiteside
SQLUSA
New Member
New Member

--
25 Jul 2008 01:31 AM
For truly exciting data profiling you have to go SQL Server 2008.

Here are some links: http://weblogs.sqlteam.com/derekc/a...60603.aspx

http://pcquest.ciol.com/content/Dev...060304.asp


I am afraid there is no good solution for your issue in SQL Server 2005.


Kalman Toth - Database, Data Warehouse & Business Intelligence Architect
SQLUSA: http://www.sqlusa.com/highperformance2005/ The Best SQL Server 2005 Training in the World!
iwhiteside
New Member
New Member

--
14 Mar 2009 09:39 PM
I would like to share the new Melissa Data SSIS Total Data Quality Toolkit TDQ-IT, including SSIS Data Profiling.

Melissa Data SSIS Total Data Quality Toolkit TDQ-IT offers a wide range of data transformation and cleansing functionality including data profiling, parsing, cleansing, matching and monitoring functionality built right in to SSIS. And, TDQ-IT leverages SSIS to provide a flexible, effective solution for your organization’s data quality and master data management (MDM) initiatives. Request a free trial today.


http://www.melissadata.com/dqt/tota...ration.htm


Acceptable Use Policy
---