SQL Server 2005 Integration Services (SSIS) is a new Extract, Transform, and Load (ETL) tool that ships with SQL Server 2005. As with many of Microsoft's development tools, you can extend SSIS to perform other operations not already bundled with the product. SSIS provides the SSIS Class Library for just this purpose. The SSIS Class Library Object Model includes all the classes and interfaces you need to extend or augment SSIS.
The best way to learn the capabilities of a new object model is to study the key points-of-interest in the object model. SSIS lends itself well to the approach. The sample program included with this article (ExploreIS) navigates the key points-of-interest in the SSIS Class Library. The following sections walk you through ExploreIS to introduce key classes in the SSIS Object model.
One caveat before delving into the details: The samples were developed using the June 2005 CTP of SQL Server 2005 and Visual Studio 2005.
A Primer for New SSIS Developers
Although this article assumes you're experienced with SSIS, an introduction to SSIS development is beyond its scope. If you are new to SSIS, read the Developer.com article "SQL Server Integration Services: SQL Server 2005's New ETL Platform" to gain an understanding of SSIS development.
Now, you're ready to begin learning about the SSIS object model. The first stop is an overview of ExploreIS.
ExploreIS Under the Hood
ExploreIS is a Visual Studio 2005 Form Desktop application that operates in a straightforward manner. When you pass the path of the Integration Services file (.dtsx) to the program, it navigates the DTSX file and prints information to a listbox on the main Form.
For demonstration, the application has been hard-coded to explore the Data Lineage sample thath ships with the SQL Server 2005 SDK code samples. The Data Lineage sample reads data from a text file, adds some lineage information to the data, and writes the data to the SQL Server 2005 AdventureWorks sample database. Figure 1 shows a view of the package as seen from the Package Explorer tab.
Figure 1. A View of the Data Lineage Package from the Package Explorer Tab
ExploreIS renders some of the contents of the Data Lineage package in a hierarchical fashion (see Figure 2). Thus, items at the lower levels are indented below the containing object.
Figure 2. Results Returned by ExploreIS
Now that you understand the functionality of the sample code and have been introduced to the sample Data Lineage package, you can move on to the SSIS Object Model.
TaskHosts, Tasks, and Packages
When you develop a SSIS package, you drag, drop, and configure separate items from the Task toolbar into your project. Each Task component you drop into you project really consists of two primary classes, a TaskHost class and a Task class, along with other support classes.
The role each class plays in SSIS is partitioned as follows:
- TaskHost handles the behavior of the component in the SSIS environment. TaskHost includes event-handling functions, package variables, and logging.
- Task handles the execution of the component and may include other specialized classes called Data Flow Components.
The Data Lineage Sample contains two tasks, a Data Flow Task and an Execute SQL Task. The Package class maintains all parts of the SSIS package, including events, variables, and tasks. The Package class maintains TaskHosts (along with other classes) in a collection called Executables.
As you can see, the TaskHosts and Tasks have very specialized functions. The functionality of the Data Flow Task in SSIS is further subdivided into classes inheriting from the class PipelineComponent.
Pipeline Components are the muscles of the SSIS Data Flow Task. Pipeline Components work with ConnectionManager objects to move and transform all types of data. (A discussion of the ConnectionManager object will come later.)
In the Data Lineage Sample, you configure PipelineComponents on the Data Flow tab in the SSIS development environment. The Data Lineage Sample uses three PipelineComponents:
- A source component called Flat File Source
- A transformation component called Logged Lineage
- A destination component called OLE DB Destination
Note that the Data Flow Task appears to be implemented with unmanaged code. As you will see later in the article, a number of wrapper classes control access to this underlying unmanaged code. Because the underlying COM is hidden, you don't need to be a COM expert. Being aware of the underlying infrastructure will be useful should you encounter any difficulties using the SSIS Data Flow Task.
Now that you've been introduced to TaskHosts, Tasks, and PipelineComponents, you're ready to move on to the ConnectionManager.
ConnectionManagers are the classes underlying the Connections you see in the SSIS development environment. The Data Lineage sample package contains two Connection Managers: a SQL Server connection to the AdventureWorks database and a connection to the source data files.
ConnectionManagers are maintained in the Package class in a collection called Connections.
Now, you're finally ready to delve into the details of the sample program.