A High-Performance Data Pump -- and a Whole Lot More
Data Transformation Services in Microsoft SQL Server 2000 is a high-performance data pump. It is a tool for copying, moving, consolidating, cleansing, and validating data. The data pump loads each row from the data source, manipulates the values in that row, and inserts the row into the data destination.
DTS is also a Rapid Application Development tool for data-oriented programming. It provides a comprehensive set of data manipulation tools organized in a development environment that is both convenient and powerful.
Speed is important when working with data:
Organizations have to be able to move large amounts of data between different systems within specified processing periods.
Complex data modifications, including row-by-row processing, are often needed as the data is being moved. If you are manipulating a lot of data, the time required to modify each record must be minimized.
The design, development, and maintenance of data cleansing applications can be very time-consuming. Tools are needed to speed the development process.
Note - If you receive a 1GB file of clickstream data at 8:00 a.m. for the previous day's Web site activity, can you load that data into a SQL Server data mart and process the OLAP cubes so that they're ready for a 10:00 a.m. management meeting?
The need for rapid data transformation is increasing along with the increasing opportunities to gather and analyze data.
There are four basic strategies for developing data manipulation packages with DTS:
Use the DTS Designer
Use the DTS Object Model in code
Use DTS Templates
Use the DTS Wizards
I encourage you to become familiar with all of these. Each one is best for a certain type of situation. These strategies are not exclusive -- you will often use them with one another to achieve the fastest possible development speed.
Using the DTS Designer to Create Complex Data Transformations
The primary tool for working with Data Transformation Services is the DTS Designer, as shown in Figure 1.1. The Designer is a friendly graphical interface that you can use to create complex data transformations with the tasks of DTS.
Figure 1.1. You can create and edit DTS packages in the DTS Designer.
You access the DTS Designer through SQL Server's Enterprise Manager. Data Transformation Services is in the console tree structure under any of the SQL Servers that have been registered (see Figure 1.2). You can open an existing package that is listed under Local Packages or Meta Data Services Packages. You can right-click on Data Transformation Services and open a DTS package that has been saved as a file. Or, you can open the DTS Designer with a blank Design Sheet to begin the creation of a new package.
Figure 1.2. You can open existing packages in the DTS Designer from the Enterprise Manager.
A DTS package contains one or more complex programmatic units called tasks. These tasks can transform data, execute a SQL statement, copy a database, FTP a file, send a message with the Message Queuing Services, or do any of a number of other things. Each task is represented by an icon, which can be seen on both the task palette and the Design Sheet. All the tasks that Microsoft provides with SQL Server 2000 are shown in Figure 1.3.
Figure 1.3. DTS has a variety of tasks that you can use in a package.
DTS Connections are used to specify the source of the data and the destination where it's being moved. You can use DTS with a wide variety of data stores, including relational databases such as Microsoft SQL Server and Oracle, multidimensional databases such as Microsoft OLAP Services, text files, and spreadsheets. DTS Connections use OLE DB providers but can also connect through ODBC drivers and Data Link files. Some of the data sources that are normally installed with SQL Server 2000 are shown in Figure 1.4.
Figure 1.4. You can connect to a variety of data sources with DTS.
Using the DTS Object Model for Programmatic Control
DTS is implemented with the Component Object Model (COM). You can manipulate DTS packages and their components through the appropriate collections, objects, properties, and methods that are exposed in the DTS object model.
The Package object is at the top of the DTS object hierarchy. All the objects that do the work of a DTS package stem from this object.
The Package object contains four collections of objects:
Connections -- Defined links to particular sources of data.
Tasks -- The actions that take place in a package.
Steps -- An object associated with a task that controls how the task fits into the workflow of the package as a whole.
Global Variables -- Variables that allow for sharing of information between the various tasks. Values of global variables can also be sent into a package.
Each task has an associated object called a custom task that contains the properties specific to that particular kind of DTS task.
There are several other DTS objects that do not fit under the Package object. These other objects primarily provide information:
Application -- System properties and information.
TransformationInfo -- DTS transformations that are registered.
ScriptingLanguageInfo -- Scripting languages that are registered.
SavedPackageInfo -- Packages that are stored as files.
Using the DTS Object Model with the Dynamic Properties Task
You can use the new Dynamic Properties task to change the properties of DTS objects while a package is being executed. You create the workflow for this task so that the values are updated at the correct point in the package execution, as shown in Figure 1.13.
Figure 1.13. The Dynamic Properties task allows you to change properties of an object as a package is executing.
SQL Server 2000 has added DTS templates to speed the development of DTS packages. If you are creating a number of packages that are similar, you can create the package once as a template and then use that template as the basis for all your individual packages.
A DTS template is always saved as a file with a .dtt extension. You can open a template in one of two ways:
Right-click on the Data Transformation Services node in the Enterprise Manager and select All Tasks and Open Template from the popup menu.
Highlight the Data Transformation Services node and select All Tasks and Open Template from the Action menu.
The template has a set of tasks and precedence constraints that have already been created. You use these objects as the starting point for a new DTS package.
Using Wizards for Rapid Application Development
SQL Server 2000 provides two wizards that automate the process of creating a DTS package. One of these wizards, the Import/Export Wizard, is almost unchanged from SQL Server 7.0. The other one, the Copy Database Wizard, is new.
Both wizards create DTS packages. You can modify these packages with the DTS Designer.
Data Transformation Services is a very powerful, versatile tool for moving and manipulating data. It was a great tool in SQL Server 7.0, and it's greatly improved in SQL Server 2000.
About the Author
Timothy Peterson is the chief consultant at SDG Computing, Inc., a company that specializes in data warehousing awith Microsfot's SQL Server tools. He teaches the official Microsoft Data Warehousing course.
This article is brought to you by Sams Publishing publisher of Timothy Peterson's Microsoft SQL Server 2000 DTS book.
© Copyright Pearson Education. All rights reserved.