The primary focus of my book is on the details of DTS implementation -- how to use Data Transformation Services. This article discusses some of the opportunities for data transformation provided by DTS.
Upgrading a Database from Microsoft Access to SQL Server
Microsoft has created an upsizing tool for moving a Microsoft Access database to SQL Server. This tool creates the links from the Access application to the SQL database.
However, sometimes people will move a database from Microsoft Access (or some other database) to SQL Server, where it's not important to maintain the user interface that was being used previously. In that case, the DTS Import/Export Wizard is a very convenient tool for creating the new tables in SQL Server and importing the data.
Consolidating Data from Multiple Sources
Companies typically have data in a variety of different OLTP and business analysis systems. DTS is a tool that can connect to a variety of different types of data sources and load that information into one central location. In some cases, companies will use this consolidated data to replace the data stored in diverse places. Other companies will continue to use their data in separate locations, but will use their consolidated information for enterprise-wide business analysis.
Data consolidation often also includes merging data obtained by the company from outside sources. This outside data can present significant processing challenges because it might not be in the same format that is used internally by the company.
The consolidation of data from multiple sources often involves much more than just copying data. There can be
Homogenization of data that uses different codes.
Verification of data that is contradictory or incorrect.
The removal of duplicate data where there is a partial overlap between data sets.
Aggregation or summarization of data.
Recalculation of values that are calculated in different ways by different source systems.
Connecting data that is related, but in which the relationship is missing in the source data.
The DTS transformation tasks, with individual row processing and the ability to look up values, can accomplish all of these tasks.
Initial and Periodic Loading of a Data Mart or a Data Warehouse
For some companies, consolidated data will be used to build a data warehouse and/or one or more data marts for the purpose of business analysis. In addition to all the issues involved in consolidating data, there are these additional data transformation issues:
Creation and initial loading of fact and dimension tables.
Periodic loading of additional facts into the fact table.
Periodic loading of new records into the dimension tables.
Updating dimension table records.
Creating and maintaining meta data for the business analysts.
Moving data from the data warehouse to the data marts.
Processing data mining models.
Processing server cubes and local cube files.
DTS has two tasks for processing business analysis information -- the Analysis Services Processing task and the Data Mining Prediction task.
Reporting on Data from Transaction Processing Systems
Existing On Line Transaction Processing (OLTP) systems usually have some reporting capability, but the flexibility of these reports can be quite limited:
Some of the significant information might not be available in the reports.
Some of the information might be aggregated already so that the detailed information is not available.
The reports might not be tied together very well with information from other parts of the OLTP system or outside the system.
The reports might not provide information for all significant time periods.
It is often very difficult to modify existing systems to obtain the desired reports. Fortunately, it may be possible to extract the information from the OLTP system and load it into another database, such as SQL Server or Microsoft Access, for reporting purposes. You can use DTS to access the data in one of two ways:
If there is an OLE DB provider or an ODBC driver to the OLTP database, you can connect using the provider or the driver and use that connection as the source for a Transform Data task.
You can export the data from the OLTP system to a text file, and use the Bulk Insert task or the Transform Data task to load the data into another database.
Building an Interface to the Web for a Legacy System
Existing OLTP systems may not have the capability to present their data in a way that can be accessed by a Web server. The data can be moved to SQL Server or another database that has a built-in Web interface.
Archiving a Database
DTS can be used for extracting data from a database and building an archive with that data. An archive is needed for two fundamental reasons:
The store of data in the OLTP system is getting too large for efficient processing. After the older data is copied to an archive, that data can be deleted from the OLTP system.
The OLTP system may automatically summarize older data and delete the details underlying that data. If the detailed data is not archived periodically, it will be unavailable for business analysis purposes.
Analyzing Internet Clickstream Data
One of the most important uses of data transformation today is the processing of Internet clickstream data. Companies want to analyze the use of their Web sites. They often want to know the following:
How are people getting to the site?
What pages are they looking at?
What ads are they seeing?
How long are they spending on each page and on the site as a whole?
What path do people follow when moving through the site?
How many people are viewing the site?
Who's viewing the site?
What is the demographic information for the people viewing the site?
Web servers create a log that records all the requests received for files. These logs contain some or all of the following information:
The identity of the requesting computer.
The time the request was made.
The text of the requesting line.
The status code returned to the client.
The number of bytes sent to the client.
The URL of the referring server.
The name and version of the browser being used by the client.
The time taken to service the request.
The URL or the resource requested.
The cookie on the client machine.
Web hosting companies sometimes process these logs and give the client company a summary of the significant data. At other times, the raw log is the only format in which the information is available.
A very significant amount of data transformation is needed to extract valuable information out of the clickstream data. You could use the DTS tasks in the following ways:
The FTP task moves the logs to a local server.
The Execute Process task starts a process that could do one or more of the following -- unzip the files, convert the files from binary to textual format, and/or pre-aggregate the data.
The Bulk Insert task loads the clickstream data from the text files into SQL Server. The logs are often very large (1GB or larger), and the Bulk Insert task can achieve the fastest speed for loading.
The Execute SQL task does set-oriented processing of the data.
The Transform Data task does row-level processing of the data.
The Analysis Services task processes OLAP cubes that are based on the data.
The Send Mail task reports on the results of the import when it is completed.
ActiveX Script tasks, Message Queue tasks, Dynamic Property tasks, and Execute Package tasks manage the data transformation process.
Importing and Exporting with XML
The Extensible Markup Language (XML) is a data-oriented Internet language that will be a key technology for the exchange of data in the future. There are many new features in SQL Server 2000 for working with XML, including the following:
The Transact-SQL OpenXML function allows you to decompose XML documents. The process of XML decomposition allows you to move XML data into a relational database.
The FOR XML clause in the SELECT statement allows you to create XML documents from the recordset created by the SELECT statement.
You can query a SQL Server database from a web page using an XML query. SQL Server will return an XML document with the data.
You can use XPath queries to map SQL Server tables to XML structures.
There are some new DTS features that make it easier to work with XML documents:
The ReadFile and WriteFile transformations give you a way to import and export XML files as a part of your transformations.
The Parallel Data Pump task allows you to process the hierarchical recordsets that are used in XML.
An XML OLE DB provider is not provided with SQL Server 2000. However, they are available from third-party sources. With an OLE DB provider you can use an XML document as the source for a transformation task.
Data Transformation Services is a very powerful, versatile tool for moving and manipulating data. It was a great tool in SQL Server 7.0, and it's greatly improved in SQL Server 2000.
About the Author
Timothy Peterson is the chief consultant at SDG Computing, Inc., a company that specializes in data warehousing awith Microsfot's SQL Server tools. He teaches the official Microsoft Data Warehousing course.This article is brought to you by Sams Publishing publisher of Timothy Peterson's Microsoft SQL Server 2000 DTS book.
) Copyright Pearson Education. All rights reserved.