By Laura Jones, Mark Spenik, and Orryn Sledge
Data Replication is a powerful feature of SQL Server. SQL Server 7.0 provided many improvements from previous versions, and SQL Server 2000 builds on those features and adds more to increase the reliability, ease of use, and management of replication.
Replication ships with the standard product. Several other RDBMS vendors treat replication as a separate product for which you pay extra.
In a nutshell, replication is the capability to reliably duplicate data from a source database to one or more destination databases. Using Microsoft SQL Server replication, you can automatically distribute data from one SQL Server to many different SQL Servers through ODBC (Open Database Connectivity) or OLE DB. SQL Server 2000 enables you to replicate to non-SQL Server subscribers (heterogeneous subscribers), such as Microsoft Access or Oracle, using ODBC or OLE DB and enables support for Internet anonymous subscribers. SQL Server replication provides update replication capabilities such as Immediate Updating Subscribers and merge replication. With all the new enhancements to SQL Server replication, the number of possible applications and business scenarios is mind-boggling. In Chapters 28, "Transactional Replication," and 29, "Merge Replication," of our book, Microsoft SQL Server 2000 DBA Survival Guide , you can examine the different types of replication in detail as well as appropriate and inappropriate applications for each type. However, for starters here are some examples of applications or scenarios in which SQL Server replication can be used:
- To distribute the workload across servers (such as moving an ad-hoc query and reporting capability from a source server).
- To move specific subsets of data (such as a company department or one month's worth of data) from a main central server.
- When you have a central database that is updated and the updates must be moved out to other databases (such as a department store changing prices for an item).
- Account management/tracking applications used by salesmen or field reps using laptops in a disconnected mode that later replicate changes to a central internal server.
- A Web-based user group or subscription application that can periodically pull down database changes via the Web.
- Environments in which servers are importing flat-file information. Use a central database to import the flat file and replicate the information to the other sites.
Replication and Microsoft SQL Server 2000
Microsoft SQL Server 2000 has added many new enhancements to data replication. This article focuses on introducing you to SQL Server replication terminology and the different types of replication. To become familiar with replication terminology, consider the replication scenario shown in Figure 1. With transactional-based replication, the publication is modified at the publisher's site and the changes are replicated to the subscribers of the publication.
Figure 1 - Overview of replication (Transactional).
Publish and Subscribe
SQL Server replication uses a publish and subscribe metaphor. Servers publish publications to which other servers can subscribe. A good analogy of this process would be a magazine. The publishers provide subscriptions of their magazine publication. The subscribers of the magazine acquire their copies of the publication and read the articles that interest them. That is how SQL Server replication works as well. A SQL Server that makes data available to other servers for subscribing purposes is called a publisher. For example, Server A in Figure 1 is a publisher. A SQL Server that subscribes to a publication published by another SQL Server is said to be a subscriber. (An example of a subscription server is Server B in Figure 1.) A SQL Server that contains the distribution database is said to be a distributor. (The distribution server is Server A in Figure 1.)
Publication and Articles
A publisher publishes a collection of one or more articles called a publication. The publication shown in Figure 1 contains the authors, titles, and publishers table found in the SQL Server example database, pubs. An article is the basic unit of replication and can be a table, a subset of the table or stored procedures. Do you see the similarities between SQL Server replication and the magazine analogy?
Articles are always associated with a publication and cannot be published by themselves. SQL Server 7.0 allowed subscriptions to a single article. SQL Server 2000 still supports this feature for compatibility reasons via Transact-SQL. However, it is strongly recommended that you only subscribe to publications, which is much easier to administer than a subscription to a single article.
Publications can contain one or more of the following:
- Vertically partitioned tables
- Stored procedures (New for SQL Server 2000)
- Horizontally partitioned tables
- Horizontally and vertically partitioned tables
A vertical partitioned table (refer to Figure 1) is an article that uses a filter to select only certain columns of a table. A horizontal partitioned table (refer Figure 1) is an article that uses a filter to select only specific rows in the table.The following cannot be published:
- The model, tempdb, and msdb databases
- The system tables in the master database
Subscriptions Types (Push and Pull)
Changes made at the publisher can be replicated to the subscribers via a push subscription or a pull subscription. With a push subscription, the publication server is responsible for replicating all changes to the subscribers without the subscribers asking for the changes. A push subscription is typically used when the subscribing databases want the changes as soon as they are made or a high level of security is required. With a pull subscription, the subscriber initiates the replication instead of the publisher. Pull subscriptions require lower overhead than push subscriptions and are better suited for situations with a large number of subscribers or lower security requirements.
SQL Server can play one or more of the following roles during the replication process:
|Publisher||A publisher server is responsible for maintaining its source databases, making the data available for replication, and sending the data to the distribution database to be replicated to subscribing servers.|
|Subscriber||A subscriber server receives and maintains published data. Subscribers can also make changes to publications. In cases where a subscriber changes the publication, the subscriber is still a subscriber and not a publisher. (The information still has only a single publisher.)|
|Distributor||The distribution server maintains the distribution database, which is responsible for the store and forward capabilities of SQL Server snapshot and transactional replication. The job of the distribution server is to replicate data from the distribution database to the appropriate subscribing servers.|
SQL Server can participate in one or more replication roles. For example, in many cases, a publication server also serves as a distribution server and can also subscribe to other publications from other publishers (in which case, the same server that was acting as a publisher and distributor is also acting as a subscriber). It is not uncommon for a subscriber also to be a publisher. However, in all SQL Server replication cases, for a publication there exists only a single master copy of the database, which is maintained by the publisher (regardless of how many different subscribers are allowed to update the publication). For example, in a merge replication publication scenario, server A publishes the pubs database. Server B and server C are subscribers and make modifications to the database. The master copy of the database that is receiving the changes is the publisher's database, server A. Server B receives server C's changes when server B replicates with server A.
Out of the box, nonSQL Server systems, such as Oracle and Microsoft Access, can be subscribers of all types of replication (except Immediate Updating Subscribers). However, Microsoft has created an open interface to SQL Server's transaction-based replication services so that third parties can create products that allow nonSQL Server systems (that is, heterogeneous data sources) to become publishers.
SQL Server 2000 has several different types of replication that can be used for a variety of business applications. The next few chapters look at each different type of replication in detail and when and how to use it. The SQL Server 2000-supported replication types are as follows:
- Immediate Updating Subscribers
In Transactional-based replication, the publication is modified at the publisher's site and the changes are replicated to the subscribers of the publication. Transactional-based replication was part of SQL Server 6.x replication. In essence, subscribers of the publication do not modify the publication but treat the information as read-only. This does not mean that all updates to an underlying table must happen at one site. Using vertical and horizontal partitioning, you can create solutions that allow for multiple sites editing information in the same table.
The key, however, to partitioning is that each site owns a particular partition of data that can only be modified by the publishing site. SQL Server 2000 makes it possible to use bi-directional transactional replication without partitioning by using custom stored procedures and loopback detection.
However, because the majority of transaction replication essentially is used by subscribers who treat the publications as read-only or use partitioning, there is no need for conflict resolution or lost updates. Such problems are avoided by the transactional-replication model and data partitioning. Examples of good transactional-based applications/scenarios are a database of rollup information, data marts, databases with regional or divisional information, and a central sales or inventory database that is updated and replicated to different sites.
Snapshot replication takes a snapshot of the schema and data at a certain point in time and replicates this information to subscribing databases. Snapshot replication was also included in SQL Server 6.x and is the simplest type of replication to implement. Because the data is provided as a snapshot of information at some point and time, there is no need to worry about conflicts or loss of transactions. Examples of good Snapshot-based applications/scenarios are lookup tables that do not change frequently, anonymous subscribers, static information, or information that is infrequently updated.
Merge replication enables users to subscribe to a publication and edit the same articles (tables) in the publication without any partitioning or custom procedures. When a subscriber edits the publication, the change is replicated back to the publisher. If a conflict occurs (for example, different users modify the same row at different sites after the databases are synched), the conflict is resolved either by priority-based rules and/or the first one to change the row wins. A good merge application or scenario is a sales tracking/call application in which a salesman uses a laptop computer in a disconnected mode to add a new customer or record a sales call. Later when the salesman is back in the office and connects the laptop to the network, the SQL Server database on the laptop merges the changes made to the central SQL Server database.
Immediate Updating Subscribers
Immediate Updating Subscribers is another form of SQL Server 2000 update replication. The Immediate Updating Subscriber is transactional-based replication (available with Snapshot and Transaction replication)that enables the subscriber to modify articles in the publication. The modification is then made at the publisher using the two phase-commit protocol and replicated to the other subscribers using the standard transaction-based replication mode. The two-phase commit protocol requires that the change occur immediately on all servers participating in the transaction or the transaction is rolled back. Therefore, all servers participating in the transaction must have a reliable connection to one another.
Immediate Updating Subscribers removes the complexity of requiring all sites to participate in a two-phase commit (which requires that all sites are connected) and yet still maintain transaction integrity. Good applications/scenarios for Immediate Updating Subscribers are applications with good network connections, applications that do not have high levels of OLTP (on-line transaction processing), and applications that need the same number at the remote site as well as a central site.
Transactional consistency in the context of replication means that the data will be identical across all sites with a result that could have been achieved had all transactions been performed at a single site. Replication also adds the caveat of eventuality or at some point in time because there might be a time delay from the time the change is made to the time the data is replicated to the subscribers. SQL Server 2000 replication falls into two modes of transaction consistency: guaranteed loose consistency and guaranteed no consistency.
Guaranteed loose consistency means that the data synchronization between the source and destination server does not occur simultaneously. Before going into more detail, look at another distributed data model: guaranteed tight consistency. A Guaranteed tight consistency distributed data model can be accomplished with SQL Server using two phase commits. In a tight consistency model, all transactions are committed or rolled back on all the servers so that the data is in synch 100% of the time. In a loose consistency model, transactions are committed or rolled back on a source server. The transactions on the source server are then replicated asynchronously to subscribing servers.
The big difference between the guaranteed tight consistency model and the guaranteed loose consistency model is that, with the guaranteed loose consistency model, there is some lag time between when changes are made to the source server and when they are replicated to the destination servers (that is, the databases are temporarily out of synch). Transactional based replication and snapshot replication are all examples of the guaranteed loose consistency model. The transactional based Immediate Updating Subscriber model lies somewhere between guaranteed loose consistency and guaranteed tight consistency. With an Immediate Updating Subscriber, a two-phase commit (tight consistency) is used between two sites (the publisher and a subscriber) and then the standard transactional based replication (loose consistency) is used to replicate the change to all other subscribers.
Merge replication falls into the guaranteed no consistency model. With guaranteed no consistency, the data will be identical across all sites with a result that might not have been achieved had all transactions been performed at a single site (see Chapter 29, "Snapshot and Merge Replication," of our book, Microsoft SQL Server 2000 DBA Survival Guide, for an actual example). Because Merge replication is for sites that are regularly disconnected, site autonomy is more important than transaction consistency. All the sites end up with the same value; however it might be a value that would not have been achieved had all the changes been made at a single site.
The Problem that SQL Server 2000 Replication Cannot Solve
There is one situation which SQL Server replication cannot address and (nor can any of the competition despite their update anywhere claims). It is having multiple sites without reliable connections update data at any time with any frequency and maintain transactional integrity. The problem is that as sites become disconnected (through for example, unreliable connections), transaction integrity is rapidly lost. A disconnected site could make a change to a record that is deleted by another site. When the site reconnects, a problem occurs as the update fails because the record no longer exists on the master database. If you read the details about the different types of replication in our book, you will see further reasons why the problem can't be solved.
The distribution database stores all the transactions to be replicated to subscribing servers (transactional replication) and acts as the store-and-forward database for replicated transactions. Transactions stay in the distribution database until all subscribers have successfully received the transaction. The distribution database is used to store publication and subscriber synchronization information as well. The following are some of the system tables that make up the distribution database:
- MSmerge_history - Contains historical information about previous subscriber updates.
- MSmerge_agents - Contains information about merge agents.
- MSdistribution_agents - Contains information about distribution agents.
- MSdistribution_history - Contains historical information for distribution agents.
- MSlogreader_agents - Contains information about log reader agents at the local distributor.
- MSlogreader_history - Contains historical information for log reader agents.
- MSrepl_commands - Contains replicated commands.
- MSrepl_errors - Contains information about failed replication.
- MSrepl_transactions - Contains a row for each replicated transaction.
- MSrepl_version - Contains a single row with the current version of replication installed.
An Overview of the SQL Server Replication Agents
To effectively administer SQL Server 2000 replication, you need to be familiar with the different agents used for replication. Our book, Microsoft SQL Server 2000 DBA Survival Guide covers each of the agent's roles in detail in chapters 28 and 29. The agents are as follows:
|Log reader agent||The log reader agent searches the transaction log of published databases for transaction log entries marked for replication. The log reader agent moves the marked transactions to the distribution database. All transactional-based publications have a log reader agent.|
|Merge Agent||The merge agent is responsible for merging incremental changes as well as applying the initial snapshot created by the snapshot agent. Each merge publication has a merge agent.|
|Snapshot Agent||The snapshot agent creates the snapshot files on the distributor and tracks the synchronization status in the distribution database between the published database and the subscribing databases. All publications have a snapshot agent.|
|Distribution Agent||The distribution agent distributes the transactions stored in the distribution database to the subscribing servers. Transactional and snapshot publications have a distribution agent for each subscriber.|
|Queue Reader Agent||The queue reader agent can be used with snapshot or transactional replication. It is a multithreaded process that runs on the distributor, taking messages from a queue and applying them to the appropriate publication.|
SQL Server 2000 allows the merge agent and distribution agent to be executed from applications other than SQL Server using the Active-X controls provided. Additional functionality can be added using SQL-DMO and third-party agents. Figure 2 shows an overview of the different interfaces and agents possible in SQL Server 2000.
Figure 2 - Overview of SQL Server replication interfaces and agents.
The process of Synchronization transfers data between the publisher and the subscribers after the initial snapshot has been applied to the subscriber. SQL Server uses the distribution agent and the merge agent to move data when changes occur at the publisher or the subscriber. The different types of synchronization used in replication are
- Transactional Synchronization
- Merge Synchronization
- Synchronizing Schema Changes
- On Demand Script Execution
With the No Synchronization option, SQL Server assumes that the articles in the source are already in synch with articles in the destination. SQL Server does nothing to verify that the databases are synchronized; that task is up to you.
Walking Through Automatic Synchronization
For this example, assume that server B selects Automatic synchronization during the subscription process. The distribution server creates two files referred to as a synchronization set in the replication working directory (the default is REPLDATA off the SQL Server home directory). The synchronization set consists of a BCP data file with the actual data of the subscribed articles and the article's table schema file.
After the synchronization set is created, a synchronization job is added to the distribution database. The distribution process reads the distribution database and applies the synchronization file set to the subscribing server (in this example, server B). First the schema file is applied to create the table schema. The table information is then copied to the subscribing server using BCP. The distribution server is notified that synchronization has completed, and server A can begin to replicate the publication MyPubs to server B.
Any transactions that occurred to the published articles after the subscribing server first subscribed but before the synchronization process occurred are then replicated to the subscriber.
Replication is an exciting technology that will play an important part in many real-world solutions. To be an effective DBA, you should fully understand how to correctly set up and administer replication and how to correctly use replication to benefit your company or organization.
Following are some of the important things to remember for SQL Server replication:
- SQL Server replication is transaction based and follows a loose consistency data distribution model to a no consistency data model.
- Replication uses a publisher/subscriber metaphor.
- Subscribing databases can perform updates when using merge replication or Immediate Updating Subscribers.
- SQL Server uses ODBC or OLE DB for replication.
- A server can play multiple roles and be a subscriber, publisher, or distributor.
- To set up replication to a nonSQL Server ODBC source, add the ODBC DSN using the ODBC administrator on the distribution server.
- NonSQL Server ODBC sources can only be subscribers (without third-party software).
- The merge and distribution agents do not have to be executed via SQL Server. You can write applications that run these agents using ActiveX controls that ship with SQL Server.
- You can have multiple distribution databases on a single server.
- Different versions of SQL Server can participate in replication.
About the Authors
Mark Spenik and Orryn Sledge are authors of Microsoft SQL Server 2000 DBA Survival Guide (Click to buy) a book published by Sams Publishing. This article is based on information submitted by Laura Jones in their book.
Laura Jones is a consultant with Trilogy Consulting, which has recently been recognized as one of the top 500 fastest growing IT consulting firms in the United States. She is currently consulting full time as a SQL Server DBA for a Fortune 5 company.
© Copyright Sams Publishing, All Rights Reserved