A Four-phase Approach to Procedural Multi-master Replication
Introduction
Do you support customers whose databases are updated by users in multiple locations and across multiple time zones? If so, the challenge for the DBA is how best to manage replicated systems that allow for fast database access over Wide Area Networks.
In many shops, popular failover solutions include Real Application Clusters (RAC) and Oracle9i Dataguard. An alternate solution, however, is growing in popularity: Oracle advanced replication — specifically, procedural multi-master replication.
With Oracle multi-master replication, you can implement peer-to-peer replication of all master tables, anywhere in the world. You can update any master site by propagating changes, either synchronously or asynchronously, and apply those changes directly to all other master tables. In addition to providing fast database access across your WAN, multi-master replication also provides solutions for failover and load-balancing issues.
What’s the catch? Multi-master replication is an extremely sophisticated and complex process. You can configure an almost infinite array of multi-master replication models, each adhering to its own set of conflict resolution and refresh rules. You may have heard that advanced replication implementations are notoriously difficult to configure, and they are. Large Oracle shops may spend hundreds of hours configuring and testing a worldwide multi-master replication solution, and many have a dedicated DBA whose sole job is to monitor and maintain the multi-master replication. In the long run, however, the investment in time and resources is worth the extra effort.
In this article, I’ll present several reasons why multi-master replication is popular for geographically distributed systems. For those of you who are new to the basic concepts of multi-master replication, I’ll present a high-level explanation of how it works, including code samples. Then we’ll look at a four-phase plan for implementing procedural multi-master replication. Finally, I’ll tell you where to find three pre-defined PL/SQL packages from Oracle that help define multi-master replication.
Why Consider Oracle Multi-master Replication?
There are a couple of reasons why Oracle multi-master replication is so popular for geographically distributed systems. Perhaps the most important reason is that it provides multiple-node replication capabilities. This may seem obvious, but you must remember that one-way, read-only snapshots are far easier to create and maintain than a multi-master scheme.
The other benefit of multi-master replication is the ability to replicate stored procedures. In a system where all code is encapsulated inside Oracle stored procedures, you can replicate the stored procedures to remote sites, just like data. This capability allows the DBA to coordinate code changes with database changes. Once the Oracle stored procedures are written, you can easily replicate and distribute them to work groups and branch offices throughout the entire replicated network of systems.
Oracle Multi-master Replication
I’ll start with a high-level view of multi-master replication and introduce some basic concepts. Multi-master replication is such a complex topic that I can’t fully address every issue about it in this space. However, I hope you’ll be happy with a conceptual explanation of the mechanisms.
In a nutshell, multi-master replication is nothing more than a coordinated set of updatable snapshots. By “updatable,” I mean that the snapshot allows the FOR UPDATE clause in the snapshot definition. To illustrate this concept, refer to the example below, where you’ll see that the snapshot is allowed to propagate updates back to the master table.
create snapshot
customer_updatable_snap
refresh fast start with sysdate
next sysdate + 1/24
for update
query rewrite
as
select * from customer@master_site;
Multi-master Conflicts and Resolutions
At first blush, multi-master replication may appear straightforward. However, there is a dark side to the process. Whenever a snapshot has the ability to send updates to other “master” tables, you always run the risk of update conflicts. So what’s the best way to avoid and/or resolve those conflicts? Let’s start the lesson by reviewing multi-master conflict avoidance. Then we’ll dive head-first into the details of procedural replication, so we can see how it all fits together.
An update conflict occurs when one remote user overlays the updates made by a user on another database. Your multi-master replication model should detect and resolve conflicts. Unfortunately, detecting and resolving those conflicts can get extremely complex. Let’s start by looking at what conflicts can occur, and then we’ll look at mechanisms for resolving them.
Conflict Types
Here are the most common types of conflicts you’ll encounter with multi-master replication:
- Uniqueness conflict — This conflict results from an attempt from two different sites to insert records with the same primary key. To avoid uniqueness conflicts, you can choose from three available options. Those three pre-built methods are called Append Site Name To Duplicate Value, Append Sequence To Duplicate Value, and Discard Duplicate Value.
- Update conflict — This conflict is caused by simultaneous update operations on the same record.
- Delete conflict — This type of conflict occurs when one transaction deletes a row that another transaction updates (before the delete is propagated).
Oracle provides several pre-written scripts to help in resolving conflicts. In the case of update conflicts, your only option is to write conflict-resolution routines, and deal with each conflict on a case-by-case basis. Fortunately, Oracle provides several pre-built methods for creating the routines. Click here for the details about Oracle conflict-resolution techniques.
Conflict Resolution Mechanisms
Here are the most common mechanisms at your disposal for resolving conflicts:
- Latest Timestamp Value. With this simple technique, you apply updates as they are received. Based on timestamp value, the most recent updates overlays prior updates. This approach can result in situations where one user’s update gets overlaid by a more recent update.
- Earliest Timestamp Value. This mechanism is the opposite of the latest timestamp value, in that the first update overlays subsequent updates. As you’d expect, not many shops use this method, but it is an option.
- Minimum and Maximum Value. This mechanism may be used when the advanced replication facility detects a conflict with a column group. The advanced replication facility calls the minimum value conflict resolution method and then compares the new value from the originating site with the current value from the destination site for a designated column in the column group. You must designate that column when you select the minimum value conflict resolution method.
- Additive and Average Value. When you’re dealing with replicated numeric values, this additive method adds a new value to the existing value using the following formula: (current value = current value + (new value - old value)). The average method averages the conflicting values into the existing value using the formula (current value = (current value + new value)/2).
- Groups Priority Value. Using this method, some groups have priority (a higher rank) over other groups. Therefore, the update associated with the highest-ranked group gets the update.
- Site Priority Value. In this method, all master sites are NOT created equal. Some remote sites will have priority over other sites.
To illustrate how conflict resolution is defined, consider the example below. In this code, we execute dbms_repcat.add_update_resolution to direct Oracle to use the “latest timestamp” method for conflict resolution for updates to the EMP table.
execute dbms_repcat.add_update_resolution( -
sname => 'SCOTT', -
oname => 'EMP', -
column_group => 'EMP_COLGRP', -
sequence_no => 1, -
method => 'LATEST TIMESTAMP', -
parameter_column_name => 'EMPNO');
At this point, you should be starting to appreciate the complexity of conflict resolution in multi-master replication. Now let’s take a quick look at the techniques you can use to define procedural multi-master replication.
Implementing Procedural Multi-master Replication
Although Procedural multi-master replication is an extremely complex process, you can break down the basic steps for defining procedural replication into four phases:
- Phase I: Pre-configuration. (Set-up Oracle parameters and catalog scripts.)
- Phase II: Define the repadmin user and database links.
- Phase III: Create master database and refresh groups.
- Phase IV: Monitor the replication environment.
Let’s take a close look at each phase in turn.
Phase I: Pre-configuration Steps for Multi-master Replication
Before you’re ready to define a multi-master replication environment, there’s a short checklist you need to deal with up front. For every site that will be participating in the replication, you must check the values of these parameters:
1. Oracle parameters minimum settings
- shared_pool_size=10m
- global_names=true
- job_queue_processes=4
To check those values, run this script on your database:
select
name,
value
from
v_$parameter
where
name in (
'job_queue_processes',
'global_names',
‘shared_pool_size’);
2. You also must be sure that the following dictionary scripts have been run from ORACLE_HOME/rdbms/admin. The catalog.sql was run when you created your instance, and the catproc.sql script is for the procedural option in Oracle.
- catalog.sql
- catproc.sql
Phase II: Set-up REPADMIN User and Database Links
The following illustrates some of the main steps you’ll follow in pre-creating the REPADMIN users and the required database links for multi-master replication. You should review these steps with great care.
REM Assign global name to the current DB alter database rename global_name to PUBS.world; REM Create public db link to the other master databases create public database link NEWPUBS using 'newpubs'; REM Create replication administrator / propagator / receiver create user repadmin identified by repadmin default tablespace USER_DATA temporary tablespace TEMP quota unlimited on USER_DATA; REM Grant privileges to the propagator, to propagate changes to remote sites execute dbms_defer_sys.register_propagator(username=>'REPADMIN'); REM Grant privileges to the receiver to apply deferred transactions grant execute any procedure to repadmin; REM Authorize the administrator to administer replication groups execute dbms_repcat_admin.grant_admin_any_repgroup('REPADMIN'); REM Authorize the administrator to lock and comment tables grant lock any table to repadmin; grant comment any table to repadmin; connect repadmin/repadmin REM Create private db links for repadmin create database link newpubs connect to repadmin identified by repadmin; REM Schedule job to push transactions to master sites REM This will replicate every minute execute dbms_defer_sys.schedule_push( - destination => 'newpubs', - interval => 'sysdate+1/24/60', - next_date => sysdate+1/24/60, - stop_on_error => FALSE, - delay_seconds => 0, - parallelism => 1); REM Schedule job to delete successfully replicated transactions execute dbms_defer_sys.schedule_purge( - next_date => sysdate+1/24, - interval => 'sysdate+1/24'); REM Test the database link select global_name from global_name@newpubs;
Phase III: Create the Master Database and Refresh Groups
Once the repadmin user and the links are in place, you’re ready to define the replication. Again, this is an extremely complex process. However, the following script will provide you with the general steps to get the work done.
connect repadmin/repadmin REM Create replication group for MASTERDEF site execute dbms_repcat.create_master_repgroup('MYREPGRP'); REM Register objects within the group execute dbms_repcat.create_master_repobject('SCOTT', - 'EMP', 'TABLE', gname=>'MYREPGRP'); execute dbms_repcat.make_column_group( - sname => 'SCOTT', - oname => 'EMP', - column_group => 'EMP_COLGRP', - list_of_column_names => 'EMPNO'); execute dbms_repcat.add_update_resolution( - sname => 'SCOTT', - oname => 'EMP', - column_group => 'EMP_COLGRP', - sequence_no => 1, - method => 'LATEST TIMESTAMP', - parameter_column_name => 'EMPNO'); REM Add master destination sites execute dbms_repcat.add_master_database( - 'MYREPGRP', - 'TD2.world'); REM Generate replication support for objects within the group execute dbms_repcat.generate_replication_support( - 'SCOTT', - 'EMP', - 'table');
Dropping Multi-master Replication
As you’d expect, there will be instances when you may need to turn off multi-master replication. Some of the obvious cases include database maintenance activities such as upgrades and reorganizations. You can use this sample script to disable multi-master replication.
connect repadmin/repadmin REM Stop replication execute dbms_repcat.suspend_master_activity(gname=>'MYREPGRP'); REM Delete replication groups -- execute dbms_repcat.drop_master_repobject('SCOTT', 'EMP', 'TABLE'); execute dbms_repcat.drop_master_repgroup('MYREPGRP'); execute dbms_repcat.remove_master_databases('MYREPGRP', 'newpubs.world'); REM Remove private database links to other master databases drop database link newpubs.world; connect sys REM Remove the REPADMIN user execute dbms_defer_sys.unregister_propagator(username=>'REPADMIN'); execute dbms_repcat_admin.revoke_admin_any_schema(username=>'REPADMIN') ; drop user repadmin cascade; REM Drop public database links to other master databases drop public database link newpubs.world;
Phase IV: Monitoring Multi-master Replication
The final phase of implementing multi-master replication involves monitoring. A variety of dictionary views provide the key to monitoring complex multi-replication processes. I cannot stress enough the importance of checking these views on every database in the multi-master network.
- DBA_REPSCHEMA. This view contains details for the replication schema
- DBA_REPCATLOG. This view provides a log of all replication activities.
- DBA_JOBS. Use this view to monitor all scheduled job in the database.
- DBA_REPCAT. This view shows the replication catalog.
- ALL_REPCONFLICT. This view provides a list of all replication conflicts.
- ALL_REPRESOLUTION. For systems defined with pre-defined conflict resolution, this view lists the resolution of every conflict.
- DBA_REPOBJECT. This view gives you a list of al replicated objects.
- DBA_REPSITES. This view provides is a list of replicated sites.
At this point, you’ll want to closely review the following script, which is the one most commonly used to monitor procedural replication. Of course, you must run this script on each remote database.
connect repadmin/repadmin set pages 50000 col sname format a20 head "SchemaName" col masterdef format a10 head "MasterDef?" col oname format a20 head "ObjectName" col gname format a20 head "GroupName" col object format a35 trunc col dblink format a35 head "DBLink" col message format a25 col broken format a6 head "Broken?" prompt Replication schemas/ sites select sname, masterdef, dblink from sys.dba_repschema; prompt RepCat Log (after a while you should see no entries): select request, status, message, errnum from sys.dba_repcatlog; prompt Entries in the job queue select job, last_date, last_sec, next_date, next_sec, broken, failures, what from sys.dba_jobs where schema_user = 'REPADMIN'; prompt Replication Status: select sname, master, status from sys.dba_repcat; prompt Returns all conflict resolution methods select * from all_repconflict; prompt Returns all resolution methods in use select * from all_represolution; prompt Objects registered for replication select gname, type||' '||sname||'.'||oname object, status from sys.dba_repobject; select * from dba_repsites;
Resources for Defining Multi-master Replication
When it comes to defining multi-master replication for your shop, you don’t have to start from scratch. Oracle offers the following pre-defined PL/SQL packages that can assist you:
- dbms_repcat package — This complex package provides over 50 stored procedures. Follow this link for a listing of the procedures in dbms_repcat.
- dcbms_reputil package — This package contains several stored procedures. Here is a list of the procedures in dbms_reputil.
- dbms_defer_sys package — This collection contains 19 replication procedures. Here is a list of the procedures in dbms_defer_sys.
Conclusion
In this brief introduction it is impossible to provide a comprehensive overview of this powerful utility. Rather, the intent of this article was to provide a simple overview of the important concepts and illustrate how multi-master replication is used within a distributed Oracle environment.
References
Constraints on updatable snapshots
Oracle advanced replication setup & design tips
Oracle FAQ - Oracle advanced replication scripts
Oracle9i documentation - conflict resolution techniques
Oracle9i Replication API documentation
OracleNotes.com – Oracle8i advanced replication
Donald K. Burleson is one of the world's top Oracle Database experts with more than 20 years of full-time DBA experience. He specializes in creating database architectures for very large online databases and he has worked with some of the world's most powerful and complex systems. A former Adjunct Professor, Don Burleson has written 15 books, published more than 100 articles in national magazines, serves as Editor-in-Chief of Oracle Internals and edits for Rampant TechPress. Don is a popular lecturer and teacher and is a frequent speaker at Oracle Openworld and other international database conferences. Don's Web sites include DBA-Oracle, Remote-DBA, Oracle-training, remote support and remote DBA.
Contributors : Donald K. Burleson
Last modified 2005-06-22 12:10 AM