A Four-phase Approach to Procedural Multi-master Replication

by Donald K. Burleson

Introduction

Do you support customers whose databases are updated by users in multiple locations and across multiple time zones? If so, the challenge for the DBA is how best to manage replicated systems that allow for fast database access over Wide Area Networks.

In many shops, popular failover solutions include Real Application Clusters (RAC) and Oracle9i Dataguard. An alternate solution, however, is growing in popularity: Oracle advanced replication — specifically, procedural multi-master replication.

With Oracle multi-master replication, you can implement peer-to-peer replication of all master tables, anywhere in the world. You can update any master site by propagating changes, either synchronously or asynchronously, and apply those changes directly to all other master tables. In addition to providing fast database access across your WAN, multi-master replication also provides solutions for failover and load-balancing issues.

What’s the catch? Multi-master replication is an extremely sophisticated and complex process. You can configure an almost infinite array of multi-master replication models, each adhering to its own set of conflict resolution and refresh rules. You may have heard that advanced replication implementations are notoriously difficult to configure, and they are. Large Oracle shops may spend hundreds of hours configuring and testing a worldwide multi-master replication solution, and many have a dedicated DBA whose sole job is to monitor and maintain the multi-master replication. In the long run, however, the investment in time and resources is worth the extra effort.

In this article, I’ll present several reasons why multi-master replication is popular for geographically distributed systems. For those of you who are new to the basic concepts of multi-master replication, I’ll present a high-level explanation of how it works, including code samples. Then we’ll look at a four-phase plan for implementing procedural multi-master replication. Finally, I’ll tell you where to find three pre-defined PL/SQL packages from Oracle that help define multi-master replication.

Why Consider Oracle Multi-master Replication?

There are a couple of reasons why Oracle multi-master replication is so popular for geographically distributed systems. Perhaps the most important reason is that it provides multiple-node replication capabilities. This may seem obvious, but you must remember that one-way, read-only snapshots are far easier to create and maintain than a multi-master scheme.

The other benefit of multi-master replication is the ability to replicate stored procedures. In a system where all code is encapsulated inside Oracle stored procedures, you can replicate the stored procedures to remote sites, just like data. This capability allows the DBA to coordinate code changes with database changes. Once the Oracle stored procedures are written, you can easily replicate and distribute them to work groups and branch offices throughout the entire replicated network of systems.

Oracle Multi-master Replication

I’ll start with a high-level view of multi-master replication and introduce some basic concepts. Multi-master replication is such a complex topic that I can’t fully address every issue about it in this space. However, I hope you’ll be happy with a conceptual explanation of the mechanisms.

In a nutshell, multi-master replication is nothing more than a coordinated set of updatable snapshots. By “updatable,” I mean that the snapshot allows the FOR UPDATE clause in the snapshot definition. To illustrate this concept, refer to the example below, where you’ll see that the snapshot is allowed to propagate updates back to the master table.

create snapshot 
   customer_updatable_snap 
refresh fast start with sysdate
next sysdate + 1/24
for update 
query rewrite
  as 
   select * from customer@master_site;

Multi-master Conflicts and Resolutions

At first blush, multi-master replication may appear straightforward. However, there is a dark side to the process. Whenever a snapshot has the ability to send updates to other “master” tables, you always run the risk of update conflicts. So what’s the best way to avoid and/or resolve those conflicts? Let’s start the lesson by reviewing multi-master conflict avoidance. Then we’ll dive head-first into the details of procedural replication, so we can see how it all fits together.

An update conflict occurs when one remote user overlays the updates made by a user on another database. Your multi-master replication model should detect and resolve conflicts. Unfortunately, detecting and resolving those conflicts can get extremely complex. Let’s start by looking at what conflicts can occur, and then we’ll look at mechanisms for resolving them.

Conflict Types

Here are the most common types of conflicts you’ll encounter with multi-master replication:

Uniqueness conflict — This conflict results from an attempt from two different sites to insert records with the same primary key. To avoid uniqueness conflicts, you can choose from three available options. Those three pre-built methods are called Append Site Name To Duplicate Value, Append Sequence To Duplicate Value, and Discard Duplicate Value.
Update conflict — This conflict is caused by simultaneous update operations on the same record.
Delete conflict — This type of conflict occurs when one transaction deletes a row that another transaction updates (before the delete is propagated).

Oracle provides several pre-written scripts to help in resolving conflicts. In the case of update conflicts, your only option is to write conflict-resolution routines, and deal with each conflict on a case-by-case basis. Fortunately, Oracle provides several pre-built methods for creating the routines. Click here for the details about Oracle conflict-resolution techniques.

Conflict Resolution Mechanisms

Here are the most common mechanisms at your disposal for resolving conflicts:

Latest Timestamp Value. With this simple technique, you apply updates as they are received. Based on timestamp value, the most recent updates overlays prior updates. This approach can result in situations where one user’s update gets overlaid by a more recent update.
Earliest Timestamp Value. This mechanism is the opposite of the latest timestamp value, in that the first update overlays subsequent updates. As you’d expect, not many shops use this method, but it is an option.
Minimum and Maximum Value. This mechanism may be used when the advanced replication facility detects a conflict with a column group. The advanced replication facility calls the minimum value conflict resolution method and then compares the new value from the originating site with the current value from the destination site for a designated column in the column group. You must designate that column when you select the minimum value conflict resolution method.
Additive and Average Value. When you’re dealing with replicated numeric values, this additive method adds a new value to the existing value using the following formula: (current value = current value + (new value - old value)). The average method averages the conflicting values into the existing value using the formula (current value = (current value + new value)/2).
Groups Priority Value. Using this method, some groups have priority (a higher rank) over other groups. Therefore, the update associated with the highest-ranked group gets the update.
Site Priority Value. In this method, all master sites are NOT created equal. Some remote sites will have priority over other sites.

To illustrate how conflict resolution is defined, consider the example below. In this code, we execute dbms_repcat.add_update_resolution to direct Oracle to use the “latest timestamp” method for conflict resolution for updates to the EMP table.

execute dbms_repcat.add_update_resolution( -
        sname => 'SCOTT',               -
        oname => 'EMP',                 -
        column_group => 'EMP_COLGRP',   -
        sequence_no => 1,               -
        method => 'LATEST TIMESTAMP',   -
        parameter_column_name => 'EMPNO');

At this point, you should be starting to appreciate the complexity of conflict resolution in multi-master replication. Now let’s take a quick look at the techniques you can use to define procedural multi-master replication.

Implementing Procedural Multi-master Replication

Although Procedural multi-master replication is an extremely complex process, you can break down the basic steps for defining procedural replication into four phases:

Phase I: Pre-configuration. (Set-up Oracle parameters and catalog scripts.)
Phase II: Define the repadmin user and database links.
Phase III: Create master database and refresh groups.
Phase IV: Monitor the replication environment.

Let’s take a close look at each phase in turn.

Phase I: Pre-configuration Steps for Multi-master Replication

Before you’re ready to define a multi-master replication environment, there’s a short checklist you need to deal with up front. For every site that will be participating in the replication, you must check the values of these parameters:

1. Oracle parameters minimum settings

shared_pool_size=10m
global_names=true
job_queue_processes=4

To check those values, run this script on your database:

select 
   name, 
   value 
from 
   v_$parameter
where
  name in (
   'job_queue_processes', 
   'global_names',
   ‘shared_pool_size’);

2. You also must be sure that the following dictionary scripts have been run from ORACLE_HOME/rdbms/admin. The catalog.sql was run when you created your instance, and the catproc.sql script is for the procedural option in Oracle.

catalog.sql
catproc.sql

Phase II: Set-up REPADMIN User and Database Links

The following illustrates some of the main steps you’ll follow in pre-creating the REPADMIN users and the required database links for multi-master replication. You should review these steps with great care.

REM Assign global name to the current DB
alter database rename global_name to PUBS.world;

REM Create public db link to the other master databases
create public database link NEWPUBS using 'newpubs';

REM Create replication administrator / propagator / receiver
create user 
   repadmin 
identified by 
   repadmin
default tablespace 
   USER_DATA
temporary tablespace 
   TEMP
quota unlimited on 
   USER_DATA;

REM Grant privileges to the propagator, to propagate changes to remote
sites
execute dbms_defer_sys.register_propagator(username=>'REPADMIN');

REM Grant privileges to the receiver to apply deferred transactions
grant execute any procedure to repadmin;

REM Authorize the administrator to administer replication groups
execute dbms_repcat_admin.grant_admin_any_repgroup('REPADMIN');

REM Authorize the administrator to lock and comment tables
grant lock any table to repadmin;
grant comment any table to repadmin;

connect repadmin/repadmin

REM Create private db links for repadmin
create database link newpubs
        connect to repadmin identified by repadmin;

REM Schedule job to push transactions to master sites
REM This will replicate every minute
execute dbms_defer_sys.schedule_push(        -
           destination => 'newpubs',         -
           interval => 'sysdate+1/24/60',    -
           next_date => sysdate+1/24/60,     -
           stop_on_error => FALSE, -
           delay_seconds => 0, -
           parallelism => 1);

REM Schedule job to delete successfully replicated transactions
execute dbms_defer_sys.schedule_purge( -
        next_date     => sysdate+1/24, -
        interval      => 'sysdate+1/24');

REM Test the database link
select global_name from global_name@newpubs;

Phase III: Create the Master Database and Refresh Groups

Once the repadmin user and the links are in place, you’re ready to define the replication. Again, this is an extremely complex process. However, the following script will provide you with the general steps to get the work done.

connect repadmin/repadmin



REM Create replication group for MASTERDEF site
execute dbms_repcat.create_master_repgroup('MYREPGRP');

REM Register objects within the group
execute dbms_repcat.create_master_repobject('SCOTT', -
        'EMP', 'TABLE', gname=>'MYREPGRP');

execute dbms_repcat.make_column_group(  -
        sname => 'SCOTT',               -
        oname => 'EMP',                 -
        column_group => 'EMP_COLGRP',   -
        list_of_column_names => 'EMPNO');

execute dbms_repcat.add_update_resolution( -
        sname => 'SCOTT',               -
        oname => 'EMP',                 -
        column_group => 'EMP_COLGRP',   -
        sequence_no => 1,               -
        method => 'LATEST TIMESTAMP',   -
        parameter_column_name => 'EMPNO');

REM Add master destination sites
execute
         dbms_repcat.add_master_database( -
         'MYREPGRP', -
        'TD2.world');

REM Generate replication support for objects within the group
execute
         dbms_repcat.generate_replication_support( -
         'SCOTT', -
        'EMP', -
        'table');

Dropping Multi-master Replication

As you’d expect, there will be instances when you may need to turn off multi-master replication. Some of the obvious cases include database maintenance activities such as upgrades and reorganizations. You can use this sample script to disable multi-master replication.

connect repadmin/repadmin


REM Stop replication
execute dbms_repcat.suspend_master_activity(gname=>'MYREPGRP');

REM Delete replication groups
-- execute dbms_repcat.drop_master_repobject('SCOTT',    'EMP', 
'TABLE');
execute dbms_repcat.drop_master_repgroup('MYREPGRP');
execute          dbms_repcat.remove_master_databases('MYREPGRP', 
'newpubs.world');

REM Remove private database links to other master databases
drop database link newpubs.world;


connect sys

REM Remove the REPADMIN user
execute
    dbms_defer_sys.unregister_propagator(username=>'REPADMIN');

execute    

dbms_repcat_admin.revoke_admin_any_schema(username=>'REPADMIN')
;

drop user repadmin cascade;

REM Drop public database links to other master databases
drop public database link newpubs.world;

Phase IV: Monitoring Multi-master Replication

The final phase of implementing multi-master replication involves monitoring. A variety of dictionary views provide the key to monitoring complex multi-replication processes. I cannot stress enough the importance of checking these views on every database in the multi-master network.

DBA_REPSCHEMA. This view contains details for the replication schema
DBA_REPCATLOG. This view provides a log of all replication activities.
DBA_JOBS. Use this view to monitor all scheduled job in the database.
DBA_REPCAT. This view shows the replication catalog.
ALL_REPCONFLICT. This view provides a list of all replication conflicts.
ALL_REPRESOLUTION. For systems defined with pre-defined conflict resolution, this view lists the resolution of every conflict.
DBA_REPOBJECT. This view gives you a list of al replicated objects.
DBA_REPSITES. This view provides is a list of replicated sites.

At this point, you’ll want to closely review the following script, which is the one most commonly used to monitor procedural replication. Of course, you must run this script on each remote database.

connect repadmin/repadmin

set pages 50000

col sname      format a20 head "SchemaName"
col masterdef  format a10 head "MasterDef?"
col oname      format a20 head "ObjectName"
col gname      format a20 head "GroupName"
col object     format a35 trunc
col dblink     format a35 head "DBLink"
col message    format a25
col broken     format a6 head "Broken?"

prompt Replication schemas/ sites
select 
  sname, 
  masterdef, 
  dblink
from
  sys.dba_repschema;

prompt RepCat Log (after a while you should see no entries):
select 
   request, 
   status, 
   message, 
   errnum
from
   sys.dba_repcatlog;

prompt Entries in the job queue
select 
   job, 
   last_date, 
   last_sec, 
   next_date, 
   next_sec, 
   broken, 
   failures,
   what
from
   sys.dba_jobs
where
   schema_user = 'REPADMIN';

prompt Replication Status:
select 
   sname, 
   master, 
   status
from
   sys.dba_repcat;

prompt Returns all conflict resolution methods
select * from all_repconflict;

prompt Returns all resolution methods in use
select * from all_represolution;
prompt Objects registered for replication
select 
   gname, 
   type||' '||sname||'.'||oname object, 
   status
from
   sys.dba_repobject;

select * from dba_repsites;

Resources for Defining Multi-master Replication

When it comes to defining multi-master replication for your shop, you don’t have to start from scratch. Oracle offers the following pre-defined PL/SQL packages that can assist you:

dbms_repcat package — This complex package provides over 50 stored procedures. Follow this link for a listing of the procedures in dbms_repcat.
dcbms_reputil package — This package contains several stored procedures. Here is a list of the procedures in dbms_reputil.
dbms_defer_sys package — This collection contains 19 replication procedures. Here is a list of the procedures in dbms_defer_sys.

Conclusion

In this brief introduction it is impossible to provide a comprehensive overview of this powerful utility. Rather, the intent of this article was to provide a simple overview of the important concepts and illustrate how multi-master replication is used within a distributed Oracle environment.

References

Constraints on updatable snapshots

Oracle advanced replication setup & design tips

Oracle FAQ - Oracle advanced replication scripts

Oracle replication solutions

Oracle8i advanced replication

Oracle9i documentation - conflict resolution techniques

Oracle9i Replication API documentation

OracleNotes.com – Oracle8i advanced replication

Using updatable snapshots

Donald K. Burleson is one of the world's top Oracle Database experts with more than 20 years of full-time DBA experience. He specializes in creating database architectures for very large online databases and he has worked with some of the world's most powerful and complex systems. A former Adjunct Professor, Don Burleson has written 15 books, published more than 100 articles in national magazines, serves as Editor-in-Chief of Oracle Internals and edits for Rampant TechPress. Don is a popular lecturer and teacher and is a frequent speaker at Oracle Openworld and other international database conferences. Don's Web sites include DBA-Oracle, Remote-DBA, Oracle-training, remote support and remote DBA.

Contributors : Donald K. Burleson
Last modified 2005-06-22 12:10 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?