An Oracle Instructor's Guide To Oracle Data Guard

by Christopher Foot

Database administrators, by the very essence of their job descriptions, are the protectors of their organization's core data assets. They are tasked with ensuring key data stores are continuously available. However, ensuring that data is available on a 24 x 7 basis is a wonderfully complex task. Hardware failures, software failures, user errors and disasters all combine to make many administrators lie awake nights thinking about whether their databases will be continuously available.

When a mission-critical application becomes unavailable, it can threaten the survivability of the organization. The financial impact of downtime is not the only issue that faces companies that have critical applications that are offline. Loss of customer goodwill, bad press, idle employees and legal penalties (lawsuits, fines, and so on) must also be considered. It is up to the database administrator to recommend and implement technical solutions that deal with these unforeseen "data disruptions."

Hardware and software vendors (including Oracle) offer varying degrees of system availability. These degrees of availability range from mirrored disks to double (or triple) redundant clustering architectures that can be repaired or replaced without bringing the entire system offline. But, as the reliability of a system increases, so does its cost. The bottom line is — the more highly available a database environment is, the more expensive it is. The challenge that database administrators face is balancing systems availability with total systems cost.

The database administrator should meet with business units to determine the cost for downtime and use it to set a budget for building the highly available database environment. Business units must ask themselves the following questions: How critical is my application to the business? If the system is offline, how much money and customer goodwill will I lose? How much availability is management willing to pay for? The least expensive option can mean downtime of possibly hours (perhaps many) under certain circumstances. At the other extreme, fault-tolerant systems can provide applications with true 24 X 7 availability, but with costs that can range from six to seven figures.

Introducing Oracle Data Guard

Oracle's Data Guard is becoming a popular solution to the problem of providing highly available architectures at a reasonably low cost. Oracle Data Guard is a passive failover environment that uses a single system to run the user applications until a failure occurs. Then the backup system is engaged and takes over for the primary system. The primary system can then be repaired or replaced. The most basic failover environment consists of two hardware servers: one being the primary, or live system, through which users access data and services; the other server contains the secondary or backup database, which closely monitors the operations of the primary and automatically takes over the role as primary in the event of a failure.

Passive failover systems are designed to be able to recover from faults not compute through faults. This means that there will be an outage if a problem occurs on the primary server. The length of the outage depends on the length of time it takes for the problem to be identified (either by the administrator or the software) and the time it takes for the failover system to be brought online.

It is important to understand that Data Guard is not a clustered architecture where individual systems are mirror images or duplicates of the other systems in the cluster (like Oracle Real Application Clusters). Because the systems are not mirror images of each other, data loss is also a concern with failover architectures. How much data is lost as a result of the failure depends upon how the failover environment is designed and configured. Oracle Data Guard can be configured to provide different levels of protections that range from minimal to zero data loss. But as it is with everything in life, there is a trade-off between zero data loss configurations and production system performance.

But Data Guard is more than just failover software, it is a software architecture that creates, supports and monitors a failover environment that protects data from hardware failures, human errors and corruptions that might otherwise cause a critical application failure to occur.

Oracle Data Guard Architecture

Let's continue our discussion on the Data Guard Architecture by breaking the Data Guard architecture down into its main components:

The primary database is the live production system. Every standby database is associated with one (and only one) primary database. In Oracle9i Release 2, up to 9 physical and logical standby databases can be associated with a single primary database. As changes are being made to the primary database, LGWR or ARCH transfers a copy of those changes (in the form of redo log entries) to the standby databases.
A physical standby database is identical to the primary database on a block-by-block basis. A physical standby database is updated by applying redo log entries that are received from the primary database. A delay can be put in place to prevent user errors from being propagated from the primary database to the physical standby database. A physical standby database must be in recovery mode while applying the redo. It can be not be used for reporting while it is recovering data, but the recovery process can be temporarily suspended to provide reporting capabilities to end-users.
A logical standby database is an independent database that contains the same data as the primary database. As with its physical counterpart, a delay can be put in place to prevent user errors from being propagated from the primary database to the logical standby database. The logical standby database uses LogMiner technology to convert the log information received from the primary database into SQL statements. The SQL statements are then applied to the logical standby database. The tables in a logical standby database can be simultaneously used for end-user reporting. Additional indexes and materialized views can be created in the database to increase query performance. All tables in the standby database that are protecting primary database tables are read-only. Tables that are not protecting primary database tables are read-write.

OK, now that we have a firm understanding of the high-level architecture, let's take one step deeper into the components and processes:

As you can see, the graphic above provides more detail and introduces a few new terms. Let's walk through the diagram starting with the primary database in the upper left hand corner:

The primary database's LGWR process collects transaction redo data and updates the online redo logs.
If the environment is configured for maximum protection, log writer (LGWR) will ship transaction redo data directly to the standby's Remote File Server Process (RFS) via Oracle NET. LGWR will transmit the redo information to the destination concurrently as the online redo log is populated. Administrators are able to specify synchronous or asynchronous network transmission of redo data to the remote destinations.
The environment can also be configured to have archiver (ARCH) ship full archived redo logs to the standby server's Remote File Server Process via Oracle NET. Administrators configure ARCH to ship archived redo logs to the standby server by placing additional entries in the parameter file. The full archived logs can only be sent to the Remote File Server Process using synchronous network transmission. Since only completed archive redo logs are sent to the standby server, data changes on the standby will lag behind the primary.
The standby server's Remote File Server Process (RFS) is responsible for receiving the archived or online redo log data from the primary server.
Depending on how the redo log data was shipped from the primary server (LGWR or ARCH), administrators are able to store the shipped redo data as standby online redo logs or standby archived redo logs. The standby database will still use conventional online redo logs (required for normal database operations) but can be configured to use both online redo logs and standby online redo logs. The following conditions must occur before standby online redo logs can be used as the repository for shipped redo log data:
The primary database must be configured to use LGWR to ship redo log data from the primary server to the standby.
The size of the standby redo log must match the size of at least one of the primary online redo logs.
The standby redo log must be archived on the standby server before its contents can be applied the standby database.
The standby database server will use the Managed Recover Process (MRP) to apply the redo information if the standby database is a physical standby and will use the Logical Standby Process (LSP) to apply redo information if the standby database is a logical standby.
The Fetch Archive Log Process (FAL) is a background Oracle process that runs on the primary database server. If ARCH is used to ship archived redo logs to the standby server there is a possibility of log gaps occurring during network failures. The standby environment can be configured to detect network failures and initiate requests to the FAL server process to send the missing archived redo logs.

Data Protection Modes

Oracle Data Guard offers three modes of data protection. The ultimate goal of any failover system is to keep the primary and standby databases as identical as possible. But the key to success is to balance the needs of transaction protection with transaction performance. Administrators use the ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE {PROTECTION | AVAILABILITY | PERFORMANCE}; statement to configure the Data Guard environment to maximize the Data Guard environment for data protection, availability, or performance

Maximum Protection

Maximum protection ensures the highest level of data availability for the primary database. In maximum protection mode, redo log records are synchronously sent by LGWR to the standby database. Primary database changes are not committed until it has been confirmed that the data is available on at least one standby database. The key word in the last sentence is "available." The redo log data does not have to be committed on the standby database, it must only be acknowledged that the data has been received on the standby server.

If Oracle determines that the redo data can't be transferred from the primary server to the standby servers, it will automatically stop the primary database instance. This ensures that no transaction data is lost when the primary and standby databases are unable to communicate. In order to prevent unwanted primary database shutdowns from occurring, administrators should configure more than one standby database when creating an Oracle Data Guard environment that will be configured for maximum protection.

Standby servers that participate in a maximum protection environment must use standby online redo logs. Because logical standby databases cannot be configured to use standby online redo logs, they are unable to participate in maximum protection configurations.

Maximum protection configurations have the greatest impact on transaction performance. Ensuring there is a high-speed connection between the primary and standby servers can lessen this impact.

Maximum Availability

Maximum availability provides the second highest level of data availability. As with its maximum reliability counterpart, redo data is synchronously transmitted from the primary database to the standby database by LGWR. Primary database changes are not committed until it has been confirmed that the data is available on at least one standby database.

The standby database may temporarily lag behind, or divurge, from the primary database without negatively impacting the production environment. If the standby database becomes unavailable for any reason, the protection mode is temporarily lowered to maximum performance until the problem has been corrected. Once connectivity is reestablished, the standby database will automatically synchronize with the primary database and no data will be lost. If the primary database fails during a primary/standby communication outage, all transactions that occurred on the primary server after the communication outage could be lost.

The use of standby online redo logs is optional for maximum availability mode. This means that logical standby databases can participate in maximum availability configurations. Oracle does recommend that physical standby servers be configured to use standby online redo logs in maximum availability configurations.

Maximum Performance

Maximum performance is the default protection mode. It offers lower data availability and higher performance than its counterparts. Redo log data is asynchronously shipped to the standby database by either LGWR or ARCH. The commit operation on the primary database is not contingent upon the data being received by the standby server. If all of the standby servers become unavailable, processing will continue on the primary database.

The use of standby online redo logs is also optional for this mode. As a result, logical standby databases are able to participate in maximum performance configurations. Physical standby databases can use standby redo logs if redo log data is shipped from the primary database by LGWR.

Data Guard Broker

Oracle's Data Guard Broker is the management framework that is used to create, configure, administer and monitor a Data Guard environment. The Data Guard Broker provides the following benefits:

Simplifies the creation of Data Guard environments by providing wizards to create and configure physical or logical standby databases. Data Guard is able to generate all of the files necessary (parameter, tnsnames.ora, etc.) to establish the connectivity between the standby and primary database servers.
Allows administrators to invoke a failover or switchover operation with a single command and control complex role changes across all systems in the configuration. A switchover is a planned transfer of control from the primary to the standby while a failover is an unplanned transfer of control due to some unforeseen event. By automating activities such as failover and switchover, the possibility of errors is reduced.
Provides performance-monitoring tools to monitor log transport and log apply times.
Provides a GUI interface (Data Guard Manager) tool that allows DBAs to administer a primary /multiple standby configuration with a simple point-and-click interface.
Administrators are able to manage all components of the configuration, including primary and standby servers and databases, log transport services, and log apply services.
Is highly integrated with Oracle Enterprise Manager to provide e-mail and paging capabilities.

An Oracle background server process called DMON is started on every site that is managed by the broker. The DMON process is created when the Data Guard Broker monitor is started on the primary or standby database servers. The DMON process is responsible for interacting with the local instance and the DMON processes running on the other servers to perform the functions requested by the Data Guard Manager or command line interface. The DMON process is also responsible for monitoring the health of the broker configuration.

DMON maintains a persistent configuration file on all of the servers managed by the Data Guard Broker framework. The configuration file contains entries that provide details on all objects in the configuration and their statuses. The broker uses this information to send information back to the Data Guard Manager, configure and start the site and database resource objects and control each object's behavior.

Conclusions

I hope you enjoyed learning about Oracle's Data Guard environment. It really isn't complicated once you break it down into its individual components. Implementing a data guard environment requires no application changes and can be configured to protect every transaction or to provide a balance between maximum protection and maximum performance. It will help protect your data from user-errors, hardware failures and corruptions that would otherwise destroy the databases you are tasked with protecting.

Thanks and I'll see you in class!

Christopher Foot has been involved in database management for over 18 years, serving as a database administrator, database architect, trainer, speaker, and writer. Currently, Chris is employed as a Senior Database Architect at RemoteDBA Experts, a remote database services provider. Chris is the author of over forty articles for a variety of magazines and is a frequent lecturer on the database circuit having given over a dozen speeches to local, national and international Oracle User Groups. His book titled OCP Instructors Guide for DBA Certification, can be found at http://www.dba-oracle.com/bp/bp_book14_OCP.htm.

Contributors : Christopher Foot
Last modified 2006-03-21 09:55 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?