The Future of Backup and Recovery

by Brian Anderson

Backup and recovery are more important today than ever before. Take, for example, a recent announcement from a major database vendor that it now has the ability to process nearly half a million transactions per minute. That's three times the speed of the next-fastest database.

What is significant about that kind of increase? It clearly shows that businesses today are demanding higher throughput as they continue to store and process more data. It wasn't long ago that people were measuring data volumes in terms of gigabytes. Today, however, it is commonplace to manage a full terabyte of data on a single server --that's roughly equivalent to the data contained in ten thousand 300-page novels.

These huge volumes demonstrate our extreme reliance on data in today's business environment. But even as our reliance increases and volumes swell, so grows the importance of each individual piece of data making up those volumes. Few organizations can succeed without the assurance that each of their orders, payments, requests and inquiries are being properly fulfilled and processed.

Recovery: The Key to a Successful Backup

These two extremes -- the huge volumes of data and the importance of each individual piece of data -- lead to significant challenges for database administrators (DBAs) and IT managers, particularly in the area of backup and recovery. Unfortunately, many organizations often lose sight of the fact that recovery is the whole point of backup and recovery.

The daily challenge of successfully completing backups on time often leads people to believe that once the backup has wrapped up, the job is finished. In fact, that's not the case. A backup is only as good as the recovery it can provide. And more often than not, you'll discover that a backup is unusable when you most need to perform recovery. This is especially true with today's sophisticated and complex database management systems, which are the backbone of most modern business applications.

For these reasons, many organizations are rethinking their backup and recovery plans with an eye toward database availability and ensuring that all of their data can be recovered as quickly as possible. Call it right-size recovery, an approach to data recovery that includes all the tools and procedures necessary to perform the smallest, most complete and most accurate recovery possible, regardless of the type of database outage.

One Size Doesn't Fit All

The first step in right-size recovery is to identify the various types of situations from which you may have to recover. Recovery is not a "one size fits all" proposition. In fact, a full-scale disaster wiping out all of your data tends to be the exception rather than the rule.

Redundant hardware configurations and improved hardware and operating system reliability have significantly reduced the number of "disk crash" horror stories in today's IT environments. Emerging technologies, such as storage area networks (SANs) and data mirroring, promise to further reduce the need to completely recreate your data from scratch. Not surprisingly, organizations today are far more likely to encounter situations in which identifiable data elements or transactions were entered incorrectly either by a user or an application error.

Generally, a spectrum of database problems can -- and should -- be identified in order to ensure that you have complete recoverability. Some of these database problems are listed below:

Invalid Data	This is the smallest, but most common database problem. It occurs when a finite number of invalid entries find their way into the data.
Corrupted Database Object	The next level of database problems include situations in which a single or limited number of database objects have become corrupted or invalid.
Full Database Corruption	At this level, the scope of the problem is so significant that the database is no longer operational and a full database recovery must be performed.
Multiple Database Corruption	The largest level of database problems occur when multiple databases within the enterprise have been corrupted and must be recovered as a set.

Successfully recovering from each of these situations with the least possible disruption to the business means matching the best possible recovery approach to the situation at hand. . In other words, selecting the "right-size" recovery for each unique database problem. Let's look at the specific recovery needs of each of these individual situations.

Transaction Recovery

Invalid data entering a business application is the most common recovery situation. There are a number of ways this can happen. A user might inadvertently post sales orders to the wrong account or incorrectly post duplicate sales orders to the same account. In another scenario, a user might specify the wrong currency conversion rate or table, or maintenance to a conversion table may have been performed incorrectly.

While most business applications today have manual procedures for correcting user errors like these, the number of corrections and time required to make them often force users to turn to DBAs for help.

Despite thorough testing and careful administration, there are also times when bugs in an application get into production and generate invalid data. This is especially true after maintenance or "patches" have been posted to the application. Under the traditional backup and recovery approach, your options are limited. Organizations must decide if it makes good business sense to close an application from users for hours at a time while the entire database is recovering from tape. Often the answer is no, and users are forced to try to re-key the information by hand.

With right-size recovery, organizations have additional recovery options. Transaction recovery, also known as data-level recovery, allows DBAs to precisely identify and correct the invalid data. The DBA can select and examine each of the changes that were applied to the database by using powerful selection and filtering capabilities.

Once the incorrect transactions have been identified, the DBA can reverse or "undo" the changes, and re-apply correct data. right-size recovery tools even have functionality to maximize the speed and control with which the corrected transactions are applied. Moreover, because the corrections are applied through standard database management system (DBMS) statements, the database remains open and available to business users.

Database Object Recovery

There are times when one or more entire objects making up the database must be recovered. This could be because the scope of the data errors is so widespread that transaction recovery is not an option, or because the object has been accidentally dropped or corrupted.

Database tables, for example, are objects that typically need to be recovered. Because tables are logical objects, however, they do not correspond to data files in a physical backup. Again, closing and restoring the entire database is usually not the best solution. Right-size recovery allows DBAs to identify and recover only the missing or damaged objects.

Right-size recovery products contain built-in database intelligence to automatically identify all of the objects making up the database from information captured when the backup was taken. This information is then matched against the existing database environment. Automated, database-intelligent analysis replaces the labor-intensive manual approach normally required, which further minimizes total time to recovery (TTR). Missing or invalid objects are then automatically recovered from the physical backup of the database, while valid objects remain unaffected. right-size recovery products even guide the DBA through the process, ensuring that all the appropriate steps are executed in the proper sequence.

Full Database Recovery

If numerous objects within the database are invalid or a system object of the database, such as the control file, has been corrupted or lost, the entire database may need to be recovered. By definition, this requires the database to be closed. During this time, users are not able to access important business-critical applications. As a result, profitability for the entire organization may be impacted.

Not surprisingly, database crashes rarely happen when it's most convenient. They typically happen during times of unusual or high-stress activity, such as month-end processing, or when new applications are being brought online. They also usually occur when database and application experts are not available.

This is where the automation, ease of use and consistency of a right-size recovery product really pays off. Recovering a database manually can require as many as 26 discrete decision points. right-size recovery minimizes the total time to recover by reducing one of the most time-consuming parts of a traditional recovery: human "think time". By automatically walking the DBA through each recovery step, not only is impact of the database outage minimized, but costly human errors are also avoided during this critical and pressure-packed process. Some of the tasks that a right-size recovery product helps automate include:

·Analysis - Determining what went wrong and what is missing. right-size recovery's automated discovery capabilities compare the current database structure to previously recorded structure information to automatically determine what is missing.
Recovery Source - This involves determining the best and most current set of backup data from which to recover, where that data is stored, and which pieces are needed. Right-size recovery products maintain a history of all backup activity, including full backups and various levels of incremental backups. These tools automatically identify the backup assets needed to recover all valid data.
Recovery Preparation - Before starting the restore, you will first need to prepare the database. This may mean simply shutting it down completely. With right-size recovery, built-in database intelligence provides the appropriate commands needed to prepare the database for the recovery process.
Restore - Once the database is ready, right-size recovery ensures that all of the backup data is copied to the correct destination. Depending on the amount of data being restored, this can be a time consuming process. right-size recovery has restart capabilities that eliminate the need to start over from the beginning after a restore problem.
Recover - After the backup data has been successfully restored, commands must be executed to turn the collection of data files back into a database. This involves such tasks as applying log records to recover the database back to current. Again, right-size recovery ensures that the appropriate recovery commands are issued in the correct sequence.
Post Recovery - The final step is to perform post-recovery tasks to ensure that the database and application are ready for execution. This can include starting the database and suggesting that a fresh backup be taken. right-size recovery performs or advises DBAs on all appropriate post-recovery cleanup tasks.

Point-in-Time Recovery

Another important feature is the ability to perform point-in-time (PIT) recovery. Point-in-time recovery means that all of the data prior to the problem is reapplied to the recovered database. This is can be difficult since several hours, or even days, may have passed since the last backup.

Assume, for example, that a full backup of the database supporting an important business application was taken on Sunday at midnight. Incremental backups of the database were also taken at midnight on Monday, Tuesday and Wednesday. A problem with the database is then discovered at 6 p.m. on Thursday evening.

Applying the full backup and each of the incremental backups would recover the data up to Wednesday at midnight. That leaves 18 hours of updates that still need to be re-applied from the database logs. But how does the DBA know which updates to apply without re-introducing the errors? When did the problem first start occurring? Which transactions are valid? With right-size recovery, the power of transaction recovery can be applied on top of the full and incremental recovery to bring the database current to the point -in time just before the error occurred.

The DBA would first use the selection, filtering and analysis capabilities of transaction recovery to identify the time or transaction number of the last valid update. This information would then be automatically fed to the recovery process, which would apply the full and incremental backups and reapply all of the valid transactions occurring up to the point of the problem. This unique capability takes the guesswork out of point-in-time recovery and ensures recovery of all the valid data.

But right-size recovery doesn't stop there. Let's say that a careful analysis revealed that the transaction problem first cropped up at 4 p.m. on Thursday. Point-in-time recovery would have recovered the data to just prior to that point. Any valid transactions that occurred after 4 p.m. would still be lost. With right-size recovery, the two hours worth of transactions occurring between the time the problem first began and when it was first noticed could be analyzed and all valid transactions reapplied. This is the best possible recovery. It brings the database back to the current state with minimized data loss and no manual re-keying of the data.

Multiple Database Recovery

Clearly the capabilities and flexibility of right-size recovery go a long way toward minimizing TTR and maximizing database availability. But up to now, we have focused on the recovery of a single database on a single server. In the real world, most organizations must manage (and recover) tens, or even hundreds, of database servers.These servers may contain different database types on different operating systems, and may be spread across multiple geographic locations.

As the number of database servers increases, so does the complexity of ensuring that the data on each of those servers is recoverable. Inconsistent and disjointed backup and recovery procedures mean that the ability of an enterprise to recover in the event of an outage is only as strong as the weakest link.

Right-size recovery combines an enterprise-wide view of the organization with maximum database recovery capabilities. This enterprise-wide recovery management console allows consistent, reliable backup and recovery plans to be established and automated. Once "best practices" policies for backup and recovery are developed, they can be automatically propagated throughout the enterprise, helping to ensure that nothing falls through the cracks, regardless of the number of servers.

An important component of enterprise-wide recovery management is the ability to identify and group database objects based on the organization's business needs. These groups can then be backed up and, more importantly, recovered as a single logical unit. Rules or policies as to how and when these logical groups should be backed up can be defined and automated through sophisticated scheduling capabilities. This can eliminate the nightmare of trying to ensure the referential integrity within the databases.

Right-size recovery automatically notifies the appropriate personnel of important backup and recovery events or problems and can even post these events to system monitoring management tools. Extensive reporting capabilities provide IT professionals with the information they need to ensure that all of the enterprise's data assets are recoverable.

Adapting to Today's Environment

An explosion in data growth rates combined with the extreme reliance organizations have on the accuracy and reliability of that data are forcing IT professionals to re-evaluate their backup and recovery strategies. Paramount in this re-evaluation is a change from the emphasis on backup to an emphasis on recovery. Lighting-fast backups won't do you any good if the recovery takes days.

With right-size recovery, all recovery situations are not created equal. Today's business environment demands the flexibility, tools and procedures to perform the smallest, smartest, most efficient recovery possible. By identifying beforehand the spectrum of recovery situations and employing right-size recovery products, organizations can minimize total time to recovery and maximize database availability.

---

Brian Anderson is a senior manager of product management in BMC Software's Storage and Recovery Management group. He has over 15 years' experience in information technology and has worked in development, consulting and product management. He can be reached at brian_anderson@bmc.com.

Contributors : Brian Anderson
Last modified 2005-08-04 08:09 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?