Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » DB2 » DB2 Distributed (UDB) Articles Archive » Transaction Logs
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 1984
 

Transaction Logs

by Peter Gulutzan

Write-Ahead Logging (WAL) has won the popularity contest. Early DBMSs (System R and SQL/DS) used a different protocol, some current DBMSs (PostgreSQL and MySQL) are recent converts, and one DBMS (InterBase) would rather fight than switch. But among the Big Three it's WAL all the way:

Who Uses WALs?

DBMS      Uses Write-Ahead Logs Claims To Use ARIES?
IBM Yes                                        Yes
Microsoft          Yes Yes
Oracle Yes No

Microsoft SQL Server 2000 and IBM DB2 7.2 specifically use the ARIES write-ahead log algorithms; Oracle9i uses something quite similar. So I can give an account of all three DBMSs -- Microsoft, IBM, Oracle -- at once. I've shoved many technicalities and terminology issues into footnotes that look like this: [1]. I recommend that you skip all the footnotes when reading this article initially; you can then browse the Footnotes file later, like a FAQ.

A Simplified Summary

A log is a group of log records that are written serially, that is, all new log records are appended to the end. There is one, and only one, log per database, so log records for different transactions are intermixed. Log records have information that a DBMS can use to recover after a crash or a disk failure. Recovery involves "redo" (processing logged operations again) or "undo" (reversing logged operations).

Logs grow rapidly, so it's often impossible to store a complete log online. Portions of a log are thus moved to offline storage when they no longer contain active transactions. This movement is called archiving.

Log Records

A log consists of log records [2]. Log records always contain some control information:

      • The Log Sequence Number, which for ARIES logs is a vital unique identifier of the log record [3].
      • A way to find other log records for the same transaction, such as the Transaction Identifier or a backward pointer.
      • A type indicator, such as "Committed."

The rest of a log record depends on what's being logged. Table 1 shows the major types of log records and their contents.

TYPE CONTENT
UPDATE For each changed column:
old value (the "Before Image")
new value (the "After Image") [4]
CREATE INDEX A "CREATE INDEX" statement [5]
COMMIT Marks transaction end, more on this later
CHECKPOINT            More on this one later, too!
ROLLBACK Compensation Log Records (yup, more later)
Miscellaneous For DBMS-specific features [6]

Table 1: Major Types of Log Records

In a few situations a particular DBMS avoids logging [7]. But mostly the DBMS plays safe and logs everything, even changes to indexes [8] and temporary tables. But if you work on temporary tables you'll reduce log overhead [9].

A Pool Of Log Buffers

The DBMS won't write log records to a file immediately after making the records. Instead, it buffers writes. The log buffers are managed separately from the data buffers. Table 2 shows details of the two buffer pools.

  LOG BUFFER POOL [10] DATA BUFFER POOL
CONTENTS       Log Records                              Copies of data and index pages
MANAGED BY      "Log Writer" process [11] "Buffer Manager" process [12]

Table 2: The Two Buffer Pools

As long as a log record is in a buffer, its vital "redo" information could be lost in a crash, so it must get safely onto a non-volatile medium. In other words, something must "flush" the buffer [13].

Scenarios that cause log-buffer flushing are:

      • When the log buffer pool fills up [14].
      • When the data buffer pool fills up [15]. It sounds strange that log buffers must be flushed when it's the data buffer pool that's full. Yet that's the point of WAL: the buffer manager waits for the log writer. Under ARIES, the buffer manager won't flush a dirty page until the log records are flushed for all changed rows on the page. This is checkable by comparing Log Sequence Numbers.
      • During SHUTDOWN.
      • At some regular long interval [16].
      • During COMMIT.

More on COMMIT

During COMMIT, the log writer flushes all log buffers and adds a final log record to mark COMMIT [17]. This event guarantees atomicity. All changes in the transaction are deemed nonexistent unless the commit log record exists. An application can trust that a successful COMMIT return means the log is current [18].

The log writer doesn't just flush the current transaction's log records. It flushes every log record that was made before the COMMIT -- including log records from other transactions that haven't been committed yet! It might seem strange to flush log records from uncommitted transactions but that's a necessary rule for the system -- everything gets written in order. Writing really is serial [19].

You might have heard that COMMIT causes flushing of dirty data pages too [20]. No! False! As soon as the log record is flushed, the database is safe. Therefore the buffer manager can delay flushing dirty data pages until some convenient time.

A Ring Of Log Files

Every database has one log [21]. There is a DBMS process called the log writer that adds new log records at the end of the log [22]. Writes to the log may be mirrored [23]. Logs can contain more than one log file [24].

If there are two files, when the log writer reaches the end of log one, it starts writing to log two. When the log writer reaches the end of log two, it starts writing to log one. This is called "circular logging" [25]. It is much like ring buffering and can be extended to a large ring of log files [26].

With circular logging, it's possible to have one process (the log writer) writing new log records at the end of the log, while another process (the archiver) is reading older log records. [27] The archiver saves the contents to offline storage [28].

Logically there are three types of log records in distinct sections of the log.

      • There are active log records that the log writer has written but the archiver can't touch because they begin with a log record from an unended transaction.
      • There are recoverable log records that the log writer has written and the archiver can touch, but the archiver hasn't reached them yet so they can't be overwritten.
      • There are reusable log records that belong to ended transactions that the archiver has finished with, so they can be overwritten or released.

Sections Of The Logical Log [29]

      ---------------  <-
     - RECOVERABLE -   |    /* archiver reads old log records here */
     ---------------   |
        |              |
        |              ^
     ---------------   |
     - ACTIVE      -   |    /* log writer writes new log records here */
     ---------------   |
        |              ^
        |              |
     ---------------   |
     - REUSABLE    -   |
     ---------------  >-

TIP: Log files should be on a different drive than the data files. The active and recoverable log sections should also be on separate drives. That way, the log writer is almost always writing sequentially to the next segment of a log file, while the archiver is almost always reading sequentially from the next segment of a different log file. Result: minimal seek time [30].

Circular logging's advantage is that files can be added, moved, or deleted when necessary. And the log doesn't have to grow forever, because the log writer circles back. There are two complications:

      • If the log writer catches up with the archiver, it will overwrite log records that the archiver hasn't read yet. That makes "Recovery From Backup" impossible [31]. Overwriting can be prevented by expanding the log temporarily [32], or by stalling the log writer until the archiver gets ahead [33]. But most simply, one can just allow the overwrite and gamble that the media won't fail [34]. Even if you gamble, ROLLBACK is still possible.
      • If the archiver catches up with the log writer, it will dispose of log records belonging to transactions that haven't ended yet. That makes ROLLBACK impossible. This can be prevented by stalling the archiver when it reaches a log record for a transaction that hasn't yet committed or rolled back.

Checkpoints

The DBMS flushes only log records at COMMIT time; it delays flushing data records until later. When will it flush data records?

There are processes that will flush the data buffer pool when it's full, or at intervals [35]. But if those were the only processes, then it would be impossible to ensure, for any given data page, that the page could be redone from online active/recoverable log records. Recoveries would have to start with an early archived log file, and redo the whole history.

Users need recoveries to work faster, so there are times when the DBMS will write absolutely everything. This enables it to flush all dirty data pages to the database too. Then the DBMS writes a log record marking that a "checkpoint" has been done. If a crash occurs later, recovery is easy because the DBMS only needs to roll forward with log records since the last checkpoint. This easiness only applies for recovering from a crash ("instance recovery"). If the database disk drive melts, then you have to restore the last backup and rerun for all archived and all online log files ("recovery from backups" or "media recovery"). Checkpoints won't help for media recovery, but instance recovery is much more common, so checkpoints are worthwhile.

As the Oracle documentation felicitously puts it: "Because all database changes up to the checkpoint have been recorded in the data files, [log records] before the checkpoint no longer need to be applied to the datafiles if instance recovery is required."

Here is what happens during a checkpoint:

      • The DBMS flushes all dirty pages to the database [36]. I really do mean all dirty pages; even those pages changed by uncommitted transactions are flushed.
      • The DBMS writes and flushes a CHECKPOINT log record to the log [37]. The CHECKPOINT log record contains all information the DBMS will need to restart as of the checkpoint time [38].
      • A checkpoint is a blocking process; that is, other DBMS processes must wait while checkpointing goes on [39].

Checkpoints occur with a frequency that a DBA can control [40]. They also happen during major database upheavals, such as shutdowns [41]. Checkpoints are not synchronized with COMMITs.

ROLLBACK [42]

As well as being useful for recovery, logs are useful for ROLLBACKs -- except in Oracle.

Oracle supports ROLLBACKs with a separate structure called the "undo tablespace" (formerly "rollback segments") [43]. This is different from IBM/Microsoft practice: these DBMSs store both redo and undo portions of the log in one place, while Oracle stores redo log records in the main log (hence Oracle's preferred name "redo log") and undo log records in the undo tablespace. To summarize:

      • List Of {undo plus redo} records: kept by IBM, Microsoft.
      • List of {undo} records plus list of {redo} records: kept by Oracle.

Table 3 shows the advantages of each method.

Oracle Advantages

The undo tablespace can be used to support MVCC (multi-version concurrency control). Applications can save obsolete data in the undo tablespace, preserving read consistency [45]                                

Long uncommitted transactions won't stall other running transactions 

IBM/Microsoft Advantages

Log records can be written with a single write [44]

ROLLBACK is accomplished by writing "compensation log records," which are robust [46]

Table 3: Advantages of Log Storage Methods

Recovery

Suppose you execute an SQL statement:

UPDATE Table1 
  SET unique_column = 2
  WHERE unique_column = 1

Assume COMMIT is automatic and all goes well. The log contains these log records:

------------------------------------
- TRANSACTION START Log Record     -
------------------------------------

...

------------------------------------
- UPDATE Log Record:               -
-       Page #x, Row #y, Column #z -
-       Before-Image: 1            -
-       After-Image: 2             -
------------------------------------

...

------------------------------------
- COMMIT Log Record                -
------------------------------------

...

------------------------------------
- CHECKPOINT Log Record            -
------------------------------------

Now suppose all doesn't go well -- instead, the system crashes. That's fine: when you restart, the DBMS will read the log and start a recovery procedure. The recovery will depend on what log records are missing.

If the CHECKPOINT log record is missing, then it's possible that the crash happened before page #x was written. No problem: the recovery manager can "redo" the write using the after-image in the UPDATE log record [47].

If the COMMIT log record is missing too, then it's possible that the crash happened after page #x was written. Again, no problem: the recovery manager can "undo" the write using the before-image in the UPDATE log record.

If the UPDATE log record is also missing, then it's possible that the crash happened after page #x was written ... Wait a minute, it's not possible because this is a write-ahead log. So again, no problem: the recovery manager doesn't encounter this situation.

So recovery is no problem regardless when the crash occurred. But like Wellington said after Waterloo: "'twas a near-run thing." If the log writer hadn't flushed during COMMIT, or if the checkpoint process hadn't caused flushing of dirty data pages, or if the log had been flushed after the data pages, one of the recovery scenarios would have failed. Thank goodness for write-ahead logging, then.

Tips

      • Use TEMPORARY objects, since logging them requires less space.
      • Plan to end transactions with COMMIT rather than ROLLBACK, since ROLLBACK causes more records to be written to the log.
      • Start transactions late (for example with Microsoft don't BEGIN TRANSACTION until you're ready to execute an SQL statement). Stop transactions early (for example avoid holding cursors open after transaction end).
      • Put most frequently changed columns at the end of the row, since the DBMS might log from the first changed byte to the end of the row.

Footnotes

Note: The following footnotes to "Transaction Logs" get to the details; may be read as a FAQ.

When you see a product name inside square brackets, for example [Oracle], the discussion that follows refers to that product alone. [IBM] refers to IBM DB2 7.2, [Microsoft] refers to Microsoft SQL Server 2000, and [Oracle] refers to Oracle9i.

Log Records

[2] Are the records in a log called log records?

      • Usually.
      • [Microsoft] Occasionally they are called "transaction log entries."
      • [Oracle] Since Oracle uses the term "redo log" rather than "log" the common names are "redo record" or "redo entry" rather than "log record."

[3] Do log records have Log Sequence Numbers?

      • Yes, but "Log Sequence Number" means something different to Oracle.
      • [Oracle] Log Sequence Numbers are not unique identifiers of log records -- they are nearly-unique identifiers of log files. Since the discussion is of ARIES, the text statements are true.
      • [ARIES] The Log Sequence Number (LSN) is particularly important when it's recorded on a page (then it's called the page_LSN). After making a log record, the DBMS puts the log record's LSN in the buffer copy of the page that's being updated. During recovery, it's evident that a log record only needs to be reapplied if the log record LSN <= page_LSN.
      • [Microsoft] The Log Sequence Number is a 32-bit address containing the log file number and the offset of the log record within the file.

[4] Does a log record have a redo portion and an undo portion?

      • Yes. You might run into mention of "redo-only log records" and/or "undo-only log records" that have only part of the information, but the logs I'm talking about have all the data, somewhere.
      • [Oracle] Undo-only log records are particularly important in the undo tablespace, which is outside what Oracle programmers think of as the log. Although Oracle calls the log "the redo log," the records actually contain both redo and undo portions. The undo portions are not vital for ROLLBACK, but are needed for recovering transactions that didn't finish. Oracle's log contains log records for changes to all tablespaces -- including the records in the undo tablespace.

Can before- and after- images be copies of pages?

      • Yes.
      • [IBM] Copying can be of items as small as columns.
      • [Oracle] The usual term is "block." Whenever you see the word "page" henceforward, substitute the word "block" in your mind.
      • It is possible to record data changes with a single delta, representing the change, rather than both a before- and after- image. For example on a numeric field a change from 66 to 88 would be logged as a single number that is the result of (66 minus 88). Another example: on a CHAR field a change from 'A' to 'B' would be logged as a single string that is the result of ('A' XOR 'B'). But IBM, Microsoft, and Oracle prefer before-images and after-images rather than deltas.

[5] Does a log contain a CREATE INDEX statement?

      • Well, it could. Let's take Microsoft's situation for an example.
      • [Microsoft] For SQL Server 7, the only information that Microsoft deemed necessary was that an index had been created. The redo process had to re-execute the CREATE INDEX, while the undo process had to do the reverse, which is DROP INDEX. For SQL Server 2000, there is a new option called Full Recovery Mode, which causes logging of the after images of all pages in the index. Remember that CREATE INDEX and other DDL statements usually are auto-committed, so they are not subject to ROLLBACK anyway. But undo information will be present in some form. For example, a CREATE involves a change to a system table, and system tables are loggable.

[6] Are there DBMS-specific log records for supporting special features?

      • Yes. Take Microsoft for example.
      • [Microsoft] SQL Server will have "dummy" log records as filler. These are necessary for Microsoft's shrink support -- because Microsoft has unique and complex ways to truncate and shrink log files. SQL Server will also have "allocation/deallocation" log records, which only record that space was allocated, not what went into the space -- because Microsoft allows "fast" bulk copy. With fast bulk copy, it's possible to get away with incomplete logging, because the source of the copied data is in stable storage.

[7] What changes might not be in the log?

      • Depends on the DBMS.
      • [IBM] BLOB and LONG VARCHAR changes aren't logged. DB2 uses another technique ("shadow pages") to preserve data integrity.
      • [Microsoft] Bulk loads aren't logged if you pick a certain option for recovery mode. Also TRUNCATE TABLE is exceptional.

[8] Are changes to indexes logged too?

      • Yes. I know this is so for IBM, Microsoft, and Oracle. I know it is an ARIES specification. I won't say that it is so for all DBMSs, since it's logically not vital to log index changes. If you know what the row value is, you can figure out the index-key values.
        back

[9] Are changes to temporary tables logged?

      • Yes, but not fully. Since rows in a temporary table are supposed to disappear when a DBMS session stops, log records won't be necessary for recovery.
      • [Microsoft] Items in tempdb are logged so ROLLBACK can happen. The amount that goes in depends on whether the recovery model is simple.
      • [Oracle] Oracle does not use the log for ROLLBACK purposes. Instead, it uses an undo tablespace. A change to the undo tablespace is stored in the log.
      • Summary: if it's a temporary table, there's no redo portion but there is an undo portion.

A Pool of Log Buffers

[10] Is Log Buffer Pool a standard term?

      • No. IBM uses the term "log buffer" instead of "log buffer pool." Microsoft prefers the term "log cache" and the things in it are "log pages." Oracle uses the term "redo log buffer" but there are hints that this is a ring consisting of several fixed-size pages (see, for instance, Oracle's description of the LOG_BUFFER initialization parameter). So I think it's reasonable to describe it as a pool.

[11] Is there really a log writer process?

      • Usually. Microsoft doesn't leave everything to the log manager or the logwriter thread, but for IBM and Oracle the word "process" is accurate.

[12] Is Buffer Manager a standard term?

      • No.
      • [Oracle] The Oracle background processes that manage data buffer writing are named DBW0, DBW1, and so on, or collectively: DBWn processes.

[13] Is flushing the right term, and how does it work?

      • I like the word "flush" because "write" is ambiguous. I want to convey that the result really goes to the disk. In this document, both "write" and "flush" mean "write to disk" directly, but I prefer "flush."
      • [IBM] IBM prefers the phrase "force to disk."
      • [Microsoft] A Microsoft support note indicates the file has WindowsNT's FILE_FLAG_WRITE_THROUGH setting, which ensures that the operating system won't delay the write.
      • [Oracle] The Oracle manual says the writing is synchronous. I avoid the word "synchronous" because it invites the question, synchronous with what? In this context, "synchronous" means that the writer will wait until it is sure the writing is done, instead of relying on a write cache, a lazy writer, and so on.

[14] Does the log buffer get flushed when the log buffer pool fills up?

      • Well, actually it can happen earlier.
      • [IBM] The DB2 manual says that flushing occurs when the "log buffer is full."
      • [Oracle] The Oracle manual says that flushing speeds up when the log buffer is only 1/3 full, and that log buffers are flushed "if more buffer space is needed." Oracle assures us that there's little chance of filling up anyway.

[15] Does the log buffer get flushed when the data buffer pool fills up?

      • Yes. This is a necessary rule for Write-Ahead Log support.
      • [IBM] The DB2 manual says that the logs are flushed before corresponding data pages are flushed, and that the "log Buffer is flushed when ... [a] dirty page is about to be written." The DB2 buffer manager checks, before a dirty page is written, that the page's LSN is <= the earliest LSN of a log record that's been flushed (the "Flushed LSN").
      • [Microsoft] The SQL Server manual says: "the write of the dirty data page is blocked while the log page recording the modification is first written to disk."
      • [Oracle] The Oracle manual says -- translating Oracle terms to my language -- "If the buffer manager finds that some log records have not been flushed, it signals the log writer to flush the log records and waits for the log writer to complete flushing the log buffers before it can flush the data buffers."
      • To be precise, the WAL specification requires only that the undo portions must be written before the dirty data page can be flushed. This is a mere technicality, because the undo and redo portions are in the same log record.

[16] Is the log buffer flushed at regular intervals?

      • Yes.
      • [Oracle] The Oracle manual says that the log buffer is circular, so that the log writer process can read/flush from one end while other processes are writing into the buffer at the other end. But apparently the process is not continuous: another manual says that flushing takes place every three seconds. This of course is a maximum time since COMMITs are likely to happen more frequently than every three seconds.

More On COMMIT

[17] When COMMIT happens, does the log writer really flush all log buffers and add a final log record to mark COMMIT?

      • Well, if you really want to get technical ... not quite.
      • [IBM and Microsoft] The ARIES specification says the sequence of events is: (1) write a COMMIT log record, (2) flush all log records as far as the current transaction's last LSN, (3) return from COMMIT; i.e., signal the application that COMMIT is done but continue asynchronously, and (4) write an "end of COMMIT" log record.
      • [Oracle] The Oracle manual says: "[The log writer] writes a commit record."
      • So it's possible that not all log buffers are flushed, and it's possible that two final log records are added to mark COMMIT.

[18] Is writing truly serial? Do uncommitted log records get written too?

      • Yes.
      • [Oracle] The Oracle manual says -- translating Oracle terms to my language -- "Sometimes, if more buffer space is needed, the log writer writes log records before a transaction is committed. These log records become permanent only if the transaction is later committed." I am suggesting here that lack of buffer space is not Oracle's real motive.

[19] When COMMIT returns, is it guaranteed that the log writer flushed?

      • Yes. Let's clear up a possible misunderstanding about delayed flushing. The DBMS might delay flushing if it anticipates that multiple COMMITs will be nearly simultaneous. There's no point in writing a log buffer, returning, and immediately being told to write the same buffer again.
      • [IBM] Log flushes can be delayed for a specified number of COMMITs, or until one second elapses, whichever comes first.
      • [Oracle] The term for this is "group COMMITs." Oracle automatically switches to group COMMITs when it detects that committing is frequent.
      • However, the manuals do not suggest that there would be a return before the COMMIT is complete. Applications should therefore expect a delay of up to one second due to delayed flushing.

[20] Are data pages flushed at COMMIT time?

      • No, and never mind what your manual says.
      • [IBM] The DB2 manual says: "Note: At the time the transaction completes by using the COMMIT statement, all changed pages are flushed to disk to ensure recoverability." I am saying that the manual is wrong.
      • [Microsoft] The SQL Server manual says: "[The] data page ... is not immediately flushed to stable storage."
      • [Oracle] The Oracle manual says: "The corresponding changes to data blocks are deferred until it is more efficient to write them. This is called a fast commit mechanism ... Oracle returns a success code to the committing transaction, although the data buffers have not yet been written to disk."

A Ring Of Log Files

[21] Does every database have one log? What is the correct term for "log"?

      • Yes.
      • [Microsoft] The preferred term is "log" or "transaction log" but Microsoft also uses the term "logical log" to refer to the collection of log records that contain meaningful data (as opposed to the contents of all log files including obsolete or unused areas).
      • [Oracle] The preferred term is "redo log" not just "log" because, although undo information is logged here, it's not used for ROLLBACK.
      • The term "log journal" seems to be rare.

[22] What is the correct name of the Log Writer process?

      • It depends on the DBMS.
      • [IBM] Logger EDU (Logger Engine Dispatchable Unit).
      • [Microsoft] Log Manager, but the process that does some of the writing is the "logwriter thread." The logwriter thread only writes the pages that the worker threads don't handle, so with respect to Microsoft it is an over-generalization to say that the log writer alone does the writing.
      • [Oracle] LGWR.

[23] Who does the mirroring?

      • Either the DBMS or the operating system.
      • [Microsoft] Microsoft got rid of the SQL Server "mirroring" options after version 6.5. Microsoft just says to configure with RAID.
      • [Oracle] One can specify additional files to write in parallel. If there are multiple files, the Oracle term is "multiplexing." If there are just two files the Oracle term is "duplexing." In earlier versions of Oracle only duplexing was possible. Occasionally Oracle uses the word "mirroring" instead. Since one might be writing to more than one file, Oracle introduces the additional terms "group" (which contains all the files being written at once) and "member" (which is one log file). Two member files in the same group will have identical data, and will have identical Log Sequence Numbers.

[24] Do logs contain more than one log file?

      • Yes.
      • [Microsoft] The log files may be virtual, that is, there may be n virtual files in a single physical file.
      • [Oracle] There must be at least two log files.

[25] What is the correct term for circular logging?

      • There is none.
      • [IBM] Since DB2 has a circular log configuration option, which effectively means "unarchivable log," the DB2 manual uses terms like "round robin log file lists" or "ring of online logs" and "wraparound."
      • [Oracle] Oracle's manual says writes are done in a "circular fashion" but that could be a reference to the ring buffer.
      • Sometimes the words "Head" and "Tail" are used, or "start of logical log" and "end of logical log."

[26] How many log files can be in a log?

      • An indefinite number, but the number of online files is usually restricted by a configuration parameter.

[27] Is there an archiver process? Is "archiver" the correct term?

      • Yes, it's optional, but necessary for recovery from backups.
      • [IBM] There's an ARCHIVE LOG command.
      • [Oracle] In earlier versions the archiver was called ARCH or "the ARCH background process." Now there can be multiple processes so they go by the names ARC0, ARC1, ARC2, ... and so on up till ARC9.

[28] Can the archiver read a log while the log writer writes?

      • Yes, but under Oracle the two processes can't work simultaneously on the same file.
      • [Oracle] The Oracle manual says: "LGWR can't write to a log group until ARCH has done with it." Or, the other way around, ARCH must wait for a "Log Switch," which is the point at which Oracle stops writing log file X and starts writing log file Y. Log switches typically occur when a log file is full, but DBAs can force log switches to happen earlier.

[29] What's the term for redo when archived log files are necessary?

      • Better words would be "Redo From Backup" or "Media Recovery." The point is that you will recopy the whole database backup, then recover using all logs. Since such drastic action might be occasioned by media failure, the term Media Recovery is fairly common.

[30] What means are used to expand the log temporarily?

      • Everybody's different. Microsoft is the most flexible.
      • [IBM] Log files can't grow, but new log files can be added to the circle. One specifies LOGPRIMARY = the number of logfiles that will be generated immediately and permanently. One specifies LOGSECONDARY = the number of logfiles that will be created when necessary, and deleted when no longer necessary.
      • [Microsoft] Files can grow (a Microsoft feature is that files are divided into virtual log files). One specifies FILEGROWTH. The growth can be as great as another specification, MAXSIZE. Once the archiver moves forward so that the log writer can overwrite, file truncation and reuse is possible.

[31] What is "Recovery From Backup"?

      • I'll revisit this later. The essential point is: if the media fails (for example the disk drive melts), then the recovery process needs the last complete backup plus all the logs made since the backup. Recovery from backup is sometimes called media recovery. The less drastic kind of recovery, which involves rerunning the log against the current database, is sometimes called instance recovery.

[32] How does "stalling" work?

      • Stalling (or blocking) is just a matter of doing nothing until an event occurs.
      • [IBM] When all logfiles (primary and secondary) are full, a LOGFULL_CONDITION happens. Sometimes DBAs must enter a command "force application (application-id)" to get rid of an application whose log records are too old. The system is subject to limitations that depend on the operating system and type of disk storage.

[33] Are the terms "active," "recoverable," and "reusable" standard?

      • No, only Microsoft uses all three terms.
      • [IBM] There are "online log" files (corresponding to the ACTIVE section) and "online archive log" files (corresponding to the RECOVERABLE section). There are also "offline archive log" files (not corresponding with the REUSABLE section). (Technically an archived log file is still online if it is on the database log file path.)
      • [Oracle] The file that the log writer is currently writing to is the "online redo log" file (corresponding to the last file in the ACTIVE section), plus there are "active online redo log" files (corresponding to all files in the RECOVERABLE and ACTIVE sections except the last), and the "inactive online redo log" files (corresponding to the REUSABLE section).
      • If archiving is disabled, the only sections are ACTIVE and REUSABLE.

[34] How does one take the gamble and allow overwriting?

      • There are configuration parameters.
      • [IBM] Choose "circular logging" (the default), which doesn't allow archives and is really only for supporting ROLLBACK. To get what IBM calls "Archived Logging" or "Archival Logging" or "Log Retention Logging" you must turn on either the LOGRETAIN or USEREXIT configuration parameter.
      • [Microsoft] Choose Simple Recovery Model.
      • [Oracle] Choose NOARCHIVELOG.

Checkpoints

[35] Are dirty pages flushed when the buffer pool is full?

      • Yes, but flushing happens at other times too. Usually a separate process wakes up occasionally, checks what percentage of pages in the buffer pool are dirty, and tries to flush if too few buffers are free.
      • [IBM] Flushing is done by a "page cleaner," which depends on various parameters, most notably chnpgs_thresh and softmax.
      • [Microsoft] Flushing is done by a "lazywriter" thread.

[36] Are dirty data pages flushed during checkpointing?

      • In theory no, in practice yes.
      • The theory is that there can be Fuzzy Checkpoints or Sharp Checkpoints. With Fuzzy Checkpoint, dirty pages are not flushed - one depends on the fact that an ARIES-style checkpoint log record contains a list of all the dirty pages that exist at checkpointing time. With Sharp Checkpoints, dirty pages are flushed.
      • I have read that Fuzzy Checkpoints are the "usual" choice but vendor documentation says dirty pages are flushed. For example a Microsoft INF says: "The checkpoint process ... writes any dirty log or data pages to disk." So it would be better to say that DBMSs use a combination of Sharps and Fuzzies.
      • Since log buffers are flushed at regular intervals anyway, Sharp Checkpoints usually have little to do.

[37] Will the DBMS write a checkpoint log record?

      • Yes, but not necessarily to the log.
      • [Oracle] Oracle updates the control file (what others would call the master boot records).

[38] What kind of information is in a checkpoint log?

      • There's quite a bit that's necessary. Let's just state an ARIES requirement.
      • [ARIES] First there must be a "start of checkpoint" log record. Then there must be an "end of checkpoint" log record. The records include information such as which transactions are currently running (along with their states and the LSNs of their most-recently-written log records). Records also include a list of which pages in the buffer pool are dirty, but with sharp checkpoints all dirty pages are flushed. As well as writing to the log, the DBMS should also update a control file (sometimes called the master boot record) to record the new checkpoint's LSN.

[39] Are other processes blocked while checkpointing goes on?

      • It depends.
      • [With Fuzzy Checkpoints] no.
      • [Microsoft] I am unable to find official Microsoft documentation on this, but a secondary source says that new transactions will be "refused" as long as the checkpoint is going on.
      • [Oracle] Oracle assures us that "Activity is not halted during a checkpoint" because the database-writing process will just keep cycling until no more entries need recording. But some refusals would still take place. In another place Oracle admits that activity is halted: "Additionally, if [the log writer] cannot access an online redo log file for writing because a checkpoint has not completed, database operation suspends temporarily until the checkpoint completes and an online redo log file becomes available."

[40] Can the DBA control the intervals between checkpoints?

      • Yes, but not precisely.
      • [Microsoft] The DBA can set the RECOVERY INTERVAL, and SQL Server decides the frequency of checkpoints based on the amount of time that it will need to recover. Usually the checkpointer is a thread that sleeps until the log manager wakes it.
      • [Oracle] Configuration parameters are LOG_CHCKPOINT_INTERVAL and LOG_CHECKPOINT_TIMEOUT.

[41] Can checkpointing happen before an interval?

      • Yes, a variety of upheavals can make checkpointing happen immediately.
      • [Microsoft] Checkpoints happen if the user issues a checkpoint statement, an ALTER DATABASE statement that changes a database option, SHUTDOWN, or a signal to the SQL Server Control Manager to "stop service."
      • [Oracle] Checkpoints happen after a recovery interval, at shutdown, when forced by the DBA, when starting a backup, and at other times.

ROLLBACK

[42] What is meant by the term "rollback"?

      • Deliberate rollback caused by user statements, or automatic rollback.
      • Naturally the rollback statement causes a rollback. Other, rarer, situations include statement failure (which causes partial rollback) and loss of communication with the client (which causes automatic rollback).

[43] Is it okay to say "undo tablespace (formerly rollback segment)"?

      • Yes.
      • [Oracle] Rollback segments and undo tablespace are different options -- this is not a mere name change. However, records in rollback segments and records in undo tablespaces have similar structure and it seems safe to talk about them together. (Incidentally Oracle calls the records either "rollback records" or "undo records.") The Oracle9i manual says that Rollback Segments are "deprecated" and you are "urged" to use undo tablespaces.

[44] Does Oracle need to do more writing than IBM/Microsoft?

      • In theory.
      • [Oracle] Oracle has to write undo records to the undo tablespace. It's cached, but caches can overflow. More importantly, any change to a tablespace must be logged. Since the undo tablespace is a tablespace, whatever is written to the undo tablespace gets written to the log too. On the other hand, when logs are on separate disk drives the cost of writing is minimal.

[45] Are there other uses for undo tablespaces?

      • Yes.
      • [Oracle] Doubtless the information in the undo tablespace is used for retrieving old values in a trigger too, though the manual doesn't appear to say that.

[46] What are Compensation Log Records?

      • Records that record "undo" information.
      • Log writing is serial, so the DBMS can't go back and delete log records that have been entered. Instead, it must add new log records that negate the originals.
      • [ARIES] Here's the ARIES specification for Compensation Log Records (CLRs): (a) they are written during ROLLBACK or partial ROLLBACK (such as ROLLBACK TO SAVEPOINT or statement failure); (b) they are added during the "undo" phase of a Recovery; (c) additional information is logged in a field called UndoNxtLSN, which prevents "redo" from processing CLRs, and prevents "redo" from processing non-CLRs multiple times (in case Recovery itself crashes). After a ROLLBACK, the log will contain these records: (a) a ROLLBACK START log record, (b) n CLRs to compensate for n original log records, (c) a ROLLBACK END log record.

Recovery

[47] Does the recovery manager rewrite page #x using the after-image?

      • Yes, if necessary.
      • [ARIES] Under ARIES, all data page headers include a page_LSN that shows the last log record. If a comparison with the UPDATE log record's LSN shows that the page is already up to date, rewrite is unnecessary.

Suggested Further Reading

Mohan C., "Repeating History Beyond ARIES"

Microsoft Corp., INF: SQL Server 7.0 and SQL Server 2000 Logging and Data Storage Algorithms Extend Data Reliability.

--

Peter Gulutzan is co-author of SQL-99 Complete Really (CMP Books 1999) and SQL Performance Tuning, which Addison-Wesley will publish in September 2002.


Contributors : Peter Gulutzan
Last modified 2005-05-02 03:58 PM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone