Oracle9i Self-Management Features: The Early Winners

by Dave Ensor

From a paper previously presented by Dave Ensor at Oracle OpenWorld 2001, San Francisco, December 2001.

Introduction

BMC Software sells a number of tools to help manage Oracle dataservers. As each new version of Oracle moves into production we need to form a view on the additional features that may be required in our products to help our customers fully exploit their Oracle instances, and also to identify those product features that will no longer merit continued development investment because Oracle has met within their software a need which had traditionally required an external solution. This paper reports some of the findings from a study of the initial production release of Oracle9i. The sole purpose of the study was to discover the extent to which Oracle had increased the ability of the dataserver to manage itself. The study took the form of structured experiments rather than observation of production use. The conclusions are therefore tentative until confirmed or disproved by some brave pioneer site, giving rise to the words "early winners" in the title of the paper.

The paper does not cover the new Oracle9i features in the areas of backup and recovery, nor does it discuss Oracle's Real Application Clusters (RAC), the replacement for Oracle Parallel Server.

No suitable clustered platform was available to the author to install and run the Real Application Cluster support before the deadline for submission of this paper, and it was not felt useful to report the 'paper study' that had been completed. As more and more sites seek to become at least more failure tolerant, if not completely non-stop, it is expected that managing multiple instance Oracle will become a major growth area and the author hopes to extend this paper to cover the topic.

Backup and recovery are not discussed for a number of reasons, not the least being that the paper is based solely on experimental use of the software and the author believes that testing in a near production environment is essential to valid commentary on data security issues. It is also becoming clear that as the number of features in the Oracle dataserver continues to increase with every release, even server specialists are starting to have some difficulty keeping fully up to date in every area. The author's key focus in undertaking the technical studies on which this paper is based was to look at new manageability features primarily from a performance and availability standpoint.

Test Environment

All of the testing for this paper was performed on the author's laptop, a Compaq Armada M700 with 448 Mb of memory and a single 20 Gb hard disk partitioned into a FAT device for the operating system and an NTFS device for both the Oracle installation and the database files. The single processor is a Pentium III with a reputed clock speed of 833MHz; it certainly executed queries from the Oracle9i cache at impressive speed.

The machine was running Microsoft Windows/2000 Professional with Service Pack 2, and Oracle9i Enterprise Edition release 9.0.1.0.1 with Oracle's pre-packaged "general purpose" database although this was customized in a number of ways. Most importantly the INDX tablespace was removed, the USERS tablespace was expanded to 2 Gb, and limits were set on datafile autoextension in each tablespace. It was noted with some disappointment that the default databases all came with a database block size of 4096 whereas the author would have preferred to use 8192. This presented an opportunity to test operation with multiple block sizes but these tests were not completed in time to be included in this paper. It is hoped to make a brief comment on the results during the conference presentation.

Self-Management

Goals

We're all busy, and Database Administrators (DBAs) are often busier than most. It makes huge sense that if the dataserver is completely capable of taking whatever action is required, then it should do so without waiting for a DBA to come along and give it permission. On the other hand, customers are not impressed when a dumb machine keeps making things worse by applying inadequate management rules, especially if these are enabled by default as always seems to be the case in some personal productivity software.

At is simplest, therefore, we can establish two goals for self-management features:

If the server can successfully manage the situation, then it should do so.
If success is in doubt, then management should be left to "someone you can fire."

Examples

In the world at large, automobiles contain a number of features that are essentially self-managing including anti-lock brakes and automatic transmission. In general, a skilled enough driver can get better performance from the vehicle without these systems, but most of us have better things to do and opt for the increased safety and ease of use of the self-managed systems rather than insisting on exercising the maximum level of control.

Self-management features are not new in Oracle. Version 6 was the first version with a region called the shared pool, but it also had a discrete data dictionary cache divided up into separate sections for tables, objects, indexes, columns, and so on. Each of these regions had to be sized using its own init.ora parameter and the penalty for making any one of them too small was a significant increase in parse times. Pretty soon Oracle worked out that most customers were failing to size these regions accurately and decided that the dataserver should size the cache dynamically from the memory allocated to the shared pool. In the case of the row cache (Oracle's slightly bizarre internal name for the data dictionary cache) the user no longer has to do anything at all. It sizes itself. (If you are interested and want to see how many failure-prone decisions this is saving you, take a look at V$ROWCACHE.)

When it comes to the log checkpoint interval we have a trade off. A longer gap between checkpoints means a higher maximum possible transaction throughput, but also increases the average and maximum instance recovery times. In Version 6, the DBA was expected to adjust checkpointing by reference to the number of redo log blocks that had to be written before a checkpoint took place; not surprisingly, many DBAs found it difficult to relate this integer to the mean or maximum recovery times. Subsequent releases have made the specification progressively easier, and in Oracle9i, the DBA simply specifies the desired mean time to recover (in seconds). The administrator specifies the goal, and the server does whatever it believes needs to be done to meet that goal.

Instance Parameter Management

Each of the recent versions of Oracle has had between 100 and 300 run-time parameters that can be specified at startup and tell the instance "whether it is a prune Danish or a fruit cake". My copy of Oracle9i has 250 of these parameters that I am allowed to set (even if I am discouraged from doing so) and an additional 432 parameters whose names start with an underscore, which I am not supposed to change unless told to do so by Oracle Support. _allow_read_only_corruption = true is just one example.

Over the years it has become less and less acceptable to stop database servers to adjust instance parameters, and Oracle has made many of the parameters 'dynamic' meaning that a suitably authenticated user can modify the value of the while the database is running. Until Oracle9i these dynamic changes were lost when the instance was stopped and restarted because at restart the parameters were read from a text file invariably referred to by its original default name init.ora although it should really be known as a pfile.

Oracle9i can use an alternative type of parameter file called an spfile, the key difference being that the new file type is maintained from within Oracle. Thus the SQL statement alter system set db_cache_advice = on scope = BOTH;

enables the cache advice feature within the current instance, and also specifies that it will be enabled on subsequent restarts. The scope argument may have the values MEMORY, SPFILE, or BOTH.

Better still, the manuals tell us that even more of the parameters are now dynamic, including those that set the size of the shared pool and the various buffer pools (in order to support multiple database block sizes, Oracle9i allows one buffer pool per block size plus default, keep, and recycle pools for the default block size). The manuals also state that even the parameters that cannot be changed dynamically can be altered using alter system set commands with scope -= spfile. Unfortunately the reality is less encouraging. The total amount of memory allocated to the SGA cannot be changed and so in order to increase one allocation, another allocation must be first reduced. Also a number of important memory regions (including the java pool size and redo log buffer) cannot currently be resized.

Worse, there are severe implementation problems. It does not appear to be possible to change static parameters as advertised in the manuals as can be seen from the following transcript from SQL*Plus:

alter system set java_pool_size  = 30M scope = SPFILE
                 *
ERROR at line 1:
ORA-02095: specified initialization parameter cannot be modified

Finally (at least under Windows/2000) any attempt to resize a buffer pool with the db_cache_advice feature enabled causes the instance to crash, leaving an ORA-00600 in the alert log. The cache advice feature, discussed below, is strongly recommended, but however attractive the idea of the spfile seems from the documentation, it clearly needs to be deployed with extreme care (if at all) in the current release.

Self-Tuning Memory Management

Oracle9i has two major features that can be used to help overcome the memory conundrum discussed below. The first is completely automatic and simply strives to keep memory usage below a specified level. The second gathers and presents data on the likely effect of changing the size of a buffer cache, allowing the DBA to determine whether or not cache sizes are appropriately set and facilitating tradeoffs between the buffer pools and the shared pool or PGA space.

Memory Made Simple

Effective use of memory is key to Oracle performance. Memory access is several orders of magnitude faster than disk access, and most high-performance Oracle applications are inherently more limited by their I/O capacity than by their CPU capacity. Many technical authors and DBA's conclude from this that the more they keep in memory the faster their applications will run, but unfortunately life is not quite that simple.

Firstly there is little point in keeping something in memory if it is not going to be used again before it eventually gets overwritten. Secondly Oracle is often (usually?) run using 'dedicated servers' in which case each Oracle session has its own operating system thread or process with a private and potentially large memory region called the PGA (Program Global Area). As the number of sessions increases so does the amount of PGA space required and eventually the operating system has to resort to paging in order to meet the demand. This can, of course, happen with only one Oracle session if either the buffer cache or the shared pool is set pathologically large, or the platform simply does not have enough memory to run the Oracle version effectively.

Once demand paging becomes a significant factor, much of the data that appears to be in memory has to be managed on disk by the operating system and this defeats the original purpose of allocating large memory work areas. The goal, therefore, is to keep the memory areas at a size that minimizes I/O by retaining frequently used items without causing excessive paging. The difficulties are that it is not usually clear how a change in cache size will affect I/O, and the number of sessions may vary over time making it difficult to decide how much memory the DBA can afford to allocate per session. The parameter sort_area_size is a particular challenge in this regard because large sort work areas can dramatically improve the performance of certain operations (typically index creation, derivation of object statistics and reporting) but they are normally unnecessary and costly for most users.

PGA Aggregate Target

Traditionally Oracle has supported has series of instance parameters such as bitmap_merge_area_size, open_cursors, open_links, and sort_work_area that together determine the size of the PGA. However whereas the value of a parameter such as open_cursors makes only a small difference to the overall memory charge, changing sort_work_area from 65536 to 1048576 could under the worst circumstances alter the total memory charge by over 4 Gb in a 5,000 user system.

The new instance parameter pga_aggregate_target does pretty much what its name suggests - it instructs Oracle to try to keep the total space used by PGAs within the instance below the target value. It cannot guarantee to achieve this 100% of the time for reasons that are covered in the Oracle documentation, but it does offer the DBA the opportunity to make the size of bitmap merge and sort areas vary with the number of simultaneous users that need to use such an area. Unfortunately it was not possible to test either the performance or the overhead of this feature in the short time between the Oracle9i production software becoming available to the author and the deadline for the submission of conference paper. It looks like a potentially valuable and completely automatic feature that will really help sites where the number of sessions is subject to great variation. However, it should be noted that current best practice in the architecture of 3-tier applications does not give rise to such variations, preferring to use a constant size pool of server sessions. To give the final word on this topic to the Oracle9i Performance Guide and Reference, "Oracle Corporation strongly recommends switching to the automatic memory management mode, because it is easier to manage and often outperforms a manually-tuned system."

Cache Advice

As already hinted, part of the mythology of Oracle performance tuning is the idea that the larger the buffer pool, the better performance Oracle will give. Before Oracle Version 8.0 large buffer pools dramatically increased the CPU power needed for a given workload, but this is fortunately no longer the case. However as already discussed it is still desirable to keep the buffer pools only just large enough in order to make memory available for other regions, in particular the shared pool and the session work areas (the PGA or, in a shared server environment, the UGA).

For many years Oracle had a pair of linked features, one of which that tracked the effectiveness of the buffer pool and the other that tracked the predicted effectiveness of proposed cache extensions. Unfortunately, not only were these features awkward to interpret, they incurred an unacceptable CPU load. These discredited features have been replaced in Oracle9i by a so-called cache advice mechanism; this does not quite live up to the promise of its name, but it is nonetheless worth exploring.

There is a new instance parameter db_cache_advice that may be set to on, ready, or off. The default is off but the documentation suggests that the trick is to set it ready because setting it from off to on requires a memory allocation that may fail. Once the feature is enabled the performance of each buffer pool is tracked to determine how effective it is being at reducing physical reads; the results are available from the virtual view V$DB_CACHE_ADVICE. The Oracle9i Reference contains the warning that "CPU and memory overheads are incurred," but in a quite punitive test the author found that the CPU overhead was consistently less than 10%. This seems a reasonable cost to incur from time to time for the benefit of being able to correctly size the buffer caches. The view V$DB_CACHE_ADVICE shows, for each buffer cache, a series of statistics on the estimated number of physical reads that would have taken place if the buffer cache had been that size over the period since either startup or when the feature was enabled. Sample output from a reporting script is shown below.

Cache        Cache   Total Est Read   Est Phys
Name         in Mb Buffers   Factor      Reads
---------- ------- ------- -------- ----------
DEFAULT       6.14   1,572     1.19     18,560
DEFAULT      12.28   3,144     1.11     17,418
DEFAULT      18.42   4,716     1.01     15,783
DEFAULT      24.56   6,288     1.00     15,658
DEFAULT      30.70   7,860     1.00     15,658
DEFAULT      36.84   9,432     1.00     15,638
DEFAULT      42.98  11,004     1.00     15,596
DEFAULT      49.13  12,576     1.00     15,591
DEFAULT      55.27  14,148     1.00     15,589
DEFAULT      61.41  15,720     1.00     15,587

The default buffer cache was 20 Mb in this example, but when the instance was started it had been 32 Mb and was reduced using alter system before running with cache advice enabled. Although the author has not found documentation to confirm this, from observation the cache advice feature monitors the predicted effect of buffer pools from about 20% to 200% of the size of the pool at startup in steps of 10% of the original size. The query that produced the example intentionally removed every second step in order to present a shorter table.

Although the view does not actually project any advice as such, two conclusions can quickly be drawn from the output in the test case shown. If only 6 Mb of buffer space had been allocated then physical reads would have increased by about 20%, whereas allocating more than 20 Mb of buffer space would have hardly reduced physical reads at all. The estimate is that the addition of the final 12 Mb would have reduced physical reads by just 4, or about 0.025%. Had this sampling interval been reasonably long and taken against the instance under typical load, then we would have proof that 20 Mb was the correct buffer pool size for the service being delivered.

Data that demonstrates the ineffectiveness of enlarging the buffer pools is invariably hotly disputed by part of the Oracle community, but the mathematics has been known for many years. Hopefully this new mechanism will gain both credibility and use, and help Oracle DBA's to reach better compromises in memory allocation.

Automatic Undo Management

Background

Other than the dreaded ORA-00600 the Oracle error that seem to strike the greatest fear into the hearts of DBAs is ORA-01555, the "snapshot too old" error (though to be fair ORA-03113 is another one that you really do not want to be faced with). Snapshot too old means simply that the read consistency mechanism has tried to reach too far back in time.

There are two reasons why a read consistency operation may need to be performed. Firstly a transaction may try to look at data that is subject to another user's uncommitted changes. If Oracle is working properly then this read consistent operation is guaranteed to succeed because Oracle is required to be able to rollback or undo a transaction at any time until it is committed.

Each query has a read consistency point, and all of the data returned by the query must be the committed state of the data as of that point in time. Normally the read consistency point is established by executing the query, but it can also be established for a whole series of queries by starting a "read only transaction". The ability to reconstitute the data as it appeared before a committed change is not guaranteed; in most applications it works without problems for most of the time but from time to time users experience the "snapshot too old" problem.

Rollback Segments

Each time Oracle makes a change to schema data it records the information required to undo that change in a special type of database area called a rollback segment. This information is always kept at least until the transaction making the change has committed, but as soon as the transaction is complete its rollback or undo data can be overwritten. How soon this happens depends on how much undo space is available and how quickly current and future transactions create new undo records. Within a few seconds, or minutes, or hours the undo information will be overwritten or, in some cases, simply discarded.

Since the introduction of Oracle Version 6 in 1988 the allocation of rollback segment space has been a major concern for Oracle DBAs who have had to decide both how many rollback segments an instance should have and how large each one should be. Resolving this issue has typically required a number of compromises that are outside the scope of this paper.

The Oracle9i Solution

Oracle9i supports the traditional rollback segment management features that have evolved over the past 13 years, but also introduces Automatic Undo Management. In this mode the DBA only has to create an "undo tablespace," tell Oracle to use this tablespace, and specify for how many seconds each undo record must be retained. The records will, of course, be kept for longer if the transaction that creates them does not commit within the time interval.

In Oracle9i the following three instance parameters will guarantee that all undo entries will remain available for 15 minutes:

undo_management = AUTO
undo_retention  = 900      # seconds
undo_tablespace = UNDOTBS

However, a potentially unwanted side effect is that the Oracle server will not retain the data for much longer than the time specified even if the instance is running with a relatively light updating load; i.e., even if there is no great demand to write new undo information. This contrasts markedly with traditional rollback segment management, where under light updating loads undo entries could (and would) remain available for several hours to generate read consistent data sometimes required by long running reports. Fortunately the instance parameter undo_retention can be altered dynamically using alter system set and this may become necessary at sites which have long report runs take place and cannot completely prevent update from occurring while these reports are running.

Automatic undo management looks like a winner despite the likelihood that many sites will find it necessary, or at the very least desirable, to alter the retention period dynamically. The parameter is specified in terms of time rather than the present specification in blocks, which requires the DBA to assess how much undo his instance generates per second and to hope that this remains approximately constant.

Database Resource Manager

Database resource management is present in Oracle8i, though as far as the author can discover the customer base has not used it extensively. Changes to the way in which Oracle manages user session processes (threads under Windows/NT and Windows/2000) have resulted in a number of changes under the covers but the basic functionality remains the same, and contrasts markedly with the user profiles feature.

Profiles, when enabled, set hard limits on the amount of resource that a session may use and may also be used to limit to number of sessions that a given user ID may start in parallel. Both CPU and logical reads (database block visits) can be rationed at either or both session level and call level. Thus a profile might limit a particular user to never exceeding 60 seconds CPU time in any call to the database. Profiles also have a role in password management, but this is outside the scope of this paper. The major problem with hard limits on block visits and CPU consumption is that such resource usage does little or no harm if other users of the server are getting the resource that they require.

In its simplest usage the Database Resource Manager seeks to control the share of the CPU resource allocated to specific groups of sessions rather than the total amount of resource consumed by those sessions, and to intervene only when some group is in danger of not receiving their allocated percentage. However, there are a number of other features including rationing the number of sessions that a specific group may start and applying execution time limits to database calls. These time limits are significantly different from the limits applied by profiles because the operation is aborted when Oracle predicts that it will overrun its limit rather than waiting for the limit to be exceeded.

Database Resource plans can quickly become extremely complex, with sessions being migrated from one resource group to another as they consume increasing amounts of resource. However, in limited testing by the author, operating a simple resource plan appear to use remarkably little resource and it did prevent long-running queries from monopolizing the CPU. In view of the extreme problems associated with trying to prioritize Oracle users through operating system scheduling features the database resource manager looks to be a valuable feature for sites that area likely to experience CPU resource conflicts. The other side of the coin is that under heavy load many Oracle instance bottleneck on I/O resource rather than CPU, resource plans can only tackle this indirectly.

Unused Index Identification

Indexes consume disk space and lengthen backup and restore operations; in addition index maintenance during insert, update and delete operations is a major CPU and I/O overhead in many Oracle applications. There is therefore a clear incentive for the identification and removal of both unused indexes and those indexes which are either little used or make no contribution to application performance. Having spent much of the past year working in this area under Oracle8i the author was fascinated to discover that Oracle9i contained a feature specifically designed to identify unused indexes. The new syntax alter index <index_name> monitoring usage; creates a row for the index in the view V$OBJECT_USAGE. This view contains a column USED that is set to YES when the parser first includes the index in an execution plan.

No indication is given of how often the index is used or whether or not its use was beneficial from a performance viewpoint. More worrying, the author's initial tests indicated that the column could be set to YES even if the index had not been used. This result is so bizarre that the author is seeking independent confirmation of it and intends report further during his conference presentation. It is as yet unclear whether the feature is intended to report index usage that does not appear in the execution plan.

For sites at which parsed SQL statements remain in the shared pool for a reasonable amount of time even if only executed once (usually the case for application that make effective use of bind variables) the simple query

select INDEX_NAME
  from USER_INDEXES
 minus
select OBJECT_NAME
  from V$SQL_PLAN
 where OBJECT_OWNER = USER;

will identify indexes that have not been used by any statement currently in the shared pool. This has proved reasonably effective at detecting unused indexes though it is known to miss indexes that are used solely for constraint enforcement. V$SQL_PLAN is another new and extremely welcome feature in Oracle9i. We always knew that the execution plans must be in the shared pool, and now we can report them with almost trivial SQL queries.

Oracle Managed Files

Oracle stores its data, including its log files and control files, in storage domains managed by the underlying operating system and relies on operating system services (albeit at a fairly low level) to handle data transfers between memory and disk. Although raw devices may need to be configured under some circumstances, the data is normally contained within a file system and each file is referenced by both a path and a name. Thus on my laptop the SYSTEM tablespace is stored within the single operating system file D:\ORACLE\ORADATA\DAE901\SYSTEM01.DBF. It should be no surprise on a small "server" such as my laptop that the files used to hold tablespace data for that particular database all share the same path, in this case D:\ORACLE\ORADATA\DAE901 (the database and the database instance are both called DAE901 after my initials and the Oracle version number).

Oracle Managed Files allow an administrator to specify a default path or location for database files, and this means in turn that operations that previously required the user to provide a file path and name can now be carried out with Oracle providing a default file. Thus (with suitable privilege) the two SQL*Plus commands:

SQL> alter system set db_create_file_dest = 'D:\TEMP';
System altered.
SQL> create tablespace XXX;
Tablespace created.

result in the creation of the file D:\TEMP\ORA_XXX_ZY2TFB00.DBF which will be used to hold all of the data stored in that tablespace. This file has been created with the default size of 100 Mb and with autoextend on, but both of these attributes could have been overridden in the create statement without having to specify a file name. If the tablespace were now to be dropped using the SQL*Plus command

SQL> drop tablespace xxx including contents;
Tablespace dropped.

then not only would storage segments (such as tables) in that tablespace be dropped, and the tablespace removed from the data dictionary, but Oracle would also delete the operating system file that it created during the create tablespace operation.

The ability to create the required files by default is also supported in the create database command.

Such functionality may not be of immediate use to many Oracle DBA's who are well-used to allocating file space for Oracle's "container files" and who also expect to have to take decisions on where in the file system such data should be located for capacity and load balancing. However for a group trying to write install and maintenance procedures for an Oracle database that is to run on many machines whose device configuration is unknown in advance, the facility to allow Oracle to name the files means one less important error-prone step to be carried out by the installer.

Oracle Managed Files may be a minor misnomer as for all normal operating purposes Oracle "manages" the file in the same way as any other database file, but the functionality will be very helpful to a number of third party software suppliers.

Conclusions

From initial experience of the self-management features discussed in this paper, the early winners are the facility to allow Oracle to determine how much memory to allocate to individual sessions and the ability to set CPU time allocation targets for groups of users. In addition automatic undo management and Oracle managed files both look promising but are perhaps more likely to be adopted for new databases than as part of an upgrade strategy as the problems that they solve should already have been overcome in any pre-existing production application.

The ability to have Oracle report unused indexes looked attractive at first sight but was not found to be useful in practice. The spfile mechanism, which is provided to allow administrators to make persistent changes to instance parameters through the alter system command, should be extremely attractive. Unfortunately in the first production release of Oracle9i the feature has a number of failings and should be used. The associated ability to dynamically resize Oracle's caches is useful, but somewhat incomplete. It does not support all of the caches, and also it does not allow increase of the total amount of memory allocated to the caches.

---

Dave Ensor is a Product Developer with BMC Software where his mission is to produce software solutions that automate Oracle performance tuning. He has been tuning Oracle for 13 years, and in total he has more than 30 years' active programming and design experience.

As an Oracle design and tuning specialist Dave built a global reputation both for finding cost-effective solutions to Oracle performance problems and for his ability to explain performance issues to technical audiences. He is co-author of the O'Reilly & Associates books Oracle Design and Oracle8 Design Tips.

Contributors : Dave Ensor
Last modified 2005-02-18 11:11 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?