Road Map to Oracle Tuning Techniques

by Dave Ensor

Paper presented by Dave Ensor at Oracle Open World, November 1998, San Francisco. The paper discusses the main types of tuning work feasible under Oracle and suggests the circumstances under which they are most appropriate.

Summary

There is a wealth of information on Oracle Tuning in both Oracle's manuals and in books from a number of publishers. Very few sources address the key issues of which techniques are appropriate when dealing with applications packages, and how the available techniques relate to the stages of the application life cycle. The paper discusses the main types of tuning work and suggests the circumstances under which they are most appropriate. A surprising range of options is available to administrators of servers running application packages and in many cases they will be able to apply techniques that are not normally considered feasible. On the other hand, as in-house applications progress through their life cycle, the levels of intervention that can and should be taken against them decrease in scope.

Background

The idea for this paper arose in the fall of 1997 during a presentation of the author's one-day seminar, "Managing Oracle Performance." The seminar covered design issues in some depth, and a member of the audience made the comment that the material seemed oriented to having complete control over the application. He asked whether, in view of the current trend towards the use of applications packages, the approach described in the seminar could really be applied in production at many sites. As far as I can remember, my initial answer was that the techniques had validity in every case but that it might prove difficult to apply many of them without complete control over the application.

It is clear, however, that the customer in a package environment cannot directly apply the most effective tuning techniques. It struck me at the time that it might be worth drawing up a "road map" to show which techniques are most appropriate under which circumstances. Such a map should also be helpful in saving work by showing the tuner or systems administrator which steps are either not worth taking or likely to be simply not feasible.

Performance Measurement

Before starting to review tuning technique, it is important to note that there are two radically different types of measurement that can be taken from an Oracle instance to determine the state of tune an Oracle application under load. The generally accepted terms for these two techniques are ratio-based tuning and wait-based tuning.

Ratio-based Tuning

This is the traditional method of measuring the general state of tune of the instance. The classic ratio is the buffer cache hit ratio that measures the success of the Oracle buffer cache (db_block_buffers) in avoiding the need for Oracle to issue I/O read requests. By far the most important source for deriving ratios is the virtual view V$SYSSTAT that contains the name and current value for around 170 internal Oracle instance statistics. The exact number of statistics varies with the specific release of Oracle, but running Oracle8 Release 8.0.3 under Windows/NT Server 4.0 there are 173 rows in V$SYSSTAT. On most platforms, these values are implemented as 32-bit signed integers that wrap around at 2,147,483,647 with the next value being -2,147,483,648.

Fortunately this wrap around is very rare and is therefore quite simple to handle. If the values are inspected reasonably frequently, it can always be assumed that only one wrap around has occurred since the previous inspection. If a statistic such as physical reads appears to have reduced during an interval, then the actual change should be reported as 4,294,967,296 - (new value - old value).

One of the problems with the ratios is that their target values are difficult to establish, and it is often unclear which of the ratios has the greatest correlation with performance in any given application environment. Despite this, most instance performance monitoring today is performed using ratios, often derived from the Oracle scripts utlbstat.sql and utlestat.sql. When the ratios are recorded in a repository over time, their history can be used to establish the acceptable ranges for the ratios. With this information, action needs be taken only when one or more of the ratios go outside the normal range. Our experience at BMC Software working with customers running operational applications against the Oracle server is that the thresholds should be based on a minimum of one week's operation.

Wait-based Tuning

Wait-based tuning is both a more recent and a more pragmatic approach. Instead of measuring ratios such as the buffer cache hit ratio, initial observation centers on whether resources are subject to waits. The basic source for this information is the virtual view V$SYSTEM_EVENT. Although the Oracle8 Server Reference Manual Release 8.0 defines 117 possible wait events, a high number of these are internal events that are unlikely to be of general interest. High levels of wait against these internal events are well worth drawing to the attention of Oracle Support, although the DBA is unlikely to be able to implement a solution.

In OLTP working, and without MTS, the event with the greatest total time waited should be "SQL*Net message from client" which means that the Oracle server session is waiting for a work request from the application. From the viewpoint of an Oracle tuner, rather than an application tuner, the ideal state is that the Oracle instance is doing such a good job keeping up with user demand that in general it is simply waiting for a user request.

The wait-based approach makes it trivially easy to spot the existence of certain types of bottleneck such as lock conflicts. If the time spent waiting for the "enqueue" event is continually rising, then users are waiting for locks. Drilling down to V$SESSION_WAIT will uncover the specific sessions that are currently waiting for locks and will allow the tuner to determine the resource (typically a table row) for which the session is waiting.

Taking Action -- From the Outside

Having uncovered an issue that is affecting performance, the next step is to take action designed to overcome the problem. It is also essential to have some means of assessing the effects of any changes made. In general, the means used to identify the problem will also serve to observe the effects of changes made. The manual Oracle8 Server Tuning Release 8.0 presents a 10-stage tuning methodology. To minimize the disruption to an existing application, this methodology can be deployed in reverse order, only making changes at a higher level when changes at a lower level prove ineffective. In this paper we look at the later steps of the methodology since once the application is in production, there is typically little that can be done to adjust the overall design.

Tuning the Underlying Platform

This is the lowest level in Oracle's documented methodology, but more often than not, its effect is minimal. It is always worth scrutinizing the platform for inappropriate network parameters (such as routers giving preference to other request types) and for inappropriate virtual memory configurations. For example, if paging rates are high, it may be necessary to spread the page files across disks. Increasingly, sites are using storage arrays that achieve this striping or spreading using statistical techniques and without the need for intervention by the administrator.

Despite its lack of impact, the first waypoint on the tuning roadmap should always be to check the network and operating system for problems that can be overcome without direct intervention at the Oracle level. This might include moving heavily accessed non-Oracle data onto storage not also referenced by Oracle, and separating Oracle's network traffic from that used by other services.

Instance Tuning

Much has been written about tuning Oracle instances, and indeed Oracle provides a bewildering array of init.ora parameters to help the DBA configure the instance. Oracle8 Release 8.0.3 under NT has 193 rows in V$PARAMETER, with an additional 120 "hidden" parameters. Although many applications vendors suggest values for the key tuning parameters, and may demand specific values for feature parameters such as compatible, local administrators are generally free to make changes as required.

Experience shows, however, that for most instances, only five parameters really need to be investigated and one of these, dB_block_size, can be expensive to change, since a complete database rebuild is required. Despite this, it is worth considering because larger block sizes have many advantages.

Historically, Oracle sites tended to benefit from increasing the size of the buffer pool. However, today that message is so well understood that it is now more common to find buffer pools that are excessively large rather than too small. For every application where the database is larger than available memory, there is a value for dB_block_buffers above which there will be little or no reduction in I/O. There is no point in setting the buffer pool larger than this size.

The shared pool exhibits similar behavior and the goal should always be to keep the parameter "shared_pool" down to the value at which the library cache hit ratio stops improving. Any instance that has more than a few Kb of free memory in its shared pool an hour after startup is a candidate for a smaller value, though this may be regarded as a pathological case.

Oracle has to wait for completion of writes to the redo log in a number of circumstances, and in most, whenever a commit takes place. If there is evidence of waiting for the redo log, then log_buffer may be increased, but this will quickly become ineffective and further tuning will require either application change (less commits) or moving the redo logs to a faster or less busy device.

Finally, consideration should always be given to setting log_checkpoint_interval larger than the redo log size though the effect is rarely dramatic.

Changing Parallelism

When either the ratio-based or the wait-based approaches have identified contention for resources, a very simple gambit may be to reduce parallelism. If the application has been poorly designed, or inappropriate use made of Oracle Parallel Server or the Oracle Parallel Query Option, then simply backing off may solve the problem and should be free from side effects.

Thus, if a batch update causes massive interference to OLTP working, then, at least as a temporary workaround, the batch update should never be performed during periods of high OLTP traffic. If this is initially unacceptable to the users, it may help to ask them to evaluate the alternatives and assess how much useful work they will get done under each scenario.

Introducing further degrees of parallelism to increase throughput by soaking up spare resources is an attractive proposition, but must be carefully tested. In addition to the danger of causing contention, further parallelism may show unexpected and unwanted application behavior (also known as bugs). Even if the site has control over the application code, the required fixes may be complex and expensive. This caution applies especially to deployment of the Oracle Parallel Server against an application not specifically designed to exploit it.

Changing Physical Structure

As discussed above, striping is normally best left to a Logical Volume Manager or Storage Array Controller. If it is necessary to distribute I/O activity across drives by hand then the DBA should adopt a broad brush.

Under Oracle8, table partitioning does not have to be application visible and so it is theoretically possible to partition a table or index originally designed to be held in a single segment. This makes distribution of the object across devices easier to manage and may also allow part of the object to be held in read only tablespace to reduce backup times. At the time of writing, the author is not aware of such a conversion having been achieved in practice.

Changing Index Policy

Although in theory it should be possible to change the way tables are indexed without affecting the correct operation of an application, this is not always true in practice. Also many applications vendors are unwilling to support changes to index structure. This is extremely frustrating to a DBA who has shown that a trivial change to the indexing can generate a 100:1 improvement in response time.

The potential pitfalls in changing index policy center around two issues. First, the application may explicitly reference indexes if they are dropped and recreated during batch operations, and secondly changing the indexes will change the optimization of some of the queries. That is usually the motivation for doing it. However, some code depends on rows being returned in a particular order although there is no ORDER BY clause. In this case, changing the index policy can cause the rows to be returned in a different order and destabilize the application. Although the resulting bug should be quite easy to fix (assuming control of the source code) it may not be found until serious damage has been done to the data.

Taking Action Inside the Application -- Considerations

Once the steps cited previously have been exhausted, it becomes necessary to consider changing the application itself. In many cases, this will be either impossible, because the site either does not have the source code or does not have the right to change it, or it will be expensive for any one of a number of possible reasons.

In-house Applications

If the performance problem is found before the application goes into production, radical action to resolve it should always be considered since the problem may be about to get worse and as we shall see below the organisation's freedom of action will reduce through the life of the application. Realistically, however, many severe performance problems are first observed during early production use, and a high proportion of the remainder are discovered during the transition from pilot running to main scale use.

Once the application is in production, any code change should be subject to a formal change control procedure that will have time and cost implications, and also the cost of the proposed change must be balanced against the expected remaining life of the application.

If the application is run at many sites, but the problems are only reported from one site, then studying the differences between the sites should give strong pointers to the nature of the problem. If you decide to change the application, you need to carefully consider change propagation and you should apply a workaround just to the affected site may start to look like a more attractive short-term solution.

Mature Applications

The older the application, the greater the resistance to making structural changes, and indeed the less chance of having either the tools or the expertise to make the changes successfully. Also, if a mature application has performance problems, you can presume that something has changed quite recently. Otherwise the problem would have been reported earlier. In many cases, it will be possible to reverse out the change even if it is simply an increase in load (number of users, number of transactions, or table size). Table sizes may be controllable by applying an archiving policy for removing old data.

If the load cannot be reduced, it will be necessary to locate and remedy some feature of the application. An effect such as lock contention may exhibit exponential rather than linear growth with load. Such effects often require significant design changes to resolve.

Taking Action Inside the Application -- Finding the Target

Once the decision has been made to make internal application changes, the first step is to identify the areas in which changes are likely to yield benefits. On the assumption that we are trying to decrease server load rather than client load, the identification process will normally be driven by a study of what is being executed. There are many techniques that allow an investigator to quickly identify the calls to the server that are candidates for tuning. One of the simplest is to search the virtual view, V$SQL for statements that have made high numbers of block visits.

SQL Tuning

In addition to changes to indexes discussed earlier, the normal approaches here are to experiment with optimizer hints and try alternate forms of the statement. In the special case where the statement is encapsulated within a view, it will be possible to replace the statement by dropping and recreating the view without having to change the application itself.

Implementing Caches

An effective alternative to speeding up the original SQL statement is to retrieve data less often by providing caches within the application. Some client tools have excellent support for manipulating arrays, whereas others make it a major development task to hold and join to a reference table in client memory.

Changing Table Structures

The design of the tables used to support the application rightly has a central place in Oracle's published methodology. However, in practice, changes are expensive to make once any substantial portion of the code has been written unless high levels of encapsulation have been employed.

One major exception is the introduction of derived fields for attributes such as account balances and date of last transaction. Using triggers to maintain these values, tables containing the new attributes can be handled as before by code that is unaware of the new columns, which can be exploited by code that is aware of them. This in turn implies that the derived values are maintained by database triggers, which can themselves cause performance problems. Adding columns will also invalidate any queries that contain a select * from an affected table. Of course, there should not be any of these but that does not mean that they do not exist.

Despite the fact that their code will almost certainly be unable to reveal that a derived field has been added, applications vendors are unlikely to look favorably on the practice.

Conclusion

When intervention is required to improve performance, the environment will determine, or at least appear to determine, where changes can reasonably be made. The earlier the application is in its life cycle, and the more control the tuner has over the application itself, the more likely it is that it will be possible to take steps that have radical impact on the performance.

Changes to the operating schedule should be feasible at any stage, even if the application is a completely closed package. However, political issues within the user hierarchy may prevent deployment of changes to work flow, and the solutions to this problem are outside the scope of this short paper. To show what can be achieved with care and determination, this paper ends with a short case study.

How I got the idea for this paper is outlined in the introduction. In fact, a few weeks after a visit to Germany, I called on a major retail chain in the UK to advise them on performance issues they were suffering with an application package.

After only a few minutes of discussion, it became apparent that they were having great success sampling V$SGA to find the most resource consumptive statements. They were then matching the users executing these statements with application functions to infer which component was likely to be issuing the statement. They were also using their training instance to experiment with alternative indexes and to execute alternative versions of SQL statements inside a test harness. The results of these experiments were reported back to the application vendor at normal account meetings. The vendor had agreed to "study a number of the recommendations made by the customer and adopt them if they were proved to be feasible." For one glaring coding error in an SQL statement, the vendor took less than 24 hours to ship a patch containing the suggested change.

There are appropriate techniques for different environments, but at the end of the day, there will always be one and only one approach that is going to yield the greatest benefit. If you need that level of benefit, and the approach that will achieve it is not open to you directly, then you have a political problem to overcome in addition to your technical problem. Provide strong evidence for the action required, and use an experienced negotiator to "motivate" the vendor to comply.

---

Dave Ensor is a Product Developer with BMC Software where his mission is to produce software solutions that automate Oracle performance tuning. He has been tuning Oracle for 13 years, and in total he has more than 30 years' active programming and design experience.

As an Oracle design and tuning specialist Dave built a global reputation both for finding cost-effective solutions to Oracle performance problems and for his ability to explain performance issues to technical audiences. He is co-author of the O'Reilly & Associates books Oracle Design and Oracle8 Design Tips.

Contributors : Dave Ensor
Last modified 2005-02-18 11:03 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?