Digging at the Segment Level: Performance Diagnosis Reaches a Deeper Level

by Arup Nanda

Oracle 9i Release 2 provides a very useful way to find out performance metrics at the segment level, hitherto impossible, enabling DBAs to pin point problems to a specific segment.

Toolbox: Oracle 9i Release 2 RDBMS. No special tools needed.

User Expertise Required: Advanced DBA Skills.

The biggest problem faced by any Database Administrator (DBA) trying to diagnose a problem is the lack of system statistics at a very granular level. STATSPACK report gives a very detailed performance metrics profile but that is at the system level only. Although that provides enough information on the overall health of the system, it does not provide the DBA with the information on specific objects that experienced or contributed to the problem, especially in areas where the problems are storage- and data access-related. For example, a typical performance problem is caused by heavy buffer access activity that might be the result of a lopsided index or just plain data distribution in a table, producing a wait event called "buffer busy waits." The STATSPACK report or a peek into the V$SYSSTAT view alerts the DBA that such an event occurred, but it does not indicate the specific object on which that event occurred, leaving the analysis in limbo. If that event occurs when the analysis is going on, then the exact segment can be ascertained from the P1 and P2 parameters of V$SESSION_WAIT view. However, as it usually happens, the suffering sessions are either completed or blown out to conserve resources, and, thus, the evidence disappears along with them.

Oracle 9i Release 2 provides a set of very useful performance views that allow drilling down to the segment level, not just system level, even after the event has occurred. For instance when you see a lot of buffer busy wait events in a STATSPACK report, you can then drill down further to find out which segments contributed to this wait event. This enhances the problem solving process immensely as the segments can be identified without realtime monitoring and can then be tuned further.

This article will explore such instrumentation and will present to the reader — specifically the DBA troubleshooting the performance problem — the means necessary to find out the wait events on the specific segments.

Background / Overview

First, I will illustrate the methodology with a case study in tuning. I will start with basic tuning methodology in practice today using STATSPACK report and then accentuate the findings from the data collected from the new views. A typical STATSPACK report has the following lines:

Avg
                                      Total Wait  wait    Waits
Event               Waits   Timeouts   Time (s)   (ms)     /txn
. . . . 
buffer busy waits     3400         0         30    8.8     11.2
. . . .

It shows that the buffer busy waits event occurred 3400 times. In order to tune the buffer busy waits, we could do a lot of things — we could increase the freelist groups and freelists of the segment, or we could rearrange the distribution of the rows in the table in such a way that the blocks are not repeatedly picked up at the same time from two different session. However, to do either of these, it's necessary to know the exact segment that to be tuned. The STATSPACK report does not tell us which objects contributed to the buffer busy waits event, and without the segment level information, the tuning cannot possibly continue. Traditionally, we would place event 10046 for each of the sessions and see all the wait events in the generated trace files, which tend to be extremely large. In a typical system, which may contain several hundred applications, this approach may not be feasible. Additionally if the applications connect through Multi Threaded Server, it becomes difficult to isolate single segment level problems even if trace analysis is possible.

This information is now obtained from the new performance view V$SEGSTAT and V$SEGMENT_STATISTICS.

Setting the Statistics Levels

In order for Oracle to collect those statistics, you must have proper initialization parameters set in the instance. The parameter is STATISTICS_LEVEL and is set in the init.ora. The good news is that this is modifiable via ALTER SYSTEM command and some underlying parameters are even modifiable via ALTER SESSION. This parameter can take three values:

BASIC: At this setting Oracle des not collect any stats. Although this is not recommended, you may decide to set this in a fine-tuned production system to save some overhead.
TYPICAL: This is the default value. In this setting, Oracle collects the following statistics.

Buffer Cache — These statistics advise the DBA how to tune the multiple buffer pools. The statistics can also be collected by setting another parameter DB_CACHE_ADVICE independently using initialization file, stored parameter file, ALTER SYSTEM or ALTER SESSION. If it's independently set, that setting takes preference over the statistics level setting.
Mean Time to Recover — These statistics help the DBA set an acceptable Mean Time to Recover (MTTR) setting, sometimes due to the requirements from Service Level Agreements with the users.
Shared Pool Sizing — Oracle can provide valuable clues to size the shared pool effectively based on the the usage and these statistics provide information on that.
Segment Level Statistics — These statistics are collected at the segment level to help determine the wait events occurring at each segment. We are interested in these statistics.
PGA Target — These statistics help tune the Program Global Area effectively based on the usage.
Timed Statistics — This is an old concept. The timed statistics were enabled in earlier versions with the initialization parameter timed_statistics. However, the statistic was so useful that Oracle made it default with the setting of statistic_level. It can be set independently, too; and if set, overrides the statistics_level setting.

ALL: In this setting, all of the above statistics are collected as well as an additional two.

Row Source Execution Stats —These statistics help tune the sql statements by storing the execution statistics with the parser. This can provide an extremely useful tool in the development stages.
Timed OS Statistics — Along with the timed statistics, if the operating system permits it, Oracle can also collect timed stats from the host. Certain operating systems like Unix allow it. It too can be set independently; and if set, overrides the statistics_level setting.

If you set these via any of the three methods, Initialization File, ALTER SYSTEM or ALTER SESSION, you can find out the current setting by querying the view V$STATISTICS_LEVEL as follows:

The output is placed in Listing 1.

SELECT ACTIVATION_LEVEL, STATISTICS_NAME, SYSTEM_STATUS, SESSION_STATUS
FROM V$STATISTICS_LEVEL
ORDER BY ACTIVATION_LEVEL, STATISTICS_NAME;

So, set the STATISTICS_LEVEL to TYPICAL either by ALTER SYSTEM or by an initialization parameter file. Do not forget to restart the database if you choose the latter.

Segment Level Statistics Collection

Now that we have set up the collection, let's examine what we can get from there. The main dynamic performance view that is populated is called V$SEGSTAT. Here is a description of the view.

Column Explanation

TS#	Tablespace Number, corresponds to the TS# column in SYS.TS$
OBJ#	The Object ID, which corresponds to the OBJECT_ID in SYS.DBA_OBJECTS
DATAOBJ#	It corresponds to the DATA_OBJECT_ID in SYS.DBA_OBJECTS
STATISTIC_NAME	The most important one, the name of the statistics we are interested in STATISTIC# A unique number to denote each statistics above. This is NOT the same as the V$SYSSTAT statistics number.
VALUE	The current value of that statistic. Please note the value is cumulative, just like the statistic values in V$SYSSTAT. If you drop the segment and recreate it, the value is reset.

As you can see, the columns are somewhat cryptic. Oracle provides another view called V$SEGMENT_STATISTICS which is based on the above view. This view has a lot more columns and is more descriptive with respect to the object identification. In addition to columns like the main view, it also references the names of the tablespace, the object, and the owner etc. so that the user can quickly join the view with actual names.

However, this view is a little slow. It's a better idea to get the object_id from the dba_objects and search based on that. Here is the description of the columns of the V$SEGMENT_STATISTICS view that are not present in the V$SEGSTAT view. The other columns are the same as in V$SEGSTAT.

OWNER	The owner of the segment
OBJECT_NAME	The name of the segment
SUBOBJECT_NAME	If the above is a table with partition, each partition has separate statistics. The partition is referred to as sub-object.
TABLESPACE_NAME	Tablespace where the segment resides
OBJECT_TYPE	Type of the segment, TABLE, INDEX, MATERIALIZED VIEW, and so on.

To find out what all statistics are collected, you can check the view V$SEGSTAT_NAME which describes the statistic name and the number.

Examining Detailed Statistics

Now we will dive in to examine the actual statistics that we populate. Since it lets us examine stats for a specific object, we can query like the following:

      SELECT STATISTIC_NAME, VALUE
      FROM V$SEGMENT_STATISTICS
      WHERE OWNER = 'SCOTT'
      And OBJECT_NAME = 'SALES';

This provides an output similar to Listing 2. Most of these wait events are self-descriptive. Once again, these are cumulative; so, the numbers go up as more operations continue on that segment. In addition, like any system level statistics, these statistics are deleted when the database is shutdown.

These segment level statistics break down the mystery surrounding the statistics collected from V$SYSSTAT or from STATSPACK reports. When baffled with a number of wait events that have already happened, the DBA can fall back on these statistics to dig deeper and identify the exact segments that experienced these waits which in turn contributed to the overall system wide wait for that event.

Improvements

With these basics placed in already, let's try to improve the collection and reporting methods to further refine the performance examination. This can be done by creating our own view in the same line as that provided by Oracle but with a little enhancement. Examining the view definition of V$SEGMENT_STATISTICS, we note that the view refers to an internal table called X$KSOLSFTS. This internal table has a very useful column - the time when the statistics were collected. This column, FTS_STMP, can be used to our advantage to provide further information on the wait events. A new view, called SEGSTAT_WITH_TIME is built from the definition of the V$SEGMENT_STATISTIC, identical to it except for the inclusion of a new column called TIME_STAMP. The view creation script is provided in Listing 3. The TIME_STAMP column can let you know if the statistics are stale and help you decide whether you should rely on them completely. The other important column this view adds is the INSTANCE_ID, which identifies the instance in a Real Application Cluster (RAC) environment. This view also takes away all but the most useful columns.

Case Study

The usefulness of the segment-level statistics can be best illustrated by a case study. Here we will create a wait scenario and then diagnose that with the segment level statistics. In the process, we will discover the facilities brought forth by Oracle 9i Release 2 that were missing in earlier releases. Please note that although the case study simulated the problems as expected when tested by the author, it is not guaranteed to produce the same behavior elsewhere. However, it should be able to help the reader understand the methodology.

Our example system is of OLTP nature. We are seeing consistent performance degradation and the objective of the exercise is to identify the problem and eliminate it. We have taken STATSPACK reports and they show high waits for "buffer busy waits" event. However, since the report does not provide information on specific tables or indexes that experience these waits, we can't start the process of segment tuning. Under Oracle 9iR2 this is possible.

For the sake of demonstration, we have a table called SALES. The table is created asper the script in Listing 4. We will initially populate the table using a script in Listing 5. Examining closely the Listing 5, you will notice that the customer_id column values are loaded one bunch at a time, making the records of a particular customer_id concentrated in a few blocks. Therefore, during an update where the records are picked up in the customer id sequence, they will be very much likely to be picked from the same block by two different sessions. The test case transaction is described in Listing 6, named stress.sql. This program, a simple PL/SQL script updates records with either the odd or even numbered sales_trans_id depending upon the parameter passed to it, for each customer_id from 1 to 60. This script is run from two different sessions.

The parameter passed is 1 from one session and 2 from the other, e.g. @stress 1. If the sessions are kicked off at the exact same time, both sessions will operate on the same customer_id but on different records due to the odd and even numbered sales_trans_id values, eliminating locking. However, both sessions will most likely try to update the records in the same block, because the records are arranged in the customer_id order and both the scripts access the records for the same customer_id. This will create a buffer busy waits scenario that we will identify and eliminate.

Once the table is loaded, execute a STATSPACK report collection. Typically in a production scenario, you would have enabled the jobs to run STATSPACK regularly. To collect the statistics, you would have to login as the STATSPACK user, usually PERFSTAT and issue a command EXECUTE STATSPACK.SNAP. This provides your baseline collection stats.

Now run the stress script from two different sessions, with parameter 1 in one session and 2 in other. For attaining the same time execution, kick them from a scheduler like cron in unix or AT command in Windows. After they are run, collect the STATSPACK statistics again by issuing EXECUTE STATSPACK.SNAP. To generate the report, run the script spreport.sql under $ORACLE_HOME/rdbms/admin directory which will ask you the snap_id for the collections. Give the snap_ids just before and after the stress script. An excerpt from the generated report has been provided in Listing 7. Under the Section "Top 5 Timed Events", we note that "buffer busy waits" is one. The system waited 3378 times for 49 seconds, about 2.83% of all the waits times.

Armed with the information we have to unearth the segment that experienced this wait event. Before Oracle 9iR2, it was impossible. In 9iR2, if you have setup the statistics collection by specifying STATISTIC_LEVEL initialization parameter, then it is trivial. You would issue the following query:

SELECT OWNER, OBJECT_TYPE, OBJECT_NAME, VALUE
FROM V$SEGMENT_STATISTICS
WHERE STATISTIC_NAME = 'buffer busy waits'

The result is something like this. Of course, you may see a lot more in your environment.

OWNER  OBJECT_TYPE OBJECT_NAME   VALUE
-----  ----------- ----------  ------	
SCOTT  TABLE       SALES         3302

What we see here is the buffer busy waits were experienced by the table SALES owner by user SCOTT. The figure 3302 also roughly corresponded to the figure we obtained from the STATSPACK report. You immediately know that the problem lies in the table SCOTT.SALES. In Pre-9i Release2 Oracle databases, this information would have been impossible to get. In an actual production system, you would probably see a lot more tables with the buffer busy waits and the sum of all will correspond to the figure obtained from STATSPACK report. This gives the DBA ability to pin down the segment either that is a victim of a wait event or a creator of one and to take corrective action.

Solution

In the above example since we identified the offending segment, we will take corrective steps to fix the problem. If you notice the buffer busy waits were because two sessions were trying to update the same block at the same time. This can be easily solved by making the distribution more even. In addition, by making sure a block is less packed, we can reduce the likelihood that a block will become hot. As a solution, we will recreate the table with smaller PCTUSED and larger INITRANS and MAXTRANS parameters. This will make the table less dense. The table creation script is provided in Listing 8. Next, we will load the table in a different way as listed in Listing 9. Examine the script closely. It loads the customer_id values one after another until the maximum of 60 is reached and the cycle is repeated. This type of loading eliminates the likelihood that a particular block will be chosen at the same time by two sessions if the customer_id is the same.

After this change, execute STATSPACK.SNAP again and note the value of VALUE in V$SEGMENT_STATISTICS for the table SLAES. Since the value is cumulative, you will need a reference value to compare. Now run the stress.sql script from two sessions the same way before, with parameter 1 and 2. Finally, take STATSPACK reports again and see the buffer busy waits statistics. It will be much less. Now examine the V$SEGMENT_STATISTICS view for the table SALES; it should be much less too.

Conclusion

Oracle 9i Release 2 provided one of the best tools a DBA can possibly have, to drill down to the segment level for analysis and diagnosis of wait events, even after the fact that the wait event was experienced. This tool goes a long way in performance related troubleshooting, which was impossible till this time.

Some common wait events like free buffer waits, etc. are not present in the V$SEGSTAT, Hopefully Oracle will provide them in the future releases. This is no doubt an important step in the direction where performance diagnosis becomes a little easier for the DBA community.

For more information here are some links to learn more: Oracle 9i Release 2 Manuals at http://otn.oracle.com/docs/products/oracle9i/doc_library/release2/index.htm. Search on v$segstat or v$segment_statistics for more information.

Arup Nanda is the founder and President of Proligence, a specialized Oracle database services provider in the New York metropolitan area which provides tactical solutions in all aspects of the an Oracle project life cycle. He has been an Oracle DBA for more than nine years has touched almost all types of database performance issues. He specializes in Oracle performance evaluation and high availability solutions.

Contributors : Arup Nanda
Last modified 2005-03-02 11:15 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?