10G Optimizer Statistics Gathering
Part 1 of this three part series will provide a high-level overview of optimizer statistics. Part 2 will focus on maintenance windows that control when automatic statistics gathering jobs run. In part 3, we’ll learn how to use 10G EM R2 to ensure that the optimizer has the correct statistical information it needs to generate optimal access paths.
So little is known about the inner-workings of the cost based optimizer that is often called the Oracle database's "little black box." The optimizer's job is to analyze the statement being executed and determine the most efficient access path to the data for that statement. Craig Mullins provides an excellent description of a database optimizer in The DB2 Developer's Guide "the optimizer is equivalent to an expert system. An expert system is a standard set of rules when combined with situational data can return an expert opinion."
The cost based optimizer uses statistics generated by the DBMS_STATS procedure as the situational data when creating its expert opinion on which access path to the data is most optimal. These statistics are stored in the data dictionary and describe the data object's space characteristics, data uniqueness and data distribution.
The cost-based optimizer is only as good as the statistics it uses as input. Statistics collections should be run on a regular basis to ensure that the statistics are current (representative of the data being accessed). The optimizer is then able to create a highly accurate access path that is based on the least cost. If statistics are not available, the optimizer uses an algorithm to calculate the statistics, which often leads to "less than optimal" access paths. In other words, Oracle guesses and it is usually not a very good guesser.
You can start your education on the different access paths that are available to the optimizer by reading Oracle's Database Performance Tuning Guide that is provided in each Oracle release's documentation. Before you buy third-party books on any topic, I highly suggest that you read Oracle's documentation first.
CPU and I/O
Performance Statistics
Before we begin our discussion on object statistics, let's spend a few minutes
discussing the system performance statistics that are used as input to the optimizer.
Don't confuse system performance statistics with database performance statistics
that are accessed using the V$ tables. We are discussing the system's CPU and
I/O performance and utilization.
In Oracle 10G, the optimizer takes CPU and I/O costs into consideration during query optimization. If the optimizer understands what the performance characteristics are for I/O and CPU, it will be more effective at query optimization. Current system statistics allow the optimizer to make more intelligent decisions during the optimization process, which ultimately results in higher quality access paths being generated.
Unlike object statistics, administrators must continue to gather system statistics manually in Oracle 10G. This is done by executing the DBMS_STATS.GATHER_SYSTEM_STATS procedure. The key to gathering good system statistics is to run the procedure when a normal workload is being executed. Running it when the workload is light, or too heavy, can result in poor access paths being generated.
If you would like to learn more about how system statistics affect the Oracle optimizer, please refer to Jonathan Lewis's excellent article titled Understanding System Statistics on Oracle Technet. Please pay special attention to Lewis's information pertaining to each release of the database.
Object Statistics
As we learned previously, object statistics describe the contents of the data and how it is stored in
the physcal database objects. Before we continue, it may be beneficial for me
to provide you with a few examples of how these statistics could affect query performance.
If a table only contains a few hundred rows, queries may run faster if the optimizer chooses to read all of the blocks in the table as opposed to using an index. The I/O generated traversing the index blocks to get to the table row entries would be higher than if Oracle read just the blocks allocated to the table being accessed.
What if we access a table using a column with poor cardinality? Cardinality describes the uniqueness of a particular value stored in a column. Poor cardinality means that the column has many values that are the same. If our statement contains a WHERE clause that searches for a column value that is contained in 90% of the table's rows, it is best that Oracle, once again, read each and every row in that table.
Conversely if a WHERE clause searches for a column value that appears in 1% of the table rows, it would be beneficial for the optimizer to choose an index. The optimizer also uses object statistics to determine which join method should be used and the order in which the joined tables are accessed.
Here are just a few of the object statistcs that the optimizer uses as input during access path generation:
- Number of rows in the tables being accessed
- Number of blocks allocated to the table
- Average table row length
- Number of distinct values for a column
- Number of nulls in a column
- Number of leaf blocks allocated to indexes
- Number of leveles in the B-Tree
- Clustering factor - how the order of the index matches the order of the table rows
As I stated previously, the intent of this blog is to not be an all-inclusive primer on the optimizer, but we should now understand why current object statistics are so important to the generation of efficient query access paths.
Gathering Object
Statistics
Oracle provides the DBMS_STATS package to allow administrators to gather, modify,
view, export, import, and delete statistics. The DBMS_STATS package gathers
statistics on columns, tables, indexes and partitions and stores the results
in Oracle's data dictionary. The optimizer reads the object statistics stored
in the data dictionary durinq query optimization.
After DBMS_STATS is executed, current statements in the shared pool that access the newly analyzed tables are invalidated. This forces the statement to be re-parsed which will allow the optimizer to generate access paths based on the new statistics.
There are many different options available in DBMS_STATS. Administrators are able to select which objects to analyze, how much of each object to analyze (large tables may take a loooonng time to analyze), select the number of parallel processes that will perform the analyze, etc. To learn more about all of the options available, please refer to Oracle's Database Performance Tuning Guide and PL/SQL Packages and Types reference manuals that are provided in each Oracle release's documentation.
10G Automatic
Statistics Generation
In releases prior to Oracle10G, administrators were required to schedule DBMS_STATS
jobs on a regular basis to ensure that valid statistics were available to the
optimizer. In releases prior to Oracle9i, it was necessary to "guess" how much of the
data changed to determine if statistics collection was necessary.
In Oracle9i, the GATHER AUTO option of the DBMS_STATS procedure could be used to help determine if statistics generation was required. If more than 10% of the rows changed in the table since the last analyze was performed, the DBMS_STATS procedure (with the GATHER_AUTO option activated) analyzed the table.
By default, Oracle10G automates these tasks by evaluating the statistics for all of the tables in the database and running analyze when required. Oracle10G's default maintenance window is nightly from 10 PM to 6 AM and all day on weekends. During these time periods, statistics are automatically collected using the GATHER_STATS_JOB procedure. The maintenance window time-periods can be adjusted to tailor them to each individual application's business processing requirements.
In Oracle Database 10g, Automatic Optimizer Statistics Collection reduces the likelihood of poorly performing SQL statements due to stale or invalid statistics and enhances SQL execution performance by providing optimal input to the query optimizer.
To ensure that 10G is automatically gathering statistics for data objects, you'll need to verify that:
-
The statistics
job is running by executing the following SQL statement:
SELECT * FROM DBA_SCHEDULER_JOBS WHERE JOB_NAME = 'GATHER_STATS_JOB';
- The modification monitoring feature that identifies stale statistics is enabled. This feature is enabled when the STATISTICS_LEVEL parameter is set to TYPICAL (default) or ALL.
Gathering Statistics
Manually in Oracle10G
Now that I have described how wonderful 10G's automatic statistics gathering
feature is, why would you ever want to schedule a statistics job manually? Let
me describe one real-life example from a recent project here at Giant Eagle.
We previously learned that automatic object statistics gathering runs during the database's maintenance window. We have a warehouse system that contains an extremely volatile table. Hundreds of thousands of rows are added and removed around the clock. The table is loaded and literally hundreds of SQL statements are run to allow our marketing personnel to make critical business decisions.
The level of activity is based on the particular business process taking place. At one particular time, the table can contain hundreds of thousands of rows and at other times it can contain dozens. OK, so when do you generate statistics? If you run DBMS_STATS at the same time each day, in most cases you would think you would get a predictable set of statistics generated. Not so in this table's case, sometimes you get hundreds of rows and some times you get hundreds of thousands.
If we are unlucky and generate statistics on the table when it has hundreds of rows, access paths suffer when the table grows to hundreds of thousands. To prevent poor access paths from being generated (a big problem in a data warehouse), we run the DBMS_STATS utility immediately after the table is loaded. The optimizer has current statistics to use as input and generates optimal access paths for our critical business queries.
10G EM R2 Manage
Optimizer Statistics Panel
A new link titled 'Manage Optimizer' statistics has been added to the database
administration home page in Enterprise Manager. Clicking on the link displays
the new Manage Optimizer Statistics home page. The new home page provides wizards
and templates that facilitate and automate optimizer statistics management.
In 10GR2 administrators are able to use Enterprise Manager to gather, restore,
delete, lock and unlock statistics. We'll learn how to use this panel in my
next blog.
Thanks for reading,
Chris Foot
_____
tags:


