Oh! I See a Problem

by Jonathan Lewis

There are several assumptions built into the Cost Based Optimizer that can make it generate an unsuitable plan. Oracle Corp. keeps introducing refinements to the optimiser code to work around this problem, and one of the most popular fixes is the parameter optimizer_index_cost_adj. Unfortunately, if you don’t know what it is fixing, and how it fixes it, fiddling with this parameter can cause as many (if not more) problems than it solves.

This article is mainly relevant to Oracle 8i as the feature it describes has been largely (though not completely) superseded in 9i.The examples and discussions are still relevant to 9i, though some minor modifications may be needed.

How Does the CBO Work?

One of the basic building blocks of the Cost Based Optimizer is the formula for calculating the cost of using a B-tree index to access a table. As far as I know, this formula was first published by Wolfgang Breitling¹ at IOUG-A 2002:

cost = 
	blevel +
	ceil(selectivity * leaf_blocks) +
	ceil(selectivity * clustering_factor)

In this formula, the blevel, leaf_blocks, and clustering_factor are the values stored in the user_indexes view for the index under investigation. The selectivity is the fraction of the table that the predicates on that index should return. (There are several variations in the way the formula is used; for example, there are really two possible selectivities, but for simple cases this is a reasonable starting point).

Essentially, this formula tells Oracle the number of block visits (see side bar) that a process will have to make to collect all the data from the table. If the selectivity (fraction of the table’s data) is X, then it seems reasonable that the number of index leaf blocks to visit will be X% of the total; and the number of table block visits will then be X% of the clustering_factor. Add on the fact that you have to walk from the root to the leaf level just once, and you have the formula.

The other important formula is the cost of a tablescan:Cost = High Water Mark / (adjusted db_file_multiblock_read_count).

Note: Bear in mind the number of block visits is not necessarily the same thing as the number of blocks that need to be visited. I could revisit the same block many times. There are several places where it is necessary to split hairs to this degree to ensure that you understand what Oracle is counting. Consider, for example, the statistic “db block changes.”

There are two critical, but false, assumptions that Oracle makes when trying to decide between an indexed access path and a tablescan (see my article on DBAZine² for more details on the relative costs):

For an indexed access path, every block visit will be a disk reques
A single block disk request has the same cost as a multi-block disk request

We can see this from the formulae: the cost of the indexed access path counts (single) block visits, the cost of the tablescan counts multi-block read requests — but Oracle compares the costs one-to-one.

These assumptions are likely to cause problems for the optimiser because they are rarely true for normal systems. As a general rule, the assumptions will result in increasing numbers of (unsuitable) tablescans as the db_file_multiblock_read_count is set to higher and higher value. (Since it is common practice to set this parameter to 128 on rule-based systems, it is not too surprising that DBAs attempting to convert to cost based often see poor performance and revert very rapidly to rule based).

Enter the Hero

In Oracle 8.1.6, Oracle Corp. introduced a pair of parameters that gave you some chance to modify the optimiser’s calculations — the optimizer_index_cost_adj (OICA), and the optimizer_index_caching (OIC) parameters. The first significant document about these parameters is probably Tim Gorman’s³, which describes their behaviour, gives a rationale for using them, and a method for deriving realistic values. There is also a well-known note by Garry Robinson⁴ that highlights the oddity that can arise in some systems if you adopt an over-simplistic mechanism for trying to work out a value for the OICA.

In this article, I will be addressing only the optimizer_index_cost_adj (OICA), and the side effects of treating it too casually.

The basic idea of the OICA is that you can use it to tell the optimiser that the cost of a single block read is less than the cost of a multi-block read. For example, a value of 25 tells the optimiser that a single block is one quarter (25%) of the price of a multiblock read.

If you set the OICA to a suitable figure (which basically can be balanced against the db_file_multiblock_read_count), then the optimiser will tend to switch from unsuitable paths involving tablescans to more suitable paths involving indexes. Moreover you will see a better correlation between the cost of a query and the time for that query to complete.

But how can you work out a suitable figure for the parameter?

On a well-written system, it might be possible to compare the multiblock disk read times and single block disk read times (“db file scattered read,” “db file sequential read”) from v$system_event. This could give you some indication of the effective performance of your hardware under a real load.

However, as Garry Robinson⁴ shows, a few very badly behaved sessions protected by a file system buffer could give very misleading results at the system level. To work around this problem you might choose to work with the same events, but use the v$session_event view to identify and eliminate the rogue sessions.

Another option is simply to measure the hardware performance with the database shut down. Many years ago, I wrote a very simple C program to read and write to a file using the same calls, the same I/O sizes, and the same byte-boundaries as an Oracle executable. More recently I have found a Web site (www.iozone.org) that has source code to do the same (only better) for many different platforms. I haven’t experimented with it yet, but the manual looks very promising.

So What’s the Problem?

Whenever you get more of the truth into the optimiser, it does a more appropriate job of optimising your SQL. It seems that the OICA is a wonderful way to make sure that Oracle has a better understanding of your system. So how can things go wrong? Surely you should be able to set the OICA to a value that gets rid of any tablescans that should not be happening.

Indeed, the idea became so popular that a few months ago it looked as if every performance issue on Metalink’s Server Forum got the response (from some other members of the forum, not Oracle support): “Set the OICA to 10 and the OIC to 90 and all your problems will go away.”

Unfortunately, it’s not that simple.

First, there is no“magic number” that is correct for all systems; you have to work out a fairly realistic number for your system.
Secondly, the load on your system may vary through the day and the relative costs of single and multi-block reads may vary significantly depending on the workload; you may have to set up a mechanism to force parameter changes onto sessions during the course of the working day.
Thirdly, the approach generally taken for“guestimating” OICA uses the timing for single and multi-block reads, but doesn’t allow for the fact that a block visit doesn’t necessarily turn into a disk read request (although it is possible to argue that this changes in 9i. There is also the argument that the OIC (optimizer_index_caching) parameter caters for caching to some degree, anyway, although it applies only to index blocks when they appear against the inner (second) table in a nested loop join).
Finally, most people describe the OICA as the parameter that makes Oracle prefer index range scans to tablescans. That’s not sufficiently true; that’s only one of the side effects of what it really does; the other side effect can be a lot nastier.

You may get lucky, and a value for OICA of around 30 is often about right, and reasonably safe, for an OLTP system with the default value for the db_file_multiblock_read_count. But it’s not a guarantee of improved performance.

Test Case

To see what can go wrong, try running the following script on 8.1.7.4, with an 8K block size, and locally managed tablespace with a uniform extent size of 1M. The db_file_multiblock_read_count should be 8 the OICA and OIC parameters should be left to their default values. (It’s probably not necessary to be so restrictive in defining the set-up, but sometimes a little detail can waste a lot of time — the most significant thing in this case is the value of the db_file_multiblock_read_count).

drop table t1;

create table t1 (
     n1 number(5),
     n2 number(5),
     small_vc varchar2(10),
     padding varchar2(100)
     );

insert into t1 
     select
     trunc((rownum-1)/4), -- use 5 for 9i
     mod(rownum,1250), -- use 1000 for 9i
     lpad(rownum,5),
     rpad('x',100,'x')
     from all_objects
     where rownum <= 5000;

create index t1_good on t1(n1);
     create index t1_bad on t1(n2);

execute dbms_stats.gather_table_stats(user,'t1',cascade=>true);

This script generates the test data. You will note particularly that there are two indexed columns: n1 with four rows per value tightly clustered, and n2 with four rows per value, but widely scattered. I have named the indexes on these columns t1_good, and t1_bad to indicate the potential differences in the cost of using them.

Now enable autotrace and run the following script:

select	small_vc
from	t1
where	n1 = 55
and		n2 = 55;

select /*+ index(t1 t1_bad) */
     small_vc
     from t1
     where n1 = 55
     and n2 = 55;

alter session set optimizer_index_cost_adj = 20;

select small_vc
     from t1
     where n1 = 55
     and n2 = 55;

In all three cases, I run the same statement, which could use either index to access the table and acquire the data. In the second case, though, I force Oracle to use the lower quality index, just to show the relative costs. The three execution plans you get should look like this:

Default plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=1 Bytes=14)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=2 Card=1 Bytes=14)
   2    1     INDEX (RANGE SCAN) OF 'T1_GOOD' (NON-UNIQUE) (Cost=1 Card=1)


     Hinted plan
     ----------------------------------------------------------
     0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=1 Bytes=14)
     1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=5 Card=1 Bytes=14)
     2 1 INDEX (RANGE SCAN) OF 'T1_BAD' (NON-UNIQUE) (Cost=1 Card=1)


     Default plan with oica = 20
     ----------------------------------------------------------
     0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=14)
     1 0 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=1 Card=1 Bytes=14)
     2 1 INDEX (RANGE SCAN) OF 'T1_BAD' (NON-UNIQUE) (Cost=1 Card=1)

Note in particular:

The default plan is to use the “good” index.
Oracle can use the more expensive “bad” index when hinted
With OICA set to 20, Oracle uses the wrong index without being hinted.

What’s gone wrong? The answer is a combination of rounding errors, optimiser internal strategy, and naming conventions.

The default cost of using the good index is two because Oracle expects to visit one index leaf block and one table block to pick up the relevant data. 1 + 1 = 2.

The default cost of using the bad index is five because Oracle expects to visit one index leaf block and four table blocks to pick up the relevant data. 1 + 4 = 5.

The cost of using the good index when OICA = 20 is 0.2* 2 = 0.4 which gets rounded up to one. (The calculations seem to be slightly different in 9i, and suggest a small, but significant, change in the purpose of OICA)

The cost of using the bad index when OICA = 20 is 0.2* 5 = 1.0

So there’s a tie on t1_good and t1_bad. Because we’ve scaled down the cost of single block reads and the optimiser has rounded our calculations up at a critical point, the two indexes have the same cost of use. So what does the optimiser do for the tie-break? It examines indexes in alphabetical order of name – so t1_bad beats t1_good. (You may recall that in earlier versions of Oracle, the optimiser appeared to use the index with the highest object id – this has changed.)

Bizarre though it may seem, if you are going to play around with OICA you may find that you need to adopt an index naming convention that makes your indexes appear in order of desirability when you list them alphabetically. And that’s not an easy thing to do, especially when you don’t necessarily use every column in an index for every query. And, of course, you don’t want to start renaming indexes too casually if you’ve been using index hints in your SQL.

Conclusion

Oracle gives you many features, but often the precise description of the costs and benefits of those features is not available. If you understand what the feature does and how it works, you will probably be able to take advantage of it. If you don’t understand, then you may find that it catches you out most unexpectedly. Be very careful with the optimizer_index_cost_adj parameter; it does much more than simply persuade the optimiser that indexes are nicer than tablescans.

References

Breitling, Wolfgang. “A Look under the Hood of CBO: The 10053 event.”
Gorman, Tim. “The Search for Intelligent Life in the CBO.”
Lewis, Jonathan. “Why Isn’t Oracle Using my Index?”
Robinson, Garry. “The Importance of optimizer_index_cost_adj.”

Jonathan Lewis is a freelance consultant with more than 18 years' experience of Oracle. He specialises in physical database design and the strategic use of the Oracle database engine, is author of Practical Oracle 8i - Building Efficient Databases published by Addison-Wesley, and is one of the best-known speakers on the UK Oracle circuit. Further details of his published papers, presentations and seminars and tutorials can be found at http://www.jlcomp.demon.co.uk, which also hosts The Co-operative Oracle Users’ FAQ for the Oracle-related Usenet newsgroups.

Contributors : Jonathan Lewis, Wolfgang Breitling, Tim Gorman, Garry Robinson
Last modified 2005-02-24 10:06 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?