Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » DB2 » DB2 Distributed (UDB) Articles Archive » So You Want an IOT
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 1984
 

So You Want an IOT

by Peter Gulutzan and Trudy Pelzer

An Index-Organized Table (IOT) is not a Heap-Organized Table (HOT). You must treat them differently if you do many bulk loads, range scans, data changes, or lookups.

The Example Table

Here’s a definition of an innocent-looking HOT:

CREATE TABLE Table1 (
  column1 CHAR(50),
  column2 VARCHAR2(50),
  column3 NUMBER NOT NULL,
  CONSTRAINT pk_t1_index PRIMARY KEY (column1)) ...

CREATE UNIQUE INDEX uk_t1_index ON Table1 (column3) ...

CREATE INDEX t1_index ON Table1 (column2, column1) ...

Presume you have good reasons for converting this HOT to an IOT:

      • Because the IOT theory makes sense to you. For the primary key column1, instead of index keys pointing to rows, you’ll have keys plus rows in the same physical place. So queries like: SELECT ... WHERE column1 'x' should require one less I/O. In addition, the column1 values need to be stored only once, which saves disk space and makes data allocation or maintenance easier.
      • Because you’ve heard that IOTs work in practice. Oracle cited examples of successful IOT conversions at Lycos and Eli Lilly, with Oracle8i. The new features of Oracle9i make a wider range of applications possible.
      • Because you know that IOTs might be faster. Other people’s tests show IOTs running 15 percent to 400 percent faster than HOTs.

So you convert the HOT to an IOT by plunking an ORGANIZATION INDEX clause at the end of the CREATE TABLE statement. What could go wrong?

What If: There Are Many Bulk Loads

Oracle starts a bulk load by sorting the index keys for the new rows. The sort speed depends on the defined size of the key, because Oracle wants to load and interchange keys in memory. So a HOT sort, with a primary-key index key size equal to LENGTH(column1), is going to beat an IOT sort, which by definition has a key size equal to LENGTH(column1) + LENGTH(column2) + LENGTH(column3) + LENGTH(column4). Expect the IOT sort to take 1.5 times longer.

There are three ways to make the IOT sort faster:

      • Specify a big in-memory sort area with the SORT_AREA_SIZE parameter.
      • Pre-sort the rows. You only have to presort by column1, which (again by definition) is unique.
      • Redefine column2. Perhaps someone defined it as VARCHAR2(50) when only 25 characters are really needed. Allowing extra space in a VARCHAR column is usually harmless — after all, the definition doesn’t affect the storage — but it does affect sorts.

That last point illustrates a theme you’ll see more than once: Column definitions that are harmless in HOTs might be harmful in IOTs.

What If: There Are Many Range Scans

Here’s a striking result from another IOT-versus-HOT comparison test. For the query:

SELECT ... WHERE <primary key> = 'X'

the IOT was 15 percent faster ... but for:

SELECT ... WHERE <primary key> BETWEEN 'X' AND 'Y' 

the IOT was 300 percent faster. The query type that benefits most from IOT conversions is not an “equals” search, it’s a range scan.

Here is something that might help explain why this is so. Each Oracle IOT block contains a pointer to the previous block and a pointer to the next block. So it’s easy to skip from one block to the next during a multi-block scan. Another boost would come from an ORDER BY column1 clause, because rows in the IOT are already in order by column1.

Although the Oracle manual says IOTs are a “unique” Oracle feature, the fact is that several competitors, including Microsoft, have had IOTs longer than Oracle. So listen to their advice, which I’ve translated to Oracle vocabulary: Base your IOT key on the columns you range-scan with, even if they’re different from the columns of the primary key.

It’s hard to apply this advice to an Oracle IOT — Oracle insists you use primary keys. But notice that our example table has two unique columns — column1 and column3. If your scans use column3 a lot, maybe the definition should have been PRIMARY KEY (column3).

Microsoft even allows non-unique IOT keys. SQL Server adds an arbitrary value, called an uniquifier, to help it distinguish between duplicates. If Oracle in a future version adds uniquifiers and allows non-primary columns for an IOT basis, it will catch up with Microsoft and extend IOT usefulness.

What If: There Are Many Secondary-index Lookups

If you create an index on column3 of a HOT table, the index keys will have two parts: the column value and the Row Identifier (ROWID). ROWIDs have segment, block, and row-within-block numbers — they’re physical pointers. And they’re fine for pointing to rows in heaps, but useless for pointing to rows in B-trees. Any INSERT might cause existing rows to move within a block, any split might cause existing rows to move to a new block. Therefore a ROWID is too volatile to use for an index on an IOT. It would be like knowing the IP address of a Web site that’s assigned dynamically.

Oracle’s solution is like using a domain name rather than an IP address. The keys of an index on an IOT consist of three main parts: the column value, the guess, and the primary-key value.

Suppose you populate your IOT:

INSERT INTO Table1 VALUES ('1', '2', 3)

The result is a row in Table1 (the IOT), shown in figure 1, and a key in uk_t1_index (the index on column3), shown in figure 2.

--------------
- ...         -
- 1,2,3       -
- ...         -
--------------

 

       a picture of File 1, Block 1

Figure 1: The IOT.

 

----------------
- ...          -
- 3            -
- 1,1          -
- 1            -
----------------


<-- the column value of column3
<-- the guess
<-- the primary-key value

Figure 2: The index to the IOT.

The primary-key value is unique and it’s easy to find the row given the primary key, so it’s a reliable — but slow — pointer. Not only must Oracle traverse the B-tree of the secondary index to find the secondary-index key, it must also traverse the B-tree of the IOT to find the row value. This leads to The IOT Tradeoff Law:

In an IOT, compared to a HOT, access by primary key is faster because there is no need to read the data row after finding the key — the data row is where the key is. On the other hand, access by secondary key is slower because the pointer to the data row is not a direct address, and so requires an additional lookup step.

Oracle tries to evade the law with a unique feature: the guess. The guess is a direct pointer to the block number that the row “might” be in. If the guess is right, then lookup-by-primary-key time is saved — but if the guess is wrong, there’s a wasted I/O.

So, what if there are many secondary-key lookups, like:

SELECT ... FROM Table1 WHERE column3 = 10

If the guesses are right, then a HOT is faster than an IOT (because guesses aren’t exact) but the difference is trivial. Once splits have occurred, some guesses will be wrong, so some tests show HOTs to be around 15 percent faster than IOTs.

What about the secondary index made with:

CREATE INDEX index_t1 ON (column2, column1)

The secondary index looks suspiciously like a covering index. The idea of a covering index is, if all columns in the select list are also in the index that Oracle uses for lookup, then Oracle can find the values in the index key — so it won’t need to read the record itself.

Therefore, index_t1 makes sense for queries like:

SELECT column1 FROM Table1 WHERE column2='x' 

in a HOT. But in an IOT, column1’s values are automatically part of every secondary index anyway, because column1 is the primary key. So:

CREATE INDEX index_t1 ON (column2) 

— with no reference to column1 — is all you need.

What If: There Are Many Data Changes

If your situation requires many

UPDATE Table1 SET column2 = ... 

statements, try to prevent rows from growing and causing splits. Consider again what an index key looks like in a secondary index to an I/O:

Secondary Key Value
The Guess Field
Primary Key Value

The Guess Field has a pointer to the block in the IOT where the record should be. Oracle will read that block. If the guess is wrong, as it will be if a split caused the row to move to a new block, then the read is a wasted I/O. After wrong guesses, Oracle must use the Primary Key Value to look up the IOT record by traversing the IOT B-tree. This is another effect that doesn’t exist with HOTs: Splits cause bad guesses, which degrade performance. Therefore IOT rows should more often be fixed (redefine column2 as CHAR), IOT blocks should have more space (add a PCTFREE clause), and IOT rebuilds should be more frequent.

If your situation requires many

DELETE FROM Table1 ... 

statements, then rebuilds will have to be even more frequent. DELETEs won’t free blocks in a B-tree (blocks must be completely empty before Oracle returns them to the free list). The same thing is true when you’re deleting keys from the index of a HOT, but in a HOT the keys are smaller, so the effect is smaller.

If your situation requires many

INSERT INTO Table1 ... 

statements, then primary-key values that go up serially (1001, 1002, 1003, ...) become a problem. As well as risking hot spots, serial primary keys cause a skew because Oracle is always adding rows at the end of the IOT and causing splits. This effect happens with HOTs too, but again the effect is smaller with HOTs because the keys are smaller.

In general, tests comparing IOTs to HOTs have shown that:

      • INSERT and UPDATE performance is about the same.
      • DELETEs are a bit faster with IOTs (because in an IOT, only one block needs changing while in a HOT two blocks need changing).
      • Mixes of INSERTs, UPDATEs, and DELETEs cause quicker file growth.

The Oracle manual calls IOTs the “best table organization for 24x7 operations” because after rebuilds (ALTER TABLE MOVE statements), secondary indexes remain usable.

However, the preceding observations about data changes show that: (a) rebuilds might be easier, but they’re also more frequent and (b) an index can be usable, yet all its guess fields can be wrong.

What If: There Are Many Primary-key Lookups

If most statements have the form

SELECT ... FROM Table1 WHERE column1='x' 

then — remembering that column1 is the primary key — add a PCTTHRESHOLD clause. The reasoning here is:

      • Table1’s defined row size is 100 bytes. An Oracle paper suggests that when row size passes 100 bytes it’s time to consider PCTTHRESHOLD.
      • If there’s a PCTTHRESHOLD clause, there will be more index entries per block, and that might mean fewer levels in the B-tree.

The effect of a PCTTHRESHOLD clause is that rows will have two parts — the early columns will be in the main B-tree record and the late columns will be in an overflow area. This means that if you select a late column, an additional I/O is necessary (Oracle will follow an overflow pointer in the main record and retrieve a block from the overflow area). And that means you should change the table definition so that the most-frequently-accessed columns come first! Column order matters little in a HOT, but with an IOT, if the least-used column is column2, then the order of columns should be column1, column3, column2. This is especially true since column2 is wider than column3, and there’s a general rule that longish columns should be last. Once you’ve changed your table definition, use PCTTHRESHOLD to cut late columns out.

Now here’s a paradox: An overflow area is really just a table, as one can see by querying the system catalog. Table 1 compares the structure of a HOT that has one primary-key index, against the structure of an IOT that has all the non-primary-key columns in an overflow area.

HOT structure IOT structure
heap table with rows that contain primary-key columns and non-primary-key columns overflow table with rows that contain non-primary-key columns
B-tree index with keys that contain B-tree with rows that contain
primary-key values and primary-key values and
ROWID pointer to rows in the heap table ROWID pointers to rows in the overflow table

Table 1: HOT structure versus IOT structure.

The structures look the same, don’t they? The query

SELECT column1 FROM Table1 WHERE column1 = 'x' 

would travel at the same speed with both structures, because in both cases, the query can be satisfied by looking at the B-tree alone.

The IOT will still be faster for range scans provided you execute

ALTER TABLE Table1 MOVE OVERFLOW 

to cluster the overflow area according to primary-key values. But in most other situations, an IOT with a low PCTTHRESHOLD clause is no better than a HOT.

What If: There Are Many Duplicates

Since IOTs are B-trees, compression is possible. For example, if the same column value occurs five times in a row, Oracle stores the value only once. But compression works only for the initial columns in the row, and the initial column in this case (column1) is the primary key. By definition a primary key is unique, so compression is useless, right? Well, right in this case, but let’s look hard at the column definition again.

Suppose column1’s values aren’t truly atomic. If, in fact, the column value contains two parts and the first part is usually the same, then (a) split the column in two, (b) redefine the table with a PRIMARY KEY(column1a, column2a) clause, and (c) add a COMPRESSION 1 clause. The IOT will get smaller. Probably the result won’t be better access time, but this illustrates again that a column definition that was fine for a HOT might need rethinking for an IOT.

The downside of compression is that there’s more disruption when you update the primary key, so this is not something you do with a highly volatile IOT.

References

Basic IOT articles that you can pick up online:

Papers that report on IOT performance tests with large tables:

A book that devotes a full chapter to IOTs:

Lewis, Jonathan. Practical Oracle8i: Building Efficient Databases, Addison-Wesley, 2000.

--

Peter Gulutzan is the co-author of one thick book about the SQL Standard (SQL-99 Complete, Really) and one thin book about optimization (SQL Performance Tuning). He has written about DB2, Oracle, and SQL Server, emphasizing portability and DBMS internals, in previous dbazine.com articles. Now he has a new job: he works for the “Number Four” DBMS vendor, MySQL AB.


Contributors : Peter Gulutzan, Trudy Pelzer
Last modified 2005-04-12 06:21 AM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone