An Interview with Scott Hayes on Optimizing Your DB2 UDB Databases - Part 2

Part 1 | Part 2

Scott Hayes is a well known DB2 UDB performance expert. He is a regular speaker at International DB2 User Group conferences, IBM DB2/Data Management conferences, and is frequently sought as a guest speaker for regional DB2 user group meetings. Scott has authored several published articles on DB2 UDB performance, is an IBM DB2 GOLD Consultant (an elite, exclusive group of IBM recognized top DB2 advocates), has obtained Advanced IBM DB2 Certifications, and is widely regarded by the worldwide DB2 community as the top performance expert for DB2 on distributed (UNIX, Linux, and Windows) platforms. Most recently, he co-founded Database-Brothers Incorporated (DBI), a group of database performance and accountability specialists.

Performance Balance

DBAzine: That is a great story, and it highlights several things. There are lots of areas you’re going after when you’re trying to analyze a performance problem, because you really can’t categorically say performance problems all fall into one specific area. In this case, you talked about a 24 x 7 customer-facing Web application. There are all types of applications out there from data warehouses to OLTPs, to batch, to off-the-shelf applications. Are there any fundamental differences in the scope of data that you gather for analysis when you’re trying to go after issues in these different types of applications?

SH: Yes. Just as a point of reference, last summer there was a magazine article that I wrote with a little help from Phil Gunning that was published in DB2 Magazine. The very subject matter of that article was OLTP versus Data Warehouse performance.

There are some common configuration settings that are important to performance in any database. There’s the necessity of providing some infrastructure so DB2 can adequately execute the SQL that it’s presented with (e.g., certain cache sizes, certain other configuration parameters). Probably 20 percent to 30 percent of database performance is attributable to configuration settings and buffer pools.

The balance all comes down to the SQL and the physical design that supports that SQL. Physical design includes indexes, having the right clustering indexes, using automated summary tables effectively, and any other performance tricks (such as generated columns, APPEND ON for highly inserted tables, and other miscellaneous techniques). In a transaction processing database such as Siebel or WebSphere, or a homegrown Web application, the real-time rapid response requirement is very high, so all the tuning that you do should be aimed at assuring very rapid response times. As a rule of thumb, no SQL statement from the hundreds of SQL statements that might be included in an application, no single SQL statement in the aggregate, should consume more than five percent of the total CPU time across an entire application. No SQL statement should have more than 50 rows read, on average, per execution. No statement should have average elapsed times in excess of five seconds — and faster for most shops. These are great objectives that a company should strive to achieve.

In a Data Warehouse environment, on the other hand, you’re going to have long-running SQL, very intense queries, queries that are going to be I/O intensive. They may perform sorts that are very large. In that environment, proper physical tuning and configuration for I/O parallelism is the key.

Partitioning

DBAzine: Speaking of large data warehouse applications and I/O parallelism, in your experience, are there any specific or general categories of the types of problems or even more common problems that you see when customers are trying to really take advantage of, and exploit, database partitioning for what was or is DB2 EEE?

SH: Yes, DB2 EEE V6.0, V7.0 EEE, and Enterprise Server Edition (ESE) in V8.0. One way to get into trouble in EEE is when you have multiple nodes (V6.0 and V7.0 terminology) or multiple partitions (V8.0 terminology) of data that’s divided across perhaps several machines. You’ll have a partition that’s working very hard and is max’d out. The CPU is running 100 percent busy while other partitions aren’t carrying their fair share of the load.

I’ve seen some cases in which partitions nine and 11 were 100 percent CPU busy and working very hard, while partitions three and four were bored at only 20 percent busy. There was very little happening with the hardware that housed partitions three and four. So, if your workload is skewed across yourpartitions, you have to find a way to better balance the workload across the partitions, and that’s a challenge to monitor in an EEE/ESE environment.

What you really need is a tool that can provide you with a bird’s eye view to summarize how all of the partitions are performing at a glance. You’d like to see all the partitions ranked from the worst performing/hardest working all the way down to those partitions that aren’t working very hard so you can focus your time and energy in the, “well-stated problems are half-solved problems” category. You want to focus your tuning energy on what is going on, on the partition that’s causing an imbalance of the workload across all the partitions. A chain is only as strong as its weakest link, and a DB2 EEE/ESE database is only as fast as its most heavily loaded partition.

Have we, perhaps, created some tables on these partitions that don’t exist on the other partitions? Has the hashing algorithm, for whatever reasons, failed, and the data for a given table isn’t evenly distributed across all of the partitions? That could happen if the data column that’s used for partitioning has skewed distribution values. You want find out in a partition database environment which of your partitions are bottlenecks. You really need to take advantage of all the computing hardware resources in that environment.

Buffer Pool

DBAzine: You’d talked earlier in one your other stories about going after and analyzing the buffer pool utilization. That may be a topic near and dear to you in part because of others like a buffer pool guide. But for DBAs who want to diagnose, or just see how they can optimize how they are allocating their buffer pools, are there any key metrics you could direct them to? Any words of advice for them or any examples?

SH: Well, first I usually caution people about buffer pools, buffer pool utilization, and monitoring performance there, because you can create a large buffer pool. You can observe very high hit ratios. You can be falsely misled into thinking you got a great buffer pool and memory configuration for the database. Unfortunately, having large buffer pools and good performance metrics in a buffer pool can be a Band-Aid. It can cover up or mask very severe underlying problems.

In classes I’ve taught, I’ve suggested that as you get ready to put a new application into production, make sure it’s production ready. Put it in the QA environment and have a very small buffer pool. The default size when you create a database in DB2 is 3.96 megabytes. I suggest that DBAs should put the entire application into a two-megabyte buffer pool (512 4K pages) — half of the IBM supplied default size. If you can achieve good performance with a two megabyte buffer pool, you’re going to have fabulous performance in a production environment where you can have a one-gigabyte or larger buffer pool, fabulous performance that will scale very well when you bring in new customers, new groups or organizations around the country or around the world.

Large buffer pools will mask underlying problems and get people in a lot of trouble. Once you’ve worked the great majority of SQL performance problems out of a database or an application, there’s some tuning that can be done to the buffer pools. It’s important to measure the performance characteristics ofthe pools and also the table spaces. People get excited about the database’s overall buffer pool hit ratio, and they compute hit ratios for the buffer pools, but this isn’t enough.

You also need to compute hit ratios and other performance metrics at the table space level to understand what hit ratios are for each table space. Is any given table space being read sequentially with asynchronous I/O, or is it being read with synchronous random I/O? You should assign table spaces that are being read randomly in an OLTP transactional database to a buffer pool with the objective of performing random I/O. Table spaces that are being pre-fetched with heavy asynchronous sequential I/O should be placed in buffer pools with the objective of performing sequential I/O. That’s probably the most important objective for buffer pool tuning — to separate your sequential I/O from your random I/O.

There’s also good reference material on this. An article written a couple of years ago was published in the IDUG Solutions Journal (Spring 2000 edition, www.idug.org/journal) that was all about the nitty-gritty details of buffer pool tuning and what performance metrics should be monitored.

Last modified 2006-07-28 02:29 PM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

An Interview with Scott Hayes on Optimizing Your DB2 UDB Databases - Part 2

Performance Balance

Partitioning

Buffer Pool