You Can’t Improve What You Can’t Compare

I have always thought that a graphical display utility that compared performance statistics would be a worthwhile tool. You could compare the statistics captured from two sets of timelines and quickly identify the differences.

10G Grid Control provides a comparison utility that allows administrators to compare two sets of ADDM snapshots. Although it provides a very simple comparison at this time, it is certainly a step in the right direction. This blog will show you how to select the timelines and perform the comparison.

Recording and comparing performance-related information is a pretty common practice in our profession. The performance information we compare ranges the spectrum, from elapsed times to access paths, from I/O performance to hard parse/soft parse ratios. The granularity of performance statistics we compare also wildly differs. We may be comparing a single statement's access path changes in the morning and reviewing an entire job stream's elapsed times in the afternoon.

We record performance information, tune and tweak, run whatever it is we are attempting to tune, record the performance information again and compare the before and after informaton. This iteration of recording/changing/recording/comparing is often done again, and again There are a few third-party tools that provide comparison functionality. Veritas's I3 for Oracle product allows users to to compare performance statistics and access path changes. The performance monitoring features the tool provides are certainly beneficial, but it is the tool's comparison capabilities that I use most. I3 captures access paths and stores them in a repository. Users can determine exactly when the access path change occurred. I3 also tracks object (index creates, index column changes. etc) and statistics changes, both of which influence the Oracle optimizer. But it is a complex tool and you need to spend time learning how to use it before you can maximize the benefits it can provide.

There are also a couple of tools that compare Oracle performance statistics. The first tool I found years ago was Statspack Viewer, which is the brainchild of Alexey B. Danchenkov. The second I recently discovered is a product called Wise. This isn't a plug for either tool because I haven't used them. Just letting you know they are out there. I have used I3 for Oracle and I am a proponent of it.

AWR Snapshot Comparison
10G Grid Control allows two sets of snapshots to be compared to one another. The results of the comparison are displayed graphically using horizontal bars to indicate differences between the two measurements. It is important to note that 10G Grid Control does not compare the hundreds of statistics stored in an AWR snapshot. Oracle has pre-selected key statistics to evaluate which makes the output much easier to review. Since Oracle has selected the statistics for us, it is safe to assume that it views them to be the most critical. The output doesn't provide a lot of information but it is certainly a start. I am hoping that Oracle will continue to build upon this functionality in future releases.

Selecting the Timelines to Compare
Before we learn how to compare snapshot sets, I need to identify two executions of the same workload. I know from experience that our warehouse executes different sets of batch jobs based on the day of the week. I'll use 10G Grid Control's Sessions: Waiting and Working Chart to identify the date and time combinations to compare.

I begin by logging on to 10G Grid Control and navigating to the Database's Home page. My next step is to click on the Performance Tab, which can be found on the top left section of the panel. 10G Grid Control displays the Database Performance Home page. The page displays performance information for the current point-in-time. If you would like to learn more about the information contained on this page, please refer to my blog titled "Database Performance Monitoring Using 10G Enterprise Manager."

Because I am trying to identify workloads that occurred in the past, I need to change the performance page's display from the current point-in-time to a historical one. I do that by clicking on the "View Data" drop down menu on the upper right hand side of the page. The "View Data" drop down allows me to change the screen's refresh time to meet specific monitoring requirements but there is also a selection that switches the display from current to historical mode.

When I choose historical mode from the drop down menu, 10G Grid Control displays the performance page in historical mode. The historical performance panel provides a slider that allows users to select dates in the past. I can drag the slider to the desired date and 10G Grid Control will automatically refresh the screen with the performance data from that time period. I know that the workload is heavy on Tuesdays so I moved the slider to November 22. 10G Grid Control refreshed the screen and displayed the performance statistics from that date.

I scrolled down to the Sessions: Waiting and Working Chart to identify the beginning and ending times of the first workload to use as input to the comparison utility. It looks like the 12 AM to 6 AM timeframe is a good choice. To verify my selection, I went back to the historical performance page and used the slider to display the data from Tuesday, November 29. Hmmm, although the two charts look different, I know that the same job stream was executed on both days. Hopefully, the comparison utility will tell me us what the differences are.

Performing the Snapshot Comparison
To perform the snapshot comparison, I need to navigate to the Automatic Workload Repository Home page. 10G Grid Control displays a set of navigation tabs on the top of each DatabaseTarget's Home page. You click on the administration tab for that database, scroll down and you'll find the AWR Navigation Link at the center of the panel.

Clicking on the link takes me to the Automatic Workload Repository Home page. The number of snapshots at the bottom of the page is also a link. If I click on the number, in this case 363, I am able to view the Snapshot Home Page. The Snapshot Home page displays all of the snapshots currently residing in the repository.

The next step is to select the "Compare Timelines" option from the Action drop down menu to notify 10G to activate the Snapshot Comparison Wizard. 10G Grid Control displays the Compare Timelines: First Timeline Start page. I need to scroll through the various snapshot ID/timestamp combinations until I find the starting time of the snapshot I want. The snapshot ID 5926 shows a timestamp of 12:00:58 AM. Looks pretty close to me. I clicked on the radio button next to Snapshot ID 5926 to select it. When I clicked on "Next" to continue the process, 10G Grid Control displays the Compare Timelines: First Timeline End page that allows me to select the ending time period.

I clicked on the radio button next to Snapshot ID 5932, which has a corresponding timestamp of 6:00:32, to select it as my ending timestamp. When I hit the "Next" navigation button to once again continue the process, 10G Grid Control displays the Compare Timelines: Second Timeline Start page. I duplicated the steps I previously executed to select the beginning and ending snapshots for the 12:00 AM to 6:00 AM timeframes on November 29.

After I selected my ending snapshot for the second timeline, 10G Grid Control displays the Compare Timelines: Review page. This page allows administrators to perform a last minute check before running the actual comparison itself. Clicking on the "Finish" navigation button tells Oracle to begin the comparison. When the comparison is complete, 10G Grid Control displays the Comparison Results page.

Comparison Results Page
The comparison results page tells me that the two workloads are indeed different. Although the screenprint only shows 20 out of 27 comparison points, the ones at the bottom did not provide any earth shattering information so I did not include them. But lets take a look at some of the obvious differences:

Session Logical Reads - BIG difference between the 22nd and the 29th. The 22nd generated a lot more cache reads than the 29th. One possible reason could be that more updates are being done (DB Block Changes and Redo Size on the comparison report). I know from experience that the batch job stream has numerous jobs that run in parallel and they often read data that other jobs are updating. Queries need to retrieve before images from undo segments to maintain read consistency, which can result in higher logical reads.

Another reason could be that we have had some queries on the 22nd using some inefficient indexes. Indexes are often stored in the buffer cache and can get beat up by queries that have low selectivity. I also know that our batch runs do use different SQL selection criteria based on the business's processing requirements for that day. Could be that we may have some inefficient indexes. One way we can find that out is to ask 10G Grid Control to run an ADDM Statspack-Style performance report. Better yet, we can ask Oracle do the grunt work for us by generating a custom ADDM task for that time period. In addition, we can run the SQL Tuning Advisor and the SQL Access Advisor for more information and recomnmendations.
DB Block Changes - Total number of changes made to blocks that were part of a DML statement (Insert, Update, Delete). We did a lot more changes on the 22nd. It is important to note that you can't use the number as a counter for the actual number of changes that occurred. Depending on the operation being performed, Oracle may change the block more than once, incrementing the counter and causing it to be higher than the number of changes actually performed by the DML statements.
Execute Count - Statements (including recursive against the dictionary) that are executed. More statements were executed on the 29th but they did less updates.
Parse Count(Total) - Number of times a SQL statement was parsed both hard and soft. Once again an indicator that we executed more statements on the 29th.
Open Cursors (Cumulative) - Total number of cursors opened during the monitored time period. Another indicator that the number of statements executed were higher on the 29th.
Redo Size - Total amount of redo generated in bytes. Looks like we made more changes on the 22nd than we did the 29th. We can verify that by looking at the numbers in DB Block Changes.

Summary
It seems that we executed more statements on the 29th, but we performed more updates on the 22nd. In addition, it may be informative for us to run some advisors and an ADDM task on both the 22nd and the 29th.

As I stated, I am hoping that this first iteration of the 10G Grid Control snapshot comparison feature is just that - the first iteration. I am a firm believer that Enterprise Manager should have a robust set of comparison utilities that allows us to look at access path and statistics changes historically.

Thanks for reading.

Monday, December 05, 2005 | Permalink | Comments (1)

trackback URL: http://www.dbazine.com/blogs/blog-cf/chrisfoot/blogentry.2005-12-03.3566409131/sbtrackback

Statspack/AWR Performance Analysis

Posted by sarithod at 2006-12-27 04:36 AM

In regards to author comments about UI for statspack analysis

- We use Performance Explorer-I (www.dbainfopower.com) for AWR/Statspack performance root cause analysis and it seem to be much better then anything else we tried before (i3 /OEM)

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

You Can’t Improve What You Can’t Compare

Statspack/AWR Performance Analysis