A DB2 Health Check - Part 1
Part 1 | Part 2
In Part 1 of this article, I concentrate on what DB2 health really means: what it is, and what it’s not. After expanding on each of the categories, I’ll discuss strategies for improvement.
In Part 2, I’ll go into more technical detail on specific DB2 health tuning categories and techniques.
This article includes portions of a case study on implementing best practices.
Health Does Not Equal Performance
Most technicians and managers equate health with performance; specifically, a DB2 subsystem is healthy if some performance measure or metric remains high. There is a concurrent emphasis on CPU and DASD capacity planning, with many tools available for tracking and predicting the need for future hardware upgrades.
A typical myth is that high-performing systems are healthy, and that making systems healthier involves getting more bang for your buck. However, this view ignores several other important categories of health. It concentrates on efficiency, while neglecting other areas.
Organizations that simply equate health with performance spend lots of time on performance tuning and reacting to real or perceived performance issues. Organizations like this tend to be dealing with symptoms rather than problems, and their usual reaction is to attempt to increase performance, not health. Typical issues in this environment are:
- Lots of access path tuning
- Reactive, rather than proactive, environment
- Poor change control procedures
- Poor (or no) disaster recovery plans
For example, would you consider the following healthy? A DB2 subsystem in which:
- No Catalog/Directory backups exist
- There is only one copy of the Active Log (single logging)
- You are running DB2 Version 4
- Maintenance is current to May 2002
- The CPU is running at 100 percent capacity all day, every day
In fact, there are several components of health that should be combined into a single overall measure. Further, the DB2 subsystem does not exist in a vacuum; to function, it requires support from IT professionals (DBAs, systems programmers, and so on). And, as a DBA you should perform a number of regular procedures as part of your support infrastructure, these include regular backups, reorgs, consistency checks, and so forth.
A Working Definition of Health
When we think about software health, we think about efficiency, code path length, speed, lack of bugs, and so on. In a broader sense, we also consider how easy new features are to use, whether new functions perform well, and how generally fault tolerant is the system.
Overall health, therefore, is a combination of the following:
- Efficiency — raw performance
- Flexibility — ability to handle and exploit features and functions
- Fault Tolerance — ability to tolerate failures
Further, there are several components in a DB2 subsystem. They include:
So, an overall health measure for a DB2 subsystem consists of a combination of these three characteristics of health across three categories of items. This is easily visualized as a graph:
The myth that health equals performance comes from an over-emphasis on software efficiency. While efficiency is important, there are eight other areas on the graph that deserve our attention. Since software health seems to be well understood, let’s look briefly at the other two areas.
Few consider process health. What does it mean for processes to be efficient, flexible, or fault tolerant? Here are examples of typical processes related to DB2 software:
- Automated backups (image copy, incremental copy, volume backups)
- Automated reorgs
- Regular collection and reporting of performance and trace data
- Regular quality review meetings
- Documentation for contingencies (e.g., what to do if DB2 crashes)
In this category, we consider the people that support a DB2 subsystem: DBAs, systems programmers, and other support personnel. Typical metrics for personnel include:
- Levels of skill and education
- Time management abilities
- Communications skills
- Prioritization skills
- Analytical abilities
- Problem-solving abilities
While it isn’t typical to consider whether staffing is “fault tolerant,” you can still analyze this area in terms of overall system health. For example, there should be more than one person competent in each essential skills area (e.g., database recovery, DB2 system restart, problem diagnosis and repair).
Typical Measurement Methods
There are many techniques for measuring each health category. A representative sample occur in the following chart.
|Efficiency||Tools (e.g., RMF)||QA Metrics||Performance Appraisal|
|Flexibility||Spare Capacity||QA Process||Review Skills Inventory|
|Fault Tolerance||Maintenance Strategy||Best Practices||Practice, Practice, Practice!|
Typical Health Strategies
The most effective method of developing a strategy for maintaining overall DB2 subsystem health is to extend the definitions given above for each health class: efficiency, flexibility, and fault tolerance. Then create an extended definition for each new area, concentrating on examples of good health.
Here is the strategy matrix for our example company:
|Ability to Recover from a Disaster|
|Availability of Extra Capacity|
|Ability of System to be:|
|Maturity (CMM level)|
The company featured in our case study used this template for several comparative analyses, including:
- Health measure
- Measurement tools
- Current documentation
- Budget allocation
The following is an extended outline of some of these levels of system health:
1. Ability to Recover from a Disaster
a. Coordination of logs and image copies
b. Existence of a D/R plan
2. Availability of Extra Capacity
a. Measurement of percent busy
b. CPU-bound versus I/O-bound
c. MB of traffic through DDF
d. Numbers of SQL statements (Query versus Update)
e. Logging activity
f. Memory activity (DBM1 address space)
g. Determination of current bottlenecks
3. Proactive, Predictive, Self-healing
a. Real-time in-memory statistics
b. Bufferpool thresholds
c. Sequential detection
d. Automatic page recovery
e. Dual logging
For more information on autonomic functions, see “Autonomic Technology in DB2 for z/OS” by James Teng, presented at the IBM Data Management Conference, October 2003, Presentation Z17. Also published on the IBM DB2 for z/OS Version 8 site at:
a. Standardized maintenance process (version, currency)
b. Regular maintenance upgrades
c. Software maintenance strategy
i. Business skills – Time management, meetings, communications skills
ii. Problem solving methodology
iii. Education, training, certification
i. Documentation: update, upgrade, centralize, review
ii. Process measurement
iii. Continuous improvement
Additional information and recommendations on software management may be found in the IBM Red Book, “Parallel Sysplex — Software Management for Availability,” SG24-5451.
5. Maturity (CMM Level)
Use of Capability Maturity Model
The Capability Maturity Model™ (CMM) was originally created for the software development process; however, it is relevant to any set of processes used by human beings. We won’t cover it here; more information on CMM may be found at:
DB2 Subsystem Health — Specifics
To measure overall DB2 subsystem health, we must break it down into characteristics of health (efficiency, stability, and so on) and examine the three components of the system: software, processes, and people. I will leave analyzing the health of the “people” and “processes” areas as an exercise for you to perform. Instead, let’s take a look at DB2 software health measurement.
The following are my sample categories. I derived these from the health strategy matrix from our case study; your categories will be different, depending upon your particular priorities. For example, this company in the case study has not implemented data sharing; consequently, there were no areas involving data sharing, global buffer pool, or coupling facility health.
- Subsystem configuration
- Catalog and Directory
- Access paths
- Process objects
For each category, we determined specific items for measurement and tuning; this resulted in the following expanded outline:
1. Subsystem configuration
a. ZParm settings
b. Logs and archives
c. Work files
d. Data sharing
e. Virtual pool sizing
f. Memory pool sizing (EDM, RID, Sort)
g. Processes for regular reporting
h. Processes for automated changes
2. Catalog and Directory
a. Reorg requirements and frequency
b. Standard consistency check (DSNTESQ)
c. User-defined indexes
3. Access paths
a.Regular saving and reporting of IFCID 0022 and 0316 records
b. Regular capture of Explain information
c. Regular (and automated) analysis
4. Data: Volumetric and configurational
a. Volumetrics on large objects
b. Highly volatile pagesets
c. Message traffic
d. Bulk loads and unloads
e. Image copies
f. Other utilities (RunStats)
g. Pageset extents
h. Data compression
i. Index and column statistics
5. Process Objects
a. Stored procedures with non-standard run options
b. User-defined function activity
d. Any process automation
A note on process objects: In my experience, these objects are implemented with little thought given to error checking or reporting. And, since object invocation can be nested (e.g., a stored procedure can invoke a UDF, which then performs SQL that causes triggers to fire), these objects need to be designed to respond/react to errors or warnings from objects they invoke.
Do you have any stored procedures that issue SQL statements? Do these statements ever generate negative SQLCodes? How would you know? Do all applications (including other stored procedures) react properly to these SQLCodes? How do you know?
It is important to implement a process for gathering and interpreting health data. A standard process should include the following items:
- Set up a snapshop process — Develop SQL queries, REXX procs, utilities, third-party tools, etc., to regularly gather and store measurements
- Develop a time-dependent storage mechanism — Store the measurements in a form where you can associate measurement with time and date gathered
- Begin regular data gathering — Implement automated snapshots
- Develop analysis reports — These include time-series analysis such as snapshot-to-snapshot comparisons and time-dependent graphs and charts
- Design an automated review process — Consider ways to use the analysis reports as inputs to jobs that will address issues
For example, you set up a method of snapshotting pageset sizes and extents on DASD. You store the results in a DB2 table including a timestamp with each measurement. Next, as a team you analyze the results and determine that there is an issue with pagesets that go into multiple extents. Finally, you write a REXX procedure that reads the snapshot data, extracts the dataset names of pagesets in more than (say) 25 extents, and creates jobs to Reorg the pagesets.
What to Look For
After a period of time, review the analyses. You are looking for indicators on three levels:
- What is the system status right now? (snapshot)
- How are things changing? (trend analysis)
- What processes need to be put in place? (continuous improvement)
In our example, it isn’t enough just to know which pagesets have many extents. You also want to know how fast they are growing (trends), and if there is anything you need to do to improve the process.
In this article we reviewed what DB2 health really means, and what is important:
- Health is more than just performance
- Consider software, people, processes and develop a health measure for each
- Gather data; store over time; analyze
- Be proactive
- Implement best practices
In Part 2 of this article, we will go into more technical detail on specific DB2 health tuning categories and techniques.
Lockwood Lyon is a DB2 for z/OS Systems and Database Performance specialist. He has over twenty years of experience in Information Technology as an IMS and DB2 database analyst, systems analyst, manager, and consultant. Most recently he's spent quite a lot of time on DB2 subsystem installation and performance tuning. Lockwood is the author of MIS Manager's Appraisal Guide (McGraw-Hill 1993), Migrating to DB2 (John Wiley & Sons 1991), and The IMS/VS Expert's Guide (Van Nostrand 1990).
Contributors : Lockwood Lyon
Last modified 2006-01-04 02:51 PM