Taking Control: DB2 Query Patroller
How do you deal with the conflicting pressures of opening your data warehouse to more users while guaranteeing response times that fall within your service-level agreement (SLA)? How do you comply with new audit and usage rules to reveal who is accessing what, when, and how often? With a charge-back toolkit and a preemptive query governor, DBAs can take control and sleep a little easier at night.
Many challenges face those charged with the responsibility of building and maintaining a company’s data asset. For example, databases today increasingly combine traditional online access with reporting and decision-support infrastructures. With the relatively cheap cost of storage, for example, some companies save data from every click a customer makes on their Web sites. They then fold this data into click-stream analysis engines for real-time cross-sell, up-sell, and substitute-product-sell approaches. This data is then fed into other parts of the data warehouse so subsequent customer visits can be tailored accordingly. While these operations provide tremendous insight for the business, they also present a risk with respect to the ability of the database to perform well. After all, when it comes to the true measure of a database, it always comes down to the SLA: if the data operation falls within this business rule, all is well; if not, the phone will ring.
In addition to this, compliance with disclosure regulations such as the Sarbanes-Oxley (SOX) Act of 2002 have forced publicly held companies to keep copies of voicemail, instant messages, e-mail, and much more to provide reports on your data’s initial, transient, and existing states at a moment’s notice.
While businesses use data to help grow the top line, they can also use it to control the bottom line and minimize costs through operational analysis and checks-and-balances controls. For example, telecommunications companies trickle feed active data warehouses with information about every call and use this information to perform real-time fraud detection. And, supply chains become more valuable and responsive with data accelerator products, which help businesses manage and maintain customer data and product information.
Furthermore, the nature of today’s IT budget is all about business justification and cost allocation; departments are responsible to ”pay for what they eat,” so to speak. In the data-warehousing world, the days of “all-you-can-eat” buffets are over. Confronting this reality with emerging new data steward roles, businesses are looking for ways to perform audit operations along with a charge-back allocation unit for use of the data warehouse. Quite simply, if marketing uses 40 percent of the data warehouse’s available query power, and analyst reporting consumes the other 60 percent of the resources, the costs of the data warehouse should be allocated accordingly.
For all of these reasons and more, it’s important to prevent reporting queries from monopolizing all the system resources so online operations can complete quickly, but still allow reports to run successfully. It’s important to have a mechanism whereby DBAs can identify who is querying which tables, and when. And it’s important to identify objects that are never used and move them to tertiary storage (or perhaps even drop them completely). IBM’s DB2® Query Patroller, can help the DBA perform all of these tasks with minimal expense and effort.
Query reporting on transactional data has long been considered a requirement for competitive advantage in a rigorous marketplace. Innovative data mining, informational, and trend analysis techniques have enabled radical breakthroughs, which have resulted in a pent-up demand for rapid, innovative query capabilities against increasingly large, mission-critical data warehouses and transactional systems. (These systems are also sometimes called operational data stores (ODS), active data warehouses, or trickle-fed data warehouses.)
To understand how the need for a product such as DB2 Query Patroller has become so critical, a recap of reporting history is helpful. When the query reporting trend started in the 1980s, business analysts would typically ask their information systems (IS) department to generate sales, market, and inventory reports. IS staff would generate static reports for the users days, and often weeks, later.
In the late 1980s and early 1990s, query tools such as Microsoft® Excel started to proliferate on the desktops of business employees. Although these extremely popular tools cut the time it took for analysts to get data reports (since analysts could generate reports themselves), the steady increase of users performing these queries introduced a huge challenge for the database systems and their DBAs: as the number of users performing their own queries increased, system response time worsened because of increased contention and poorly written queries.
These trends presented IS teams with a quandary: How do you ensure that the company’s databases provide quick response times across large amounts of data to an increasing number of users tapping ad hoc queries on their desktops?
One possible solution (to which many vendors today feel they must resort) is to throw more hardware into the mix. Large SMP and MPP systems can be expanded to handle increased workloads, and partitioning can help balance the workload by spreading it across several CPUs or even different servers. But the advancement in query tools means that every business analyst in your company can quickly generate a database query without knowing anything about the database or the SQL being generated. This ability is terrific from a business perspective, but not so terrific from a DBA’s perspective. A couple of users submitting complex queries at the same time can bring a large, multi-terabyte database system to its knees. If the query submissions are controlled so that they are not all run at the same time, the response times might not be significantly impacted. This could perhaps delay, or eliminate, hardware upgrades and/or low-level performance tuning. You can always (or almost always) throw hardware at a problem, but that’s expensive and begs the question, are you using what you’ve got today in the most efficient manner possible?
DB2 Query Patroller
DB2 Query Patroller is a preemptive query governor that intercepts and evaluates queries before they’re executed to maintain control of system resources and ensure consistent response times. It provides DBAs with real-time monitoring of the database system and lets them perform historical analysis of the queries and database objects to optimize future resource usage. DB2 Query Patroller also provides profile management features for user and group prioritizations, charge-back accounting, user-assisted query management, and so on. Additionally, DB2 Query Patroller supports nicknames, so controls can be applied to other databases such as Oracle, SQL Server, DB2 UDB for z/OS®, and more. (This is made possible through the Information Integrator technology that’s included in every DB2 UDB data server for the IBM data server family, and through the WebSphere Information Integrator product for non-IBM servers.)
DB2 Query Patroller can help address the needs of both users and DBAs. It maximizes system resources by:
- Intercepting runaway queries before they hit the runtime engine, where they can degrade database performance.
- Running short or pre-determined (a.k.a. ”canned”) queries that are approved by the DBA team with locked-in, consistent response times. (Long, complex statements can be automatically queued, held, or scheduled to run during off-peak hours.)
- Prioritizing the most urgent queries and important users or groups.
- Preventing users, groups, or departments from monopolizing the corporate data warehouse’s computing resources.
- Preventing overload of the database’s computing resources during peak workload times.
- Reducing the impact of expensive queries by running them at scheduled off-peak times.
DB2 Query Patroller intercepts and evaluates queries and determines if they can be run immediately, or if they should be queued or held — all based on business policies that govern the use of the data warehouse. DB2 Query Patroller also lets DBAs assign different priorities to queries run by different users or groups and can prioritize database resources based on the classification of the query being executed.
DB2 Query Patroller lets administrators set database thresholds for:
- The number of queries that a user can execute concurrently.
- The maximum cost of a query that a user can run.
- The total cost of all concurrently running queries on the system.
- The total number of queries (of any size) that can execute concurrently.
- The total number of queries of a particular size (defined via business policies) that can run concurrently.
Since DB2 Query Patroller tracks queries, it can also be used to:
- Charge users, groups, or departments based on their accumulated usage of the data server resources.
- Analyze queries to find “problem” queries, which can then be sent to the Design Advisor or explained for optimization procedures.
- Analyze the database usage, so that DBAs can prepare for cyclical spikes in database access.
- Understand who is accessing the database, what tables, indexes, and columns they are accessing, how many queries are being run, what are the longest running queries, and so on.
- Identify objects that may be causing performance issues. For example, an index can have a negative effect on INSERT, UPDATE, and DELETE performance. If an index were never used to facilitate query speedup, it would be beneficial to remove the index from the database schema.
Enabling DB2 Query Patroller
To get DB2 Query Patroller going, install the software on top of an existing DB2 UDB installation, enable the dynamic SQL query management database configuration parameter (DYN_QUERY_MGMT) for the database, and start the DB2 Query Patroller engine. DB2 Query Patroller can also manage more than one database on a server, even if they reside in different instances.
The management for DB2 Query Patroller is performed from a central graphical tool, called the Query Patroller Center, which lets you manage and configure the DB2 Query Patroller environment (refer to figure 1).
The Query Patroller Center has the same look and feel as the rest of the graphical toolset that comes with the DB2 UDB database product (for example, the Control Center or the Health Center), so there’s really nothing new to learn. You can launch the Query Patroller Center from the DB2 UDB server where the database resides or from a remote workstation.
Figure 1: Query Patroller Center.
From the Query Patroller Center, you can manage all of your business policies, users, groups, and charge-back repositories from a central point of control.
For example, you can:
- Configure the DB2 Query Patroller system.
- Create query classes for exceptional query handling capabilities.
- Create thresholds and limits that serve as business rules for the users and groups that consume the data warehouse resources.
- Monitor queries as they are processed.
- Analyze queries that have been successfully executed.
- Analyze object usage within the database.
Configuring DB2 Query Patroller
Let’s assume that a particular user (or group) needs to run a number of month-end processing reports for consumer vulnerability analysis. This line of business need may require that the data warehouse support a higher number of concurrent queries from their department than users in other departments because this line of business is deemed more operationally important to the business. If consumer vulnerability queries are more complex than the SQL statements typically issued by other users (for example, they require recursive SQL to introspect a Bill of Materials for substitute supplier contracts), you could set the maximum query cost for the user or group running these queries to a higher value than the rest of the users. Figure 2 shows the Submitter Properties dialog box used to set these kinds of policies.
Figure 2: Setting user properties.
You can see in figure 2 that the maximum number of queries is set to 10, meaning this user (or group) can run up to 10 queries concurrently. If the user (or group) attempts to run an eleventh query, that query will be queued until one of the currently running queries completes, and then it will be automatically released. This number may have been determined to enforce “anti-monopoly” rules, or perhaps because the database begins to be overwhelmed when these types of queries for this user become problematic after this threshold. Setting the maximum number of queries for this user to 10 can be one way to help ensure the database delivers consistent response times that conform to the SLA. Of course, you could limit the size of an individual user’s or group’s queries by setting the maximum cost of a query. In figure 2, this value is set to 1,000,000 “timerons” (a unit of measurement used to estimate the processor and I/O costs of a query). This means that if a query from this user or group exceeds this business rule, that query will be held for subsequent administrative inspection.
Your environment may consist of users or groups that should have unrestricted access to the data warehouse. You can configure DB2 Query Patroller to allow special users and groups to bypass its governing mechanism. DB2 Query Patroller will still intercept the query (a good thing, because it can still report the charge-back information), but once it determines that the user or group that issued the query shouldn’t be managed, it immediately passes the query to the DB2 UDB runtime query engine.
DB2 Query Patroller delivers a sliding scale of granularity when it comes to control. To take a coarser-grained approach, you could choose to set the maximum cost (in timerons) of all concurrent queries on the database (refer to figure 3). In figure 3, 100 concurrent queries can be run on the database, regardless of who is running them, and the total cost of these queries cannot exceed 10,000,000 timerons.
Figure 3: Setting maximum cost of concurrent queries.
Combining the features outlined in figure 2 and figure 3, you can see the various checks and balances that you can implement on your data server.
DBAs can also choose to prevent specific applications from being managed or held by DB2 Query Patroller (perhaps because the application is particularly important or always issues simple, well-written, or “canned” SQL statements). DB2 Query Patroller will still intercept the query, but will pass the query through to the DB2 UDB server after determining that the application that issued the query is one that shouldn’t be managed. For example, in the previous figure you can see that DB2 Query Patroller will ignore the application db2bp.exe (which happens to be the DB2 CLP) for business rule violations, but will still track the objects accessed by this application for historical analysis. A DBA may choose to implement this control knowing that the data warehouse is consumed by a rich array of query tools, and that the CLP is primarily used for performance analysis by the data warehouse support staff.
DB2 Query Patroller also allows for the definition of query classes. This feature alone has led to the resolution of performance and capacity issues in many customer environments. The easiest way to think of query classes is to think about trying to get the right temperature of water for a night-time bath. To make your bath a soothing experience (for DBAs, this means the phone isn’t ringing with irate users calling), you need to have the right combination of hot and cold water (and perhaps a little bubble bath). Too much of either will spoil the delicate balance required to ensure the bath is relaxing.
Query classes in DB2 Query Patroller operate in the same fashion. They allow you to set “flows” of queries into the data warehouse to ensure that response times remain consistent. Perhaps the data warehouse is able to support 1,000 small-sized queries, 100 medium-sized queries, and only 10 large-sized queries. If too many large-sized queries flood the system, then overall performance suffers.
In fact, some DB2 Query Patroller users found solutions to their performance problems just by managing the mix of queries, without concern for their costs. In essence, they found a perfect bath with the right mix of hot and cold water, and the right amount of bubble bath.
How do you know what kind of classes to create? Remember that DB2 Query Patroller comes with the ability to “see” what queries are being run and their associated costs. For example, investigation into your data warehousing activity may reveal that the results logically map to tiers of query classes, which are then implemented as business rules using the Query Patroller Center, as shown in figure 4.
Figure 4: Creating a query class based on historical cost information.
You can see in figure 4 that all queries that cost less than 10,000 timerons are classified as class 1 (small). The data warehouse is set up to handle 1,000 queries at any time that are less than 10,000 timerons. In this case, if the system were loaded with 1,001 queries, with timeron costs all under 10,000 timerons, the 1,001st query would queue up and run when allowed by the business policy, which states that only 1,000 queries of this class can run at any time. Figure 4 also shows another class (class 2) for all queries that cost between 10,000 and 100,000 timerons, limiting the number of class 2 queries to 100, and so on.
Of course, DB2 Query Patroller combines and evaluates all of your policies: the number of queries that a class can run, what class a query is directed to, the per user/group consumption rates, and the total system resource utilization rate. In other words, it controls the flow of various sizes of queries, controls the right mixture of the flow to the system, avoids the monopolizing user and runaway query phenomena, and also controls the resource.
When a query exceeds any of these defined thresholds polices, DB2 Query Patroller prevents the query from executing by taking one of the following actions:
- Holding the query and notifying the user via e-mail.
- Queuing the query to be run at a later time.
In the first option, the held query can be inspected by the DBA, who can then choose to cancel it, let it run, or take other actions. This occurs when the query is too expensive for the threshold set for the user or group.
In the second option, DB2 Query Patroller automatically submits queries from the queue to the DB2 UDB server as soon as other queries are completed in the DB2 UDB engine. This action is taken when the system is too busy to handle the query at the time of submission based on the query class size, maximum number of concurrent queries a user can submit, or the system thresholds on total cost or number of concurrent queries.
If a query is held or scheduled to run at a later time, you can configure DB2 Query Patroller to store the result set from the query in a result table, which the user can query later, once the query is completed. This data cache offers two advantages: 1) it can instantly deliver the result set at a later time, and 2) it can be used over and over again to eliminate the need to rerun the query, yet still deliver results. This second advantage is especially beneficial in group interval reporting summaries like a quarter-end report, in which static information is provided to multiple groups for analysis.
Figure 5 shows a sample e-mail that DB2 Query Patroller sends a user with details about a query that was queued and subsequently completed. Of course, you can automate the process to deliver the query results right to the user instead of just an e-mail notification.
Figure 5: An example of an e-mail informing a user about a held and subsequently run query.
The Query Patroller Center lets DBAs monitor their data warehouses in real time by viewing the current workload on the production system as well as the status of submitted queries. Although administrators can monitor the progress of all queries active on the system, users can optionally install the DB2 Query Patroller Client component and monitor the progress and results of their own queries. This provides end users with their own query monitoring tools to get detailed information on the progress of large query runs, for example, thereby preempting many avoidable problems. Without this type of feedback, a user who fails to get a query result in the expected time frame might assume the query was lost, and resubmit the same query. If this query was problematic (for example, involving a massive Cartesian join), then this user would now have submitted two problematic queries, thereby driving the data warehouse deeper into the land of performance troubles.
The combined ability to set thresholds and monitor the system helps DBAs gain control of their databases and ensure consistent response times for users.
Historical Analysis / Usage Analysis
The historical analysis features of DB2 Query Patroller give DBAs a better understanding of their database environment and object usage from a business and service-quality point of view. This information can help in capacity planning as the database grows, and also provides a method for charge-back accounting to properly bill specific lines of business for system usage and identify key characteristics of the data warehouse such as peak usage times, schema hits, unused objects, and more.
Examples of information that DB2 Query Patroller can provide include:
- Reports on database and object usage, query execution, typical response times, and potential capacity upgrade requirements. (DBAs can also create their own reports, because this information is stored in DB2 tables and is accessible via SQL.)
- Which users or lines of business are consuming the greatest amounts of database server resource.
- Which database objects are unused, and thus are candidates for removal (tables, indexes, and columns).
- The impact of configuration changes and new database objects.
- The number of queries that were run each day for the last month.
- The 10 longest-running queries.
- The 10 longest-running queries with the lowest cost.
- Which queries caused problems, and other troubleshooting information.
Figure 6 shows an example of historical query information tracked by DB2 Query Patroller.
Figure 6: An example of historical query information tracked by DB2 Query Patroller.
DB2 Query Patroller does not include all of the reports mentioned above, but the information about all of the managed or intercepted queries is stored in DB2 UDB tables, so you can use the power of SQL to perform some of this analysis. For example, to get the number of queries run each day, you could use the following query:
select date(time_completed) as date, count(*) as count from db2qp.track_query_info group by date(time_completed)
To get the 10 longest-running queries along with the user ID of the person who ran each query and its cost in timerons, sorted by the query cost, you could use the following query:
select id,db2qp.convertToString(statement) as statement,estimated_cost, ((decimal((julian_day(sql2.time_completed)- julian_day(sql2.time_started))*3600*24)) + (decimal((hour(sql2.time_completed)- hour(sql2.time_started))*3600)) + (decimal((minute(sql2.time_completed)- minute(sql2.time_started))*60)) + (decimal((second(sql2.time_completed)- second(sql2.time_started)))) + ((decimal((microsecond(sql2.time_completed)- microsecond(sql2.time_started)))) / 1000000 ) ) as exec_time_sec,substr(user_id,1,20)as user_id,substr(application,1,30) as applications from db2qp.track_query_info sql2 where id in (select id from db2qp.track_query_info sql2 order by ((decimal((julian_day(sql2.time_completed)- julian_day(sql2.time_started))*3600*24)) + (decimal((hour(sql2.time_completed)- hour(sql2.time_started))*3600)) + (decimal((minute(sql2.time_completed)- minute(sql2.time_started))*60)) + (decimal((second(sql2.time_completed)- second(sql2.time_started)))) + ((decimal((microsecond(sql2.time_completed)- microsecond(sql2.time_started)))) / 1000000 ) ) desc fetch first 10 rows only);
If you are running DB2 Query Patroller on a non-partitioned database, the CPU time will represent the actual CPU usage for the query. However, in a multi-partitioned database (used for many larger data warehouses), the CPU time is only captured for the coordinator partition. Because of this, you cannot charge based on actual CPU time, but you can still bill users and departments based on the percentage of the time they used the warehouse. To determine the percentage of usage for each user, you could use the following query:
select substr(stmt_auth_id,1,30) as User_ID, sum(rslt.max_exec) as sum_exec from (select stmt_auth_id, ((decimal((julian_day(sql2.time_completed)- julian_day(sql2.time_started))*3600*24)) + (decimal((hour(sql2.time_completed)- hour(sql2.time_started))*3600)) + (decimal((minute(sql2.time_completed)- minute(sql2.time_started))*60)) + (decimal((second(sql2.time_completed)- second(sql2.time_started)))) + ((decimal((microsecond(sql2.time_completed)- microsecond(sql2.time_started)))) / 1000000 ) ) as max_exec from db2qp.track_query_info sql2 where sql2.type = 1 and sql2.time_completed-sql2.time_started > 1 and sql2.estimated_cost > 1 and sql2.completion_status = 'D' order by sql2.estimated_cost desc,sql2.time_created,sql2.id ) as rslt group by rollup ( stmt_auth_id)
Notice that this query breaks down the timestamp, since you cannot simply subtract one Julian date/timestamp from another. An example of output from this kind of GROUP BY and ROLLUP query might look like this:
USER_ID SUM_EXEC ------------------------------ ----------------------- -- - 5092.50417300000000000000 DSNOW 2256.94900000000000000000 PAULZ 2835.55517300000000000000
In this example, PAULZ would be charged 55.7 percent (2835/5092) of the cost of the warehouse, and DSNOW would be charged 44.3 percent of the cost. The charge-back accounting can apply to individual users like this, or to groups using submitter profiles, or by building a group table and joining that into the query shown previously.
Database Object Access Monitoring
DB2 Query Patroller also allows you to monitor which objects (tables, columns, indexes, and so on) users are accessing as well as which objects are not being accessed. In addition to analyzing system usage, this helps to ensure that people are not able to see information they should not see. You can get this information quickly and easily using the built-in report in the Query Patroller Center, as shown in figure 7:
Figure 7: Schema hits and no hits report in the Query Patroller Center
You can quickly and easily see that the user PAULZ has only accessed the ORG, SALES, and STAFF tables, as well as a DB2 UDB system table. (Note also the distribution of the access patterns to the data that tables hold.) The Query Patroller Center provides a number of “canned” reports that you can use (such as the Indexes Hit and Indexes Not Hit reports shown in figure 7). For example, figure 8 shows a histogram that reveals the distribution of queries across users of the system:
Figure 8: Submitter Hit report in the Query Patroller Center.
Of course, as previously mentioned, you can generate your own reports as well since all of this data is stored in DB2 UDB tables.
Business intelligence and data warehousing can put an end to business as usual. They are all about making better, more informed, faster decisions based on current data. Businesses collect enormous amounts of data every day about orders, inventory, accounts payable, point-of-sale transactions, and, of course, customers.
All this information is usually guarded by DBAs, who worry about user queries disrupting normal database operations. The problem is that users worry about getting the right information out of the data servers to make intelligent decisions. These two opposing factors create a potential battleground of “job doers,” but DB2 Query Patroller can avert such problems.
DB2 Query Patroller helps to proactively govern and throttle queries, ultimately giving control back to the data warehouse itself. DB2 Query Patroller addresses DBAs’ concerns by empowering them to make more informed decisions about query cost analysis, workload management, and load balancing. DB2 Query Patroller helps users not only by informing them of the status of their running queries, but more importantly, by delivering to them consistent response times (the cornerstone of a successful data warehouse), which provides end-user satisfaction with IT services.
When DBAs can make better decisions about the architecture of the enterprise’s strategically vital asset — its data — better information can be delivered more quickly to the right people. This usually results in both top-line (gross revenues) growth and bottom-line (net earnings) improvements. We’re willing to bet that all of us work for companies where those notions resonate.
Paul C. Zikopoulos, BA, MBA, is an award-winning writer and speaker with the IBM Database Competitive Technologies team. He has more than ten years of experience with DB2 products and has written numerous magazine articles and books about it. Paul has co-authored the books: DB2 Version 8: The Official Guide, DB2: The Complete Reference, DB2 Fundamentals Certification for Dummies, DB2 for Dummies, and A DBA’s Guide to Databases on Linux®. Paul is a DB2 Certified Advanced Technical Expert (DRDA and Cluster/EEE) and a DB2 Certified Solutions Expert (Business Intelligence and Database Administration). Currently he is writing a book on the Apache Derby/IBM Derby database.
Dwaine R. Snow is a senior product manager for DB2 UDB and focuses on competitive technologies. Dwaine has worked with DB2 UDB for the past 14 years as part of the development team focusing on the database engine and tools, the development of the DB2 Certification program, and as part of the lab-based consulting team. Dwaine has presented at conferences worldwide, contributed to the DB2 tutorial series, and has written a number of articles and coauthored books on DB2 including The Advanced DBA Certification Guide and Reference for DB2 Universal Database for Linux, UNIX®, and Windows, DB2 UDB for Windows, The DB2 Cluster Certification Guide, and the second edition of the DB2 Certification Guide for Linux, UNIX, and Windows. Dwaine is a DB2 Certified Solutions Expert in both Database Administration and Application Development, as well as a Certified Advanced DBA. He has also worked extensively onsite with customers in planning and implementing both transaction-based and data warehouse systems.
IBM, DB2, DB2 Universal Database, WebSphere, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both.
Windows is a trademark of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of others.
The opinions, solutions, and advice in this article are from the author’s experiences and are not intended to represent official communication from IBM or an endorsement of any products listed within. Neither the author nor IBM is liable for any of the contents in this article. The accuracy of the information in this article is based on the author’s knowledge at the time of writing.
Contributors : Paul C. Zikopoulos, Dwaine R. Snow
Last modified 2006-03-24 02:42 PM