Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » DB2 » DB2 Distributed (UDB) Articles Archive » SQL Views Transformed
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 1984
 

SQL Views Transformed

by Peter Gulutzan

"In 1985, Codd published a set of 12 rules to be used as "part of a test to determine whether a product that is claimed to be fully relational is actually so". His Rule No. 6 required that all views that are theoretically updatable also be updatable by the system."

— C. J. Date, Introduction to Database Systems

IBM DB2 v 8.1, Microsoft SQL Server 2000, and Oracle9i all support views (yawn). More interesting is the fact that they support very similar advanced features (extensions to the SQL-99 Standard), in a very similar manner.

Syntax

As a preliminary definition, let’s say that a view is something that you can create with a CREATE VIEW statement, like this:

CREATE VIEW <View name>
[ <view column list> ]
AS <query expression>
[ WITH CHECK OPTION ]

This is a subset of the SQL-99 syntax for a view definition. It’s comforting to know that "The Big Three" DBMSs — DB2, SQL Server, and Oracle — can all handle this syntax without any problem. In this article, I’ll discuss just how these DBMSs "do" views: what surprises exist, what happens internally, and what features The Big Three present, beyond the call of duty.

I’ll start with two Cheerful Little Facts, which I’m sure will surprise most people below the rank of DBA.

Cheerful Little Fact #1:

The CHECK OPTION clause doesn’t work the same way that a CHECK constraint works! Watch this:

CREATE TABLE Table1 (column1 INT)

CREATE VIEW View1 AS
   SELECT column1 FROM Table1 WHERE column1 > 0
   WITH CHECK OPTION

INSERT INTO View1 VALUES (NULL) <-- This fails!

CREATE TABLE Table2 (column1 INT, CHECK (column1 > 0))

INSERT INTO Table2 VALUES (NULL) <-- This succeeds!

The difference, and the reason that the Insert-Into-View statement fails while the Insert-Into-Table statement succeeds, is that a view's CHECK OPTION must be TRUE while a table's CHECK constraint can be either TRUE or UNKNOWN.

Cheerful Little Fact #2:

Dropping the table doesn’t cause dropping of the view! Watch this:

CREATE TABLE Table3 (column1 INT)

CREATE VIEW View3 AS SELECT column1 FROM Table3

DROP TABLE Table3

CREATE TABLE Table3 (column0 CHAR(5), column1 SMALLINT)

INSERT INTO Table3 VALUES ('xxxxx', 1)

SELECT * FROM View3 <-- This succeeds!

This bizarre behavior is exclusive to Oracle8i and Microsoft SQL Server — when you drop a table, the views on the table are still out there, lurking. If you then create a new table with the same name, the view on the old table becomes valid again! Apart from the fact that this is a potential security flaw and a violation of the SQL Standard, it illustrates a vital point: The attributes of view View3 were obviously not fixed in stone at the time the view was created. At first, View3 was a view of the first (INT) column, but by the time the SELECT statement was executed, View3 was a view of the second (SMALLINT) column. This is the proof that views are reparsed and executed when needed, not earlier.

View Merge

What precisely is going on when you use a view? Well, there is a module, usually called the Query Rewriter (QR), which is responsible for, um, rewriting queries. Old QR has many wrinkles — for example, it’s also responsible for changing some subqueries into joins and eliminating redundant conditions. But here we’ll concern ourselves only with what QR does with queries that might contain views.

At CREATE VIEW time, the DBMS makes a view object. The view object contains two things: (a) a column list and (b) the text of the view definition clauses. Each column in the column list has two fields: {column name, base expression}. For example, this statement:

CREATE VIEW View1 AS 
 SELECT column1+1 AS view_column1, column2+2 AS view_column2 
 FROM Table1 
 WHERE column1 = 5

results in a view object that contains this column list:
{'view_column1','(column1+1)'} {'view_column2','(column2+2)'}

The new view object also contains a list of the tables upon which the view directly depends (which is clear from the FROM clause). In this case, the list looks like this:

Table1

When the QR gets a query on the view, it does these steps, in order:

LOOP:

[0] Search within the query’s table references (in a SELECT statement, this is the list of tables after the word FROM). Find the next table reference that refers to a view object instead of a base-table object. If there are none, stop.

[1] In the main query, replace any occurrences of the view name with the name of the table(s) upon which the view directly depends.

Example:

SELECT View1.* FROM View1

becomes

SELECT Table1.* FROM Table1

[2] LOOP: For each column name in the main query, do:

    If (the column name is in the view definition)
     And (the column has not already been replaced in this pass of the outer loop)
     Then:
       Replace the column name with the base expression from the column list

Example:

SELECT view_column1 FROM View1 WHERE view_column2 = 3

becomes

SELECT (column1+1) FROM Table1 WHERE (column2+2) = 3

[3] Append the view’s WHERE clause to the end of the main query.

Example:

SELECT view_column1 FROM View1

becomes

SELECT (column1+1) FROM Table1 WHERE column1 = 5

Detail: If the main query already has a WHERE clause, the view’s WHERE clause becomes an AND sub-clause.

Example:

SELECT view_column1 FROM View1 WHERE view_column1 = 10

becomes

SELECT (column1+1) FROM Table1 WHERE (column1+1) = 10 AND column1 = 5

Detail: If the main query has a later clause (GROUP BY, HAVING, or ORDER BY), the view’s WHERE clause is appended before the later clause, instead of at the end of the main query.

[4] Append the view’s GROUP BY clause to the end of the main query.

Details as in [3].

[5] Append the view’s HAVING clause to the end of the main query.
Details as in [3]

[6] Go back to step [1].

There are two reasons for the loop:

      • The FROM clause may contain more than one table and you may only process for one table at a time.
      • The table used as a replacer might itself be a view. The loop must repeat till there are no more views in the query.

A final detail: Note that the base expression is "(A)" rather than "A." The reason for the extra parentheses is visible in this example:

CREATE VIEW View1 AS 
  SELECT table_column1 + 1 AS view_column1              
  FROM Table1

SELECT view_column1 * 5 FROM View1

When evaluating the SELECT, QR ends up with this query if the extra parentheses are omitted:

SELECT table1_column + 1 * 5 FROM Table1

... which would be wrong, because the * operator has a higher precedence than the + operator. The correct expression is:

SELECT (table1_column + 1) * 5 FROM Table1

And voila. The process above is a completely functional "view merge" procedure, for those who wish to go out and write their own DBMS now. I’ve included all the steps that are sine qua nons.

The Small Problem with View Merge

A sophisticated DBMS performs these additional steps after or during the view merge:

      • Eliminate redundant conditions caused by the replacements.
      • Invoke the optimizer once for each iteration of the loop.

All three of our DBMSs are sophisticated. But here’s an example of a problematic view and query:

CREATE TABLE Table1 (column1 INT PRIMARY KEY, column2 INT

CREATE TABLE Table2 (column1 INT REFERENCES Table1, column2 INT)

CREATE VIEW View1 AS
  SELECT Table1.column1 AS column1, Table2.column2 AS column2
  FROM Table1, Table2
  WHERE Table2.column1 = Table1.column1

SELECT DISTINCT column1 FROM View1 <-- this is slow
SELECT DISTINCT column1 FROM Table2 <-- this is fast

— Source: SQL Performance Tuning, page 209.

The selection from the view will return precisely the same result as the selection from the table, but Trudy Pelzer and I tested the example on seven different DBMSs (for our book SQL Performance Tuning, see the References), and in every case the selection-from-the-table was faster. This indicates that the optimizer isn’t always ready for the inefficient queries that the Query Rewriter can produce.

Ultimately, the small problem is that the "view merge" is a mechanical simpleton that can produce code that humans would immediately see as silly. But the view-merge process itself is so simple that it should be almost instantaneous. (I say "almost" because there are lookups to be done in the system catalog.)

So much for the small problem. Now for the big one.

Temporary Tables

Here’s an example of a view definition:

CREATE VIEW View1 AS 
    SELECT MAX(column1) AS view_column1 
    FROM Table1

Now, apply the rules of view merge to this SELECT statement:

SELECT MAX(view_column1) FROM View1

The view merge result is:

SELECT MAX((MAX(column1)) FROM Table1

... which is illegal. View merge will always fail if the view definition includes MAX, or indeed any of these constructions:

      • GROUP BY, or anything that implies grouping, such as HAVING, AVG, MAX, MIN, SUM, COUNT, or any proprietary aggregate function
      • DISTINCT, or anything that implies distinct, such as UNION, EXCEPT, INTERSECT, or any proprietary set operator

So if a DBMS encounters any of these constructions, it won’t use view merge. Instead it creates a temporary table to resolve the view. This time the method is:

[at the time the view is referenced]

CREATE TEMPORARY TABLE Arbitrary_name
 (view_column1 )

INSERT INTO Arbitrary_name SELECT MAX(column1) FROM Table1

That is, the DBMS has to "materialize" the view by making a temporary table and populating it with the expression results. Then it’s just a matter of replacing the view name with the arbitrary name chosen for the temporary table:

SELECT MAX(view_column1) FROM View1

becomes

SELECT MAX(view_column1) FROM Arbitrary_name

And the result is valid. The user doesn’t actually see the temporary table, but it’s certainly there, and takes up space as long as there is an open cursor for the SELECT.

If a view is materialized, then any data-change (UPDATE, INSERT, or DELETE) statements affect the temporary table, and that is useless — users might want to change Table1, but they don’t want to change Arbitrary_name, they don’t even know it’s there. This is an example of a class of views that is non-updatable. As we’ll see, it’s not the only example.

So ...

      • With view merge alone, it is possible to handle most views.
      • With view merge and temporary tables, it is possible to handle all views.

Permanent Materialized Views

Since the mechanism for materializing views has to be there anyway, an enhancement for efficiency is possible. Namely, why not make the temporary table permanent? In other words, instead of throwing the temporary table out after the SELECT is done, keep it around in case anyone wants to do a similar SELECT later. This enhancement is particularly noticeable for views based on groupings, since groupings take a lot of time.

DB2, Oracle, and SQL Server all have a "Permanent Materialized View" feature, although each vendor uses a different terminology. Here are the terms you are likely to encounter:

Vendor Terms that May Refer to Permanent Materialized Views

DBMS VENDOR TERM
DB2 Automated Summary Table (AST) Materialized Query Table (MQT)
Oracle Materialized View (MV)summarysnapshot
SQL Server Indexed View

The terms are not perfect synonyms because each vendor’s implementation also has some distinguishing features; however, I’d like to emphasize what the three
DBMSs have in common, which happens to be what an advanced DBMS ought to have.

      • First, permanent materialized views are maintainable. Effectively, this means that if you have a permanent materialized view (say, View1) based on table Table1, then any update to Table1 must cause an update to View1. Since View1 is often a grouping of Table1, this is not an easy matter: either the DBMS must figure out what the change is to be as a delta, or it must recompute the entire grouping from scratch. To save some time on this, a DBMS may defer the change until: (a) it’s necessary because someone is doing a select or (b) some arbitrary time interval has gone by. Oracle’s term for the deferral is "refresh interval" and can be set by the user. (Oracle also allows the data to get stale, but let’s concentrate on the stuff that’s less obviously a compromise.)

(By the way, deferrals work only because the DBMS has a "log" of updates, see my earlier DBAzine.com article, Transaction Logs. It’s wonderful how after you make a feature for one purpose, it turns out to be useful for something else.)

      • Second, permanent materialized views can be indexed. This is at least the case with SQL Server, and is probably why Microsoft calls them "indexed views." It is also the case with DB2 and Oracle.
      • Third, permanent materialized views don’t have to be referenced explicitly. For example, if a view definition includes an aggregate function (e.g.: CREATE VIEW View1 AS SELECT MAX(column1) FROM Table1) then the similar query  SELECT MAX(column1) FROM Table1  can just select from the view, even though the SELECT doesn’t ask for the view. A DBMS might sometimes fail to realize that the view is usable, though, so occasionally you’ll have to check what your DBMS’s "explain" facility says. With Oracle you’ll then have to use a hint, as in this example:
SELECT/*+ rewrite(max_salary) */ max(salary)
  FROM Employees WHERE position = 'Programmer'

Permanent materialized views are best for groupings, because for non-grouped calculations (such as one column multiplied by another) you’ll usually find that the DBMS has a feature for "indexing computed columns" (or "indexing generated columns") which is more efficient. Also, there are some restrictions on permanent materialized views (for example, views within views are difficult). But in environments where grouped tables are queried often, permanent materialized views are popular.

UNION ALL Views

In the last few years, The Big Three have worked specifically on enhancing their ability to do UPDATE, DELETE, and INSERT statements on views based on a UNION ALL operator.

Obviously this is good because, as Codd’s Rules (quoted at the start of this article) state: Users should expect that views are like base tables. But why specifically are The Big Three working on UNION ALL?

UNION ALL views are important because they work with range partitioning. That is, with a sophisticated DBMS, you can split one large table into n smaller tables, based on a formula. But what will you do when you want to work on all the tables at once again, treating them as a single table for a query? Use a UNION ALL view:

CREATEVIEW View1 AS 
  SELECT a FROM Partition1 
  UNION ALL 
  SELECT a FROM Partition2

SELECT a FROM View1

UPDATE View1 SET a = 5

DELETE FROM View1 WHERE a = 5

INSERT INTO View1 VALUES (5)

Since View1 brings the partitions together, the SELECT can operate on the conceptual "one big table." And, since the view isn’t using a straight UNION (which would imply a DISTINCT operation), the data-change operations are possible too. But there are some issues:

      • Where should the new INSERT row end up: in Partition1 or Partition2?
      • Where should the changed UPDATE row end up: in Partition1 or Partition2?

The issues arise because a typical partition will be based on some formula, for example: "when a < 5 then use Partition1, when a > 5 use Partition2." So it makes sense for the DBMS to combine UNION ALL view updates with the range partitioning formulas, and position new or changed rows accordingly. Unfortunately, when there are many partitions, this means that each partition’s formula has to be checked to ensure that there is one (and only one) place to put the row.

An old "solution" was to disallow changes, including INSERTs, which affected the partitioning (primary) key. Now each DBMS has a reasonably sophisticated way of dealing with the problem; most notably DB2, which has a patented algorithm that, in theory, should handle the job quite efficiently.

Updatable UNION ALL views are useful for federated data, which (as I tend to think of it) is merely an extension of the range partitioning concept to multiple computers.

Alternatives to Views

Think of the typical hierarchy: person, employee, manager.

Each of these items can easily be handled in individual tables if a UNION ALL view is available when you want to deal with attributes that are held in common by all three tables. But in future it might be better to use subtables and supertables, since subtables and supertables were designed to handle hierarchies. The decision might rest on how well your organization is adjusting to your DBMS’s new Object/Relational features.

You cannot create a view with a definition that contains a parameter, so you might have to make a view for each separate situation:

CREATE VIEW View1 AS 
  SELECT * FROM Table1 
  WHERE column1 = 1 
  WITH CHECK OPTION

CREATE VIEW View2 AS 
  SELECT * FROM Table1 
  WHERE column1 = 2 
  WITH CHECK OPTION

And so on. But in future this too might become obsolete. It is already fairly easy to make stored procedures that handle the job.

If you want to do a materialization but don’t want (or don’t have the authority) to make a new view, you can do the job within one statement. For example, if this is your view:

CREATE VIEW View1 AS 
  SELECT MAX(column1) AS view_column1 
  FROM Table1 
  GROUP BY column2

then instead of this:

SELECT AVG(view_column1) 
  FROM View1

do this:

SELECT AVG(view_column1) 
  FROM (SELECT MAX(column1) AS view_column1 
  FROM Table1 GROUP BY column2) AS View1

In fact, this is so similar to using a view that many people call it a view —”inline view” is the common term — but in standard SQL the correct term for [that thing that looks like a subquery in the FROM clause] is: table reference.

Tips

Over time, users of views have developed various "rules" that might make view use easier. The common ones are:

      • Use default clauses when you create a table, so that views based on the table will more often be updatable.
      • Include the table’s primary key in the view’s select list.
      • Use a naming convention to mark non-updatable columns.
      • Use the same naming convention for view names as you use for base table names. Alternatively, view names should begin with the name of the table upon which the view depends.
      • [DB2] Document the view’s purpose (security, efficiency, complexity hiding, alternate object terminology) in the view’s REMARKS metadata.
      • [SQL Server] Make an ordered view with a construct like this: CREATE VIEW ... SELECT TOP 100 PERCENT WITH TIES ... ORDER BY.”

I would like to end with a recommendation about who has the best implementation of views, but in fact The Big Three are keeping up with each other feature by feature. Besides, I am no longer an unbiased observer.

References

Bello, Randall G., Karl Dias, Alan Downing, James Feenan, Jim Finnerty, William D. Norcott, Harry Sun, Andrew Witkowski, and Mohamed Ziauddin. "Materialized Views In Oracle."

      • Very complete, for Oracle8.

Bobrowski, Steve. "Creating Updatable Views."
http://www.oracle.com/oramag/oracle/01-mar/index.html?o21o8i.html

      • An Oracle Magazine article tip set.

Burleson, Donald. "Dynamically create complex objects with Oracle materialized views."
(Also at http://www.dba-oracle.com/art_9i_mv.htm.)

      • A two-part article on syntax and practical employment.

Gulutzan, Peter and Trudy Pelzer. SQL Performance Tuning. Addison-Wesley 2003

Lewis, Jonathan. "Using in-line view for speed."

      • An idea that COUNT(DISTINCT) in both the SELECT and the GROUP BY can be more efficient with inline views, on an older version of Oracle.

Mullins, Craig. "A View to a Kill."

      • Advice to DBAs.

Rielau, Serge. "INSTEAD OF Triggers: All Views are updatable!"

      • INSTEAD OF triggers are in vogue among all DBMS vendors. This is the DB2 take.

"Migrating Oracle Databases to SQL Server 2000."

      • This article includes a compact description of the differences between Oracle and Microsoft with respect to views.

"US 6,421,658 B1 - Efficient implementation of typed view hierarchies for ORDBMS."

      • An example of an IBM patent relating to views.

"Creating and Optimizing Views in SQL Server."

      • Includes some ideas for using INSTEAD OF triggers

Tip #41: "Restricting query by "ROWNUM" range (Type: SQL)."

      • One of many tip articles about the benefits of ROWNUM for limiting a query after the ORDER BY is over.

--

Peter Gulutzan is the co-author of one thick book about the SQL Standard (SQL-99 Complete, Really) and one thin book about optimization (SQL Performance Tuning). He has written about DB2, Oracle, and SQL Server, emphasizing portability and DBMS internals, in previous dbazine.com articles. Now he has a new job: he works for the "Number Four" DBMS vendor, MySQL AB.


Peter Gulutzan
Last modified 2005-04-18 03:49 PM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone