Assuring Data Integrity in DB2 - Part 2

by Craig S. Mullins

Part 1 | Part 2

In part 1 of this article, you learned some of the mechanisms that DB2 provides to automatically enforce and maintain the integrity of data as it is added to, and modified within, DB2 tables. Referential integrity, including RI referential constraints, referential sets, and referential integrity guidelines were discussed. Part 2 of this article discusses more of these mechanisms, highlighting check constraints and their guidelines, code constraints, and more.

Check Constraints

Check constraints can be used to place specific data value restrictions on the contents of a column through the specification of an expression. The expression is explicitly defined in the table DDL and is formulated in much the same way that SQL WHERE clauses are formulated. Any attempt to modify the column data (e.g., during INSERT or UPDATE processing) will cause the expression to be evaluated. If the modification conforms to the Boolean expression, the modification is permitted to continue. If not, the statement will fail with a constraint violation.

Check constraints consist of two components: a constraint name and a check condition. The same constraint name cannot be specified more than once for the same table. If a constraint name is not explicitly coded, DB2 will automatically create a unique name for the constraint derived from the name of the first column in the check condition.

The check condition defines the actual constraint logic. The check condition can be defined using any of the basic predicates (>, <, =, <>, <=, >=), as well as BETWEEN, IN, LIKE, and NULL. Furthermore, AND and OR can be used to string conditions together. However, please note the following restrictions:

The constraint can only refer to columns in the table in which it is created. Other tables cannot be referenced in the constraint.
Subselects, column functions, host variables, parameter markers, special registers and columns defined with field procedures cannot be specified in a check constraint.
The NOT logical operator cannot be used.
The first operand must be the name of a column contained in the table. The second operand must be either another column name or a constant.
If the second operand is a constant, it must be compatible with the data type of the first operand. If the second operand is a column, it must be the same data type as the first column specified.

The EMP table contains the following check constraint:

    PHONENO  CHAR(4) CONSTRAINT NUMBER CHECK
         (PHONENO >= '0000' AND
          PHONENO <= '9999'),

This constraint defines the valid range of values for the PHONENO column. The following are examples of check constraints which could be added to the EMP table:

    CONSTRAINT CHECK_SALARY
    CHECK (SALARY < 50000.00)

    CONSTRAINT COMM_VS_SALARY
    CHECK (SALARY > COMM)

    CONSTRAINT COMM_BONUS
    CHECK (COMM > 0 OR BONUS > 0)

The first check constraint ensures that no employee can earn a salary greater than $50,000; the second constraint ensures that an employee’s salary will always be greater than his or her commission; and the third constraint ensure that each employee will have either a commission or a bonus set up.

The primary benefit of check constraints is the ability to enforce business rules directly in each database without requiring additional application logic. Once defined, the business rule is physically implemented and cannot be bypassed. Check constraints also provide the following benefits:

Because there is no additional programming required, DBAs can implement check constraints without involving the application programming staff. However, the application programming staff should be consulted on check constraints because they may have more knowledge of the data. Additionally, the application programming staff must be informed when check constraints are implemented to avoid duplication of effort in the programs being developed.
Check constraints provide better data integrity because a check constraint is always executed whenever the data is modified. Without a check constraint critical business rules could be bypassed during ad hoc data modification.
Check constraints promote consistency. Because they are implemented once, in the table DDL, each constraint is always enforced. Constraints written in application logic, on the other hand, must be executed by each program that modifies the data to which the constraint applies. This can cause code duplication and inconsistent maintenance resulting in inaccurate business rule support.
Typically check constraints coded in DDL will outperform the corresponding application code.

Check Constraint Guidelines

When using check constraints the following tips and techniques can be helpful to assure effective constraint implementation.

Beware of Semantics with Check Constraints

DB2 performs no semantic checking on constraints and defaults. It will allow the DBA to define defaults that contradict check constraints. Furthermore, DB2 will allow the DBA to define check constraints that contradict one another. Care must be taken to avoid creating this type of problem. The following are examples of contradictory constraints:

    CHECK (EMPNO > 10 AND EMPNO <9)

In this case, no value is both greater than 10 and less than 9, so nothing could ever be inserted. However, DB2 will allow this constraint to be defined.

    EMP_TYPE    CHAR(8) DEFAULT 'NEW'
    CHECK (EMP_TYPE IN ('TEMP', 'FULLTIME', 'CONTRACT'))

In this case, the default value is not one of the permitted EMP_TYPE values according to the defined constraint. No defaults would ever be inserted.

    CHECK (EMPNO > 10)
    CHECK (EMPNO >= 11)

In this case, the constraints are redundant. No logical harm is done, but both constraints will be checked, thereby impacting the performance of applications that modify the table in which the constraints exist.

Other potential semantic problems could occur:

the parent table indicates ON DELETE SET NULL but a rule is defined on the child table stating CHECK (COL1 IS NOT NULL),
When two constraints are defined on the same column with contradictory conditions
When the constraint requires that the column be NULL, but the column is defined as NOT NULL

Code Constraints at the Table-Level

Although single constraints (primary keys, unique keys, foreign keys, and check constraints) can be specified at the column-level, avoid doing so. In terms of functionality, there is no difference between an integrity constraint defined at the table-level and the same constraint defined at the column-level. All constraints can be coded at the table-level; only single column constraints can be coded at the column-level. By coding all constraints at the table-level maintenance will be easier and clarity will be improved.

Code this (table-level):

CREATE TABLE ORDER_ITEM
 (ORDERNO     CHAR(3)        NOT NULL,
  ITEMNO     CHAR(3)        NOT NULL,
  AMOUNT_ORD DECIMAL(7,2)    NOT NULL,
  PRIMARY KEY (ORDERNO, ITEMNO)
  FOREIGN KEY ORD_ITM (ORDERNO)
   REFERENCES ORDER ON DELETE CASCADE
)

Instead of this (column-level):

CREATE TABLE ORDER_ITEM
 (ORDERNO     CHAR(3)    NOT NULL
   REFERENCES ORDER ON DELETE CASCADE,
  ITEMNO      CHAR(3)   NOT NULL,
  AMOUNT_ORD      DECIMAL(7,2)   NOT NULL,
  PRIMARY KEY (ORDERNO, ITEMNO)
)

Favor Check Constraints Over Triggers

If the same data integrity rule can be achieved using a check constraint or a trigger, favor using the check constraint. Check constraints are easier to maintain and are generally more efficient than triggers.

Using DB2 Triggers for Data Integrity

DB2 triggers can be useful for enforcing complex integrity rules, maintaining redundant data across multiple tables, and ensuring proper data derivation. There are many considerations that must be addressed to properly implement triggers. For additional information on DB2 triggers, check out “An Introduction to Trigger for DB2 OS/390” at http://www.dbazine.com/db2/db2-mfarticles/mullins-triggers.

Using Field, Edit, and Validation Procs for Data Integrity Field procedures are programs that transform data on insertion and convert the data to its original format on subsequent retrieval. You can use a FIELDPROC to transform character columns, as long as the columns are 254 bytes or less in length.

No FIELDPROCs are delivered with DB2, so they must be developed by the DB2 user. They are ideal for altering the sort sequence of values.

An EDITPROC is functionally equivalent to a FIELDPROC, but it acts on an entire row instead of a column. Edit procedures are simply programs that transform data on insertion and convert the data to its original format on subsequent retrieval. Edit procedures are not supplied with DB2, so they must be developed by the user of DB2. They are ideal for implementing data compression routines.

A VALIDPROC receives a row and returns a value indicating whether LOAD, INSERT, UPDATE, or DELETE processing should proceed. A validation procedure is similar to an edit procedure but it cannot perform data transformation; it simply assesses the validity of the data.

A typical use for a VALIDPROC is to ensure valid domain values. For example, to enforce a Boolean domain, you could write a validation procedure to ensure that a certain portion of a row contains only T or F. In this way it is similar to a check constraint, but a VALIDPROC can apply to an entire row, as opposed to a single column.

Summary

DB2 offers a plethora of options for ensuring data integrity. Be sure to take care to use the appropriate options as you design your DB2 databases. Failing to build data integrity constraints into your database design almost certainly will result in invalid data in your tables.

Craig Mullins is an independent consultant and president of Mullins Consulting, Inc. Craig has extensive experience in the field of database management having worked as an application developer, a DBA, and an instructor with multiple database management systems including DB2, Sybase, and SQL Server. Craig is also the author of the DB2 Developer’s Guide, the industry-leading book on DB2 for z/OS, and Database Administration: Practices and Procedures, the industry’s only book on heterogeneous DBA procedures. You can contact Craig via his web site at http://www.craigsmullins.com.

Contributors : Craig S. Mullins
Last modified 2006-03-16 01:51 PM

RI with Dup Parent Index keys

Posted by mmcgady at 2007-03-13 09:17 AM

I have come across several projects where there is a requirement to keep and view historical data. In other words, if data changes (1 or more times) the old values need to be available. This is especially true for systems that process time sensitive transactions. To do this data would never really be changed, but simply a new row would be inserted with the "changed" data. The problem lies with how to define primary keys and how to enforce referential constraints.

As part of a data analysis team, I have proposed using a timestamp as the low order keys of the primary/unique index in all the tables that need to keep historical data. This way, when processing a transaction that needs to use data as it was, say, a month in the past, the queries or programs would use the timestamp in all related tables in order to obtain data relevant to that point in time.

However, there also exists technical requirements to use DB2 enforced RI between the related tables. Since DB2 RI requires a unique index, this is not possible because as data is changed (really repeated and modify pertinent column values) the only attribute that keeps the primary index unique is the timestamp. The substantive composite key values will be duplicated. Since DB2 requires that RI be enforced using a unique index, satisfying both the business reqirement of keeping historical data and the technical requirement of DB2 enforced RI is not possible.

Does anyone know of a reason for DB2 requiring that the parent key be unique.

DBAzine.com

Sections

Personal tools

Menu

Who Are You?