DB2 Data Type Usage Tips

by Craig S. Mullins

Data type and length are the most fundamental integrity constraints applied to data in a database. Simply by specifying the data type for each column when a table is created, DB2 automatically ensures that only the correct type of data is stored in that column. Processes that attempt to insert or update the data to the wrong type will be rejected. Furthermore, a maximum length is assigned to the column to prohibit larger values from being stored in the table.

The DBA must choose the data type and length of each column wisely. It is best to choose the data type that most closely matches the domain of correct values for the column. For example, a numeric column should be defined as one of the numeric data types: integer, decimal, or floating point. Specifying a character data type for a column that will contain numeric data imposes additional work on the application programs to edit check the values. Furthermore, the program will be less efficient at edit checking data than DB2 is. And data that is inserted or modified outside the scope of an application program will not be edit checked at all, thereby potentially introducing data intgretity problems into the database.

DB2 supports a broad array of data types. For numeric data, DB2 support SMALLINT, INTEGER, DECIMAL, and FLOAT data types. For character data, DB2 supports CHAR, VARCHAR, CLOB, and DBCLOB data types. For temporal data, DB2 support DATE, TIME, and TIMESTAMP data types. And for graphic data DB2 supports GRAPHIC, VARGRAPHIC, and BLOB data types. DB2 also provides the ROWID data type which is basically a pointer to be used in conjunction with the large object data types (BLOB, CLOB, and DBCLOB).

But the choice of data type is not always 100% clear. For example, a common data requirement is for numeric data to be displayed with leading zeroes. But none of the numeric data types support leading zeroes. What should be done in this situation? Let's run through an example.

A Numeric Example

Say a four-byte code is required to identify an entity; all of the codes are numeric and there is not intent for them to ever contain non-numeric data. However, for reporting purposes, the users want the codes to print out with leading zeroes. Should the column be defined as CHAR(4) or SMALLINT? The best answer to this question is to use the data type that is closest to the domain for the column - in this case, SMALLINT is probably the best choice. But let's examine the reasoning.

Edit checks: Without proper edit checks, inserts and updates could place invalid alphabetic characters into the column. If ad hoc data modifications are permitted then this type of data integrity problem almost certainly will occur. Although ad hoc modification is rare in production databases, data problems can still occur if the proper edit checks are not coded into every program that can modify the data. If proper edit checks are coded and will never be bypassed, data integrity should not be a problem, but performance may be.

Filter factors: Consider the possible number of values that a CHAR(4) column and a SMALLINT column can assume. Even if edit checks are coded for each, DB2 is not aware of these and assumes that all combinations of characters are permitted. For character column DB2 assumes that 26 alphabetic letters, 10 numeric digits, and a space will be used - adding up to 37 possible characters. For a four-byte character column there are 374 or 1,874,161 possible values.

A SMALLINT column can range from -32,768 to 32,767 producing 65,536 possible small integer values. The drawback here is that negative or 5 digit product codes could be entered. However, if we adhere to our proper edit check assumption, the data integrity problems will be avoided here, as well. Or we could use DECIMAL(4,0) instead of SMALLINT, which is probably a sensible thing to do in this case.

At any rate, DB2 will use the HIGH2KEY and LOW2KEY values to calculate filter factors. For character columns, the range between HIGH2KEY and LOW2KEY is larger than numeric columns because there are more total values. The filter factor will be larger for the numeric data type than for the character data type which may influence DB2 to choose a different access path. For this reason, favor a numeric data type over a character data type.

Display: But if we store data as numeric the leading zeroes problem remains. We can solve this problem using other methods, for example by formatting the data programmatically when it is read for display purposes. When using QMF, you can force leading zeroes to be shown using the "J" edit code. Report programs can be coded to display leading zeroes easily enough by moving the host variables to appropriate display fields. And for ad hoc access using other reporting tools there will usually be a parameter that can be used to display the leading zeroes.

A Date-Time Dilemma

Let's turn our attention to another data type dilemma. What approach should be taken if we need to record a date and time for each row of a table? Well, the obvious answer is to use the TIMESTAMP data type, but it might not be the best answer. An alternate option would be to use two columns: one DATE and one TIME.

The best answer to this question depends on several factors specific to your situation. Consider the following points before making your decision:

With DATE and TIME you must use two columns. TIMESTAMP uses one column, thereby simplifying data access and modification.
The combination of DATE and TIME columns requires 7 bytes of storage, while a TIMESTAMP column always requires 10 bytes of storage. Using the combination of DATE and TIME columns will save space.
TIMESTAMP provides greater time accuracy, down to the microsecond level. TIME provides accuracy only to the second level. If precision is important, use TIMESTAMP. Use of TIME instead would cause less precision.
Date and time arithmetic can be easier to implement using TIMESTAMP data instead of a combination of DATE and TIME. Subtracting one TIMESTAMP from another results in a TIMESTAMP duration. To calculate a duration using DATE and TIME columns, two subtraction operations must occur: one for the DATE column and one for the TIME column.
DB2 provides for the formatting of DATE and TIME columns via local DATE and TIME exits, the CHAR function, and the DATE and TIME precompiler options. These facilities are not available for TIMESTAMP columns. If the date and time information is to be extracted and displayed on a report or by an online application, the availability of these DB2-provided facilities for DATE and TIME columns should be considered when making your decision.

Another Numeric Example

Another common application requirement is to store a limited range of numeric values - say from 1 to 9. The quick answer is likely to be something along these lines: "Well, those are integers, but they are small, so let's use SMALLINT." This is probably not the best choice in this case. Of course, SMALLINT will work, it is just not the optimal solution. Remember, from our previous example that SMALLINT values can range from -32,768 to 32,767 producing 65,536 possible small integer values.

An alternate approach that more closely conforms to the domain is to use the DECIMAL data type. DECIMAL(1,0) will only permit integers because of the zero scale. But it will also limit the values that can be stored to 0 through 9 (positive or negative). You can then attach a check constraint to the column specifying COLUMN > 0 to remove 0 and negative numbers as legitimate values.

Summary

Of course, this has been just a short introduction to DB2 data types; there are many other issues that need to be examined during database design including user-defined distinct types, default values, nulls, and check constraints to name a few. But choosing the proper data type for your DB2 columns is one of the most important database design decisions you will make. Improper data types can result in data integrity problems, performance degradation, disk storage waste, and program development issues. Spend your database design time wisely by analyzing the impact of your DB2 data type decisions.

Craig Mullins is an independent consultant and president of Mullins Consulting, Inc. Craig has extensive experience in the field of database management having worked as an application developer, a DBA, and an instructor with multiple database management systems including DB2, Sybase, and SQL Server. Craig is also the author of the DB2 Developer’s Guide, the industry-leading book on DB2 for z/OS, and Database Administration: Practices and Procedures, the industry’s only book on heterogeneous DBA procedures. You can contact Craig via his web site at http://www.craigsmullins.com.

Contributors : Craig S. Mullins
Last modified 2006-01-16 04:28 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

DB2 Data Type Usage Tips

A Numeric Example

A Date-Time Dilemma

Another Numeric Example

Summary