Design and the eDBA
Welcome to another installment in the ongoing saga of the eDBA. So far in this series of articles, we have discussed eDBA issues ranging including availability and database recovery, new technologies such as Java and XML, and even sources of on-line DBA information.
But for this installment we venture back to the very beginnings of a relational database - to the design stage. In this article we will investigate the impact of e-business on the design process and discuss the basics of assuring proper database design.
Living at Web Speed
One of the biggest problems that an eDBA will encounter when moving from traditional development to e-business development is coping with the mad rush to "get it done NOW!" Industry pundits have coined the phrase "Internet time" to describe this phenomenon.
Basically, when a business starts operating on "Internet time" things move faster. One "Web month" is said to be equivalent to about three standard months. The nugget of truth in this load of malarkey is that Web projects move very fast for a number of reasons:
- Because business executives want to conduct more and more business over the Web to save costs and to connect better with their clients.
- Because someone read an article in an airline magazine saying that Web projects should move fast.
- Because everyone else is moving fast so you'd better move fast, too, or risk losing business.
Well, two of these three reasons are quite valid. I'm sure you may have heard other reasons for rapid application development (RAD). And sometimes RAD is required for certain projects. But RAD is bad for database design. Why? Applications are temporary, but the data is permanent. Organizations are forever coding and re-coding their applications - sometimes the next incarnation of an application is being developed before the last one even has been moved to production.
But when did you ever throw away data? Oh, sure, you may redesign a database or move from one DBMS to another. But what did you do? Chances are, you saved the data and migrated it from the old database to the new one. Some changes had to be made, maybe some external data was purchased to combine with the existing data, and most likely some parts of the database were not completely populated. But data lives forever.
To better enable you to glean value from your data it is wise to take care when designing the database. A well-designed database is easy to navigate and therefore, it is easier to retrieve meaningful data from the database.
Database Design Steps
The DBA should create databases by transforming logical data models into physical implementation. It is not wise to dive directly into a physical design without first conducting an in-depth examination of the data needs of the business.
Data modeling is the process of analyzing the things of interest to your organization and how these things are related to each other. The data modeling process results in the discovery and documentation of the data resources of your business. Data modeling asks the question, "What?" instead of the more common data processing question, "How?"
Before implementing databases of any sort, a sound model of the data to be stored in the database should be developed. Novice database developers frequently begin with the quick-and-dirty approach to database implementation. They approach database design from a programming perspective.
That is, novices do not have experience with databases and data requirements gathering, so they attempt to design databases like the flat files they are accustomed to using. This is a mistake because problems inevitably occur after the databases and applications become operational in a production environment.
At a minimum, performance will suffer and data may not be as readily available as required. At worst, data integrity problems may arise rendering the entire application unusable.
A proper database design cannot be thrown together quickly by novices. What is required is a practiced and formal approach to gathering data requirements and modeling data. This modeling effort requires a formal approach to discovering and identifying identities and data elements. Data normalization is a big part of data modeling and database design. A normalized data model reduces data redundancy and inconsistencies by ensuring that the data elements are designed appropriately.
It is actually quite simple to learn the basics of data modeling, but it can take a lifetime to master all of its nuances.
Once the logical data model has been created, the DBA uses his knowledge of the DBMS that will be be used to transform logical entities and data elements into physical database tables and columns. To successfully create a physical database design, you will need to have a good working knowledge of the features of the DBMS, including:
- In-depth knowledge of the database objects supported by the DBMS and the physical structures and files required to support those objects
- Details regarding the manner in which the DBMS supports indexing, referential integrity, constraints, data types, and other features that augment the functionality of database objects
- Detailed knowledge of new and obsolete features for particular versions or releases of the DBMS to be used
- Knowledge of the DBMS configuration parameters that are in place
- Data definition language (DDL) skills to translate the physical design into actual database objects
Armed with the correct information, the DBA can create an effective and efficient database from a logical data model. The first step in transforming a logical data model into a physical model is to perform a simple translation from logical terms to physical objects.
Of course, this simple transformation will not result in a complete and correct physical database design -- it is simply the first step. The transformation consists of the following:
- Identify and create the physical data structures to be used by the database objects (for example, table spaces, segments, partitions, and files)
- Transform logical entities in the data model to physical tables
- Transform logical attributes in the data model to physical columns
- Transform domains in the data model to physical data types and constraints
- Choose a primary key for each table from the list of logical candidate keys
- Examine column ordering to take advantage of the processing characteristics of the DBMS
- Build referential constraints for relationships in the data model
- Reexamine the physical design for performance
Of course, the above discussion is a very quick introduction to and summary of data modeling and database design. Every DBA should understand these topics and make sure that all projects, even e-business projects operating on "Internet time," follow the tried and true steps to database design.
Database Design Traps
Okay, so what if you do not practice data modeling and database design? Or what if you'd like to, but are forced to operate on "Internet time" for certain databases?
Well, the answer, of course, is "it depends!" The best advice I can give you is to be aware of design failures that can result in a hostile database. A hostile database is difficult to understand, hard to query, and takes an enormous amount of effort to change.
Of course, it is impossible to list every type of database design flaw that could be introduced to create a hostile database. But let's examine some common database design failures.
Assigning inappropriate table and column names is a common design error made by novices. Database names that are used to store data should be as descriptive as possible to allow the tables and columns to self-document themselves, at least to some extent. Application programmers are notorious for creating database naming problems, such as using screen variable names for columns or coded jumbles of letters and numbers for table names.
When pressed for time, some DBAs resort to designing the database with output in mind. This can lead to flaws such as storing numbers in character columns because leading zeroes need to be displayed on reports. This is usually a bad idea with a relational database. It is better to let the database system perform the edit-checking to ensure that only numbers are stored in the column.
If the column is created as a character column, then the developer will need to program edit-checks to validate that only numeric data is stored in the column. It is better in terms of integrity and efficiency to store the data based on its domain. Users and programmers can format the data for display instead of forcing the data into display mode for storage in the database.
Another common database design problem is overstuffing columns. This actually is a normalization issue. Sometimes a single column is used for convenience to store what should be two or three columns. Such design flaws are introduced when the DBA does not analyze the data for patterns and relationships. An example of overstuffing would be storing a person's name in a single column instead of capturing first name, middle initial, and last name as individual columns.
Poorly designed keys can wreck the usability of a database. A primary key should be nonvolatile because changing the value of the primary key can be very expensive. When you change a primary key value you have to ripple through foreign keys to cascade the changes into the child table.
A common design flaw is using Social Security number for the primary key of a personnel or customer table. This is a flaw for several reasons, two of which are: 1) a social security number is not necessarily unique and 2) if your business expands outside the USA, no one will have a social security number to use, so then what do you store as the primary key?
Actually, failing to account for international issues can have greater repercussions. For example, when storing addresses, how do you define zip code? Zip code is USA code but many countries have similar codes, though they are not necessarily numeric. And state is a USA concept, too.
Of course, some other countries have states or similar concepts (Canadian provinces). So just how do you create all of the address columns to assure that you capture all of the information for every person to be stored in the table regardless of country? The answer, of course, is to conduct proper data modeling and database design.
Denormalization of the physical database is a design option but it can only be done if the design was first normalized. How do you denormalize something that was not first normalized? Actually, a more fundamental problem with database design is improper normalization. By focusing on normalization, data modeling and database design, you can avoid creating a hostile database.
Without proper upfront analysis and design, the database is unlikely to be flexible enough to easily support the changing requirements of the user. With sufficient preparation, flexibility can be designed into the database to support the user's anticipated changes. Of course, if you don't take the time during the design phase to ask the users about their anticipated future needs, you cannot create the database with those needs in mind.
Taming the Hostile Database
If data is the heart of today's modern e-business, then the database design is the armor that protects that heart. Data modeling and database design is the most important part of a database application.
If proper design is not a component of the database creation process, you will wind up with a confusing mess of a database that may work fine for the first application, but not for subsequent applications. And heaven help the developer or DBA who has to make changes to the database or application because of changing business requirements. That DBA will have to try to tame the hostile database.
Craig Mullins is an independent consultant and president of Mullins Consulting, Inc. Craig has extensive experience in the field of database management having worked as an application developer, a DBA, and an instructor with multiple database management systems including DB2, Sybase, and SQL Server. Craig is also the author of the DB2 Developer’s Guide, the industry-leading book on DB2 for z/OS, and Database Administration: Practices and Procedures, the industry’s only book on heterogeneous DBA procedures. You can contact Craig via his web site at http://www.craigsmullins.com.
Contributors : Craig S. Mullins
Last modified 2006-01-16 07:03 AM