Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Of Interest » Articles of Interest » Irrational Exuberance
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 2416
 

Irrational Exuberance

by Fabian Pascal

In, “The Death of Denormalization” Craig Mullins predicts that, “Indeed, the death of denormalization is fast approaching. And who among us will really miss it when it finally kicks the bucket?”

We wish it were so, but are not as optimistic. It would take much more than just improvements in current product implementations to do away with what is one of the major misconceptions prevalent in the industry.

“Normalization is a design approach that minimizes data redundancy and optimizes data structures by systematically and properly placing data elements into the appropriate groupings. A normalized data model can be translated into a physical database that is organized correctly. In simple terms, normalization is the process of identifying the one best place each fact belongs.”

Essentially this is correct, but some clarifications are in order (see chapters 5, 8 in Practical Issues in Database Management).

      • The best way to think of this is, informally, that in a fully normalized database every entity-type in the conceptual model maps to exactly one table in the logical database. It is only then that all non-key attributes in every relvar are about the key, the whole key and nothing but the key. In an undernormalized database, multiple entity-types are “bundled” into single tables, which produces at least four practical problems:
          • redundancy
          • database bias
          • update anomalies
          • proneness to misinterpret the meaning of R-tables
          • overhead of extra constraint maintenance

Given a properly formulated conceptual model, correct mapping to the logical model will yield a fully normalized database. The need for a ‘process of normalization’ arises only when a database is poorly designed—undernormalized—and requires fixing.

“E.F. Codd, the creator of the relational model, created normalization in the early 1970s. Like the relational model of data, normalization is based on the mathematical principles of set theory. Although normalization evolved from relational theory, the process of normalizing data is generally applicable to any type of data.”

A better way to put it is that while conceptual modeling is inherently an informal endeavor, logical modeling—which is where normalization applies—is based on formal theory. Otherwise put, while business modeling is all art, logical database design is, at least in part, science. And the theory is there not for its own sake, but to guarantee consistency and avoid the practical problems mentioned above (see Database Founcations paper #4,  “Un-Muddling Modeling”). (Incidentally, it’s good to see, for a change, the distinction between the relational model, and normalization, as they are not the same.)

It is true that normalization principles can be applied to “any type of data”, but it is very important to realize that maximum benefits from it accrue to relational databases managed by true RDBMSs (TRDBMS), and we do not mean SQL products (see the following).

“Normalization is a logical process and does not necessarily dictate physical database design. A normalized data model will ensure that each entity is well formed and that each attribute is assigned to the proper entity. Of course, the best situation is when a normalized logical data model can be physically implemented without major modifications, but DBAs frequently had to divert from implementing a fully normalized physical database due to deficiencies in the DBMS in terms of performance or design.”

There is a technical term for the proper relationship that a DBMS ought to maintain between the logical and physical database representation levels. Physical data independence—a major motivation for the relational model (see E. F. Codd’s original publications)—that the logical model (what users and applications see) should be insulated from and, thus, independent of physical implementation details. The purpose is practical: it permits DBMS vendors to use any and all physical means at their disposal, and to change/improve them at will, to maximize performance, without affecting the users’/ applications’ view of the data. The principle prevents huge development and maintenance burdens that are prohibitive if and when it is not adhered to, and physical implementation details contaminate the logical level. (In fact, “fully normalized physical database” is a contradiction in terms—normalization is purely logical.)

In fact, one reason SQL DBMSs do not always perform as well as they could is due to their poor support of this principle. Because SQL implementations tend to maintain a fixed relationship between logical R-table rows and physical records, their optimization options are limited (see “The Dangerous Illusion,” part 1 and part 2). That’s why DBAs sometimes have to denormalize logical designs—incurring all the problems mentioned above—in order to get satisfactory performance.

This reinforces “The Logical-physical Confusion” prevalent in the industry. Practitioners criticize the relational model for being purely logical and, thus, “failing to address physical implementation.” But this is backwards: it’s the independence of logical models from physical details that confers on implementations unlimited freedom to maximize performance and, thus, obviate the need for denormalization.

“So a normalized database implementation minimizes integrity problems and optimizes updating; but it may do so at the expense of retrieval. When a fact is stored in only one place, retrieving many different, but related facts usually requires going to many different places. This can slow the retrieval process. Updating is quicker, however, because the fact you're updating exists in only one place.

Many of our most critical applications drive transactions that require rapid data retrieval. Some applications require specific tinkering to optimize performance at all costs. To accomplish this, sometimes the decision is made to denormalize the physical database implementation, thereby deliberately introducing redundancy. This can speed up the data retrieval process, but at the expense of data modification.”

Since performance is determined entirely by physical implementation and not by logical database design, it is misleading to speak of normalization as a tradeoff between integrity (logical) and performance (physical). Although this may actually be sometimes the outcome with today’s SQL DBMS’s, it is not due to normalization per se but, as we have explained, to the poor implementation of those products. A well-implemented TRDBMS would involve no such tradeoff.

It is also important to know that denormalization—even with current SQL products—does not improve retrieval performance for all applications. Applications that access a fully normalized table will perform worse if the tables are merged into a wider denormalized table, because they would have to read more data than they need (this is the database bias we referred to earlier). Furthermore, queries can be more complex and harder to formulate with undernormalized databases.

Because normalization and denormalization are purely logical by definition, even though we understand what is meant by “physical denormalization,” we find the term conducive to logical-physical confusion.

“Why is denormalization dying? First, the modern DBMS has been improved over the past twenty years. Today's most popular DBMSs (DB2, Oracle, SQL Server) have better internal performance features and characteristics that can more quickly retrieve data. Another factor is better query optimization. With the in-depth, complex cost-based optimizers used by modern DBMSs, access paths are becoming more efficient. Finally, we have materialized query tables (MQTs), also known as automated summary tables (ASTs). These are new database objects supported by some of today's DBMSs that can be thought of as a materialized view. A table is created based on a SQL statement, and the DBMS manages the gathering of the data, which is then physically stored. And the optimizer “knows” about these objects so a query can be written against either the materialized query table or the underlying tables themselves. And the DBMS provides options to control data refresh rates and other use characteristics.

Using these features, the DBA can create a fully normalized physical database implementation - and then create “denormalized” structures using MQTs or ASTs. This brings the benefit of data integrity because the database is fully normalized, along with the speed of retrieval using the materialized query table.”

But the whole point of views—and logical data independence—is to conceal “materialized query tables”!!

Hardware, product and optimization improvements are, of, course, always desirable and welcome, but they are only add-ons, not substitutes for the real and fundamental solution: well implemented TRDBMSs, whose performance is not hindered, but actually enhanced with fully normalized databases.

At the current state of database science, we believe that a DBMS truly based on the relational model (TRDBMS) and implemented using the TransRelational™ technology is the real solution, not only for performance purposes, but also for the many other practical benefits accruing from that. We note in particular that the TransRelational™ implementation model not only obviates the need for materialized query tables, but actually does away with all materialized tables, including SQL’s base tables: all tables are “virtual,” so to speak. (This would also lend itself rather well to the correct handling of missing data expounded in chapter 10 of Practical Issues in Database Management.)

We do not know whether—and may even have reasons to doubt that—the industry will produce such solutions. That requires a major change in the educational system and the level of foundation knowledge of users, vendors and the trade media, which we do not expect any time soon. In fact, we see further deterioration thereof instead.

In these circumstances poor products and practices, including denormalization, will persist.

(For a more in-depth treatment of the subject see Database Founcations paper #6 “The Costly Illusion: Normalization, Integrity and Performance.”)

--

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCS, and IRS. He is founder, editor and publisher of Database Debunkings, a Web site dedicated to dispelling persistent fallacies, flaws, myths and misconceptions prevalent in the IT industry. Together with Chris Date he has recently launched the Database Foundations Series of papers. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and for The Journal of Conceptual Modeling. His third book, Practical Issues in Database Management serves as text for his seminars.


Special Offer: Author Fabian Pascal is offering DBAzine.com readers subscriptions to the Database Foundations Series of papers at a discount. To receive your discount, just let him know you’re a DBAzine reader before you subscribe! Contact information is available on the “About” page of his site.


Craig Mullins Replies to Fabian:

I really have no major qualms with most of Fabian’s dissection of my article on the death of denormalization. Fabian does a great job, as always, of calling out the need for a true relational implementation that could possibly do away with many of the irritations faced by DBAs and technologists who use the actual existing SQL DBMS products in the real world. My article focused on issues impacting denormalization as a practice applied to those products.

And yes, my assertion that “… the death of denormalization is fast approaching” is probably irrationally exuberant. But the goal was to be provocative to get the article published, and then read. I know Fabian has issues with publishers and this sort of thing, but so be it.

The bottom line is this: after reading Fabian’s meticulous analysis of my article, I stand by everything I wrote without changing one word of it. And thank you, Fabian, for augmenting the article with his views on true relational database systems.


Contributors : Fabian Pascal, Craig S. Mullins
Last modified 2005-04-12 06:21 AM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone