Old Approach, True Implementations

by Fabian Pascal

In a Techworld article entitled “A new approach to querying databases?” David Cartwright writes:

Most people who use databases in anger are familiar with the concept of a relational database. The idea of “normalising” data in order to minimise duplication of information, whilst providing a mechanism to — “join” the disparate data collections together based on common fields, is well understood even by relative newcomers to database technology.

First, a vast majority of database practitioners are ignorant and dismissive of relational concepts; they are familiar with SQL databases, which they mistake for relational, in part because the trade media induce this misconception, among others. Second, normalization is far from being “well understood”; in fact, even those who think they are familiar with it reveal, on probing, that they are not. A majority erroneously believe that normalization — rather than poor implementation and physical deployment — inhibits performance. They denormalize databases, trading integrity for performance without realizing it (see Irrational Exuberance and “A Costly Illusion that Won’t Go Away”; for a detailed discussion of normalization see Practical Database Foundations papers #1, “What First Normal Form Really Means,” #2, “What First Normal Form Means Not,” and #6, “The Costly Illusion: Normalization, Integrity and Performance”).

Relational databases as we know them today are, however, far from optimal — at more than one level. At the lowest level we have the database implementation itself — not least the fact that as well as containing elements of information, fields are also permitted to be empty, or “null”. (sic) And then there’s the language that we use to insert and extract data — SQL, or the Structured Query Language: it’s relatively simple to learn, but the syntax is often inconsistent and unless you use one of the many vendor-specific supersets of SQL it can be tricky to express complex series of operations in a concise manner.

Cartwright means SQL DBMSs, not relational databases, of course; he seems unaware of the difference.

It should come as no surprise, then, that work has been going on for some years to devise a more correct alternative to the relational database as we know it today. Perhaps the most promising is the work done by a world-renowned pair of database specialists, Hugh Darwen and Chris Date. The former is a retired IBM database specialist, and the latter is the author of the standard text book on the subject used by most good universities and colleges to teach database-oriented subjects. Darwen and Date, in their book Foundation for Future Database Systems: The Third Manifesto (ISBN 0201709287), take a step back at the way databases work and describe a new approach to database architecture.

It should, in fact, be surprising that any effort is still being invested in anything relational; the industry and academia have been ignoring and dismissing the technology, deeming it “old” and, therefore, obsolete, “just a theory” and, therefore, not practical. So Date and Darwen’s work is not just the “most promising,” but practically the only effort to devise an alternative to SQL’s bastardization of the model (see If You Liked SQL, You’ll Love XQuery).

The Third Manifesto does not describe a “new approach to database architecture.” It is rather an attempt to spell out, refine and clarify what a truly relational DBMS (TRDBMS) and data language ought to be; it is a blueprint, if you will, for genuine relational systems, as distinct from SQL products.

So what’s the big deal? In a nutshell, Tutorial D is intended to be a “proper” implementation of a database query language. The idea is that there should be no arbitrary restrictions on the syntax of the query language (Voorhis cites SQL’s rather arbitrary habit of allowing nested queries in some places but not others, for instance), but at a lower level the database shouldn’t run up against idiotic limitations. The limitation in existing implementations that generates the most comment from the various parties in the debate is the problem with ‘null’ values in relational databases. Put simply, a database field has a type (50 characters, for instance, or a floating point number to two decimal places, or an 8-bit integer), but when you don’t fill the field in (i.e. it’s ‘null’) it loses all its meaning. Even the ANSI standards state that if a field is null it’s said not to exist — so if you ask a database for “all entries where field X is not equal to 47” it won’t return any of those where field X is null because instead of saying “Null doesn’t equal 47” ,(sic) the value “null” is deemed not to be comparable with any non-null field.

A proper implementation is not a new query approach, but rather the correct query approach that was never implemented. A relational database attribute (not field) draws its values from a data type (or domain). A type is not just one possible representation — which is what Cartwright’s enumerated examples are — but also a set of values so represented, and is associated with a set of operators applicable to those values. SQL does not provide proper support of types in general, and of user-defined types of arbitrary complexity in particular. The relational model accommodates them.

We don’t know what “if you don’t fill the field…it loses all its meaning” means. SQL NULLs implement a faulty version of many-valued logic, rather than the two-valued logic on which the relational model is predicated (pun intended). Consequently, aside from complexity, SQL queries can yield results that are incorrect in the real world (see Practical Database Foundations paper #8, “The Final NULL in the Coffin: A Logically Correct Solution to Missing Data.”

Darwen and Date’s new architecture addresses many of the limitations of today’s relational database structures — not least the ever-present issue of “null” fields. It talks about new techniques for normalising data which eliminate problems, though unsurprisingly the elimination of some gotchas has brought about the need to introduce some new concepts to make the model work successfully. And in many ways, the new structure (and thus the Tutorial D language) is easy to get to grips with and far from rocket science to program.

Again, no new architecture; the good old relational model. It does not “talk about new techniques for normalizing data.” While doing research for their book Temporal Data and the Relational Model , they identified a new, sixth normal form, pertinent to certain databases, particularly those containing interval data, such as time.

Incidentally, aside from the REL implementation he refers to, there is a commercial product with a language based on the principles advanced in The Third Manifesto: Dataphor by Alphora. It is interesting to note that the company refrained from implementing NULLs, in order to avoid the problems haunting SQL, but recently had to give in to market pressure and added them (so much for the market system leading to best technologies). But, at the time, there was no logically correct solution to missing data. We (the authors of the above-mentioned paper #9) believe that the paper provides a possible solution. In the same paper, we also argue that truly relational DBMSs (TRDBMS) that support our solution should be based on the TransRelational™ implementation model.

Special Offer: Author Fabian Pascal is offering DBAzine.com readers discounted subscriptions to the Practical Database Foundations series of papers. To take advantage of this offer, contact him via the About page on http://www.dbdebunk.com/index.html.

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author, and lecturer specializing in data management. He was affiliated with Codd & Date and for 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, and IRS. He is founder, editor and publisher of Database Debunkings, a Web site dedicated to dispelling persistent fallacies, flaws, myths and misconceptions prevalent in the IT industry. Together with Chris Date, he has recently launched the Practical Database Foundations series of papers that also serve as text for seminars. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and Test Your Foundation Knowledge.

Contributors : Fabian Pascal
Last modified 2005-04-12 06:21 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

Old Approach, True Implementations