Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Of Interest » Articles of Interest » Nulls Nullified
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 3448
 

Nulls Nullified

by Fabian Pascal

“The idea that you will always know everything is arrogant.”

--Joe Celko

“The idea that anything can be inferred from what you don’t know is ignorant.”

--Fabian Pascal

It is not possible to be a proper database practitioner without knowledge of data fundamentals, yet a vast majority of practitioners possess very little such knowledge. What is more, most of what is published in the industry, or taught in academia, is devoid of fundamentals, and not only centers almost exclusively on tools and products, but is more often than not outright wrong, because authors and teachers are not familiar with fundamentals either and, worse, are dismissive of them.

In an article titled “If You Liked SQL, You’ll Love XQuery” I criticized the principal author of SQL — which, after all, was supposed to be a relational data language — for poor understanding of the relational model, leading to NULLs, a violation of the model which unduly complicates the language, and is prone to produce incorrect query results. I wrote:

“The relational model is predicate logic applied to databases. Predicate logic is the real-world’s two-valued logic (true/false) ... logic guarantees correctness — defined as consistency — of query results. It is to preserve logical correctness; therefore, that Codd’s Information Principle requires that all information in relational databases be represented as values in relations. The term ‘NULL values’ suggests that Chamberlin does not realize that part of the problem with NULLs is that they are not values — indeed, they are supposed to be markers for the absence of values. Whatever a database table with NULLs is, neither is it a relation, nor do NULLs represent anything in the real world and, consequently, correctness and the rest of the relational benefits are lost.

Incidentally, ‘inapplicable values’ are a red herring. They are an artifact of bad database design. There is only one kind of missing value — unknown — and as I demonstrate in the above-mentioned chapter [of my book, PRACTICAL ISSUES IN DATABASE MANAGEMENT], it can be handled relationally, without the huge problems of SQL’s NULLs.”

Consider now some reactions to my article at Slashdot.org.

“If you removed NULLs from relational database design, people would reinvent them (poorly) -- probably by using IDs of -1 or 0, or IDs to a special magic ‘null’ row, which I suspect is what he’s talking about by ‘it can be handled relationally.’ To suggest that missing or inapplicable values are not part of ‘the real world’ is so wrong it’s... well... wrong. Anyone who’s actually done database work (or programming work, for that matter) knows this.”

--Chops

Like so many in the industry, the author of this comment is unaware of the history of the field in which he practices! In fact, the “special values” that he refers to were actually used to mark missing data decades ago, before databases and DBMSs were invented, and they are still in use today, even in SQL databases. Over time it became abundantly clear that this was a complex, unproductive, and prone to error way to handle missing data. And, of course, “Those Who Don’t Know the Past, Are Condemned to Repeat It.”

We do not know what “null row” means (there is such a thing as “NULL row” in SQL, but that’s not what is meant here), but note the confusion between (a) the absence of data (b) how such absence is represented and manipulated in databases. By “nor do NULLs represent anything in the real world” I did not mean to deny (a), but rather that SQL’s NULLs are not the correct way for (b). There is nothing in the real world that a NULL represents; rather, it is an attempt to represent imperfect knowledge of the real world, which is quite distinct; and SQL’s NULL fails to do even that job satisfactorily.

It is, indeed, a fact, that there is no such thing as “inapplicable data.” Consider, for example, data on employees, some of whom do not earn a commission. This is often represented logically in SQL databases by an EMPLOYEES table with some rows containing NULLs in the COMMISSION column. But what exactly does that mean? On the one hand, in the real world those employees do not earn a commission, period. Yet, on the other hand, the place for COMMISSION values is present for the rows representing those employees, as if they did earn a commission. This is, of course, a contradiction: the logical model does not represent the conceptual model accurately, or, in other words, the database design is not right.

Note: SQL has only one kind of NULL. If both inapplicable and unknown values are represented by NULLs, the DBMS won’t be able to distinguish between them and will produce incorrect results. For a simple example consider the query “What is the total compensation of each employee?” when some of the commissions are unknown and some are inapplicable.

Here’s another comment:

“SQL NULLs are the worst thing since unsliceable bread. They break boolean logic. You would think that if (X = Y) is false, then (X != Y) would be true. With SQL, if either X or Y or both are NULL, then any expression evaluating it is false.

I understand the argument (NULL indicates no data--so you can’t claim it’s equal to anything). Academic ##@!&. Anyone who’s maintained code using SQL NULL semantics will agree. If you really want to claim that NULL is so much 'not a value’ that you can’t compare it to anything, then do it the man’s way and throw a ##@!& exception. Of course, anyone can see that doing so would make code which MIGHT encounter a null value even MORE difficult to maintain, so they came up with this ‘any comparison to NULL is false’ ##@!&.

The easiest way to define NULL is that it’s equal to another NULL value, but not equal to anything else. Then I don’t need any special ‘is null’ clause either.

The very definition of b0rken.”

--ceswiedler

It is quite telling that when defenders of NULLs try to rationalize them by dismissing the errors of which SQL and its commercial variants are chockfull, they almost always make those very same mistakes, defeating their own argument.

      • The expression (X!=Y) should not evaluate to false if either X or Y or both are NULL, but to unknown NULL (in SQL NULL does not represent just unknown, but also inapplicable, causing it to fall into yet more logical traps). Comparing something you don’t know to something you either know or don’t cannot possibly evaluate to either true or false!
      • It’s not that “we want to claim that NULL is so much ‘not a value’”; it really is not a value, and therefore not comparable to any value, whether we like it or not. That’s what a NULL means: the absence of a value. Indeed, the term “NULL value” is a contradiction in terms.
      • Is a NULL equal to a NULL? Say, my age and your age are unknown. We represent both by NULLs. When the DBMS compares ages, should it conclude that we have the same age? If the ages are unknown we cannot tell whether this is true or false. It’s unknown! (What ceswiedler probably means is that both ages are unknown, but that is not the same as equality, exactly the kind of confusion that many-valued logic causes).

I have already addressed the exception values approach. What defenders of NULLs fail to appreciate is that the existence of NULLs forces the same kind of complexity in SQL queries that special values produced in application code prior to databases: SQL statements must be heavily qualified with IS NULL or IS NOT NULL.

“It’s not the SQL standard’s fault if your code’s logic can’t handle that case. Nor is it the standard’s fault that you can’t see fit to NOT USE the NULL feature when you don’t want to (and any reasonable database even goes so far as to give you the option of making *sure* you don’t use it -- that’s what the NOT NULL declaration when defining a column is for).

Getting rid of NULL isn’t going to help you when you suddenly discover that you really DO need to be able to represent ‘missing data’ somehow.”

--kcbrown

This comment is internally inconsistent. Either the NULL is unnecessary, in which case it should not be in the language; or it is necessary, in which case not using it is not an option. SQL seems to want to have it both ways, so it’s not application code that is at fault here.

“The simple reason for NULL being an integral part of relational databases comes from the foundations of the concept: relations are sets, and the whole idea is based on set theory.

A table is a set of rows, and a row is a set of attributes. Every set contains the null set as an element. Therefore, without changing the relational model to be based on some other premise, null must exist.

Granted, there are some problems, but it is what it is...”

--DingoBueno

As I wrote in the original article, NULLs are actually a violation of Codd’s own core Information Principle: tables containing them are not relations. A relation/relvar consists of a heading — a set of typed attribute names — and a set of tuples, each of which is a set of attribute values. And NULLs are not values, they are markers for the absence of values.

Some problems, indeed: outside of two-valued logic, all bets are off; integrity of the data and of results derived from it cannot be guaranteed.The reason practitioners are oblivious to the problems posed by NULLs is because (a) they lack proper education and (b) they use simplistic examples and stop at representation, without much thought of the impact on what representation is really for: integrity enforcement and manipulation.

Until recently, there was no logically correct, relational solution to missing data. We offer an outline of a possible such a solution in Practical Database Foundations paper #8, “The Final NULL in the Coffin,” which also summarizes the problems with NULLs.

--

Special Offer: Author Fabian Pascal is offering DBAzine.com readers discounted subscriptions to the Practical Database Foundations series of papers. To take advantage of this offer, contact him via the About page on http://www.dbdebunk.com/index.html.

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, and IRS. He is founder, editor and publisher of Database Debunkings, a Web site dedicated to dispelling persistent fallacies, flaws, myths, and misconceptions prevalent in the IT industry. Together with Chris Date he authors the Practical Database Foundations series of papers that also serve as text for seminars. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld, and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and Test Your Foundation Knowledge.


Contributors : Fabian Pascal
Last modified 2005-04-12 06:21 AM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone