No Database Champion

by Fabian Pascal

A meaningful dialog on any topic in a scientific domain (that is, arguments that are either true, or false) — as distinct from personal beliefs and preferences, esthetics, ethics, and so on — requires adequate knowledge and sound, clear, and precise reasoning (see "Why Is It Important to Think Precisely" in Relational Database Writings, 19xx-19xx). In the data management field, those who explain, criticize, or propose to extend or replace the relational model (RM) — which is nothing but the application of logic and math to database management — ought, at a minimum, to know and understand the model. Alas, as the weekly quotes at Database Debunkings demonstrate, in the IT industry this is not deemed necessary: knowledge and understanding of the field's foundations is abysmally poor. Arguments are riddled with confusion and fuzziness, of which practitioners often even boast. Even though you would not know it from the profusion of claims of technological progress by the industry and media, database technology is actually regressing (see "Skyscrapers with Shack Foundations").

Consider the exchange about my writings at xmldb.org. When somebody referred to them as "interesting insights, given [Pascal] is a strong RM advocate" (implying that it is quite odd for relational proponents to have such insights), Mike Champion, a database specialist at Software AG, replied:

"I've read quite a bit of Mr. Pascal's and C.J. Date's ranting on their www.dbdebunk.com site. I'd agree that there is quite a bit of food for thought there; I certainly come away from it with a fuller appreciation for the deep elegance of the relational model. It is very clear to me that the XML community (especially the XML DB people) needs to come to grips with how the XML data model contrasts with, complements, and/or replaces the relational paradigm."

I note in passing the reference to "elegance" (which will shortly be contrasted with "practicality," hinting to the common — and fallacious — notion that the two are mutually exclusive). The RM is indeed, elegant, but it is its very elegance that lends it superior practicality, and it's the lack of elegance of the hierarchic data model (HM) underlying XML that doomed hierarchic DBMSs in the past (see below). As to "ranting," I leave it to the reader to decide exactly whose pronouncements deserve to be referred to as such.

The fact is that XML was devised by text publishers, not data management people, with whom the concept of markup tags originates (see "Tags Do Not a Language Make"). And it was intended mainly for data interchange, not data management, which requires an agreed physical format, not a data model (see "The Exchange Tail"). But the industry has a long history of blindly extending technology invented for one purpose to other, unintended purposes. So why not extend markup tags to data management?

Even though they implied a data model (see "Tags Do Not a Language Make") — I dare anybody to do data management without one — XML proponents were not conscious of the concept, and the fact that they have not "come to grips" with it is highly significant and, in itself, throws their whole endeavor into serious question. Post-fitting a theory to technology, rather then deriving the latter from the former is rather absurd, and defeats the very point of a sound foundation. This is almost identical to what happened with hierarchic database technology of decades ago, and doomed it. The industry never learns (see "Those Who Don't Know or Forget the Past, Are Doomed to Repeat It").

As usual, some try to dismiss the criticism of XML as an ad-hoc technology by resorting to the excuse that it does not claim to have a theoretical foundation, as if the deficiencies will go away just because there is no such pretense. But in fact, this excuse is not even valid: the XML specifications refer explicitly to a yet-to-be formulated "query algebra," which will serve as the "formal" foundation of XML. And as Hugh Darwen points out:

"Now, my eyes light up at the word "algebra" ... Originally, I understood it to mean a set of operations that are closed over some type. That is, every operation in X Algebra operates on zero or more values of type X and returns a value of type X. Hence, set algebra, Boolean algebra, relational algebra and the algebra of numbers that gives us arithmetic. Over what is the XML Query Algebra closed? Nobody has ever given me an answer that makes sense (apart from the occasional, honest 'I don't know')."

It is interesting, Chris Date points out, that in trying to post-fit a data model to XML, the first thing that had to be done was to discard the XML document as the fundamental object. What does this tell you about the whole endeavor?

Note: In fact, XML specs also refer to graph theory, but say that XML does not fully adhere to it. And for good reason: that would make XML even more complex than it is.

Champion proceeds to state "the most important advantages of the relational model [which] include:

"a) Data independence —the RDBMS folks don't care what order the rows and columns are in; most XML tools make you care about the nesting level of elements, the distinction between elements and attributes, and other crusty things that are implementation details in an RDBMS. Add a few additional columns to your RDBMS table and all your SQL code will still work ... add some elements to your XML schema, and there's a good chance that your Xpath queries or DOM algorithms will break."

Well:

Data independence (DI) is the principle whereby applications are not dependent on data management functions, which belong in the DBMS; database systems that expose applications to physical row and column order violate only physical data independence, just one kind of DI.
Champion does not say why we "R folks don't care about order," which we don't for practical reasons: such order is an artifact of data representation that does not represent anything meaningful in the real world and only complicates matters. If the principle of unity of representation underlying Codd's Information Rule (data must be represented explicitly and in only one way, as values in tables) is violated, then information implicit ("hidden") in ordering complicates manipulation and integrity, without adding any value.
Attributes are either part of the conceptual (or business) model; they are certainly not "implementation details."
The lack of flexibility and the accompanying prohibitive maintenance burden are a main practical reason for which hierarchic DBMSs were discarded decades ago. XML inventors should have come to grips with that, before they proposed their so-called "revolutionary" technology.

"b) Avoiding "pointers" (although that's a religious war in the RDBMS world) — let's face it, URLs, most XPath expressions, ID/IDREF pairs, etc. are pointers; Date has written a number of articles explaining the problems of using pointers rather than "values" to link related information (and the addition of a value-based Join operation is the main thing I like about XQuery)."

As I argued so many times, those assigning any religiosity to relational proponents have it backwards: it is us who have science behind us, and it's the other approaches to data management, XML included, that lack a scientific foundation, so it's proponents of XML, who have nothing better than some form of religion to rely on. Indeed, the whole XML bandwagon, as most other industry fads, is very religion-like: we at least justify our position. (Incidentally, Chris Date finds the statement about XQuery bizarre: join is the most glaringly obvious omission, he says).

"c) Redundancy control/referential integrity — we just plain don't have a decent story here! Or maybe our story is that we have a totally different, document-oriented notion of "integrity" that is very different from the relation model ... for example, a digitally signed XML contract/invoice/whatever stored in a database should not change when the underlying data change; that's a real-world advantage for an XML DB in some situations, just like keeping the data in normalized relations offers classical referential integrity in other situations."

What nonsense. It is quite sad that those serving as technical experts for major DBMS vendors have such poor knowledge/understanding of data management in general (see "Unstructured Thinking" and "Comments on an Interview with Jim Gray"), and of RM in particular, and are so confused in their thinking.

Referential integrity (RI) has little to do with normalization or redundancy control.

– RI is the logical representation in the database of one kind of business rule in the real world;

– Normalization avoids redundancy, which is an artifact of representation that imposes an unnecessary integrity control burden having nothing to do with the real world (see "The Dangerous Illusion" parts 1, 2).

Why Champion thinks that a relational representation cannot handle the kind of integrity he refers to (rather imprecisely), and, what is more, what this has to do with redundancy, normalization and RI, escapes me.

"But as a purely practical matter, XML databases have an awful lot going for them ... I certainly wouldn't want the job of normalizing the DocBook schema for storage in an RDBMS! On XML-DEV, Frank Richards of SoftQuad noted that figuring out how to effectively store document-oriented XML in an RDBMS "felt like programming a Turing machine." I love that analogy ... it pays homage to the formal elegance at the heart of the relational model, but recognizes the impracticality of using it for EVERYTHING. (When's the last time YOU took advantage of the Turing's math to prove that your Java program was correct?)"

I don't know what a DocBook is, but the issue is not how difficult the data is to normalize, but rather what kind of questions must be answered from the data, and what is the best way to guarantee correctness (defined as consistency).

It is the type of organization (provided by a data model) that determines that. The problem with practitioners in general, and XML proponents in particular, is that they don't really know what is, or isn't, answerable from relational or nonrelational databases, and with what consequences. The fact is that tree structures can be represented and manipulated better relationally; that SQL and its commercial implementations do a poor job of it is not a relational fault.

It is, therefore, ironic that Champion implies it is we, relational proponents, who push RM for "everything," when it's proponents of XML databases who claim superiority to relational (normalized) databases for any kind of questions, including those answerable by relational databases.

Champion exhibits the all too common logical-physical confusion. A data model and any enterprise-specific business models based on it (not to be confused, as they commonly are, with one another) are purely logical. There is nothing to prevent any DBMS, including relational ones, from storing XML data (although I would advise against it, for some obvious reasons (see "The Data Exchange Tail"). Logical representation (organization), integrity and manipulation are something else altogether, and XML DBMSs must reinvent those of the hierarchic model to do any data management. That is not only unnecessary, but also regressive. As to the "Turing math" argument, that is pure and simple nonsense which, coming from a technical professional, is quite sad.

"Likewise, as Date and Pascal are fond of pointing out, SQL is an incomplete implementation of the relational model. Some things that *theoretically* work fine in relational algebra are an absolute horror in SQL (such as finding all the subcomponents of a component down to an arbitrary depth in a "bill of materials" application). Interestingly, this is trivial with XML."

Whether this is trivial in XML or not is, at best, arguable (to be polite); and it is actually quite trivial in SQL:1999 says Chris Date. He uses an example from IBM's IMS manual to illustrate the "triviality" of hierarchic databases, which also exposed the physical level to users in applications (as XML does):

"Logically deleting a logical child prevents further access to the logical child using its logical parent. Unidirectional logical child segments are assumed to be logically deleted. A logical parent is considered logically deleted when all its logical children are physically deleted. For physically paired logical relationships, the physical child paired to the logical child must also be physically deleted before the logical parent is considered logically deleted."

While it is true that SQL is, relationally and otherwise, a poorly designed data language, the solution is not to regress back to inflexible, complex hierarchic databases, but to implement fully and correctly truly relational DBMSs (TRDBMS) and better data languages, based on a real algebra.

"My big complaint against Pascal's latest article was the assertion that XML's use of a hierarchical data model is obviously, provably, just plain WRONG: "What is more, due to their horrendous complexity and inflexibility, databases and DBMSs relying on the hierarchical model became obsolete in the 80's, at least technologically. SQL DBMSs based — albeit insufficiently and in many ways incorrectly — on the simpler relational data model, based on predicate logic and set mathematics proved superior." ... "It was not necessary to invent a new technology just for the purpose of "transmitting" structured data. But having invented one, the industry should certainly not have relied on an inferior, discredited data model to base it on."

Well, given what I stated above, what is obviously, provably, just plain wrong is Champion's misrepresentation of my article. I do not just claim that the hierarchic data model "is plain wrong." I demonstrate that RM is a sounder, simpler, more flexible, and more practical data model. Until somebody produces examples of any information that relational (not SQL!) databases cannot handle, but hierarchic databases can — and/or do it better integrity- and manipulation-wise — arguments for HDBMSs are not defensible (for how TRDBMSs would handle hierarchies see Chapter 7 in Practical Issues in Database Management.)

"I sent the editor of searchdatabase.techtarget.com a long reply to that point (based largely on a discussion on XML-DEV a couple of weeks ago), which was posted at techtarget.searchdatabase's DBA Water Cooler Forum. Mr. Pascal didn't think much of my critique, I'm astonished to say! (Just kidding, as anyone who reads dbdebunk.com knows, he does not suffer fools, i.e. anyone who doesn't have a shrine to E. F. Codd in their living room, gladly). "

Any claim I equate the lack of a "shrine to Codd" with foolishness is foolish. I will let readers judge for themselves whether much should be thought of Champion's "critique," but you can see my reply there, from which Champion quotes:

"The problem with all these guys is that they do not have any formal education or knowledge. They rely on 'commonsense' and practice and that is simply not enough. That is precisely why the state of IT is so horrendous ... There ought to be a requirement that to be published, one ought to know what he's talking about."

Darn right.

Then he states "Well, I'm going to go off and lick my wounds and beg the world's forgiveness for contributing to the horrible state of IT now ..."

And he expects to be taken seriously?

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder, editor and publisher of Database Debunkings, a web site dedicated to dispelling persistent fallacies, flaws, myths and misconceptions prevalent in the IT industry (Chris Date is a senior contributor). Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld. He is author of the contrarian columns "Against the Grain," Setting Matters Straight, and for The Journal of Conceptual Modeling. His third book, Practical Issues in Database Management, serves as text for his seminars.

Contributors : Fabian Pascal
Last modified 2005-04-12 06:21 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

No Database Champion