Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Of Interest » Articles of Interest » No Integrity: A Systemic Problem
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 2862
 

No Integrity: A Systemic Problem

by Fabian Pascal

“We have 22 universities and colleges with over 200,000 students in Dalian,” the city’s mayor, Xia Deren, told me. More than half graduate with engineering or science degrees, and even those who don’t are directed to spend a year studying Japanese or English and computer science” … The Chinese certainly want to believe it’s inevitable that they will move from basic software outsourcing to design, but even a top Chinese science planner acknowledges that it won’t be easy. Xu Kuangdi, president of the Chinese Academy of Engineering, said to me that for China to advance, “we have to build more products from our own intellectual property … But in software,” he added, that will require “improving the innovative capability of the younger generation,” which will require some big changes in China’s rigid, rote education system. Chinese officials, he said, are thinking about such changes right now. I wouldn’t bet against them.  Have your kids finished their homework?”

— Thomas Friedman, “Doing Our Homework,” New York Times

In response to our long-standing claim that relational technology is the scientific, and therefore proper foundation for data management, we sometimes get,“If this is so, how come nobody has implemented it? Why does no DBMS vendor, even as sophisticated a one as Microsoft, do it?”

The answer, of course, is lack of foundation knowledge. And we provide ample evidence in weekly quotes and content at Database Debunkings; e.g., “Unstructured Thinking,” If You Liked SQL, You’ll Love XQUERY (IBM), “Comments On a Jim Gray Interview,” “On Microsoft’s Nonsense” (Microsoft), “On Larry Ellison” (Oracle), No Database Champion (Software AG). What better proof than the fact that even though relational technology was invented by Codd at IBM, they implemented SQL (which Oracle was the first to commercialize), and  the very people who came up with SQL are now pushing XML. The same knowledge deficiency characterizes not just dominant players, but also vendors without inhibiting large installed user bases; e.g., “Object Objectivity?” (Objectivity), “On What Is a Data Model” (Lazy Software), and others (“On "Multivalue" Technology,” “On Prevayling Ignorance”).

As long as inferior nonrelational products are bought — and, therefore, sold at a profit — there is no incentive to acquire and apply foundation knowledge (“Don’t Bother Me With All That Theoretical Stuff”). This won’t change anytime soon if customers (management and technical personnel) do not demand it (“No Supply Without Demand”), because they don’t know fundamentals either. And they don’t because they are neither required nor rewarded (in fact, they are actually often punished) for such knowledge (“It’s All in the Jobs”, “On Industry’s Hiring Practices”). When tool skills are the only requirement, fundamentals will be ignored and dismissed, only product-specific information will be deemed useful, and will be accepted uncritically (“Silly Seeley”).

Dependent on advertising and sponsorships, the trade media, broadly defined, is only too glad to follow suit and focus exclusively on products. Not only online and print publications (“On Fundamentals and Trade Media,” “On Trade Media Priorities,” “On Trade Media’s Product Bias”) and book publishers (“On Database Books”), but also seminar vendors (Dumbing Down: Et Tu, Europe?), and conference organizers (“On Data Fundamentals at Conferences,” “On Speaker Selection Criteria: Something Rotten in Denmark”) reflect and reinforce this sad state of affairs to the point at which it is practically impossible to publish or teach — and therefore learn — anything on data fundamentals (“On Trade Media’s "Balance": Another One Bites the Dust”).

Any wonder that authors, consultants, experts, and pundits (“On What Writers Should Know But Don’t,” “On ‘Respected Technical Analysts’,” “More On Kimball’s ‘Dimensional Muddling’,” “Silverstone’s UDMs: Neither Universal, Nor Data Models”) and even academics (“Denormalization For Performance: Et Tu Academia?”) don’t know much about them? Indeed, academia is renouncing its educational function and reduces itself to a training platform for vendors (“The Myth of Market-based Education,” “On the Market as Substitute for Knowledge”). The result is the vicious cycle of The Ignorance Mechanism.

Some think that the hope lies with open source/public domain (OS/PD) product designers who, unfettered by commercial considerations, can and will try to “do the right thing.” We’ve been doubtful, though, because such designers operate in the same industry and business culture as commercial vendors, are the product of the same education system, and consume the same information generated by the industry and media. If the education system is corrupted by the market, there is no reason to expect that OS/PD designers will escape the vicious cycle. In fact, they often exhibit poorer foundation knowledge than their commercial counterparts.

Consider, for example, the following:

“This is semantics [sic], but I think Codd and Date in the 70s meant by a relational database something, which has a relational query language. Thus any database which has a query language somewhat similar to the relational algebra or SQL can be called a relational database …… MySQL nowadays supports transactions, logging, and crash recovery. What do you demand of a DBMS? … Referential integrity constraints can be described as a property of an ‘active’ database. The term active here because [sic] you could define them with triggers. Active databases became popular in the 80s when Sybase brought stored procedures … InnoDB type tables will have referential integrity constraints in September. It is in my to-do list for this month.”

— Heikki Tuuri, Innobase Oy

Codd’s definition of a relational DBMS (not database!) is true and full support of the relational data model, consisting of well-defined structure, integrity, and manipulation features. SQL has a query (that is, manipulation) component, but very weak integrity support, and violates several relational  principle (e.g., NULLs, duplicates, and recently, even pointers).

As we explained in Practical Database Foundations paper #4, “Un-muddling Modeling,” the sum total of integrity constraints (schema) — the DBMS-understood meaning of the data — represents formally in the database the sum total of the informal business rules in the conceptual model — the best approximation possible to the user-understood meaning. There are four types of relational constraint, referential being only one type of database constraint (that is, one that spans multiple tables). A DBMS that supports only referential constraints has very little understanding of the database, and cannot possibly manage it (that is, protect the integrity of the data and of the results derived from it). That must be undertaken by users in applications, which was the case prior to DBMS software.

Sybase did not invent some sort of “active database” — a meaningless (pun intended) concept; rather, it added some, albeit not all, of a critical DBMS component that was missing from products. Moreover, triggers are procedural and, therefore, inferior to the declarative constraints required by the relational model (“On Triggers”). Besides, we do not believe in adding such core functions as integrity post-hoc. Integrity support is limited, complex and difficult to extend in SQL precisely because it was implemented this way (a Foundations paper on integrity is forthcoming).

What is particularly troubling is that, according to his bio, Tuuri “holds a PhD degree in mathematical logic.” Since the relational model is logic and math applied to databases, if Tuuri has such a poor understanding of the model, what can we expect of the average practitioner? (Tuuri once complained that this is insulting, but I am just drawing logical conclusions from his pronouncements.)

Here’s another example:

“I have a crude understanding of … [the] concept of ‘what types are and what their function in a data model is’. Suneido does not implement these concepts, nor does it make any claims to … Suneido’s goals are pragmatic. We don’t claim to implement any standard, or to be ‘complete’ or ‘correct’ in any theoretical sense. Our only hope is that Suneido is ‘useful’ for certain applications.”

— Andrew McKinlay, Suneido

Data types are also a core component of a data model, representing property type business rules in business models (again, see paper #4), represented formally by data type constraints and underlie column constraints in the database. They are, therefore, an integral part of the DBMS-understood meaning of the database. What this means in practical terms is that because Suneido does not understand types, it allows users and applications to insert any values in the database, a serious integrity deficiency.

Note: One of the major limitations of SQL-based products is their lack of proper support of user-defined data types of arbitrary complexity. Object proponents have used this deficiency as evidence that the relational model cannot handle “complex types,” but SQL and its implementations have only themselves, not the model, to blame for it. In fact, it’s object DBMSs that are deficient, because while they support object types, they have no equivalent to relations.

McKinlay makes the same mistake prevalent in the industry, that “theory is not practical,” dismissing, in other words, the usefulness of a scientific foundation for database management. He seems to think that it is OK if a DBMS does not guarantee correctness, as long as it does not claim that it does. But it is full integrity support that makes a DBMS useful; the need for it does not go away just because the vendor does not claim such support.

MySQL is a widely used open-source product. Its declared purpose is “to provide the best possible service requires maintaining a massive database that can meet extreme performance demands, with very high reliability” (emphasis added). But is the following consistent with this claim?

“Reasons NOT to Use Foreign Keys constraints: There are so many problems with foreign key constraints that we don’t know where to start:

    • Foreign key constraints make life very complicated, because the foreign key definitions must be stored in a database and implementing them would destroy the whole "nice approach" of using files that can be moved, copied, and removed.
    • The speed impact is terrible for INSERT and UPDATE statements, and in this case almost all FOREIGN KEY constraint checks are useless because you usually insert records in the right tables in the right order, anyway.
    • There is also a need to hold locks on many more tables when updating one table, because the side effects can cascade through the entire database. It’s MUCH faster to delete records from one table first and subsequently delete them from the other tables.”

— MySQL documentation

Isn’t it fair to say that this reveals a complete lack of understanding of what a DBMS is? How can a product that intentionally fails to support integrity constraints (and, therefore, has no understanding of the data) guarantee reliability?

But then, why should MySQL bother to do any better, if practitioners are quite happy to undertake database functions in applications?

“MySQL is not a toy database — it is far superior  to many I have used in my long career. The lack  of constraints is not a weakness. It is eminently possible to create reliable applications without the need for database constraints — I should know because I have designed and built many applications that did not use database constraints (mainly because they were not available). Developers only rely on database constraints to circumvent their sloppy code. Anything that can be done within the database can also be done within application code. I have seen what happens when poor programmers try to shift logic from their code into the database — they get it wrong and then blame the database for their incompetence. I am used to designing and building applications without relying on database “features,” so I write my code accordingly. It also means that the logic is maintained in one place and not in bits and pieces here and there.”

— Tony Marston, http//groups.google.com, comp.lang.php

This is nothing but a regression to the DBMS-less, application-based data management bad old days of decades ago. Marston has it backwards: integrity is a DBMS, not application function, precisely so that constraints reside and are enforced centrally in the database, rather than redundantly and unreliably scattered in applications.


As long as the industry and business culture ignore, dismiss and suppress foundation knowledge, no progress should be expected for either commercial or open source/free products.

--

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCS, and IRS. He is founder, editor and publisher of Database Debunkings, a Web site dedicated to dispelling persistent fallacies, flaws, myths and misconceptions prevalent in the IT industry. Together with Chris Date he has recently launched the Database Foundations Series of papers. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and for The Journal of Conceptual Modeling. His third book, Practical Issues in Database Management serves as text for his seminars.


Special Offer: Author Fabian Pascal is offering DBAzine.com readers subscriptions to the Database Foundations Series of papers at a discount. To receive your discount, just let him know you’re a DBAzine reader before you subscribe! Contact information is available on the “About” page of his site.


Contributors : Fabian Pascal
Last modified 2005-11-11 01:16 PM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone