The Ignorance Mechanism

by Fabian Pascal

Those few of us who deplore ignorance in the industry and strive to combat it, are frequently taken to task or dismissed for claiming such ignorance in the first place e.g. “It is not possible that the whole industry is wrong and only you are right”, “the market decides what’s right”, and so on. Furthermore, when I criticize the media for its lack of sufficient knowledge to cover technology properly, I am told not to expect it from journalists, who are supposed to just “report the facts” (read: regurgitate whatever they are told), and leave the analysis/assessment to “experts” (although not the likes of me, of course, only those who “are with” the industry’s and media (fad) program (see “Another One Bites the Dust”).

A recent article, “Use XML Where RDBMSs Fear to Tread” by Lee Thé, offers yet another clear view of the mechanism that sustains and reinforces ignorance and proves our point quite nicely. The article is concerned with record keeping by medical group practices:

“Visit the typical American group medical practice and you’ll see advanced technology in use everywhere — except when it comes to patient record keeping. Here the manila folder still reigns, each one containing a sheaf of paper charts and typically handwritten notes — in doctors’ handwriting. As a consequence, medical practices expend a lot of energy maintaining and accessing patient records, and often have trouble getting the necessary information in a timely manner.”

It documents the case of a specific medical practice as an example of successful automation due, judging from the title, to the superiority of XML over relational technology.

There is a combination of three ingredients — common in the industry — that can mislead uninformed readers of trade media articles about IT projects.

A reporter unable to interact intelligently with the technology under consideration: According to the byline, “Former Visual Studio Magazine executive editor Lee Thé writes occasional technology pieces to pay for scuba diving trips for his wife and himself while he works on a science fiction trilogy.” That may explain why, despite having been a technology journalist for many years, his pronouncements on data management issues had a feel of fiction (and no science) about them.
A professional user organization with interest, but not enough knowledge in, data management, either to judge the expertise of others, or to assess the optimality of a solution in all its implications:

“… a medium-sized group cardiology practice in Suffern, New York … is probably ahead of the curve because its resident technophile, Dr. Michael Muschel, is also its managing director. Dr. Muschel set out to implement his solution with the belief that an automated system could increase productivity for both physicians and staff, reduce expenses, increase accessibility to patient records, provide efficiencies to the operational workflows, enhance support for malpractice risk management and regulatory audits, and uncover new sources of revenue.”

A consultant/developer chosen on the basis of personal relationship, rather than data management expertise adequate to the task:

“To help him explore the options, Dr. Muschel sought out an acquaintance. He asked Neal Lipschitz, a principle at software development house Woodcrest Solutions, if Woodcrest could provide a cost-effective solution that synergized with Muschel’s group’s varied methods of practicing medicine and its operational workflow.”

Under these circumstances, it is hardly surprising that we had difficulty even understanding the solution described in the article, let alone ascertaining the validity of the claims made for it (when I sent Chris Date a copy of the article, he declined comment, declaring it hopeless). That is only in part due to the fact that much of it involves application development aspects that are irrelevant to the data management argument stated in the title; the main problem was lack of clarity and succinctness; confusion; use of fuzzy, questionable, or incorrect terminology; and so on; all rooted in lack of the necessary foundation knowledge. What is obvious, though, is that the article not only fails to support its title, but there is actually a good chance that it contradicts it!

A major concern of the medical practice was that

“Every doctor likes to keep records her or his own way. However, a software solution that provides doctors with customized recording options likely requires a degree of customization no small business or workgroup can afford. Although electronic medical record (EMR) products exist, they’re old school: They’re costly, rigid, monolithically coded applications that force doctors to abandon their personal record-keeping styles for that of the given EMR. Dr. Muschel knew these limitations would make any such product a nonstarter with his staff.”

Now, it is often the case that a proprietary application is not the solution. But if there is, indeed, real — as distinct from just apparent — cross-physician variation in the data used, (and not only in how it is used, which is not the same thing!), then whether developing a solution from scratch is more affordable than customizing an existing product is, at best, an empirical issue, requiring analysis based on data management expertise that a physician, technophile or not, is not likely to possess. Proper expertise, therefore, is critical. Did Woodcrest possess such?

“Right off the bat, they could see the underlying problem with current canned products: They all relied on relational database management systems (RDMS), and plain RDBMSs weren’t built for this kind of business situation. Relational data models are inherently inflexible. To store information using the relational paradigm, you need to build the entity/relationship model first. Such a model is fixed, and if you need to add new information, you must design and code a new model.

An RDBMS-aholic might propose trying to derive the superset of all the information and processes used by all the doctors, then implement only the fields/tables needed by leaving all the fields not used blank (read “null”). But this method is impracticable and unrealistic. Alternatively, you could buy a number of available products that allow you to enhance the relational model, but they’re costly and impose serious performance hits.”

There are several indications that Woodcrest simply did not possess necessary foundation knowledge and skills to make sound database decisions.

Common failure to distinguish between relational technology and SQL. The fact is that databases and DBMSs in general were invented to address precisely “this kind of business situation,” unlike XML, which was not. By representing data in an application-neutral way in the database, and providing different views of the data to different users/applications, DBMSs — relational ones in particular — achieve the desirable flexibility via logical data independence. An argument that commercial SQL-based products, by violating the relational model, are not as effective as truly relational products, would have been more defendable, but note very carefully that even in that case any alternative solution would have to be proven superior. There is no evidence to that effect in the article, quite the contrary (see below).
Confusion of levels of representation. There is only one relational data model. What the article refers to in the plural are logical models — business models mapped to the database using the relational model as “paradigm” (see below). Whether Woodcrest realize it or not, whether they like it or not, data management is structure, integrity, and manipulation by definition, and these are exactly the components of a data model. There cannot be data management without a data model and, therefore, any product/technology purporting to do data management, whether relational or not, requires business (ER) modeling (structuring) and mapping to logical models (see Database Foundations paper #4,“Un-muddling Modeling”). It so happens that, for various reasons, the relational structure is the most cost-effective for integrity enforcement and manipulation. The notion that modeling can be avoided by not using a relational DBMS, and that business or logical models are “fixed” and must be “redone” when new information is added, is absurd and betrays lack of understanding of most basic data management concepts.
Equally absurd is one reason given for ruling out a SQL DBMS. It is true that SQL NULLs were a big mistake that should have never been implemented. In a true RDBMS they should not, and would not, exist. But it is simply not true that cross-physician data variations would involve tables with NULLs in SQL databases. Having vehemently rejected NULLs in general, and demonstrated that they are problematic, we relational proponents showed they should and can be avoided via correct design (see chapter 10 in Practical Issues in Database Management). The type of NULL referred to in the article in particular — inapplicable, as distinct from unknown — is decidedly a red herring: such NULLs are an artifact of poor design (see also chapter 6 in my book), suggesting that Woodcrest does not master correct database design principle.
The object-oriented perspective pertains to programming, has little to do with data management, and fails to recognize the important distinction between applications and DBMSs. Object-oriented (OO) programmers without database education want to handle everything in applications, a regression to the bad old days preceding DBMSs (see Oh, Oh, Not OO Again, “OO for Application Development, Not Data Management”). Hence the pejorative term, “RDBMS-aholic” and the infamous logical-physical confusion implicit in the misuse of databases as sheer “persistence storage.”

Note: Woodcrest chose MySQL, one of the worst SQL options (see, “MySQL and Innobase: Are They DBMSs, Let Alone Relational?,” “On DBMS Builders”).

The notion of “products improving the relational model” is baloney. There is neither a need for such improvements and, therefore, nor products that can do it. The relational model is simply logic applied to data management and improving logic is a rather tall order for anybody, let alone product vendors. Anybody involved in data management should know enough to be suspicious of, let alone promote, solutions that ignore logic. What is really needed is a genuine implementation of the relational model along the lines of The Third Manifesto, using the TransRelational™ implementation model.

Ruling out SQL DBMSs just because one doesn’t know proper design, or has no database expertise, is hardly good practice, but neither the technophile physician nor the reporter knew enough to detect it.

The section, “Surpass RDBMSs With XML” contains more evidence to buttress our suspicions.

“Lipschitz and Crocetti believed they had a better solution: They adapted an XML engine built originally for a stock-trading environment for derivative products. This engine combined XML with the relational paradigm, as many others have done, but went a step further and used XML for data representation and the relational model for persistence support. XML is often employed in the form of simple XML documents. However, in this case, the team extended the XML data representation to include XML/HTTP requests, XML Data Query and Sorting, and more.

The XML engine employs a service-oriented architecture, which transforms each user HTTP request into an XML request. A manager component captures the request and executes the business logic corresponding to the event received. The set of core components and XML services constitute the business logic layer, which the development team implemented with a factory pattern. This way, they can enhance and extend the functionality by plugging in new or modified components without recompiling the system. They don’t even have to shut it down.”

This is a good example of lack of clarity, confusion, and imprecise use of terms that impede understanding. What exactly does “combined XML with the relational paradigm,” or “extended the XML data representation to include XML/HTTP requests,” mean? This is the jargon regularly used in the media and industry for mainly promotional purposes, to impress the uninformed reader, and to obscure the lack of substantive knowledge on the subject. No knowledgeable data management specialist would express matters this way and no knowledgeable reporter would repeat it.

It appears that decisions were made for expediency, rather then based on sound data management considerations. As I argued in “To a Hammer, Everything Looks Like Nails,” information management decisions being made by developers is akin to architecture being done by building contractors. It tends to produce solutions based on what the developer knows, rather than optimal suitability to task. That is why we constantly beat the drums of data fundamentals: those who don’t possess such knowledge will have little to base their decisions on other than the products they already know, whether they were intended for the specific task or not. Apparently, Woodcrest chose XML not because it was the right solution — although they probably presented, and even believed it as such — but rather because they already used it in another context and it was expedient to extend it.

Even if we are generous more than is warranted in our interpretation of the solution, we still have problems with it.

Programmers with OO background, but little or no database education (as distinct from training) fail to distinguish between DBMS and applications and use databases — if at all — as a sheer physical data store (“persistence support” in OO parlance) which is, of course, an unproductive, problematic misuse.
The correct term would not be “representation” — that’s a DBMS function — but presentation, which is in the application domain. It appears that XML is used in some fashion at the application, not database level. We have little to say about that, but if that were the case

it certainly says nothing about XML superiority over RDBMS;
how, then, does XML Data Query — a database function — enter the picture, and what are we to make of the following?

“Crocetti architected the engine to offer a flexible method for implementing any data structure with XML (instead of a relational model). The result was an XML engine that was able to handle a wide variety of structured data and map it transparently using underlying relational support — a key requirement to be able to reuse this XML engine in different applications, such as Dr. Muschel’s. The engine had to support creating and manipulating any set of information without requiring additional development effort.”

First, it is simply not true that XML supports any data structure. To the extent that XML is used for data management — for which it was not originally intended — the only data structure it understands is the hierarchic one — the tree — already discredited as inferior and more complex than the relational structure for integrity and manipulation decades ago (see The Data Exchange Tail, The XML Bug). The “wide variety of structured data” can only be interpreted to mean that text or images can be embedded in XML files by using XML tags. However, that is not structured data in the data management sense: the system does not understand what it means and cannot derive information from such data. What is more, there is nothing to prevent RDBMSs to provide such capability and, in fact, SQL products do (e.g., BLOBs).

Second, if mapping to the “underlying relational support” is necessary anyway, how exactly is the “undesirable modeling” required by relational technology avoided? In fact, aside from the complexity of the hierarchic model itself, the problem with employing two data models when only one would do is not only that it makes everything more complex — two sets of integrity and manipulation facilities, two data languages, and so on — but also imposes the burden of mapping between the two, which would otherwise be unnecessary (see Database Foundations papers #1, “What First Normal Form Really Means” and #2, “What First Normal Form Means Not”). However transparent the mapping to end-users, somebody’s gotta do it, and for no good reason. It’s precisely by ignoring these aspects that proponents of XML and other nonrelational data management solutions can claim superiority.

The rest of the article is even more difficult to understand, but still does not provide evidence for the stated claims. We give up. We do, however, have some comments about the media.

Thé refers to “the typical group medical practice,” but offers no evidence that the case chosen is, indeed, typical, nor does he say how and why he chose it. In reality, cases are selected based on various obscured relations/interests/preferences of the publication, reporter, user company, vendor(s) and consultant(s) involved; often it is just a matter of convenience, or access. There is little reason to assume that cases are representative, let alone significant, or worthy of serving as examples.

I often offer evidence in my writings that what editors usually prefer — almost exclusively — are pieces about commercial technologies or products pushed by one or more vendors (preferably large ones with big advertising budgets). Whether salaried or freelancer, to be successful — by which I mean being published and getting paid — writers internalize what they sense editors want and focus on that. Thus, while reporters insist that they feel no pressure — even though it occurs and I can personally vouch for that — there is no need for it. In the case of the article under consideration, .NET, a Microsoft technology, and XML, pushed by everybody, fit the bill and will pay for scuba diving.

The trade media usually rationalizes this practice by claiming, “what big vendors such as Microsoft do is important and, therefore, must be covered.” This is self-serving, but it would be acceptable if the coverage were analytical and informed, rather than superficial, ignorant of the subject matter, and regurgitation without scrutiny. What gets published is driven not by efforts to understand and assess technology and products — for which knowledge of the subject matter is required — but rather to provide visibility, preferably positive.

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for 20 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on data management technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCS, and IRS. He is founder, editor and publisher of Database Debunkings, a Web site dedicated to dispelling persistent fallacies, flaws, myths and misconceptions prevalent in the IT industry. Together with Chris Date he has recently launched the Database Foundations Series of papers. Author of three books, he has published extensively in most trade publications, including DM Review, Database Programming and Design, DBMS, Byte, Infoworld and Computerworld. He is author of the contrarian columns Against the Grain, Setting Matters Straight, and for The Journal of Conceptual Modeling. His third book, Practical Issues in Database Management serves as text for his seminars.

Special Offer: Author Fabian Pascal is offering DBAzine.com readers subscriptions to the Database Foundations Series of papers at a discount. To receive your discount, just let him know you’re a DBAzine reader before you subscribe! Contact information is available on the “About” page of his site.

Contributors : Fabian Pascal
Last modified 2005-04-12 06:21 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

The Ignorance Mechanism