The "Future of DBMS" - Part 2

by Fabian Pascal

Part 1 | Part 2

This is the second and concluding part of my comments (which are prefixed with FP) on an exchange on The Future of RDBMS in a section so named at Devshed).

Binky: I see your point about OO databases, SQL is easy to learn and can be put together to form very complicated statements in one line that conventional programming would take lots of lines to do. One problem though is that SQL does tease people into doing long lined queries that when you go back to them are difficult to follow - they work, yes, but you can't break them up. Yes you can do transactions, but people don't always do use them when they should.

I'm talking from a PHP point of view, dealing with an SQL database when I'm thinking of blobbing some XML. As everything would be processed afterwards then some extra XML processing wouldn't matter. It would also mean that different entries could have different fields (as it were) so the data setup is not so set in stone. This is essentially an RDB though with an added twist.

I think that with the world looking towards less complicated programming procedures then OODBs will not replace SQL driven RDBs. Fourth generation language superseded by third, that's a backwards step in my book.

FP: Another excellent example of the damage to database practice done by seeing everything solely via the perspective of products and industry technologies such as PHP and XML, without any awareness of the fundamentals (see my editorials at Database Debunkings and Date's "Why Is It Important to Think Clearly?" in Relational Database Writings 1994-1997)).

Andnaess: Of course, Date argues that SQL is an ad-hoc solution and completely useless, and that it should be replaced by something much better.

Have you ever done any relational algebra? If you have, writing long SQL statements is very easy, even if they grow very big. SQL doesn't tease people into doing anything. You query for the data you want, nothing more and nothing less.

Transactions are not meant to make queries easier to read, transactions are something you use to ensure data integrity. You don't really mean to say that people should use transactions to make queries easier to read do you?

FP:

"Completely useless" is perhaps a too strong characterization. Date does better than that: in The Third Manifesto: he and Darwen propose and outline Tutorial D, a truly relational data language that does not suffer from SQL's deficiencies;
SQL can be nastily procedural, complex, and hard to optimize, particularly if databases are undernormalized, a majority of which are (see chapter on normalization in Practical Issues in Database Management). Knowledge of relational algebra helps, but SQL has many flaws that even that cannot solve.

Binky: Well, I must be doing it wrong then ... So if I have a SELECT nested inside a SELECT nested inside another SELECT, I should do it like that instead of taking the information from SELECT one into a variable then using that ... oh, it's not SQL, is it? It's programming, maybe I should be doing OO databases.

Transactions, though ahh, now ... erm let me see ... Right, an example. If I want to update lots of tables then instead of joining them all in a massive long SQL statement, why not use a lot of single updates inside a transaction?

FP: this speaks for itself, no comments are necessary (again, see "Why It Is Important to Think Clearly").

Rycamor: No one ever said that SQL should be used entirely without programming. And yes, while SQL has its problems, it is the best we have at the moment.

If you have lots of SELECTs nested inside other SELECTs, then there is probably another, better way to get at your data. Re-examine your queries, or even your data structure itself. See some further discussion on trees and SQL here.

I personally am not the biggest fan of tree structures. I know that they are sometimes useful, but I don't see why programmers consider them so intuitive. For example, I think it is a pain to constantly navigate through a file heirarchy in Windows Explorer. Yes, that's why shortcuts were invented, but that is just a band-aid solution that leads to a proliferation of unorganized icons on your desktop. So IMHO, nested tree views of information definitely have their place, but when information becomes endlessly nested, I question its usefulness to the user. Maybe the structure of what is being presented to the user needs to be rethought. I rarely see the use for more than three to five levels.

And trees are not always that great for mapping to the "real world." Actual trees have either branches or end nodes (leaves). But the forking of each branch doesn't really constitute a node, with it's own information (leaf). (Not to mention, the tree's "root" is not singular, but composed of many branchings in the other direction). And the analogy often gets worse when trying to deal with more complex real-world issues. In the real world, things have a way of re-connecting back into the structure in many ways, which can't be represented by a tree, and which deftly "skip" many levels of a hierarchical model. (Secretary is pretty low in the company hierarchy, but often can get you in to talk to the boss without going through all the other pointy-heads).

There's nothing wrong with using a programming language to loop through multiple result sets, if it accomplishes your purpose. For example, if you want a multidimensional array, you won't get that from any one SQL query, so there is no reason not to use a programming language to extract that into a multidimensional array and work with it. I left a (very) simple example of this in the code snippets at SourceForge.

But the question the relational data experts would tell you to ask yourself is, "why do I need that multidimensional array?" Maybe there is another way. In the end, people do process information "flatly." When you click on different folders in the Explorer left hand pane, to open up this or that folder, you are still reading the folder names as a flat list, in actuality, even if you read from the root level down into a lower directory. You can't read more than one thing at a time. Queries accomplish the same thing.

FP:

I generally agree that a proliferation of sub-SELECTs may suggest problems elsewhere. However, the subquery feature itself is a SQL problem, which could have been prevented. Codd, while at IBM, warned that the way in which this feature was implemented would make SQL a redundant language -- a query can be expressed in more than one way -- causing both usability and performance/optimization problems. In 1987, I published the results of comparing five SQL PC DBMSs on the performance of one query expressed seven ways: the execution times ranged from 2 seconds to 2500 seconds!
The fact of the matter is that the hierarchic structure does not occur in the real world as frequently as proponents of the hierarchic approach would have you believe. Consequently, hierarchic structures must be forced on data to represent realities that are not inherently hierarchic in character. On the other hand, the relational model can be used to represent both hierarchic and non-hierarchic realities -- that is, it is more general -- and do so with much more simplicity for users and product optimizability for vendors.

Andnaess: Re: "Well I must be doing it wrong then ..." : I guess all you need is a very rudimentary preprocessor. Why not write one?

Re: "Transactions, though ahh, now...erm let me see ..." :Forgive my ignorance, but what DBMS allows you to update several tables by joining them? You have to use a lot of single updates, and if the constraints are such, do them inside a transaction.

MattR: Transact-SQL supports updating JOINed tables (although I think it is limited to updating a single table at once):

UPDATE sometable
   SET is_active = 0
  FROM sometable,
       inactive_parts
 WHERE inactive_parts.id = sometable.id
   AND inactive_parts.type = "brake_pads"

You can also use subqueries and such instead of a join; however, it still is that you can only update a single table. Updating multiple tables would be a nightmare although they are looking at adding it. I think maybe (in relation to transactions) he was thinking something like this:

BEGIN TRAN
  -- Update bob's entries
  UPDATE sometable
     SET owner = "Bob"
   WHERE owner = (SELECT owner_name
                  FROM owner
                  WHERE prev_owner = "Joe")
  -- Now do other updates
COMMIT TRAN

FP: The concept of updating a "JOINed single table at once" is a rather odd expression. In the first example, the JOIN is irrelevant -- only one table is updatable, period. And the second example does not involve any JOINs. The important issue is updatability of multi-table views, whether join-, or otherwise. SQL and its commercial implementations do not allow the updating of such views because SQL, by failing to adhere to relational principles and support constraint inheritance in general, and key inheritance in particular, makes it impossible for a DBMS to figure out how to propagate view updates to the underlying base tables correctly (see the integrity chapter in Practical Issues in Database Management). The vendors have simply chosen to disallow such updates, to prevent corruption. This of, course, robs SQL technology of one of the most important practical benefits from relational technology: logical independence, forcing appliactions to update the base tables directly and, thus, imposing a maintenance burden if those tables change.

(For how a true RDBMS would update multitable views watch the DATABASE DEBUNKINGS space for some forthcoming articles by Chris Date.)

Tgrignon: I wholeheartedly agree with the vein of this posting in regard to OO databases. Let's look at this another way: what are the benefits of doing OO-programming? If you knock out the intellectual or dork prestige factor of being a gOOrOO, you've got the big advantage of modularity. Sure there are other advantages but the big one for me is the Lego block approach to programming (and I've always loved Lego).

SQL, flawed as it is, is relatively standardized. I can take SQL scripts from an Oracle or Postgre or MySQL site and use the logic with any SQL-capable database [sic]. Sure there are a few syntax things (that's how we make the big ... well that's how we make bucks anyway, right?), but by and large SQL99 compliant code is a standard that cuts through a lot of boundaries. I can even give users a basic SQL window to play with, I wonder how much unnecessary coding in an OOdb that would take!

FP: OO programming is exactly right -- see OO for Application Development, Not Database Management in the Against the Grain series. Program modularity is a good thing, but has nothing to dao with data management per se.

Fabian Pascal has a national and international reputation as an independent technology analyst, consultant, author and lecturer specializing in data management. He was affiliated with Codd & Date and for more than 15 years held various analytical and management positions in the private and public sectors, has taught and lectured at the business and academic levels, and advised vendor and user organizations on database technology, strategy and implementation. Clients include IBM, Census Bureau, CIA, Apple, Borland, Cognos, UCSF, IRS. He is founder and editor of Database Debunkings, a web site dedicated to dispelling prevailing fallacies and misconceptions in the database industry, where C.J. Date is a senior contributor. He has contributed extensively to most trade publications, including Database Programming and Design, DBMS, DataBased Advisor, Byte, Infoworld, and Computerworld and is author of the contrarian column Against the Grain. His third book, Practical Issues in Database Management - a Guide for the Thinking Practitioner, serves as text for a seminar bearing the same name.

Contributors : Fabian Pascal
Last modified 2006-01-04 02:03 PM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

The "Future of DBMS" - Part 2