Web Databases

by Joe Celko

An American thinks that 100 years is a long time; a European thinks that 100 miles is a long trip. How you see the world is relative to your environment and your experience. We are starting to see the same thing happen in databases, too.

The first fight has long since been over and SQL won the battle for a standard database language. However, if you look at the actual figures, only 12 percent of the world's data is in SQL databases. If a few weeks is supposed to be an "Internet Year," then why is it taking so long to convert legacy data to SQL? The simple truth is that you could probably pick any legacy system and move its data to SQL in a week or less. The trouble is that it would require years, maybe decades, to convert the legacy applications code to a language that could use the SQL database. This is not a good way to run a business.

The trend over the past several years is to do new work with an SQL product, and try to interface to the legacy systems for any needed data until you can kill the old system. There are any number of products that will make an IMS, IDMS, TOTAL, or flat file system look like a set of SQL tables (note to younger readers: if you do not know what those products are, look around your shop and ask the programmer who is still using a slide ruler instead of a calculator).

We were comfortable with this situation. In most business reporting programs, you write a preamble to set up the report, a loop that goes over a cursor, and a post-amble to do the house cleaning. The hard part is getting the query in the cursor just right. What you want is to make the result set from the query look as if it were a very simple sequential file that had all the data required, already sorted in the right order for the report.

Years ago, a co-worker of mine defined the Law of Conservation of Difficulty. Every system has a minimum degree of difficulty, and you cannot put out less effort than is required to overcome that degree of difficulty to solve the problem. You can put out more effort, to be sure, but never less effort. What SQL did was sweep all the difficulty out of the host language and concentrate it in the queries. This situation was fine, and life was good. Then along came the Internet. There are a lot of other trends that are changing the way we look at databases — data warehouses, small machine databases, non-traditional data, and so on — but let's start with the Internet databases first.

Application database builders think that handling 1000 users at one time is scalability; Web database builders think that a Terabyte is a large database.

In a mainframe or client-server database shop, you know in advance the maximum number of terminals or workstations can be attached to your database. And if you don't like that number, you can disconnect some of them until you are finished doing batch processing jobs.

The short-term fear in a mainframe or client-server database shop is of ad hoc queries that can exclude the rest of the company from the database. The long-term fear is that the database will outgrow the software or the hardware or both before you can do an upgrade.

In a Web database shop, you know in advance what result sets you will be returning to users. If a user is currently on a particular page, then he can only go to the previous page, or one of a (small) set of following pages. It is an old-fashioned tree structure for navigation. When the user does a search, you have control over the complexity of this search. For example, if I get to a Web site that sells antique comic books, I will enter the Web site at the home page 99.98 percent of the time instead of going directly to another page. If I want to look for a particular comic book, I will fill out a search form that forces me to search on certain criteria — I cannot look for "any issue of Donald Duck with a lot of Green on the Cover" on my own if cover colors are not one of the search criteria.

What the Web database fears is a burst of users all at once. There is not really a maximum number of PCs that can be attached to your database. In Larry Niven's science fiction novels, there are cheap teleportation booths all over the planet. You step inside one, put in your credit card, dial the number of your destination and suddenly you are in a receiving booth at your destination. The trouble is that when something interesting happens and it appears on the worldwide television system, you get "flash crowds" — all the people in the world who like to look at car wrecks show up in one place all at once.

If you get too many users trying to get to your Web site at once, the Web server crashes. This is exactly what happened to the Encyclopedia Britannica Web site the first day that they offered free access.

I must point out that virtually every public library on Earth has an encyclopedia set. Yet, you have never seen a crowd form around the reference books and bring the library to a complete halt. Much as I like the Encyclopedia Britannica, they never understood the Web. They first tried to ignore it, then they tried to sell a subscription service, then when they finally decided to make a living off of advertising, they underestimated the demand.

Another difference between an application database and a Web database is that an application database is not altered very often. Once you know the workloads, the indexes are seldom changed, and the tables are not altered very much.

In a Web database, you might suddenly find that one part of the database is all that anyone wants to see. If my Web-enabled comic book shop gets a copy of SUPERMAN #1, puts the cover on the Web, and gets listed as the "Hot Spot of the Day" on Yahoo! or another major search engine, then that one page will get a huge increase in hits.

Another major difference is that the Internet has no SQL-style transaction model. Once a user is connected to an SQL database, the system knows who he is, his privileges, and a history of his session.

The Web site has to confirm who you are with every action you take and has no concept of your identity or history. It is like a bank teller with brain damage who has to ask for your account number and identification for each check you deposit, even though you are standing in front of them. Cookies are a partial answer. These are small files with some identification data in them that can be sent to the Web site along with each request. In effect, you have put your identification documents in a plastic holder around your neck for the bank teller to read each time. The bad news is that a cookie can be read by virtually anyone else and copied, so it is not very secure.

Right now, we do not have a single consistent model for Web databases. What we are doing is putting a SQL database on the back end, a Web site tool on the front end, and then doing all kinds of things in the middle to make them work together. I am not sure where we will sweep the Difficulty this time, either.

Joe Celko was a member of the ANSI X3H2 Database Standards Committee and helped write the SQL-92 standards. He is the author of over 450 magazine columns and four books, the best known of which is SQL for Smarties (Morgan-Kaufmann Publishers, 1999). He is the Vice President of RDBMS at North Face Learning in Salt Lake City.

Contributors : Joe Celko
Last modified 2005-04-20 10:18 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

Web Databases