# DBAzine.com

##### Personal tools
You are here: Home » Catching the Next Bus
Seeking new owner for this high-traffic DBAzine.com site.
##### Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]

# Catching the Next Bus

by Joe Celko

This article will discuss a classic logic-and-probability problem. Suppose a man randomly runs to the bus station and catches the next bus leaving the station. Assume there are only two bus routes in this town, and call them “A” and “B.” The schedule for each route spaces the bus trips one hour apart. Your first thought may be that the random traveler would spend approximately equal time on both routes. But the truth is that he spends far more time on route “A” than on route “B.” Why?

The “A” bus leaves on the hour, and the “B” bus leaves at five minutes after the hour. In order to catch the “B” bus, the traveler has to be in the station between the hour and five after the hour. The rest of time, he will sit and wait for the “A” bus to arrive on the hour.

The problem of finding the next bus leaving the station is a fairly easy bit of SQL. Here is a simple table for an imaginary bus line schedule. It gives the route number and the departure and arrival times for one day, without regard to the departure or destination points:

```CREATE TABLE Schedule
(route_nbr INTEGER NOT NULL,
depart_time TIMESTAMP NOT NULL,
arrive_time TIMESTAMP NOT NULL,
CHECK (depart_time < arrive_time),
PRIMARY KEY (route_nbr, depart_time));

INSERT INTO Schedule VALUES
(3, '2006-02-09 10:00', '2006-02-09 14:00'),
(4, '2006-02-09 16:00', '2006-02-09 17:00'),
(5, '2006-02-09 18:00', '2006-02-09 19:00'),
(6, '2006-02-09 20:00', '2006-02-09 21:00'),
(7, '2006-02-09 11:00', '2006-02-09 13:00'),
(8, '2006-02-09 15:00', '2006-02-09 16:00'),
(9, '2006-02-09 18:00', '2006-02-09 20:00');```

If you want to catch the next bus at “2006-02-09 15:30,” you should return (route_nbr = 4). If the time is “2006-02-09 16:30,” you should return route numbers 4 and 9, since both of them will depart at the same time.

Okay, how can we solve the problem? The computational way to do so is to find the departure times that occur on, or after the time you arrived at the station:

```SELECT E1.event_id, E1.depart_time, E1.arrive_time
FROM Events AS E1
WHERE E1.depart_time
= (SELECT MIN(E2.depart_time)
FROM Events AS E2
WHERE :my_time <= E2.depart_time);```

This will work fine because the primary key should give you fast access to the departure time column for computing the MIN().

The next question that comes to mind is, can we find a way to do this without computations?

Let’s try adding a column for the waiting period before the departure time in a given day. Route 3 is the first bus out, so if you get to the station any time between midnight and 10:00 Hrs on February 9, that is the next bus out. If you get to the station between 10:00 and 11:00 Hrs, you will take Route 7 and so forth.

```CREATE TABLE Schedule
(route_nbr INTEGER NOT NULL,
wait_time TIMESTAMP NOT NULL,
depart_time TIMESTAMP NOT NULL,
arrive_time TIMESTAMP NOT NULL,
CHECK (depart_time < arrive_time),
PRIMARY KEY (route_nbr, depart_time));```

Now the table looks like this:

```INSERT INTO Schedule VALUES
(3, '2006-02-09 00:00', '2006-02-09 10:00', '2006-02-09 14:00'),
(7, '2006-02-09 10:00', '2006-02-09 11:00', '2006-02-09 13:00'),
(4, '2006-02-09 15:00', '2006-02-09 16:00', '2006-02-09 17:00'),
(5, '2006-02-09 16:00', '2006-02-09 18:00', '2006-02-09 19:00'),
(6, '2006-02-09 18:00', '2006-02-09 20:00', '2006-02-09 21:00'),
(8, '2006-02-09 11:00', ‘2006-02-09 15:00', '2006-02-09 16:00'),
(9, '2006-02-09 16:00', '2006-02-09 18:00', '2006-02-09 20:00');```

The query can now be done without a subquery:

```SELECT E1.event_id, E1.depart_time, E1.arrive_time
FROM Events AS E1
WHERE :my_time BETWEEN E1.wait_time AND depart_time;```

Will this run better? Well, it will require an extra timestamp for each row in the schedule. That means (365 days per year * (n) routes in the system * size of a timestamp) extra storage from the first version of the table; this is not that bad. In the case of a bus schedule, you can assume that it will remain constant for at least several months. The real question is, how much trouble is computing the wait time? It will be about the same as running the first query once and using the data as part of an UPDATE or INSERT.

It only takes a little adjusting of your own mindset to start seeing table-driven answers for database problems.

--

Joe Celko was a member of the ANSI X3H2 Database Standards Committee and helped write the SQL-92 standards. He is the author of over 750 magazine columns and seven books, the best known of which is SQL for Smarties, now in its third edition. For more articles, puzzles, resources, and more, go to www.celko.com.

Contributors : Joe Celko