Database Changes Required for IMS 7.1 and HALDB
IMS Databases through the Years
For the past few decades, the IMS database arena has been very quiet. The last major enhancements were the addition of VSAM support in the mid-1970s and the introduction of Fast Path databases in the late 1970s. OSAM was enlarged to allow a 4 GB size limit in the 1980s and again to 8 GB in the 1990s.
The new century starts with a big bang. IMS 7.1 introduces high availability large databases (HALDB), and the analogy to a computer in a famous Arthur C. Clark novel is probably purely accidental. When I was confronted with HALDB for the first time, I considered spelling it "HELLDB" because of the amount of work that this would create.
The concept of partitioning databases is not new. Partitioning allows you to store a large amount of data in the same database. Storing large amounts of data was always a general problem, but was not acted upon too much due to high prices of disk data storage. However, the price of disk storage has dropped in the last five years. And, the growth of business, particularly ebusiness, has been dramatic over the same time frame, so there is more and more pressure to increase the amount of data kept online in databases.
Several preliminary solutions to the storage problem were invented. Some customers replicated database definitions and separated the data by application on key ranges. The first vendor solution was from NEON. While this solution allowed applications to remain unchanged, it had some limitations. HALDB is the first complete solution for partitioning. The whole idea behind partitioning is to be able to store more data and do this without changing the applications.
The question that arises is "why do we need a new database type to store more data?" Fast Path DEDBs have been available for over 20 years, but DEDBs haven't received widespread recognition. When Fast Path was introduced, it did not blend in very well with the existing infrastructure, and this gave people the idea that it was not very useful. This idea was further enforced by the fact that Fast Path did not support batch applications. We had plenty of batch applications in the past, and I consider this the major failing point of DEDBs. Don't get me wrong, I have nothing against DEDBs, but they did not get enough recognition. However, some people chose DEDBs because of those additional features like supporting 24X7 or SDEPs.
HALDB incorporates some of the best parts of DEDBs. HALDB allows you to have portions of the database offline without hurting online database access too much. This is where NEON's solution has one of its weak points- it disallows access to the entire database even when only one partition should be offline. DEDBs and NEON's partitioning also store the information on how the partitions are separated (such as key ranges) in the database description (DBD). This means that a change in the data separation is a database description change and, thus, requires complex change management.
HALDB removes this obstacle by storing the partition information in the DBRC RECON data sets. Because the partition definition data is stored in the RECON data sets, DBRC is now a required component. Actually, there is no HALDB without DBRC. I consider this to be great, since I am known as a DBRC fanatic. If DBRC were required on all databases, life would be easier and people would finally quit shooting themselves in the foot on purpose.
HALDB Database Structure
So far we have not talked about HALDB at all, so let's do that now. The good news is that if you understand HDAM and HIDAM, you will understand HALDB. If those database types are foreign to you, stop reading and get some education. Rumor has it that IMS is still being taught.
HALDB consists of a database structure definition, which is still done the old-fashioned way in a DBD. There are some smaller changes, but a segment is still a segment, and hierarchical data structures are still used as well. What has been removed from the DBD is the description on how to store those databases. However, the description has not totally been removed from the DBD because we still have the concept of data set groups and which segments will be in which data set group. How those data sets will look like is now defined by partitions. Since the structure of the data sets is defined in the DBD, all partitions have the same structure. If you define your structure with two data set groups, each partition must have two data set groups. HALDB also has some other data sets, as we will see.
The partitions and their data sets are no longer defined in the DBD. This is where DBRC comes into play; they are stored in the RECON data sets. Because we already trust DBRC as the keeper of the database status - at least I do - this is the right place to keep the information about the data storage. Think about all this wonderful advantages. For example, using DBRC for this kind of data relieves us from forgetting ACBGENs, just to name one of the notorious "shoot yourself in the foot" items.
The online IMS system knows about the partition and the entire database. However, the partitions, not the database as a whole, are authorized. This is why the whole thing works-DBRC authorizes on the partition level, and IMS schedules on the database level. If you take a partition offline, you have not affected the database and all your transactions can still be scheduled.
We have not touched indexing yet. Unfortunately, there is bad news. If you convert a HIDAM database to HALDB, you will not have a primary index DBD any more. Some people (like me) consider this good news. Because partitioning is done by key ranges, every partition has a primary index.
Secondary indexes are different because they must span the entire database. Even though secondary indexes have their own partitioning (independent of the primary database key range), they span the entire primary database. Any time you take the secondary index down for maintenance you will stop the primary database as well. If this were how HALDB worked, all improvements would be nil. That is why HALDB does not work this way. You don't need to take a secondary index down for maintenance because there is no need for maintenance. This is due to a new pointer type called EPS. Classical pointers were 4-byte data elements containing RBAs; an EPS pointer is 28 bytes. But before talking about the EPS, let's go and look at the segments first.
Don't get scared by the thought of segment changes. Nothing will happen to your data; we IMS nerds are only interested in the prefix. We consider the data as a necessary evil. In HALDB, the segment prefix has become slightly larger, and it contains a physical parent (PP) pointer all the time. In the past, the PP pointer was needed only if the database had logical relationships or secondary indexes. Even then, the PP pointer was only in those segments that absolutely needed them. Now all of the segments have a PP pointer, no matter what.
The prefix contains a new field- the ILK. It is 8 bytes and contains a unique number for that segment type in that partition. Actually, if you look at this, it contains a 4-byte RBA and two 2-byte fields. The first of those two fields probably looks familiar to you, as it reflects the partition number you are currently in. The second one is not so familiar, and it is called the "reorg number." This is a very important number, and it is probably wise to spend some time on this.
One fact of life for a segment in a database is that it never changes its stored location. IMS nerds call this the RBA of a segment. The location of a segment (its RBA) never changes, unless you reorganize the database. During a reorganization, the entire database is recreated. This means that all segments are likely to be in new places, and all references to those segments must be changed. If these references are completely inside the database, that is not a major problem. The reorganization recreates the entire database and all its cross references, unless you have references coming from outside the database, such as logical relationship pointers and secondary index pointers. These pointers happen to point into your freshly reorganized database where all segments have changed locations (RBAs). Since those references are also done by RBAs, those references are now wrong. Somehow, those references must be updated. In the old classic reorganizations, we created a WF1 data set and used the data from it to recreate the secondary indexes and to update the logical relationship pointers.
Forget about this process; HALDB does not work this way any more. There is no WF1. If there's no WF1, how do those references get updated? The answer is that they don't need to. They resolve this when needed- this is also known as "self-healing pointers."
At this time, I have to admit that I have not told the entire story yet. So how does this work? Remember that we started talking about the ILK. There is this funny number called reorg number. You will also see this in some other places in the database. The first block in a database is the bitmap block (ok, VSAM nerds, it's the second). Some of you may remember that the FSAP in this block was not an FSAP. It actually contained the DUI. This was the place where the DBRC token was kept to indicate which RECON is responsible for this database. Unfortunately, this information was never used properly (remember, I am the DBRC fanatic). So, this field was reused to contain the partition number and the reorg number.
Once you run the first reorg, the reorg number will change. What effect does this have on the ILK? All segments that you insert into the reorganized partition will have the new reorg number, all the old ones keep their old ILK. The bright guys have figured out by now that this ILK must be unloaded and passed on to the reload; how else would the reload process know about the ILK of the previous segment? Let's make a simple statement: the ILK is unique for a segment for the lifetime of that segment. No other segment of the same segment type will ever have the same ILK.
Now, that we have figured out what the ILK is, let's go back to this new pointer type called EPS. This pointer is used when a reference has to be made from outside the database, such as from a secondary index. So, the conclusion is that the secondary index pointers are now EPS pointers. Also, logical relationship pointers must be used that way. HALDB puts some limitations on logical relationships and will support only physical pairing and unidirectional relations. This way the only pointer needed is the logical parent (LP) pointer. Guess what- LPs are now EPS pointers as well.
EPS pointers are 28 bytes. I will not get into the nitty gritty details, but an EPS pointer contains the ILK of the target segment, the partition number and reorg number of the target partition at the time this pointer was created, and the RBA pointer to the target segment. By looking to that pointer, we can determine which partition to go to. Now we can determine if the partition has been reorganized since the last time the EPS pointer was created; the target partition has its current reorg number and our EPS pointer has the partition number from the time the pointer was created. If everything works out well, we can use the pointer in the EPS to go after the target segment. Everything is fine, and all participants are happy. But, if the target partition was reorganized, that pointer will not work. However, we will have an indicator for that- the reorg number in the EPS and the reorg number in the target partition do not match. So, what now?
It is now the time to introduce the last piece of HALDB. There is a new data set called the ILDS, and there is one for every partition. It is a VSAM KSDS. The ILDS uses the ILK of all segments in the partition that could be pointed to as key for a location reference. The ILDS is updated when the partition is reorganized. The ILDS contains the most recent information on how to find any segment in the partition by its ILK. Again, luck is on our side. We know the ILK of the target segment because it was stored in the EPS pointer. And, remember, the ILK of a segment never ever changes, and it is unique.
So, we are looking at our EPS pointer knowing that our RBA pointer to the target segment is no longer valid. We can go to the target partitions ILDS, look up the reference of that segment by using the ILK, and retrieve the new segment RBA. To make things nice for the second time, we now plug the new pointer in the EPS and update the reorg number to reflect the current reorg number of that partition. If you now rerun the search process, we will immediately go with the stored RBA since we discovered that it is now valid.
If you are with me so far, we can now add the next layer of complication: you want to split a partition into two. Let's assume that you started with four partitions, so your partition numbers are 1-4. You want to split partition 3. I will not try to guide you through the process on how to define and handle this, but I will tell you what the end result will be. What happens now is that partition 3 no longer exists. Two new partitions are added as partitions 5 and 6. The important thing to remember is that partition 3 is retired for good, and this partition number will not be used again. And since you can have about 32,000 partition numbers, you will probably not run out of partition numbers. So we have learned one lesson thus far: the highest partition number is not necessarily the number of partitions.
Let's get back to the example of splitting partition number 3 into partitions 5 and 6. Once this is done, all segments from partition 3 in partition 5 and 6 will not get a new ILK. If you look at the ILK, you will still recognize partition 3. Remember, the ILK does not change for a given segment. Any newly inserted segment will have partition 5 or 6 in its ILK. The initial reorg number of partition 5 and 6 will be "1" again.
What is happening to our EPS pointer? The partition number has become invalid. How do we know which partition to go to? There is only one way to resolve this. Our partitions are separated by root segment keys; therefore, the only way of finding our partition is to know the root key. But, if you are looking from a secondary index based on some segment down in the hierarchy, how do you know the root key? The only answer to that is that the secondary index needs the EPS pointer and the root segment key stored in the index record. If you already looked at some of the new DBD parameters, now you know why you have to specify the root segment key length for HALDB indexes.
But (I can hear you already) what about logical relationship? This is a lot easier than you think. The LPCK has to be there and with it your target root segment key is there. Because of all this, virtual LPCK is not available for HALDB.
Should I Convert to HALDB?
Now we know how all this works, what does it really mean? If you want to go to HALDB, you must convert your database. This means that you must reorganize the database and reload it into the new structure. If the database is involved in logical relationships, you must convert all of the related databases. And, needless to say, all your secondary indexes pointing to HALDB databases must be HALDB indexes as well.
Once you have converted the databases (hopefully with the help of some tools), you need to adapt the maintenance procedures. Remember, there is no WF1. You can remove all your JCL that uses the WF1 (like rebuilding the indexes). And don't forget to remove the IDCAMS statement for the secondary indexes. Since they are not rebuilt, you should not delete them. Because they are not rebuilt with the maintenance of your primary database, you will need to reorganize the secondary indexes separately.
Some other things will need your attention. For example, HALDB doesn't allow you to use the reorganization process to maintain your data (like deleting data). This applies to databases with indexes and logical relationships. There is no WF1 anymore, and your indexes are not recreated, so that means that there could be index references to data you just deleted. It clearly means that you have created pointer errors.
This type of process has to change. The only solution available without using ISV products is to change this process from an utility unload-reload to an application unload-reload and maintain (delete) the data in this process. You will need to write those programs and create PSBs. Then the load process using a PSB with PROCOPT=L will recreate the whole thing. And, there are some others things to consider if logically related databases are involved.
The smart guys have probably figured out what will happen to the ILK. This is easy; since the database is created, all segments are there for the first time and will get a new ILK. And what will happen to the reorg number? And if I had retired partition 3 and added 5 and 6, will this be the same? The answer is very simple. Since you did not change the partition definition in DBRC, you will have the same partitions like before. And, yes, the reorg number should go back to "1," not that this matters very much.
Now, here's the last thing to think about. Remember our case where you went from your initial four partitions to partitions 1, 2, 4, 5, 6? If you have taken a backup of all your partitions prior to the split and you have to go back to this level of data, you must reverse the DBRC process from five partitions back to four also.
So what is the verdict? I believe HALDB is a good thing. I could see that this may be the only database type in IMS that will survive over time. Remember that shared indexes were also declared as a temporary solution, and they are still around.
Will everybody go to HALDB? If you have a database size problem, HALDB is your best bet. If not, stay where you are. The work involved does not justify the changes, if there is not much to gain. And consider this- it is the first release. There are some rough edges which will get smoother over time.
Will IMS be around long enough to make the move? This is basically your call. Consider this that each database system has its advantages and disadvantages. It's time to rethink the "one size fits all" behavior. Pick the database which fits your data best; don't fit your data to the database. Hierarchical data is best suited for hierachical databases, and relational data is best suited for relational databases.
And be assured, you won't be there alone; we IMS nerds will be there with you.
Christian Koeppen is an IMS Product Architect with BMC Software.
Contributors : Christian Koeppen
Last modified 2005-08-04 08:21 AM