Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » DB2 » DB2 Distributed (UDB) Articles Archive » Picture This: BLOBs
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 1984
 

Picture This: BLOBs

by Trudy Pelzer

A Binary Large OBject (BLOB) is nothing new. Any old-timer can recall dBASE’s infamous “memo” fields (circa 1988), the introduction of Oracle’s LONG and LONG RAW data types (c. 1987), the first use of VARGRAPHIC fields in IBM DB2 (c. 1984), and Microsoft’s first use of Sybase SQL Server, which included TEXT and IMAGE data types (c. 1988). But BLOBs, in the 2003 model, are new and improved since the days of yore.

I propose to look at three DBMSs — IBM DB2 v8.1, Microsoft SQL Server 2000, and Oracle9i — and see how their BLOB implementations work these days. From what I’ve seen, it’s been a while since anyone did a general survey like this. So I’ll try to cover all the interesting peculiarities: syntax, storage, and locators.

Scope

A “binary” data type is a data type that contains strings of bytes (so it’s like CHAR), but no associated character set and collation (so it’s not like CHAR). Among DBMSs there are various names for binary data types, such as IBM’s CHAR ... FOR BIT DATA, Informix’s BYTE, Microsoft SQL Server’s and Sybase ASE’s IMAGE, MySQL’s LONGBLOB, Oracle’s LONG RAW, and so on. The SQL Standard defines “binary data” as “a sequence of octets that does not have either a character set or collation associated with it.”

A “large” data type is a data type that contains long strings of bytes (so it’s like either a BINARY or a CHAR data type, but grosser). There is no consensus on how long the string must be to qualify, and sometimes it’s not long at all, as in MySQL’s curiously named TINYBLOB data type. But most people would acknowledge that “large” means at least “larger than the maximum size of a CHAR” -- and in many DBMSs a CHAR’s maximum size is 255 or fewer (see Table 1) or is the size of a page (see Table 2).

DBMS CHAR MAXIMUM
Access 255 characters
DB2 for OS/390 254 characters
DB2 7.2 254 bytes 
MySQL 3.23 255 bytes
Oracle7 254 bytes
SQL Server 6.x 255 bytes
Sybase 12.1 255 characters

Table 1: Small CHAR data type Maximum.

DBMS CHAR MAXIMUM
Ingres II 2.5 2 KB (default page size 2 KB)
Oracle8 2 KB (page size can be 2 KB)
SQL Server 2000 8 KB (default page size 8 KB)
Sybase 12.5 16 KB (page size can be 16 KB)

Table 2: Page-size CHAR data type maximum.

If you combine a “large” and a “binary” data type, you have a BLOB. If you combine a “large” and a “char” data type, you have either a CLOB (Character Large OBject) or an NCLOB (National Character Large OBject). Since the “large char” data types are similar to the “large binary” data types in many ways, I’ll touch on them briefly as well. The generic term that encompasses all variants is LOB (Large OBject).

Table 3 shows the LOB data types required by the SQL Standard, and currently supported by DB2, Oracle, and SQL Server. Table 4 shows the theoretical maximum sizes of LOBs(using the same nomenclature for all implementations, including SQL Server).

ANSI SQL NAME DB2 ORACLE SQL SERVER
BLOB BLOB BLOB(old types: LONG, LONG RAW) IMAGE
CLOB CLOB CLOB TEXT
NCLOB N/A NCLOB NTEXT

Table 3: LOB data types.

Notes about Table 3:

      • DB2 doesn’t support an NCLOB data type, but does support DBCLOB for character strings that use a double-byte character set.
      • The old Oracle LONG and LONG RAW data types are not the same as BLOBs, and will probably be phased out in the next version. NCLOB is for Unicode data, while CLOB is for non-Unicode data.
      • SQL Server’s NTEXT is for Unicode data, while TEXT is for non-Unicode data.

DATA TYPE DB2 ORACLE SQL SERVER
BLOB 2 GB 4 GB 2 GB
CLOB 2 GB 4 GB 2 GB
NCLOB N/A 4 GB 2 GB

Table 4: LOB maxima.

The maximum size of a LOB is something that I’ve called theoretical, since some operating systems have a maximum file size that is just at, or less than, two gigabytes (e.g., a Windows95/98/Me FAT16 file has a maximum size of 2 GB minus 1 byte, an OS/2 HPFS file has a maximum size of 2 GB). However, these are the actual sizes for many people.

Syntax

BLOBs are notorious for their restrictions — there are so many things that you can do with “small” data types that you can’t do with BLOBs. (Note: I’m not talking about inline BLOBs here; assume that a BLOB’s size is greater than the size allowed for an inline BLOB.)

The restrictions begin right when you define a table with a BLOB column, as Table 5 shows.

  DB2 ORACLE SQL SERVER
More than one LOB column per table? Yes Yes Yes
PRIMARY KEY? No No No
LOB can be indexed? No No No
Incremental Backup? Yes No No

Table 5: CREATE TABLE restrictions on LOBs.

But most of the restrictions are on operations that involve comparisons. There’s an understandable reason for this: DBMSs don’t like to load LOBs into memory. Quite apart from the fact that some operating systems have only 2 GB of user-addressable memory, there’s the consideration that DBMSs generally don’t have a buffer pool large enough to fit multiple copies of large objects.

Of course, this restriction on compare implies a restriction on other operations that involve a compare — for example, DISTINCT and GROUP BY both need equality comparisons to work. Table 6 shows the types of operations that are restricted for LOBs.

LOB OPERATION ALLOWED DB2 ORACLE SQL SERVER
= > >= < <=  No No No
IN, BETWEEN No No No
DISTINCT No No No
GROUP BY No No No
ORDER BY No No No

Table 6: Comparison-related restrictions on LOBs.

However, it’s generally possible to deal with “part of a BLOB” if you can’t deal with the whole thing at once. Table 7 shows some operations that you can do with LOBs.

LOB OPERATION ALLOWED DB2 ORACLE SQL SERVER
LIKE Yes Yes Yes
SUBSTRING Yes Yes Yes
POSITION/PATINDEX Yes Yes Yes

Table 7: Operations allowed on LOBs.

In general then, it’s possible to say that you can deal easily with a partial BLOB, but not with a whole BLOB. That leads to an interesting observation. Namely, in that case, yoou want to be able to operate on the data without restrictions.

Storage

There are two places where a LOB can go: “inline” along with the rest of the columns of the same row, or “out-of-line” in a well-separated page, tablespace, or file.

It’s not absolutely necessary for a DBMS to allow inline LOBs, so you don’t always see the option. But the most recent versions of Oracle and SQL Server do support inline BLOBs, as shown in Table 8.

  DB2 ORACLE SQL SERVER
Main clause N/A {ENABLE | DISABLE} STORAGE IN ROW TEXT IN ROW option of sp_tableoption
Other notes N/A N/A Can’t be altered Affects all LOBs in table, as they’re updated

Table 8: Clauses for Defining/Enabling Inline LOBs.

Oracle’s older data types (LONG and LONG RAW) were always inline, and the default behavior is also to store LOBs inline. The advantage of an inline LOB is that access to a specific complete row is faster — when you access the row, the LOB value is right there with it. And it’s easier to maintain, provided that the data page doesn’t overflow.

However, in the grand picture of things, inline LOBs have crippling disadvantages: (1) any table scan will be slower because data pages are full of LOB data; (2) to fit a big LOB you need to have page-overflow space in every extent. Therefore the usual option involves using inline LOB storage for small (especially NULL) values, but out-of-line LOB storage for the main case. For Oracle, the maximum size of an inline LOB is approximately 4000 bytes and for SQL Server the maximum is about 7000 bytes (including overhead); after that, the
BLOB is stored out-of-line, even if you defined it as an inline BLOB. With DB2, defining your columns as LONG VARCHAR or LONG VARGRAPHIC has the same result — characters strings have a maximum length of 4000 and graphic strings have a maximum size of 2000, if you want to be able to operate on the data without restrictions.

Out-of-line LOBs have a pointer in the regular data page (along with all the non-LOB columns) and a storage area in a special set of pages elsewhere. Table 9 shows the terminology used by each DBMS.

  DB2 ORACLE SQL SERVER
Term for Pointer Large Object Descriptor  LOB Locator Pointer

Table 9: Terminology for Out-Of-Line LOB Storage.

Notes on Table 9:

The size of DB2’s large object descriptor varies (from about 60 to 300 bytes) because it depends on the maximum size of the LOB to which it refers.
The crux of the issue of LOB storage is: How is the LOB data itself stored? There are three methods:

      1. Each LOB is a separate file.
      2. LOBs are in a chain.
      3. LOBs are in a tree.

Each LOB is a separate file:

This is Oracle’s BFILE option. The advantage of this method is that the operating system can reclaim space if a LOB shrinks.

LOBs are in a chain:

This was the method used by earlier versions of SQL Server (e.g., version 6.5). All LOBs were split into 2 KB pages, each of which had pointers to the other pages in the LOB’s chain. The advantage of this method was easy setup — at a cost of difficult maintenance.

LOBs are in a tree:

For this method, I’ll quote the SQL Server documentation because Microsoft is particularly proud of this newer way of organizing LOB pages:

“The data is stored in a collection of 8-KB pages that are not necessarily located next to each other. In SQL Server 2000, the pages are organized logically in a B-tree structure, and in SQL Server version 6.5 and earlier they are linked in a page chain. The advantage of the method used by SQL Server 2000 is that operations starting in the middle of the string are more efficient ... SQL Server 2000 can quickly navigate the B-tree, and SQL Server version 6.5 must scan through the page chain.”

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_da2_5f1d.asp

Contrasting chain and tree, one can see that with tree it’s possible to go anywhere in the BLOB without starting at the beginning and going to the end.

It’s also worth noting that, in SQL Server 2000, two different LOBs can share the same page. This exemplifies how progress does march on through the field of LOBs.

At this point, I will quickly note some other aspects of physical/internal DBMS workings. They are not necessarily affected by LOB storage, but this is the appropriate place to mention them. Table 10 shows a list of physical/internal features.

  DB2 ORACLE SQL SERVER
Caching No Yes Yes
Logging Yes, up to 1 GB Yes Yes
Log Option LOGGED | NOT LOGGED LOGGING |
NOLOGGING
SET RECOVERY {FULL | SIMPLE | BULK_LOGGED

Table 10: Physical/internal features.

Notes on Table 10:

      • Caching refers to the ability of the DBMS to store copies of LOBs, or at least parts of LOBs, in memory. The cache is not necessarily the DBMS’s usual buffer pool, and in at least some cases the caching is not automatic. For example, with Oracle9i, your options are CACHE (LOBs are put in the buffer cache for faster access), NOCACHE (LOBs are generally not put in the buffer cache, the default), and CACHE READS (LOBs are put in the buffer cache only for read operations). With DB2, you can, at least, usually take advantage of your operating system’s file system caching by storing LOBs in SMS or DMS tablespaces.
      • Logging refers to the ability of the DBMS to keep track of all data-change activity on the LOB (for an article on logging see P. Gulutzan, Transaction Logs, www.dbazine.com/gulutzan2.html). It is generally considered appropriate that LOB logging be an option because (a) it can take a long time to log a data-change operation, (b) the size of the log might be restricted, and (c) many types of LOBs — for example, images — are available elsewhere in some other form, such as the original JPEG file. So some recovery method might still exist after a crash. Note, by the way, that “no logging” has no effect on COMMIT or ROLLBACK; data consistency is maintained even when a transaction is ROLLBACKed whether LOB logging is enabled or disabled.
      • Log Option refers to the syntax used to enable/disable LOB logging. In DB2, this can be done only with the CREATE TABLE statement; the default is LOGGED. In Oracle, this can be done with both CREATE TABLE and ALTER TABLE; the default depends on the cache attribute. If you specify CACHE, the default is LOGGING; if you specify NOCACHE or CACHE READS, the logging attribute defaults to the same value of the tablespace in which the LOB is stored. NOLOGGING doesn’t apply to inline LOBs, since they are treated the same as any other data type stored in a row. In SQL Server, logging must be changed at the database level. For LOB logging, Microsoft recommends that you do ALTER DATABASE ... SET RECOVERY FULL. To turn logging off, or to adjust logging to allow for bulk operations (including operations on LOBs), SET RECOVERY SIMPLE and SET RECOVERY BULK_LOGGED, respectively.

Locators

DBMS vendors have slightly different ideas of how you should look at locators, as the following quotations show:

“It is important to understand that a LOB locator represents a value, not a row or location in the database ... A LOB locator is only a mechanism used to refer to a LOB value during a transaction; it does not persist beyond the transaction in which it was created.”

— DB2 reference manual

“The LOB locator is stored in the table column, either with or without the actual LOB value.”

— Oracle reference manual

Thus, Oracle encourages you to think of locators as permanent pointers, while DB2 wants you to think of them as anything but pointers. This need not confuse you, though. You just have to know that the key meaning is: a locator is short, a locator references a LOB, a locator is useful for accessing a LOB, a locator is useful as a token that you can use instead of the LOB itself.

To a host application program, the “locator” is a (usually 16 byte) host variable. Typically, you SELECT into a locator host variable. Later, with a separate instruction, you use the locator host variable to retrieve the entire LOB, a piece at a time, into other, larger buffers.

This maintains the “out-of-line” paradigm seen when discussing LOB storage. Not only is the LOB stored separately from other columns, it’s also accessed separately. Regular SQL statements and their parameters needn’t include the LOB. Instead, they include locators as references to the LOB. The specifics will vary, so I’m only going to refer to the “Standard SQL” interface for locators. The standard SQL/CLI (also known as ODBC) function calls are:

      • SQLGetLength returns the actual length of a LOB
      • SQLGetPosition finds the start of a passed string within a LOB
      • SQLGetSubString extracts a portion of a LOB

This may not seem like very much of a function library, but with SQLGetLength, SQLGetPosition, and SQLGetSubString you can do pretty well anything: find things inside the LOB, handle pieces of the LOB at a time, transfer the whole LOB to a file, and so on.

It must not be thought that all interfaces use locators directly. For example, here is a simple JDBC snippet that simply uses the JDBC getClob and setClob methods.

 rs = stmt.executeQuery ("SELECT clob_column FROM Table1");
 rs.next();
 Blob clob_buffer = rs.getClob (1);
 Rs.close();
 stmt.setClob (1, clob_buffer);
 stmt.executeUpdate ("INSERT INTO Table2 VALUES (?)";

Tips

The most popular tip for LOBs is: Don’t use LOBs. It is common to find that, say, web programs will connect to databases that merely contain the names of image files. The idea is that the program will access (read-only) the image file when it sees the file name, and this works well enough for its limited purpose. The proponents of this “tip” will usually cite portability and ease of use as advantages.

IBM has done some studies that compare the performance of LOB access to the performance of CHAR / VARCHAR access, and some of the findings are illuminating. For example, one study found that it is cheaper to fetch a 320 KB LOB than to fetch ten 32 KB non-LOBs. This and other studies are noted in the “References” at the end of this article.

The only sage advice that I have for you is: If you use images or structured documents or other “large” data, then you’ll find that LOBs are useful. Eventually, you’ll also find that they’re easy to use, but there is a learning curve.

References

http://archives.postgresql.org/pgsql-jdbc/2002-08/msg00207.php

Bonner, Chuck. “Understanding LOBs in DB2 for OS/390”.

IBM Corporation. Large Object (LOB)
Considerations When Using the Universal JDBC Driver.

IBM Corporation.
Note: The IBM FAQ article cited in the text about relative data type performance.

Oracle Corp. “Oracle9i Application Developer’s Guide - Large Objects (LOBs).”

Otey, Michael. “SQL Server BLOB Storage.”

Paterson, Chandini. “Oracle LOBs - what, why, and how.”

Soltani, John. “The Long Road to LOBS: Large Objects in Oracle9i.”

--

Trudy Pelzer has worked with SQL since 1986. She has co-authored four books, including one thick book about the SQL Standard (SQL-99 Complete, Really) and one thin book about portability and optimization for DB2, Oracle, SQL Server, and five other DBMSs (SQL Performance Tuning). This is Trudy’s second article for DBAzine.


Contributors : Trudy Pelzer
Last modified 2005-04-12 06:21 AM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone