New Technologies of the eDBA: XML
This is the third installment of my regular eDBA column, in which we explore and investigate the skills required of DBAs to support the data management needs of an e-business. As organizations move from a traditional business model to an e-business model, they will also introduce many new technologies. Some of these technologies, such as connectivity, networking, and basic Web skills, are obvious. But some are brand new and will impact the way in which eDBAs perform their jobs.
In the last eDBA column I discussed one new technology: Java. In this edition we will examine another new technology: XML. The intent here is not to deliver an in-depth tutorial on the subject, but to introduce the subject and describe why an eDBA will need to know XML and how it will impact their job.
What is XML?
XML is getting a lot of publicity these days. If you believe everything you read, then XML is going to solve all of our interoperability problems, completely replace SQL, and possibly even deliver world peace. In reality, all of the previous assertions about XML are untrue.
XML stands for eXtensible Markup Language. Like HTML, XML is based upon SGML (Standard Generalized Markup Language). HTML uses tags to describe how data appears on a Web page. But XML uses tags to describe the data itself. XML retains the key SGML advantage of self-description, while avoiding the complexity of full-blown SGML. XML allows tags to be defined by users that describe the data in the document. This capability gives users a means for describing the structure and nature of the data in the document. In essence, the document becomes self-describing.
The simple syntax of XML makes it easy to process by machine while remaining understandable to people. Once again, let's use HTML as a metaphor to help us understand XML. HTML uses tags to describe the appearance of data on a page. For example the tag, " text ", would specify that the "text" data should appear in bold face. XML uses tags to describe the data itself, instead of its appearance. For example, consider the following XML describing a customer address:
<company_name>BMC Software, Inc.</company_name>
<street_address>2101 CityWest Blvd.</street_address>
XML is actually a meta language for defining other markup languages. These languages are collected in dictionaries called Document Type Definitions (DTDs). The DTD stores definitions of tags for specific industries or fields of knowledge. So, the meaning of a tag must be defined in a "document type declaration" (DTD), such as:
<!DOCTYPE CUSTOMER [
<!ELEMENT CUSTOMER (first_name, middle_initial, last_name,
company_name, street_address, city, state,
<!ELEMENT first_name (#PCDATA)>
<!ELEMENT middle_initial (#PCDATA)>
<!ELEMENT last_name (#PCDATA)>
<!ELEMENT company_name (#PCDATA)>
<!ELEMENT street_address (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip_code (#PCDATA)>
<!ELEMENT country (#PCDATA)>
The DTD for an XML document can either be part of the document or stored in an external file. The XML code samples shown are meant to be examples only. By examining them, you can quickly see how the document itself describes its contents.
For data management professionals, this is a plus because it eliminates the trouble of tracking down the meaning of data elements. One of the biggest problems associated with database management and processing is finding and maintaining the meaning of stored data. If the data can be stored in documents using XML, the documents themselves will describe their data content. Of course, the DTD is a rudimentary vehicle for defining data semantics. Standards committees are working on the definition of the XML Schema to replace the DTD for defining XML tags. The XML Schema will allow for more precise definition of data, such as data types, lengths and scale.
The important thing to remember about XML is that it solves a different problem than HTML. HTML is a markup language, but XML is a meta language. In other words, XML is a language that generates other kinds of languages. The idea is to use XML to generate a language specifically tailored to each requirement you encounter. It is essential to understand this paradigm shift in order to understand the power of XML. (Note: XSL, or eXtensible Stylesheet Language, can be used with XML to format XML data for display.)
In short, XML allows designers to create their own customized tags, thereby enabling the definition, transmission, validation and interpretation of data between applications and between organizations. So the most important reason to learn XML is that it is quickly becoming the de facto standard for application interfaces.
There are, however, some problems with XML. Support for the language, for example, is only partial in the standard and most popular Web browsers. As more XML capabilities gain support and come to market, this will become less of a problem.
Another problem with XML lies largely in market hype. Throughout the industry, there is plenty of confusion surrounding XML. Some believe that XML will provide metadata where none currently exists, or that XML will replace SQL as a data access method for relational data. Neither of these assertions is true.
There is no way that any technology, XML included, can conjure up information that does not exist. People must create the metadata tags in XML for the data to be described. XML enables self-describing documents; it doesn’t describe your data for you.
Moreover, XML doesn’t perform the same functions as SQL. As a result, XML can’t replace it. As the standard access method for relational data, SQL is used to "tell" a relational DBMS what data is to be retrieved. XML, on the other hand, is a document description language that describes the contents of data. XML may be useful for defining databases, but not for accessing them.
With the DBMS, more and more of the popular DBMS products are providing support for XML. Take, for example, the XML Extender provided with DB2 UDB Version 7. The XML Extender enables XML documents to be integrated with DB2 databases. By integrating XML into DB2, you can more directly and quickly access the XML documents as well as search and store entire XML documents using SQL. You also have the option of combining XML documents with traditional data stored in relational tables.
When you store or compose a document, you can invoke DBMS functions to trigger an event to automate the interchange of data between applications. An XML document can be stored complete in a single text column. Or XML documents can be broken into component pieces and stored as multiple columns across multiple tables.
The XML Extender provides user-defined data types (UDTs) and user-defined functions (UDFs) to store and manipulate XML in the DB2 database. UDTs are defined by the XML Extender for XMLVARCHAR, XMLCLOB and XMLFILE. Once the XML is stored in the database, the UDFs can be used to search and retrieve the XML data as a complete document or in pieces. The UDFs supplied by the XML Extender include:
- storage functions to insert XML documents into a DB2 database
- retrieval functions to access XML documents from XML columns
- extraction functions to extract and convert the element content or attribute
- values from an XML document to the data type that is specified by the function name
- update functions to modify element contents or attribute values (and to return a copy of an XML document with an updated value)
More and more DBMS products are providing capabilities to store and generate XML. The basic functionality enables XML to be passed back and forth between databases in the DBMS. Refer to Figure 1.
Figure 1. XML and Database Integration
Defining the Future Web
Putting all skepticism and hype aside, XML is definitely the wave of the immediate future. The future of the Web will be defined using XML. The benefits of self-describing documents are just too numerous for XML to be ignored. Furthermore, the allure of using XML to generate an application-specific language is powerful. It is this particular capability that will drive XML to the forefront of computing.
More and more organizations are using XML to transfer data, and more capabilities are being added to DBMS products to support XML. Clearly, DBAs will need to understand XML as their companies migrate to the e-business environment. Learning XML today will go a long way toward helping eDBAs be prepared to integrate XML into their data management and application development infrastructure. For more details and specifics regarding XML, refer to the following website: http://www.w3.org/XML
Please feel free to e-mail me with any burning e-business issues you are experiencing in your shop and I'll try to discuss them in a future column. And please share your successes and failures along the way to becoming an eDBA. By sharing our knowledge, we make our jobs easier and our lives simpler.
Craig Mullins is an independent consultant and president of Mullins Consulting, Inc. Craig has extensive experience in the field of database management having worked as an application developer, a DBA, and an instructor with multiple database management systems including DB2, Sybase, and SQL Server. Craig is also the author of the DB2 Developer’s Guide, the industry-leading book on DB2 for z/OS, and Database Administration: Practices and Procedures, the industry’s only book on heterogeneous DBA procedures. You can contact Craig via his web site at http://www.craigsmullins.com.
Contributors : Craig S. Mullins
Last modified 2006-01-16 07:12 AM