What's New in SQL Server 2005 XML

by Hilary Cotter

This article discusses the new SQL Server 2005 XML features. The SQL Server 2000 XML feature set was mainly designed for relational aspects (you could decompose XML into relational form and store it in tables using OPENXML) and to generate XML (using the FOR XML statements).

But it was limited:

based on relational to XML mapping. Writing FOR XML statements could be very complex
limited support for semi-structured data
no full XML document storage; for example, order was not preserved
limited XML querying functionality – you could write XML views to run XPATH queries, but it was complex

The main four new features are

the XML data type, which allows you to store XML documents natively – which means the Query and Storage Engine takes into account the XML nature of the documents you are storing, and it is no longer necessary to shred XML data to relational form
the ability to query XML data stored in the XML data type using XPath or XQuery statements, which takes advantage of XML schemas
the ability to issue DDL statements against XML documents stored in the XML data type
the ability to index XML documents

SQL SERVER 2005 XML features are, for the most part, backward-compatible with SQL Server 2000 XML features, with the exception of the fact that deprecation of SQL XML ISAPI. SQL XML ISAPI has been dropped due to

security concerns
new features in other products that offer better functionality and performance
Microsoft’s direction to move away from ISAPI technology

What is XML?

XML is a text document that contains mark-up tags that convey the structure of data and is often used for data interchange, especially over the Internet. Frequently, the XML tags will convey semantic information that otherwise would not be obvious or intuitive.

There are six rules for creating well-formed XML documents:

XML elements must have a closing tag

The following is a minimal XML document:

<?xml version="1.0"?>
<root><element></element></root>

Notice how each node (ignoring the XML declaration) closes. This, for instance, is invalid since we do not close the element element.

<?xml version="1.0"?>
<root><element></root>

Empty elements

If an element has no values, we can close it as follows:

<?xml version="1.0"?>
<root><element/></root>

Notice how, in the empty element element, there is a forward slash after the element name.

Tags must be properly nested

If you have one or more sub-elements, you must nest your tags in hierarchical order as illustrated. Here is an example:

<?xml version="1.0"?>
<root>
<orders>
<order id="1">this is order 1</order>
</orders></root>

The previous code sample is valid, the following sample is not. Notice how we close the “orders” tag, before we close the “order” tag:

<?xml version="1.0"?>
<root>
<orders>
<order id="1">this is order 1</orders>
</order></root>

XML tags are case sensitive

XML has its roots in the UNIX operating system, which is case sensitive. XML has retained its case sensitivity, so an element called orders is considered to be different from an element called Orders, or ORDERS.

XML documents must have a root tag

All XML documents must contain a single, unique tag pair to define the root element, and all other elements must be nested within the root element. All elements can have sub- (child) elements. Sub-elements must be in pairs (i.e., there must be matching opening and closing tags) and correctly nested within their parent element.

Attribute values must always be quoted

An element can optionally include one or more attributes value pairs. All attribute values must be enclosed in double quotes. Attributes are used to attach additional secondary information to an element and can accept default values, while elements cannot. Here is an example illustrating attributes:

SELECT * from AUTHORS WHERE AU_ID ='172-32-1176' FOR XML AUTO
--returns
<authors au_id="172-32-1176" au_lname="White" au_fname="Johnson"
phone="408 496-7223" address="10932 Bigge Rd." city="Menlo Park"
state="CA" zip="94025" contract="1"/>

Each column in the results set is an attribute of the element Authors.

Most XML parsers (which are part of applications that consume XML) require well-formed XML; however, some parsers are tolerant of XML documents that do not contain root tags. An XML document missing root tags is termed an “XML fragment.”

Schemas

A schema is a header for your XML document that describes the structure and constrains the contents of XML documents by associating XML data types with XML element types and attributes.

XML schemas help XML parsers by telling them what elements to expect in an XML document. If a schema did not exist, the parser would have to parse through the complete XML document twice, gathering a list of all the elements within the XML document before it could correctly parse these elements.

Here is an example of an XML schema and its related data:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
xmlns:company="http://www.example.com/mycompany"
      targetNamespace="http://www.example.com/mycompany" 
elementFormDefault="qualified">
      <xs:element name="employee">
            <xs:complexType>
                  <xs:all>
                        <xs:element name="EmployeeID" 
type="xs:string"/>
                        <xs:element name="LastName" type="xs:string"/>
                        <xs:element name="FirstName" type="xs:string"/> 
                        <xs:element name="Title" type="xs:string"/>
                        <xs:element name="TitleOfCourtesy" 
type="xs:string" />
                        <xs:element name="BirthDate" type="xs:string"/>
                        <xs:element name="HireDate" type="xs:string"/>
                        <xs:element name="Address" type="xs:string"/>
                        <xs:element name="City" type="xs:string"/>
                        <xs:element name="Region" type="xs:string"/>
                        <xs:element name="PostalCode" 
type="xs:string"/>
                        <xs:element name="Country" type="xs:string"/>
                        <xs:element name="HomePhone" type="xs:string"/>
                        <xs:element name="Extension" type="xs:string"/>
                  </xs:all>
            </xs:complexType>
      </xs:element>
</xs:schema>

Why Use XML Features in SQL at All?

The Visual Studio .Net programming languages ship with powerful features used in working with XML. So why use the SQL XML features at all; why not have the client, which consumes the results set returned by SQL Server, render it as XML? This question is compounded by further product refinement, since SQL Server 2000 in applications like Visual Studio .Net and Visual Studio .Net 2003, SQL Server 2005, and Visual Studio .Net 2005. In general, the XML features of SQL Server offer better performance, returning a result set as XML rather than having the client convert to XML. There are exceptions, however.

For example, if you need to frequently update your XML data, you should shred it to relational form and store it in the database as rows in tables. Sometimes, this is better done using client-side technologies as opposed to OPENXML within SQL Server.

The XML Data Type

The most exciting new XML feature in SQL Server 2005 is the XML datatype. The XML data type allows you to store XML documents or fragments, natively; provides indexing support; and provides XML-aware functions to manipulate them. When you store XML documents in columns of this data type, you will notice that the document does not consume as much space in the table or column as it would outside of the table. This is because SQL Server 2005 represents the element and attribute names with internal identifiers.

So, an XML fragment like this:

<authors au_id="111-11-1111" au_lname="pubs" au_fname="pubs"
phone="UNKNOWN     " contract="1" rowguid="15092A31-E797-41E2-92D0-
2DE69FEF6E9E"/>
<authors au_id="111-11-1112" au_lname="pubs1" au_fname="pubs"
phone="UNKNOWN     " contract="1" rowguid="DEADAC5F-2C06-4B7A-82D6-
F56E742990C4"/>

might be represented internally as:

<a b="111-11-1111" c="pubs" d="pubs" e="UNKNOWN     " f="1" 
g="15092A31-E797-41E2-92D0-2DE69FEF6E9E"/>
<a b="111-11-1112" c="pubs1" d="pubs" e="UNKNOWN     " f="1"
g="DEADAC5F-2C06-4B7A-82D6-F56E742990C4"/>

Notice the conversation of space even in this article. When SQL Server is storing the XML document in XML data-type columns, it creates a map to map the internal identifiers, so it you can perform XQuery queries on the XML documents using element, or attribute names.

XML data-type columns have several limitations:

you cannot use them as a Primary Key
you cannot use them as a Foreign Key
you cannot use them as a Unique Constraint
you cannot declare them with the Collate Keyword

In addition to using the XML data type in a column, you can also use it in declaring a parameter, or a variable.

Typed vs. untyped

There are two ways of storing XML documents in the XML data-type columns:

typed
untyped

A typed XML document is one with which you have optionally associated a schema in parameters, variables, or columns of the XML datatype. Doing so is a good idea, since:

you can validate data as you push it into the XML data-type column
this is an optimization for the XQuery engine

To associate an XML document with a schema, you must first register the schema and, second, associate the XML data-type column with that schema.

Here is how you would associate an XML data-type column with the schema MySchema:

Register your schema using Create Schema Collection.
Declare your XML data-type column as using this schema.

Here is an example (more complete examples are in the attached code sample:

CREATE DATABASE XMLData 
GO
USE XMLData
GO
CREATE XML SCHEMA COLLECTION EmployeeDataXSD AS N'
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:company="http://www.example.com/mycompany" 
      targetNamespace="http://www.example.com/mycompany"
elementFormDefault="qualified">
      <xs:element name="employee">
          <xs:complexType>
                <xs:all>
                      <xs:element name="EmployeeID" 
type="xs:string"/>
                      <xs:element name="LastName" type="xs:string"/>
                      <xs:element name="FirstName" type="xs:string"/>
                      <xs:element name="Title" type="xs:string"/>
                      <xs:element name="TitleOfCourtesy" 
type="xs:string" />
                      <xs:element name="BirthDate" type="xs:string"/>
                      <xs:element name="HireDate" type="xs:string"/>
                      <xs:element name="Address" type="xs:string"/> 
                      <xs:element name="City" type="xs:string"/>
                      <xs:element name="Region" type="xs:string"/>
                      <xs:element name="PostalCode" 
type="xs:string"/>
                      <xs:element name="Country" type="xs:string"/>
                      <xs:element name="HomePhone" type="xs:string"/>
                      <xs:element name="Extension" type="xs:string"/>
                </xs:all>
          </xs:complexType>
    </xs:element>
</xs:schema>'
GO
CREATE TABLE EmployeeData( id INT PRIMARY KEY IDENTITY, xml_data 
XML( EmployeeDataXSD ))
GO
INSERT INTO EmployeeData VALUES ('<employee 
xmlns="http://www.example.com/mycompany" >
<EmployeeID>1</EmployeeID>
<LastName>Davolio</LastName>
<FirstName>asdf</FirstName>
<Title>Sales Representative</Title>
<TitleOfCourtesy>Ms.</TitleOfCourtesy>
<BirthDate>1948-12-08T00:00:00</BirthDate>
<HireDate>1992-05-01T00:00:00</HireDate>
<Address>507 - 20th Ave. E.Apt. 2A</Address>
<City>Seattle</City>
<Region>WA</Region>
<PostalCode>98122</PostalCode>
<Country>USA</Country>
<HomePhone>(206) 555-9857</HomePhone>
<Extension>5467</Extension></employee>')
GO

XQuery

Now that we have stored our XML in our XML data type, how do we access it?

We use XQuery. XQuery has four methods:

Exists (which determines if elements or attributes have specific values)
Modify
Query
Value

Exists

The following is an example in which we are returning rows that contain an XML element with a value of “2”:

SELECT xml_data FROM EmployeeData WHERE xml_data.exist('declare
namespace xd="http://www.example.com/mycompany";
/xd:employee[xd:EmployeeID eq "2"]')=1

This will return all rows that have XML documents stored in the in the xml_data column in which the element EmployeeID has a value of “2.”

The following query will return all rows that have XML documents stored in the xml_data column in which the element EmployeeID does not have a value of “2.”

SELECT xml_data FROM EmployeeData WHERE xml_data.exist('declare
namespace xd="http://www.example.com/mycompany";
/xd:employee[xd:EmployeeID eq "2"]')= 0

Modify

“Modify” is used to modify an XML document stored in an XML data-type column. The following is an example of how to do it. In this example, we are changing the last name of Anne Dodsworth to Anne Smith:

UPDATE EmployeeData SET xml_data.modify('declare namespace 
xd="http://www.example.com/mycompany";
replace value of (xd:employee/xd:LastName)[1] with "smith"')
WHERE xml_data.exist('declare namespace
xd="http://www.example.com/mycompany"; /xd:employee[xd:FirstName eq
"Anne" and xd:LastName eq "Dodsworth"]')=1

Query

The query method can be used to retrieve either the entire contents of an XML document or a selected section of the XML document. An example follows:

--returning the entire document
SELECT id, xml_data.query('declare namespace 
xd="http://www.example.com/mycompany"; /xd:employee') AS 
Body FROM EmployeeData

--only returning the last name
SELECT id, xml_data.query('declare namespace
xd="http://www.example.com/mycompany"; /xd:employee/xd:LastName') AS
Body FROM EmployeeData

Value

The value method allows you to extract scalar values from an XML datatype. The following is an example of this method:

SELECT id, xml_data.value('declare namespace
xd="http://www.example.com/mycompany";
(xd:employee/xd:LastName)[1]','varchar(20)')as LastName from
EmployeeData

XML Indexes

As the XML data-type columns can hold up to 2GB of data, and you may be querying portions of the XML document, it is advantageous to index these columns.

There are two types of indexes you can place on XML documents stored in the XML data-type columns:

Primary – covers all elements in a column
Secondary – cover paths, values, and properties.

The creation syntax is highly symmetric with the Create Index statements you use to create clustered, nonclustered, and unique indexes.

The following is an example of how to create a primary index:

CREATE PRIMARY XML INDEX xml_data_index on EmployeeData(xml_data)

Note that you must have a clustered primary key on the table on which you wish to index XML data type column.

Here are some examples of how to create a secondary index:

Create XML index xml_data_index_secondary_path on EmployeeData(xml_data) 
USING XML INDEX xml_data_index FOR PATH
Create XML index xml_data_index_secondary_value on 
EmployeeData(xml_data) USING XML INDEX xml_data_index FOR VALUE
Create XML index xml_data_index_secondary_property on 
EmployeeData(xml_data) USING XML INDEX xml_data_index FOR PROPERTY

FOR XML Enhancements

In SQL Server 2000, you could return your results set in XML format using the FOR XML predicate. Getting well-formed XML out of such a query was often highly convoluted and could involve unioning a table with itself three times to get reasonable looking XML, and ordering child elements was impossible. There have been substantial improvements in the FOR XML predicates in SQL SERVER 2005.

One of the more interesting enhancements is the ability to return XML data using the XML data type.

Here is an example from SQL Server 2000 (also backwardly compatible in SQL Server 2005):

SELECT * FROM EmployeeData WHERE id =1 FOR XML AUTO
GO
--returns
XML_F52E2B61-18A1-11d1-B105-00805F49916B
-----------------------------------------------------------------------
<EmployeeData
id="1"><xml_data><employee
xmlns="http://www.example.com/mycompany"><EmployeeID>1</EmployeeID><Las
tName>Davolio</LastName><FirstName>asdf</FirstName><Title>Sales
Representative</Title><TitleOfCourtesy>Ms.</TitleOfCourtesy><BirthDate>
1948-12-08T00:00:00</BirthDate><HireDate>1992-05
01T00:00:00</HireDate><Address>507 - 20th Ave. E.Apt.
2A</Address><City>Seattle</City><Region>WA</Region><P
ostalCode>98122</PostalCode><Country>USA</Country><HomePhone>(206)555-
9857</HomePhone><Extension>5467</Extension></employee></xml_data></Empl
oyeeData>

(1 row(s) affected)

Given that, in SQL Server 2005, using the TYPE directive, your results set is returned as an XML data type, the following is an example:

SELECT * FROM EmployeeData WHERE id =1 FOR XML AUTO, TYPE
GO
--returns the below – not the absence of a column header
<EmployeeData id="1"><xml_data><employee
xmlns="http://www.example.com/mycompany"><EmployeeID>1</EmployeeID><Las
tName>Davolio</LastName><FirstName>asdf</FirstName><Title>Sales
Representative</Title><TitleOfCourtesy>Ms.</TitleOfCourtesy><BirthDate>
1948-12-08T00:00:00</BirthDate><HireDate>1992-05
01T00:00:00</HireDate><Address>507 - 20th Ave. E.Apt.
2A</Address><City>Seattle</City><Region>WA</Region><P
ostalCode>98122</PostalCode><Country>USA</Country><HomePhone>(206)555-
9857</HomePhone><Extension>5467</Extension></employee></xml_data></Empl
oyeeData>

(1 row(s) affected)

Now it is also possible to do nested queries and sub-queries, something impossible with SQL Server 2000, and you can also do inline schema generation, illustrated as follows:

Create Table Test
(PK INT not null Primary Key,
CharCol1 char(20))
GO
SELECT * FROM test FOR XML AUTO, XMLSCHEMA
GO
--generates
XML_F52E2B61-18A1-11d1-B105-00805F49916B
<xsd:schema targetNamespace="urn:schemas-microsoft-com:sql:SqlRowSet2"
xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet2"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
elementFormDefault="qualified">
<xsd:import namespace =
"http://schemas.microsoft.com/sqlserver/2004/sqltypes
schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sq
ltypes.xsd"/>
<xsd:element name="test">
<xsd:complexType>
<xsd:attribute name="PK" type="sqltypes:int" use="required"/>
<xsd:attribute name="CharCol1">
<xsd:simpleType>
<xsd:restriction base="sqltypes:char" sqltypes:localeId="1033"
sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"
sqltypes:sqlSortId="52">
<xsd:maxLength value="20"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
</xsd:element>
</xsd:schema>

(0 row(s) affected)

OpenRowSet Enhancements

You can now use OpenRowSet to insert XML documents into your database. In SQL Server 2000, you could use OpenXML, but this was very memory intensive and was best used when you were infrequently loading XML data into your database. The following is an example of loading a csv into the EmployeeData table shown previously:

Insert into EmployeeData(xml_data)<
Select * from Openrowset(bulk 'c:\employees.csv, SINGLE_CLOB) as X

Summary

This article highlighted some of the most significant new XML features in SQL Server 2005. The most significant new feature is the ability to store XML documents in the new XML data type and to be able to index, retrieve, query, and modify elements and their attributes using XQuery.

Hilary Cotter has been involved in IT for more than 20 years as a Web and database consultant in the tri-state area of New York, New Jersey and Connecticut. He was first recognized by Microsoft in 2001 with the Microsoft SQL Server MVP award. After receiving his Bachelor of Applied Science degree in Mechanical Engineering from the University of Toronto, he studied both economics at the University of Calgary and Computer Science at UC Berkeley. Hilary has worked for Microsoft, Merrill Lynch, UBS-Paine Webber, MetLife, VoiceStream, Tiffany & Co., Pacific Bell, Cahners, Novartis, Petro-Canada, and Johnson and Johnson. He has just completed A Guide to SQL Server 2000 Transactional and Snapshot Replication and has a companion volume on merge replication in the works for 2005.

Contributors : Hilary Cotter
Last modified 2006-07-28 03:08 PM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?