Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » SQL Server » SQL Server Articles Archive » Bulk Loading XML Data Using MSXML
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 3555
 

Bulk Loading XML Data Using MSXML

by John Paul Cook

Microsoft XML Core Services (MSXML) offers several programmatic extensions for writing XML applications. The Bulk Load 3.0 COM object that comes with MSXML 3.0 provides a high performance option for bulk loading XML data into a relational database. MSXML 3.0 comes with Windows XP, Internet Explorer 6 and BizTalk 2002. MSXML can be downloaded from www.microsoft.com/xml. The latest version as of publication time is version 4.0. Be sure to read the installation notes before installing MSXML to avoid possible conflicts with other applications. For example, if installing BizTalk Server 2002, it must be installed before installing MSXML 4.0. Two different test platforms were used in the writing of this article. Both machines had XP Professional with Visual Studio Enterprise, .NET Framework Service Pack 1, and SQL Server 2000 Service Pack 2 Developer Edition. One machine had MSXML 4.0 installed on it without BizTalk Server 2002. The other machine had BizTalk Server 2000 installed first and afterwards MSXML 4.0 was installed.

COM and the .NET Framework are two fundamentally different approaches to application development. COM type libraries must be converted to .NET assembly metadata to achieve COM interop in .NET. By using the Visual Studio .NET IDE, this can be done automatically by following the simple steps that follow. It is assumed that MSXML is already installed.

Begin by going to the Visual Studio .NET menu bar and selecting Project, then Add Reference.

Alternatively, go to the Solution Explorer, right click on References and select Add Reference.

When the Add Reference dialog box appears, select Microsoft SQLXML BulkLoad 3.0 Type Library from the list, click the Select button, and click OK to complete your selection.

The necessary metadata is automatically generated. The COM object appears as a reference in the solution explorer.

Use the Object Browser to see what’s available in the component. Go to the menu bar and select View, Other Windows, Object Browser.

The Object Browser shows you the interfaces and members available.

Testing for this article was done by loading 1 million rows into the Region table in the Northwind database. Here is the format of the XML file that was bulk loaded:

<Locations>
<Location>
<LocationID>11</LocationID>
<LocationDescription>A11</LocationDescription>
.
.
.
</Location>
<Location>
<LocationID>1000010</LocationID>
<LocationDescription>A1000010</LocationDescription>
</Location>
</Locations>

Notice that it has a root element, which is the <Locations> tag. It is necessary to have a mapping schema file that maps the elements in the XML file to the Region table:

Notice that the sql:relation and sql:field attributes specify the mappings to the Region table and its columns.

A SQLXMLBulkLoad3 object must be instantiated. In VB.NET, use the following code:

Dim bulkLoadObj As New SQLXMLBULKLOADLib.SQLXMLBulkLoad3Class()

In C#, instantiate an object from the class by using this code.

Notice that the Object Browser reveals a ConnectionString property and an Execute method. The ConnectionString property is a valid OLE DB connection string. In this example, a connection is made to a SQL Server. Instead, the connection could have been to an Oracle server. VB.Net syntax is shown first, followed by C# syntax.

bulkLoadObj.ConnectionString = "Provider=SQLOLEDB;Data Source=localhost;" _
                  & "Integrated Security=SSPI;Initial Catalog=northwind"

bulkLoadObj.ConnectionString = "Provider=SQLOLEDB;Data Source=localhost;" +
		      "Integrated Security=SSPI;Initial Catalog=northwind";

The Execute method requires two parameters: a mapping schema file and the XML file to be bulk loaded.

bulkLoadObj.Execute("C:/Locations.xsd", "/sql/sql_articles/cook3/C/Locations.xml") ' VB.NET

bulkLoadObj.Execute("C:/Locations.xsd", "/sql/sql_articles/cook3/C/Locations.xml"); // C#

As you can see, only three lines of code had to be added to our Visual Studio.NET project to accomplish bulk loading of XML. Adding error handling and other supportive code is left as an exercise for the reader. For those cases where something quick and dirty will suffice, MSXML’s COM object could be called from VBScript instead.

A common question concerns the speed of loading XML data files. After all, the tags greatly increase the number of bytes that must be processed. The same million rows of data take much less space formatted as a bcp file. A bcp file was created and loaded using SQL Bulk Load (which is just bcp invoked from T-SQL instead of a command line) for comparison purposes. Here is the bcp file format of locations.txt:

11,A11
  .
  .
  .
1000010,A1000010

The XML file contained seven times as many bytes as the bcp file. Using MSXML to bulk load the XML took only about 2.2 times as long as bcp. It is the XML tags that make the XML file so much larger. The obvious question is what performance improvement can be achieved by reducing the size of the XML tags? A new file called a.xml was created using the following format:

 <t>
 <a>
 <b>11</b>
 <c>All</c>
       .
       .
       .
 </a>
 <a>
 <b>1011101</10>
 <c>A1000010</c>
 </a>
 </t>

After a new mapping schema was made to accompany a.xml, which was bulk loaded using MSXML’s bulk loading object. Test platforms were 600 MHz Pentium III’s with 512 mb of ram.

file             size          elapsed load time 

locations.txt     15,409 KB    103 seconds
locations.xml    107,206 KB    225 seconds
a.xml             40,799 KB    157 seconds

Having fewer bytes in the XML tags did greatly speed up loading of this dataset. It came at the cost of clarity. The shorter tags are devoid of any descriptive meaning. Even if you have control over the creation of the tags, it’s impractical to focus on speed of loading to the exclusion of maintainability.

On another practical matter, the XML files shown above all contain a root element. What do you do if you receive a large XML file without a root element? Adding root element tags to a file is inconvenient. Fortunately, it’s also not necessary. The MSXML bulk load object has a Boolean Fragment property that when set to true obviates the need for a root element. The Boolean CheckConstraints property can be used to enforce primary and foreign key constraints. If this property is set to true, it’s a good idea to specify an error log file by using the ErrorLogFile string property. The Boolean SchemaGen property can be used to create a new table in the target database. For more information, see the MSXML SDK that’s installed with the MSXML download.

As you have seen, Visual Studio .NET and MSXML simplifies the task of bulk loading XML with the aid of a mapping schema. The bulk load component’s 16 properties provide a great deal of flexibility for configuration in a production environment.

--

John Paul Cook is a database and systems architect in Houston, Texas. His primary focus is helping large enterprise customers succeed with SQL Server, Oracle, and the .NET framework.


Contributors : John Paul Cook
Last modified 2005-04-12 06:21 AM

Bulk Loading XML Data Using MSXML

Posted by ksk3888 at 2007-09-07 07:52 AM
Hi John,
I read your article "Bulk Loading XML Data Using MSXML". Very nice article. I'm also doing the same exact thing. I'm having a serious performance issue. I've to load over 10 Million XML files into SQL 2005 Database. I tried to load 5000 xml files initially, with each file size: 7 KB. It took 12 HOURS to load 5000 files. This is too slow. Following are the settings for SQLXMLBulkLoad4Class that i'm using.

objXBL.SGDropTables = false;
objXBL.SchemaGen = false;
objXBL.KeepIdentity = false;
objXBL.BulkLoad = true;
objXBL.Transaction = false;
objXBL.ForceTableLock = true;

Please give me your suggestions / comments.

Thanks a Lot,

Sam Kumar.
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone