Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » SQL Server » SQL Server Articles Archive » Saving and Retrieving Binary Files With SQL Server - Part 1
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 3554
 

Saving and Retrieving Binary Files With SQL Server - Part 1

by Eric Charran

Part 1  |  Part 2

Overview

SQL Server supports the ability for clients to store objects within tables. Generically known as Binary Large Objects (BLOBs), these objects can be complex data types and represent physical file objects. Common uses for this capability within the database layer include storing photos or thumbnail information for personnel databases, and storing specific content for Web sites such as images, or documents that can be retrieved and stored. The benefit of storing this information in binary format in the database is that the data is returned as part of the Tabular Data Stream. This eliminates file-level access and simplifies the overall physical implementation architecture. Also, the binary data can be backed up and restored along with the rest of the database.

Another major advantage of storing documents in the database is SQL Server’s built in capability to Full-Text Index the documents. This provides applications and clients the ability to search for words and phrases inside of a document stored within a table using the Full-Text search predicates built into TSQL (i.e., the CONTAINS operator). When a binary column is Full-Text Indexed in SQL Server, developers can write code using stored procedures that will search within the document, as well as meta data about the document.

To achieve the goal of storing, searching, and retrieving documents from SQL Server, it is important for developers to be able to write interfaces that enable users to place documents into the database as well as retrieve them. The retrieval methods should be provided through application searching interfaces (i.e., Web forms or Windows Forms and stored procedures).

This part of the Saving and Retrieving Binary Files with SQL Server series will focus on the database configuration portion of the solution. The second part of the series will expound upon the application configuration and coding portion of the solution.

SQL Server Setup

In establishing the data layer for allowing an application to store and retrieve binary objects, Microsoft recommends using several data types designed to support binary and text data. Developers will need to know how to best choose the correct data type for the data the application will store.

Data Type Selection

Binary data can be stored in several ways within a SQL Server table. Which storage methods are optimal depends on how much data storage is required for the application. The data types include binary, image, and varbinary, and the difference between these is primarily in their storage capabilities. A binary data type column must hold one row’s worth of data in a SQL Server table (equivalent to 8KB for each row). Similarly, the varbinary data type only allocates the amount of space required for the storage of the object up to 8KB. The relationship between the binary and varbinary is similar to the relationship between the char and varchar data type, in that the varchar column will expand to store the data in the column up to its specified limit.

The image data type is designed to hold data that is greater in size than 8KB. The image data type is the data type of choice for developers interested in storing complex binaries within the database.

Data Model

Since data will be stored in an image data type within a table, developers should give thought to storing additional data along with the binary document. Meta data such as file name, a description of the file, and the name of the user who uploaded the document is particularly valuable information that can be maintained and provided to the application to help users navigate the document store within the database. Additionally, the Full-Text Index feature of SQL Server requires the storage and specification for each document of its file extension to allow for searching.

Typically, in a Third Normal Form schema, this information will exist in an entity of its own, and can potentially be related to other entities within the database. Stored procedures should provide the capability to retrieve and store this information.

Binary Storage and Retrieval Stored Procedures

Binary document storage and retrieval can be accomplished through a two-layer interface from the database perspective. In essence, there should be a layer that solely provides the capability to upload and retrieve documents, and a layer that allows users to search for documents to be retrieved.

The document storage and retrieval stored procedures should be quite compact and elementary. The save stored procedure, for example, should accept an image parameter and other basic parameters dealing with the document meta data. This stored procedure conducts an insert into the table that stores the document and can be written as a simple insert statement using the parameters supplied.

The retrieval methodology should include a stored procedure that contains a series of methods to enable users to search for the document. One method involves maintaining a table of details (an independent table of meta data linked to the document table). The stored procedure can search through these details using the LIKE operator in TSQL. The other approach is to maintain a Full-Text Index on the documents table to allow searching using the CONTAINS operator within the documents. The stored procedure can combine these approaches with direct searches on the meta data using TSQL LIKE operators (i.e., users can search based on their name to see all documents they uploaded, the documents file name, upload dates, and so on). A bit-driven interface into the stored procedure can reflect which type of search should be conducted.

The aforementioned stored procedure should return information about the documents found and the primary key from the documents table. Then, the application can use this key to retrieve the specified document later in the application workflow. Using this approach, a user can search for documents without the overhead of bringing back multiple documents and their associated data. This allows the application to retrieve only the documents that are needed by the search. The application will pass the document primary key into a third stored procedure designed specifically for document data retrieval based specifically on the table’s primary key. The outline of the database objects are as follows:

Database Object  Description
Search Stored Procedure                 Returns document search information by searching the meta data in the documents table or associated entities. This stored procedure uses a combination of Full-Text Indexing (by specifying the CONTAINS operator) or traditional TSQL searches (by specifying the LIKE operator). It searches on document-related data including upload dates, file names, and descriptions. This procedure returns the search information along with the document primary key for subsequent document retrieval.
Document Save Stored Procedure Inserts document data as image data into the database table along with associated meta data, including the file type, file name, upload date/time, and so on.
Document Retrieval Stored Procedure Retrieves the document from the database table by selecting the image column based on the document primary key input parameter.
Document Table Contains an image column as well as other traditional columns to store the document binary data as well as supporting meta data. For Full-Text Indexing this column must store the document extension.
Supporting Tables Tables that are related to the documents table via referential integrity that contain diversified data that can be searched in addition to the document meta data for document retrieval.

 Table 1.1: Database Objects for document binary storage.

Implementation

Configuring and developing the database portion of the solution is relatively straight-forward. Much of the complexity of managing the conversion of binaries into the image data type is managed by the application. As a result, creating the data model and the stored procedures to facilitate the movement of documents in and out of the database is as simple as searching for a document via its primary key and returning the image column, or inserting a row from a stored procedure with an image parameter.

The searching component provided by the database is where SQL Server will be relied upon to conduct most of the work. By designing a versatile stored procedure, the application will be able to provide users with multiple methods of document location capability. It is important to undergo the traditional database development lifecycle and take into account concepts such as domain integrity, entity integrity and referential integrity, while paying strict attention to indexing and optimization of search queries. Because searching for documents requires much less flexibility than designing a search in a traditional OLTP application, developers have the luxury of avoiding dynamic SQL.

Additionally, developers and administrators must become familiar with all aspects of Full-Text indexing, including its architecture, the concept of incremental and full population and their scheduling requirements and other important aspects of maintaining a Full-Text Index.

--

Eric Charran is currently working as a Technical Architect and Web Developer, in Horsham, PA. Eric holds Microsoft Certifications in SQL Server and has significant experience in planning, designing, and modeling n-tier applications using SQL Server 2000, as well as architecting and implementing .NET framework based applications, solutions and Web services. Eric’s other professional skills encompass Database Administration, Data Warehousing and Application Architecture, Modeling and Design, as well as Data Warehouse Transformation and Population using DTS. Eric is also skilled in developing and implementing ASP.NET and Windows forms applications, using technologies such as COM+, the .NET framework, VB.NET and other .NET languages.


Contributors : Eric Charran
Last modified 2006-01-09 02:56 PM

Code

Posted by tassflint at 2006-03-17 09:24 AM
I would also like to get the full code in VB.Net. The code I have written is much the same, yet I am not really getting anything from it, so it would help me greatly to be able to see where I have made a mistake.

Thanks
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone