Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Of Interest » Articles of Interest » A Conceptual Meta-model for Unstructured Data - Part 2
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 2533
 

A Conceptual Meta-model for Unstructured Data - Part 2

by Robert S. Seiner

Part 1  |  Part 2

This is part two of a two-part article describing a conceptual meta-model that can be used to support the management of unstructured data. Part one in DBAzine.com described unstructured data, described the primary conceptual entity (the “artifact”), and began to detail several unstructured data meta-data types. Part two revisits the beginning of the part one article and completes the description of the meta-model by detailing the remaining unstructured data meta-data types.

Introduction

As discussed in part one of this article, meta-data is not just about structured data anymore. Meta-data is also data about unstructured data.

This article does not define or debate the differences between structured and unstructured data. This author considers structured data to be tabular or delimited by nature and recorded in a file or database table. For the purpose of this article, unstructured data will be referred to as "artifacts." Artifacts includes data/documents/content recorded in electronic format that can be managed and leveraged for the benefit of your company, your customers, your suppliers, and so on. Artifacts include word processing files, HTML files (Web pages), project plans, presentation files, spreadsheets, graphics, audio files, video files, emails — any data that is not in tabular or delimited format. Some people call this recorded knowledge. Some people call this Web content. Some people call this data documents as in document management. Everybody calls it valuable. For this article, that is the definition of unstructured data.

Just like structured data, to manage artifacts of unstructured data, a company needs to record meta-data about those artifacts, organize that meta-data, and make that meta-data available to the knowledge workers of the organization so they can locate artifacts when they need them. The conceptual model (refer to Figure 1) described in the first part of this article represented many of the types of meta-data that can be recorded about artifacts. The model may not include absolutely everything that you need to know about the artifacts, but it should provide a good start toward understanding the relationship between meta-data and unstructured data.

Figure 1: Unstructured data conceptual meta-model.

The rest of the article walks through the remainder of the conceptual model entity by conceptual entity and offers a brief description of the additional types of meta-data that can and should be recorded about unstructured data and artifacts.

Since the model is focused on the artifact, it is worth describing the artifact once again before starting to define the remaining unstructured data meta-data types.

Artifact

The artifact sits in the middle of the conceptual meta-model for unstructured data. As stated earlier, artifacts are the basic occurrences of unstructured data. Artifacts can include word processing files, HTML files (Web pages), project plans, presentation files, spreadsheets, graphics, audio files, video files, emails — any data that is not in tabular or delimited format. All of the meta-data recorded in the conceptual entities surrounding the artifact relate directly to the artifact.

While it makes sense to have naming standards for managed artifacts, implementing such a standard is difficult to enforce across an organization. An overwhelming number of artifacts already exist throughout your company and numerous more are constantly being created. Most companies will not consider going back and renaming existing artifacts to follow a naming convention. The naming of artifacts tends to follow personal preference and the artifact name originates when the artifact is stored locally on a desktop, laptop, network drive, and so on, typically by the author of that artifact. Therefore, it can be difficult to locate artifacts by their name alone; thus, the need for the additional artifact-related meta-data that is listed below.

Samples of meta-data related to Artifacts:

      • Artifact Name — represents the full technical name of the artifact — for example, projectname.doc, presentation.ppt, datamovement.xml, audiofile.wav, and so on.
      • Artifact Description — represents a brief textual description of what is stored in the artifact.

The rest of the meta-data described in this article is meta-data related to artifacts. Some companies may consider putting timestamps on the meta-data to store historic information that is known about the artifact. Other companies may elect to manage only a subset of the meta-data entities that are covered below.

Data Related

Data Related meta-data is included on the conceptual meta-model to secure a place to relate artifacts to specific pieces of structured data in the organization. Business rules, data models, data flow diagrams, architecture definition (whether it is technology architecture, data architecture, enterprise architecture), data dictionaries, application development documentation, report catalogs and reports themselves ... are all considered vital artifacts for the operation of your business and can be related to specific structured data in the organization. These artifacts need to be managed just like all others. This entity allows the knowledge worker to identify and locate artifacts that have a relationship to structured data in the organization.

Like the other meta-data listed in part one, the data names and type codes will require their share of due diligence to make certain names and type codes are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with your company's information technology. The data-related meta-data can also be used to link an artifact to your "data about data" in your enterprise or application meta-data repositories.

Samples of meta-data related to Data:

      • Data Name — represents a database name, table name, column name, core element name, any specific structured data (logical or physical) that can be related to an artifact.
      • Data Type Code — specifies the type of structured data represented by the Data Name above: "Table," "Column," "Entity," "Database," and so on.
      • Repository Link Code — represents a unique identifier that couples the artifact with specific data in an enterprise or application repository.

Time or Time/Date

There are many ways that the Time meta-data can be related to artifacts. The Time meta-data by itself has no meaning and must include an additional Time Type Code that denotes what the specific time represents.

Time Type Codes can specify "CREATE DATE," "RETIRE DATE," "REVIEW DATE," "LAST UPDATE DATE," and other dates pertaining to the specific artifact. For these Time Types codes, it may also be important to record meta-data that specifies the accountable party who took a specific action associated with the artifact at the recorded time in the past (created, updated) or in the future (retire, review).

Time meta-data may also represent ranges of times. For example, if an artifact should only be available from a certain date to a certain date, this information can be recorded with start and end Time/Date meta-data.

Samples of meta-data related to Time/Date:

      • Artifact Time(s)/Date(s) — represents the specific occurrences of Time/Date associated with the artifact and further defined by the Time/Date Type code.
      • Time/Date Type code — represents the meaning of the Time/Date recorded. Examples of values that may be recorded in this code are listed in the second paragraph in this section.

Media Type

Media Type meta-data represents the format type of the artifact and may be associated with the tools or software that are required to view, edit, and manage the artifact. For example, an artifact may be a word processing document, spreadsheet, graphics, audio, video, and so on, that can only be opened using specific tools (e.g., Word, Excel, Access, Illustrator). It is important to identify the format of the artifact so the knowledge worker knows what media types are standard, acceptable, and the tools they will need to view the artifact.

Like the other meta-data listed above, the media type codes will require their share of due diligence to make certain media types are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with your company's information technology.

Samples of meta-data related to Media Type:

      • Media Type Code — represents the media used to record the artifact; for example, word processing document, spreadsheet, presentation file, graphic, audio file.
      • Media Tool Code — represents the tools or software that is required to open, view, or change the artifact, for example, Microsoft Word, Lotus, PowerPoint, Visio, Wave Player.
      • Media Version Code — represents the version of the Media Tool that will be required to open, view, or change the artifact.

Package

Package meta-data can be used to associate artifacts to other artifacts. For example, user manuals, books, procedure guides, and so on, are often made up of many "chapters," "pieces," or numerous graphics. The artifacts that make up a package can refer to other artifacts. The individual chapters, pieces, and graphics may stand alone and require separate management as an individual artifact as well as require being managed as a part of a larger package of artifacts.

Like the other meta-data listed previously, the package type codes will require their share of due diligence to make certain package types are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the your company's need for packaging artifacts.

Samples of meta-data related to Package:

      • Package Name — represents the name of the artifact that includes other artifacts. For example, Data Architecture Plan, Equipment Operating Manual, Store Layouts.
      • Package Type Code — represents the type of package of artifacts and contain values that identify "USER MANUAL," "PROCEDURE GUIDE," "GRAPHICS PACKAGE," and so on, depending on how your company packages artifacts.

Status and Version

Status meta-data can be recorded about each artifact to identify the history and the present state of each artifact, including "PRODUCTION," "TEST," "UNDER REVIEW," "RETIRED," and more. The status code, when used along with a status date field, can be used to monitor the activity of each artifact by recording when the artifact was a draft, approved, under review, in production, back under review, and so on.

Version meta-data can be used when multiple copies or versions of the same artifact exist. For example, user manuals may contain different information for different releases of applications, software products, procedures, and so on.

Like the other meta-data listed previously, the status and version meta-data will require their share of due diligence to make certain that this meta-data is clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the companies need for applying status and versioning to artifacts.

Samples of meta-data related to Status and Version:

      • Status Code — represents the status of an artifact at the point in time represented by the status/version date. Examples are listed in the first paragraph in this section above.
      • Version Code — represents the numbered or codified version of an artifact. For example typical values may include "Release 1.0," "Release 1.1," "English Version," "Spanish Version."
      • Status/Version Date — represents the date or point in time when the status or version of the artifact was recorded and made available.

Project and Process

Project and Process meta-data can be used to relate a specific artifact to a project, process, or specific task. Companies that create project plans may intend to link a specific project task to a deliverable (which will become an artifact).

Companies may elect to link artifacts to a specific step or section of a process. For example, when setting up a display case in a new store, there can be pictures that demonstrate exactly how each of the sections of the display case are supposed to look. When building a house, there are schematic diagrams (read, artifacts) that are required for the landscape, architecture, wiring, plumbing and HVAC. And when describing a process, there are supporting documents for each of the steps of the process along the way.

The ability to create a link between projects, processes, and artifacts potentially requires several pieces of meta-data, depending on how projects and processes are developed at your company. If your company follows a strict planning process including the development and maintenance of detailed project plans, the coding that is used for that planning can be linked to the artifacts via this Project and Process meta-data.

Additionally, you may need to develop a Project/Process Artifact Type Code that defines the relationship between the artifact and the process: "DELIVERED," "UTILIZED," "INPUT," "OUTPUT."

Like the other meta-data listed previously, the project and process meta-data will require due diligence to make certain that the projects and processes to be linked to the artifacts are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the companies need for managing projects and processes at your company.

Samples of meta-data related to Project and Process:

      • Project Name — represents the business name of the project that is being linked to an artifact.
      • Process Name or Code — represents the process name or coded value of the process that is linked to the artifact.
      • Project/Process Artifact Type Code — representing the relationship between the project and process and the artifact. Potential values for this meta-data and defined in the next to last paragraph of this section.

Event

Event meta-data can be used to associate artifacts with events (happenings) that take place at your company or within part of your company. Events can be "one-time" events or events that repeat themselves periodically. And, to be consistent with previously defined meta-data, events can be categorized into types of events and events can be parts of other events. For example, a store may have a certain procedure or set of procedures that they follow for their annual holiday sale or promotion; a manufacturing company may produce certain goods only during an event at a certain time of year; a conference may be held once a year; and the information, pictures, processes, and more, that are used to support these events are likely recorded in artifact (e.g., documents, graphics).

Artifacts should be linked to events so companies can retrieve the information they need to prepare for or follow through with these events. Event meta-data can identify specific events, event types, and can relate events to other events.

As with the other meta-data listed above, creating codes for events and event types requires due diligence to make certain this meta-data is clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the business. The recording and management of event codes and event type codes also may be very useful in describing how the business operates.

Samples of meta-data related to Events:

      • Event Code — representing the specific event to which the artifact is linked.
      • Event Type Code — representing the type of event that is represented in the Event code. For example, "CONFERENCE," "MEETING NOTES," "BANQUET," "2003 SUMMER AUDIT," and so on.

Conclusion

This article walked through the remainder of the conceptual meta-model for unstructured data entity by conceptual entity and offered a brief description and some samples of each of the types of meta-data that should be considered when recording meta-data about unstructured data and artifacts. Hopefully, these two articles broadened your thinking about the meta-data component of managing unstructured data. Perhaps in the future, I will consider changing my meta-data definition to read as follows:

Meta-Data is ...

“Data recorded in IT tools that improves both business and technical understanding of data and data-related people and processes.”

Whew! That's a mouthful. "Data About Data" is a lot easier to remember.

--

Robert (Bob) S. Seiner is recognized as the publisher of The Data Administration Newsletter (TDAN.com), an award winning electronic publication that focuses on sharing information about data, information, content and knowledge management disciplines. Mr. Seiner speaks often at major conferences and user group meetings across the U.S. He can be reached at the newsletter at rseiner@tdan.com or 412-220-9643 (fax 9644).

Mr. Seiner is the owner and principal of KIK Consulting Services, a company that focuses on Consultative Mentoring or simply stated ... teaching company's employees how to better manage and leverage their data, information, content, and knowledge assets. Mr. Seiner's firm focuses on data governance/stewardship, meta-data management, business intelligence and knowledge management. KIK has developed a 4-Step Method© for Consultative Mentoring that involves customizing industry best practices to work in your environment.

For more information about Mr. Seiner, KIK Consulting Services and The Data Administration Newsletter (TDAN.com), please visit www.tdan.com and www.tdan.com/kik.htm.


Contributors : Robert S. Seiner
Last modified 2006-01-04 12:02 PM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone