Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Of Interest » Articles of Interest » A Conceptual Meta-model for Unstructured Data - Part 1
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 2413
 

A Conceptual Meta-model for Unstructured Data - Part 1

by Robert S. Seiner

Part 1  |  Part 2

This is part one of a two-part article that describes a conceptual meta-model that can be used to support the management of unstructured data. Part one quickly describes unstructured data, describes the primary conceptual entity (the “artifact”), and begins to detail several unstructured data meta-data types. Part two will complete the description of the meta-model by detailing the remaining unstructured data meta-data types.

Introduction

Meta-data is not just about structured data anymore. Five years ago, I published a definition for meta-data that makes as much good sense today as it did when I wrote it. I defined meta-data as …

“Information documented in IT tools that improves both business and technical understanding of data and data-related processes.”

I have seen my definition repeated in DM Review magazine, Intelligent Enterprise magazine, and several other publications by some astute :) authors. I can see sticking with that definition because it says so much more than “data about data.” However, when looking closely at my definition ... or the industry definition, the question that could pop into people's minds is — what exactly is data? My definition mentions data and data-related processes, but doesn't clearly specify structured or unstructured data.

This article is not intended to define or debate the differences between structured and unstructured data. This author considers structured data to be tabular or delimited by nature and recorded in a file or database table. For the purpose of this article, unstructured data will be referred to as “artifacts.”

Artifacts includes data/documents/content recorded in electronic format that can be managed and leveraged for the benefit of your company, your customers, your suppliers, and so on. Artifacts include word processing files, HTML files (Web pages), project plans, presentation files, spreadsheets, graphics, audio files, video files, emails ... any data that is not in tabular or delimited format. Some people call this recorded knowledge. Some people call this Web content. Some people call this data documents as in document management. Everybody calls it valuable. For this article, that is the definition of unstructured data.

Just like structured data ... to manage artifacts of unstructured data, a company needs to record meta-data about those artifacts, organize that meta-data, and make that meta-data available to the knowledge workers of the organization so they can locate artifacts when they need them. The conceptual model (refer to figure 1) described in this article represents many of the types of meta-data that can be recorded about artifacts. The model may not include absolutely everything that you need to know about the artifacts, but it should provide a good start toward understanding the relationship between meta-data and unstructured data.

Figure 1: Unstructured data conceptual meta-model.

The rest of the article walks through the conceptual model entity by conceptual entity and offers a brief description of each of the types of meta-data that can and should be recorded about unstructured data and artifacts.

Artifact

The artifact sits in the middle of the conceptual meta-model for unstructured data. As stated earlier, artifacts are the basic occurrences of unstructured data. Artifacts can include word processing files, HTML files (Web pages), project plans, presentation files, spreadsheets, graphics, audio files, video files, emails ... any data that is not in tabular or delimited format. All of the meta-data recorded in the conceptual entities surrounding the artifact relate directly to the artifact.

While it makes sense to have naming standards for managed artifacts, implementing such a standard is difficult to enforce across an organization. An overwhelming number of artifacts already exist throughout your company and numerous more are constantly being created. Most companies will not consider going back and renaming existing artifacts to follow a naming convention. How artifacts are named tends to follow personal preference, and the artifact name originates when the artifact is stored locally on a desktop, laptop, network drive ... typically by the author of that artifact. It can be difficult to locate artifacts by their name alone; thus, the need for the additional artifact-related meta-data that is listed below.

Samples of meta-data related to artifacts:

      • Artifact Name — represents the full technical name of the artifact (for example, projectname.doc, presentation.ppt, datamovement.xml, audiofile.wav, and so on)
      • Artifact Description — represents a brief textual description of what is stored in the artifact

The rest of the meta-data described in this article is meta-data related to artifacts. Some companies may consider putting timestamps on the meta-data to store historic information that is known about the artifact. Some companies may elect to manage only a subset of the meta-data entities that are covered below.

Business Function

Many organizations are designed by business function. For example, your organization may have an accounting function, a payroll function, a human resources function, sales, marketing, manufacturing, purchasing, IT … and the list goes on. Your company may have many of each of these business functions. In many organizations, these functions are divided into sub-functions or sub-sub functions that correspond to the organizational chart. Organization by business function makes sense for many companies, and categorizing your artifacts by business function makes sense, too. This conceptual entity identifies the meta-data that links specific artifacts to business functions, whether that business function is the origination point for the artifact or another business function area that makes use of the artifact.

To be successful tagging artifacts with business function meta-data, organizations might start with the highest level of the organizational chart (all companies, subsidiaries ...) and define company codes to represent the companies and business function codes to represent the functions. The list of codes and names should be defined appropriately, recorded and utilized consistently, reviewed periodically, and managed to eliminate duplicates so it stays consistent with the company’s business.

Samples of meta-data related to business function:

      • Company code/business function code — these two pieces of meta-data may need to be stored together for companies that have multiple business entities (companies) that have similar business functions. Depending on your organization, you may need to capture a company code for each business function code.
      • Company/business function relationship type code — this code would identify the type of relationship between the company/business function and the artifact. For example, “ORIGIN” in this code could tell the knowledge worker that the artifact was authored by a specific company/business function(s);“CONSUMER” can represent that this company/business function makes use of the artifact, “APPROVAL” can represent that this company/business function has the responsibility of approving the artifact.

Subject Area

Subject area meta-data may be used to relate an artifact to an enterprise data model or business models with the “subject area” type of categorization or breakdown. By creating a code for each subject area and linking subject area codes to artifacts, knowledge workers are given an additional way to search for artifacts.

Just like company/business function code, the Subject Area code will require its share of due diligence to make certain subject areas are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the business.

Samples of meta-data related to subject area:

      • Subject Area code — represents the specific subject area to which the artifact is linked
      • Subject Area type code — represents the type of subject area classification – enterprise data model, business model, organizational model ...

Purpose

Purpose meta-data identifies the uses (intended or otherwise) of the artifact. Purposes may also be broken into Sub-purposes. For example, if “store opening” was defined as the purpose, “equipment delivery” may be a Sub-purpose. The same holds true for a purpose of “financial reporting” and a Sub-purpose of “report distribution.” The purpose code can be used to identify why the artifact was created and how it is to be used.

You must ensure that purposes and Sub-purposes are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the business.Rrecording and managing purpose codes themselves may be very useful in describing how the business operates.

Samples of meta-data Related to purpose:

      • Purpose code — represents the specific purpose to which the artifact is linked
      • Purpose sub-code — represents a sub-type within a purpose to further define make how the artifact will be used

Steward

Steward meta-data, in the context of an artifact, is meta-data about a person who is accountable for the artifact. Notice that I said “a person” and not “THE person.” Depending on your organization and how you define stewardship, there may be several steward types, and thus, the need to record a steward type code along with the steward (person) information. Keep in mind that your may have several stewards of the same or different types that may be associated with a single artifact.

Different steward type codes may include “AUTHOR,” “REVIEWER,” “APPROVER,” “USER” (knowledge workers). Consider using several steward types (as opposed to assigning a single steward per artifact) to identify, record, and track the different types of accountability for the management of the specific artifact. The link between the steward and the artifact may also be important if the artifact is time-sensitive and must be reviewed periodically to make certain it is current.

Just like the other codes listed above, you should make sure that the steward type code gets its share of due diligence to make certain accountabilities per steward type are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the business.

Samples of meta-data related to steward:

      • Steward type code — represents the type of steward that is being associated to the artifact
      • Steward person identifier — represents the link to the steward (person). It is not recommended that this identifier is the steward’s name (for obvious reasons). This meta-data may contain an employee id, social security number, or the specific data at your company that can be used to associate an artifact to a person

Location

A Location may be as specific or as varied as a country, region, state, city, complex, building, floor, or mail-stop. Location information can be used to link people associated with a specific location to the artifact. Location can also be used to secure who sees or uses the artifact and to identify where the artifact is relevant or should be distributed.

It may be necessary to record information about numerous locations per each piece of artifact and also to delineate the locations by location type (e.g., market region, office complex, building).

Like the other codes listed above, location codes and location types will require due diligence to make certain location definitions are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the business.

Samples of meta-data related to location:

      • Location code — represents the code for the location being linked to the artifact
      • Location type code — representing the type of location that is being associated to the artifact (e.g., sales region, physical location, mail center)

Community and Audience

Community or audience meta-data identifies a group of knowledge workers that are associated to an artifact. Depending on your organization, you may have communities made up of other communities or you may need ways to identify types of communities. Examples of a community types could include user-group communities, management teams, project teams, and so on, that may be related or linked to a specific artifact. Sub-communities may break these down further: User Groups may break down to ER/Win Users, Portal Users, Data Warehouse Users; Project Teams may break down to specific projects by name; Management Teams may break down into different levels of managers, and so on.

Like the other meta-data listed above — the creation of community codes and community type codes will require due diligence to make certain the codes are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the business. Companies should consider both opt-in and organization-defined communities that are designed for specific purposes.

Samples of meta-data related to Communities:

      • Community code — represents the specific community of interest interested people to which the artifact is linked
      • Community type code — represents the type of community classification (e.g., user group, management team, project team)

Security

Security meta-data can be used to make it such that only specific people, groups, or users at specific locations may be granted access to an artifact and to identify the type of access that the group has to the artifact. This type of meta-data may also be used to answer questions such as, “What artifacts can John Smith view? change? eliminate?”, “What artifact can be changed by the ABC Management Team?”, and so on.

Like the other meta-data listed above — the security groups and their relationship to knowledge workers will require their share of due diligence to make certain security group definitions are clearly defined, recorded, managed to eliminate duplicates, and that they stay consistent with the business.

Samples of meta-data related to security:

      • Security grouping code — represents the security grouping by community, location, steward type, others, and so on, to which the artifact is linked
      • Security grouping type code — represents the type of grouping, by “COMMUNITY” “LOCATION,” “STEWARD TYPE,” and so on, to which security is being applied
      • Security type code — this field may specify the type of security that is associated with the artifacts (e.g., “ALL ACCESS,” “CHANGE,” “READ-ONLY”)

Conclusion — Part One

Part one of this article walked through half of a conceptual meta-model for unstructured data including descriptions of the Primary Entity (artifacts that reside in the middle of the model), Subject Area meta-data, Business Function, Purpose, Steward, Location, Community, Audience, and Security meta-data that should be consider when recording meta-data about unstructured data. Hopefully, this article began to broaden your thinking about the meta-data component of managing unstructured data.

The second part of this article will rehash briefly what was covered in part one and will cover additional conceptual meta-data entities for unstructured data that complete the meta-model. These entities will include Data-related meta-data, Time meta-data, Event, Project and Process, Status and Version, Package, and Media type meta-data.

--

Robert (Bob) S. Seiner is recognized as the publisher of The Data Administration Newsletter (TDAN.com), an award winning electronic publication that focuses on sharing information about data, information, content and knowledge management disciplines. Mr. Seiner speaks often at major conferences and user group meetings across the U.S. He can be reached at the newsletter at rseiner@tdan.com or 412-220-9643 (fax 9644).

Mr. Seiner is the owner and principal of KIK Consulting Services, a company that focuses on Consultative Mentoring or simply stated ... teaching company's employees how to better manage and leverage their data, information, content, and knowledge assets. Mr. Seiner's firm focuses on data governance/stewardship, meta-data management, business intelligence and knowledge management. KIK has developed a 4-Step Method© for Consultative Mentoring that involves customizing industry best practices to work in your environment.

For more information about Mr. Seiner, KIK Consulting Services and The Data Administration Newsletter (TDAN.com), please visit www.tdan.com and www.tdan.com/kik.htm.


Contributors : Robert S. Seiner
Last modified 2006-01-04 12:02 PM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone