Getting Down to Data Basics
Well, this is the fourth eDBA column I have written for dbazine and I think it's time to start over at the beginning. Up to this point we have focused on the transition from DBA to eDBA, but some e-businesses are brand new to database management. These organizations are implementing eDBA before implementing DBA. And the sad fact of the matter is that many are not implementing any formalized type of DBA at all.
Some daring young enterprises embark on Web-enabled database implementation with nothing more than a bevy of application developers. This approach is sure to fail. If you take nothing else away from this article, make sure you understand this: every organization that manages data using a database management system (DBMS) requires a database administration group to ensure the effective use and deployment of the company's databases.
In short, e-businesses that are brand new to database development need a primer on database design and administration. So, with that in mind, it's time to get back to data basics.
Data Modeling and Database Design
Novice database developers frequently begin with the quick-and-dirty approach to database implementation. They approach database design from a programming perspective. That is, novices do not have experience with databases and data requirements gathering, so they attempt to design databases like the flat files they are accustomed to using. This is a major mistake, as anyone using this approach quickly finds out once the databases and application moves to production.
At a minimum, performance will suffer and data may not be as readily available as required. At worst, data integrity problems may arise rendering the entire application unusable. A relational database design can not be thrown together quickly by novices.
What is required is a practiced and formal approach to gathering data requirements and modeling that data. This modeling effort requires that the naming entities and data elements follow an established and standard naming convention. Failure to apply standard names will result in the creation of databases that are difficult to use because no one knows its actual contents.
Data modeling also requires the collection of data types and lengths, domains (valid values), relationships, anticipated cardinality (number of instances), and constraints (mandatory, optional, unique, etc.). Once collected and the business usage of the data is known, a process called normalization is applied to the data model.
Normalization is an iterative process that I will not cover in detail here. Suffice it to say, the a normalized data model reduces data redundancy and inconsistencies by ensuring that the data elements are designed appropriately. A series of normalization rules are applied to the entities and data elements, each of which is called a "normal form." If the data conforms to the first rule, the data model is said to be in "first normal form," and so on.
A database design in First Normal Form (1NF) will have no repeating groups and each instance of an entity can be identified by a primary key. For Second Normal Form (2NF), instances of an entity must not depend on anything other than the primary key for that entity. Third Normal Form (3NF) removes data elements that do not depend on the primary key. If the contents of a group of data elements can apply to more than a single entity instance, those data elements belong in a separate entity.
There are further levels of normalization that I will not discuss in this column to keep the discussion moving along. For an introductory discussion of normalization visit http://wdvl.internet.com/Authoring/DB/Normalization.
Physical Database Design
But you can not stop after developing a logical data model in 3NF. The logical model must be adapted to a physical database implementation. Contrary to popular belief this is not a simple transformation of entities to tables. Many other physical design factors must be planned and implemented. These factors include:
- A relational table is not the same as a file or a data set. The DBA must design and create the physical storage structures to be used by the relational databases to be implemented.
- The order of columns may need to be different than that specified by the data model based on the functionality of the RDBMS being used. Column order and access may have an impact on database logging, locking, and organization. The DBA must understand these issues and transform the logical model appropriately.
- The logical data model needs to be analyzed to determine which relationships need to be physically implemented using referential integrity (RI). Not all relationships should be defined using RI due to processing and performance reasons.
- Indexes must be designed to ensure optimal performance. To create the proper indexes the DBA must examine the database design in conjunction with the proposed SQL to ensure that database queries are supported with the proper indexes.
- Database security and authorization must be defined for the new database objects and its users.
These are not simple tasks that can be performed by individuals without database design and implementation skills. That is to say, DBAs are required.
The DBA Management Discipline
Database administration must be approached as a management discipline. The term discipline implies planning and implementation, according to that plan. When database administration is treated as a management discipline, the treatment of data within your organization will improve. It is the difference between being reactive and proactive.
All too frequently the DBA group is overwhelmed by requests and problems. This occurs for many reasons, including understaffing, overcommitment to application development projects, lack of repeatable processes, lack of budget and so on.
When operating in this manner, the database administrator is being reactive. The reactive DBA functions more like a firefighter. His attention is focused on resolving the biggest problem being brought to his attention. A proactive DBA can avoid many problems altogether by developing and implementing a strategic blueprint to follow when deploying databases within their organization.
The 17 Skills Required of a DBA
Implementing a DBA function in your organization requires careful thought and planning. The previous sections of this article are just a beginning. The successful eDBA will need to acquire and hone expertise in the following areas:
- Data modeling and database design. The DBA must possess the ability to create an efficient physical database design from a logical data model and application specifications. The physical database may not conform to the logical model 100 percent due to physical DBMS features, implementation factors, or performance requirements. If the data resource management discipline has not been created, the DBA also must be responsible for creating data modeling, normalization, and conceptual and logical design.
- Metadata management and repository usage. The DBA is required to understand the technical data requirements of the organization. But this is not a complete description of his duties. Metadata, or data about the data, also must be maintained. The DBA, or sometimes the Data Administrator (DA), must collect, store, manage, and enable the ability to query the organization's metadata. Without metadata, the data stored in databases lacks true meaning.
- Database schema creation and management. A DBA must be able to translate a data model or logical database design into an actual physical database implementation and to manage that database once it has been implemented.
- Procedural skills. Modern databases manage more than merely data. The DBA must possess procedural skills to help design, debug, implement, and maintain stored procedures, triggers, and user-defined functions that are stored in the DBMS. For more on this topic check out www.craigsmullins.com/db2procd.htm.
- Capacity planning. Because data consumption and usage continues to grow, the DBA must be prepared to support more data, more users, and more connection. The ability to predict growth based on application and data usage patterns and to implement the necessary database changes to accommodate the growth is a core capability of the DBA.
- Performance management and tuning. Dealing with performance problems is usually the biggest post-implementation nightmare faced by DBAs. As such, the DBA must be able to proactively monitor the database environment and to make changes to data structures, SQL, application logic or the DBMS subsystem to optimize performance.
- Ensuring availability. Applications and data are more and more required to be up and available 24 hours a day, seven days a week. The DBA must be able to ensure data availability using non-disruptive administration tactics.
- SQL code reviews and walk-throughs. Although application programmer usually write SQL, DBAs are usually blamed for poor performance. Therefore, DBAs must possess in-depth SQL knowledge so they can understand and review SQL and host language programs and to recommend changes for optimization.
- Backup and recovery. Everyone owns insurance of some type because we want to be prepared in case something bad happens. Implementing robust backup and recovery procedures is the insurance policy of the DBA. The DBA must implement an appropriate database backup and recovery strategy based on data volatility and application availability requirements.
- Ensuring data integrity. DBAs must be able to design databases so that only accurate and appropriate data is entered and maintained. To do so, the DBA can deploy multiple types of database integrity including entity integrity, referential integrity, check constraints, and database triggers. Furthermore, the DBA must ensure the structural integrity of the database.
- General database management. The DBA is the central source of database knowledge in the organization. As such he must understand the basic tenets of relational database technology and be able to accurately communicate them to others.
- Data security. The DBA is charged with the responsibility to ensure that only authorized users have access to data. This requires the implementation of a rigorous security infrastructure for production and test databases.
- General systems management and networking skills. Because once databases are implemented they are accessed throughout the organization and interact with other technologies, the DBA must be a jack of all trades. Doing so requires the ability to integrate database administration requirements and tasks with general systems management requirements and tasks (like job scheduling, network management, transaction processing, and so on).
- ERP and business knowledge. For e-businesses doing Enterprise Resource Planning (ERP) the DBA must understand the requirements of the application users and be able to administer their databases to avoid interruption of business. This sounds easy, but most ERP applications (SAP, Peoplesoft, etc.) use databases differently than homegrown applications. So DBAs require an understanding of how the ERP packaged applications impact the e-business and how the databases used by those packages differ from traditional relational databases. Some typical differences include application-enforced RI, program locking, and the creation of database objects (tables, indexes, etc.) on-the-fly. These differences require different DBA techniques to manage the ERP package effectively.
- Extensible data type administration. The functionality of modern DBMSes can be extended using user-defined data types. The DBA must understand how these extended data types are implemented by the DBMS vendor and be able to implement and administer any extended data types implemented in their databases.
- Web-specific technology expertise. For e-businesses, DBAs are required to have knowledge of Internet and Web technologies to enable databases to participate in Web-based applications. Examples of this type of technology include HTTP, FTP, XML, CGI, Java, TCP/IP, Web servers, firewalls and SSL. Other DBMS-specific technologies include IBM's Net.Data for DB2 and Oracle Portal (formerly WebDB).
- Storage management techniques. The data stored in every database resides on disk somewhere (unless it is stored on one of the new Main Memory DBMS products). The DBA must understand the storage hardware and software available for use, and how it interacts with the DBMS being used. Storage technologies include raw devices, RAID, SANs and NAS.
Meeting the Demand
The number of mission-critical Web-based applications that rely on back-end databases is increasing. Established and emerging e-businesses achieve enormous benefits from the Web/database combination, such as rapid application development, cross-platform deployment and robust, scalable access to data. E-business usage of database technology will continue to grow, and so will the demand for the eDBA. Make sure your organization is prepared to manage its Web-enabled databases before moving them to production. Or be prepared to encounter plenty of problems.
Craig Mullins is an independent consultant and president of Mullins Consulting, Inc. Craig has extensive experience in the field of database management having worked as an application developer, a DBA, and an instructor with multiple database management systems including DB2, Sybase, and SQL Server. Craig is also the author of the DB2 Developer’s Guide, the industry-leading book on DB2 for z/OS, and Database Administration: Practices and Procedures, the industry’s only book on heterogeneous DBA procedures. You can contact Craig via his web site at http://www.craigsmullins.com.
Contributors : Craig S. Mullins
Last modified 2006-01-16 07:08 AM