An Oracle Instructor's Guide to Oracle9i - External Tables

by Christopher Foot

This is the second article in a three-part series on Oracle's latest release, Oracle9i. The first article offered information on persistent initialization parameter files, remote startup/shutdown, database managed undo segments, resumable space allocation and flashback query.

In this installment, we'll discuss external tables, tablespace changes, Oracle managed files, multiple blocksizes and cache configuration, on-line table reorganization and index monitoring.

The last article in this series will cover RAC (Real Application Clusters), fail safe, data guard, fine-grained resource management, fine-grained auditing and label security.

External Tables

Seasoned data warehouse administrators know that getting data out of the data warehouse is not the only challenging issue they must address. Extracting, transforming and loading data into the data warehouse can also be quite formidable (and quite expensive) tasks.

Before we begin our discussion on data warehousing, we need to understand that the data warehouse always contains data from external sources. The data is extracted from the source systems, transformed from operational data to business data using business rules, and ultimately, loaded into the data warehouse tables. This process of extracting data from source systems and populating the data warehouse is called Extraction, Transformation and Loading or ETL. Shops deploying data warehouses have the options of purchasing third-party ETL tools or writing scripts and programs to perform the transformation process manually.

Before Oracle9i, the most common methods of manually performing complex transformations were:

The extracted data would be loaded into staging tables in the data warehouse. The staged data would be transformed in the database and then used as input to programs that updated the permanent data warehouse tables.
The data would be transformed in flat files stored outside of the database. When the transformation process was complete, the data would be loaded into the data warehouse.

Oracle9i introduces external tables, which provide a mechanism to view data stored in external sources as if it were a table in the database. This ability to read external data provides a more straightforward method of loading and transforming data from external sources. Administrators no longer need to reserve space inside the database for staging tables or write external programs to transform the data outside of the database environment. By making it no longer necessary to stage data in the Oracle database, Oracle9i's external tables have essentially streamlined the ETL function by merging the transformation and loading processes.

External tables in Oracle are read only and cannot have indexes built upon them. Their main use is a data source for more traditional Oracle table structures. Data warehouse administrators are able to use the CREATE TABLE AS SELECT…. and the INSERT INTO…..AS SELECT statements to populate Oracle tables using the external source as input.

Much of the data validation and cleansing that occurs during the ETL process requires access to existing data stored in the data warehouse. Since the external table data is viewed by the database as ordinary table data, SQL, PL/SQL and Java can be used to perform the data transformations. Joins, sorts, referential integrity verification, ID lookups and advanced string manipulations can be performed in the database environment. In addition, advanced SQL statements such as UPSERT and multi-table INSERT statements allow data to be easily integrated into the warehouse environment. The power of the database can be fully utilized to facilitate the transformation process.

External table definitions do not describe how the data is stored externally, rather they describe how the external data is to be presented to the Oracle database engine. Let's take a quick look at an external table definition:

CREATE TABLE empxt
(empno 	NUMBER(4),
ename 	VARCHAR2(10),
job 	VARCHAR2(9),
mgr 	NUMBER(4),
hiredateVARCHAR2(20),
sal 	NUMBER(7,2),
comm 	NUMBER(7,2),
deptno 	NUMBER(2))
ORGANIZATION EXTERNAL
(TYPE ORACLE_LOADER
DEFAULT DIRECTORY dat_dir
ACCESS PARAMETERS
(records delimited by newline
badfile bad_dir:'empxt%a_%
p.bad'
logfile log_dir:'empxt%a_%
p.log'
fields terminated by ','
missing field values are null
(empno, ename, job, mgr,
hiredate, sal, comm, deptno))
LOCATION ('empxt1.dat', 
'empxt2.dat'))
REJECT LIMIT UNLIMITED;

Most of the above table's definition should be familiar to us. However, a few parameters warrant further investigation:

ORGANIZATION EXTERNAL - Designates that the table's data resides in an external location.
TYPE - Indicates the access driver. The access driver is the API that interprets the external data for the database. If you do not specify TYPE in the table's definition, Oracle uses the default access driver, ORACLE_LOADER.
DEFAULT DIRECTORY - specifies one or more default directory objects that correspond to directories on the file system where the external data resides. Default directories are able to contain both source data and output files (logs, bad files, discard files, etc.). The directory objects that refer to the directories on the file system must already be created with the CREATE DIRECTORY SQL statement. In addition, READ access must be granted to directory objects containing the source data and WRITE access must be granted to all directories that are to contain output files (BAD_DIR, LOG_DIR). Users wanting access to external table data must be granted the appropriate security on the directory objects as well as the table.
ACCESS PARAMETERS - Assigns values to access driver parameters.
BADFILE, LOGFILE -Oracle load utility output files.
LOCATION - Specifies the location for each external data source. The Oracle server does not interpret this clause. The access driver specified interprets this information in the context of the external data.
PARALLEL (not specified) - Enables parallel query processing on the external data source.

Oracle9i external tables provide great benefits to warehouse environments by combining the transformation and external data access processes. Oracle calls the process "pipelining" and describes it as "a whole new model for loading and transforming external data."

There is a wealth of information available on Oracle9i external tables. Instead of providing you with an in-depth description of how to implement and administer Oracle9i external tables, please refer to Dave Moore's excellent article in DBAzine.com titled "External Tables in Oracle9i." He suggestion to use the external table feature to use SQL statements to search the database alert log is a GREAT idea!

Tablespace Changes

Oracle9i provides the database administrator with a variety (read that bewildering array) of new tablespace parameters and block sizes. Administrators are now able to create Oracle managed tablespaces, user managed tablespaces, locally managed tablespaces, dictionary managed tablespaces, specify AUTOALLOCATE, UNIFORM, PERMANENT, UNDO as well as select block sizes of 2K, 4K, 8K, 16K, or 32K.

The tablespace definition below combines a few of the aforementioned options:

    CREATE TABLESPACE oracle_local_auto DATAFILE SIZE 5M BLOCKSIZE 2K;

Many of the parameters were not specified intentionally to highlight some of the default specifications for Oracle9i tablespaces. Although some of the parameters we will review were introduced in earlier releases, it is important to discuss them to obtain a clear understanding of tablespace administration in Oracle9i. Let's continue our discussion by taking a closer look at the tablespace's definition:

Because a datafile specification was not provided, the tablespace is Oracle managed. The datafile clause is only optional if the DB_CREATE_FILE_DEST initialization parameter is set. The parameter specifies an operating system directory that is the default storage location for Oracle managed datafiles. The operating system directory must already exist and must have the proper security permissions to allow Oracle to create files in it. If a datafile specification and SIZE parameter are not specified, a 100 megabyte file is created by default. During tablespace creation, the database server selects a file name for the Oracle managed file and creates the file in the directory specified in the DB_CREATE_FILE_DEST initialization parameter. When the tablespace is dropped, Oracle automatically removes the Oracle managed files associated with the dropped tablespace. By default, an Oracle managed datafile is autoextensible with an unlimited maximum size.
The tablespace will be locally managed because we did not specify EXTENT MANAGEMENT DICTIONARY during creation. Oracle has changed the default from dictionary managed in Oracle8i to locally managed in Oracle9i. Locally managed tablespaces track all extent information in the tablespace itself, using bitmaps. Tracking extents in bitmaps improves speed and concurrency of space operations

Administrators are able to override Oracle managed extents by specifying EXTENT MANAGEMENT DICTIONARY in the tablespace definition. Dictionary managed tablespaces rely on data dictionary tables to track space utilization within the tablespace. The STEM tablespace is always dictionary managed.

The tablespace will use the default free space management setting of SEGMENT SPACE MANAGEMENT MANUAL. As a result, Oracle will use freelists to manage free space within segments in the tablespace. Free lists are lists of data blocks that have space available for inserting rows.

Administrators have the option of overriding the default specification of SEGMENT SPACE MANAGEMENT MANAUAL with SEGMENT SPACE MANAGEMENT AUTO. SEGMENT SPACE MANAGEMENT AUTO tells Oracle to use bitmaps to manage free space within a segment. The bitmap structure stores information that describes the amount of space in the blocks that are available for row inserts. As free space within each block grows and shrinks, its new state is reflected in the bitmap. Bitmaps allow Oracle to manage free space more automatically. As a result, tracking free space within segments using bitmaps provides a simpler and more efficient method of free space management. Only permanent, locally managed tablespaces can specify automatic segment space management.

The extent management will be AUTOALLOCATE (extent sizes defined and managed by Oracle) because a default storage clause is not be specified. If the default storage clause is not specified, or if it is specified with PCTINCREASE not equal to 0 and/or INITIAL not equal to NEXT, then Oracle creates a locally managed tablespace with extents managed automatically (AUTOALLOCATE).

Administrators are also able to specify that the tablespace is managed with uniform extents of a specific size by specifying UNIFORM SIZE in the tablespace's definition or by specifying INITIAL = NEXT and PCTINCREASE = 0. This specification tells Oracle to create a uniform locally managed tablespace with uniform extent size = INITIAL.
The tablespace datafile will have a 2K blocksize. Oracle9i allows administrators to specify a nonstandard block size for tablespaces. In order for the tablespace specification to override the standard database blocksize specified during database creation, the DB_CACHE_SIZE and DB_nK_CACHE_SIZE (where nk matches the tablespace block size) must be set in the initialization parameter file. Oracle9i allows administrators to choose from 2K, 4K, 8K, 16K and 32K blocksizes.
Finally, it will be autoextensible (the file will be able to automatically grow in size) because autoextensible is the default for an Oracle managed file.

Let's take a look at our tablespace definition again. This time we will provide all of the specifications for some of the features we have just discussed:

    CREATE TABLESPACE oracle_local_auto DATAFILE SIZE 5M BLOCKSIZE 2K 
    AUTOEXTEND ON EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT 
    MANUAL;

Oracle9i's new tablespace definitions allow administrators to tailor their environments to meet application requirements. We'll end our discussion on Oracle9i tablespaces with a few quick recommendations:

Oracle managed - In addition to the benefits of not needing to create filenames and define specific storage requirements, managed files provide the additional advantage of being deleted from the operating system when the DROP TABLESPACE statement is executed. But these benefits do not outweigh the disadvantage of losing the flexibility of specifying different mountpoints or drives manually. Most administrators will prefer to have the flexibility of placing files on different drives or mountpoints and to not be forced into using one directory specification (whether that directory is striped or not).
Multiple block sizes - Multiple blocksize specifications allow administrators to tailor physical storage specifications to a data object's size and usage to maximize I/O performance. In addition, it also allows administrators to easily use the transportable tablespace feature to transfer tablespaces between databases having different default blocksizes (i.e. moving data from an OLTP application to a data warehouse).
Locally managed - Oracle is highly recommending that locally managed tablespaces be used for all tablespaces except the SYSTEM tablespace. Because extent management is tracked internally, the need to coalesce tablespaces is no longer required. In addition, allocating or releasing space in a locally managed tablespace avoids recursive space management operations (updates to data dictionary tables that track space utilization). Oracle also states that data objects with high numbers of extents have less of a performance impact on locally managed tablespaces than they do on their dictionary managed counterparts.

Online Table Reorganizations

Oracle9i allows DBAs to perform complex table redefinitions on-line. Administrators now have the capability to change column names and datatypes, manipulate data, add and drop columns and partition tables while the table is being accessed by on-line transactions (for a complete list of changes, please refer to the Oracle9i Administration Guide). This new feature provides significant benefits over more traditional methods of altering tables that require the object to be taken off-line during the redefinition process. Oracle9i provides a set of procedures stored in the PL/SQL package DBMS_REDFINITION as the mechanism to perform on-line redefinitions.

Most tables in Oracle can be redefined. The Oracle9i Administration Guide provides a listing of table specifications that will prohibit a table from being redefined on-line. For example, one requirement is that the table being redefined must have a primary key. Oracle9i provides a procedure that will check the table to determine if it can be redefined. The example below shows the table SCOTT.SOURCE_EMP being checked to determine if it meets the on-line redefinition criteria:

     EXEC dbms_redefinition.can_redef_table ('SCOTT', 'SOURCE_EMP');

Administrators create an empty work table in the same schema as the table to be redefined. This work table is created with all of the desired attributes and will become the new table when the redefinition is executed. The two table definitions below show our source table (SCOTT.SOURCE_EMP) and the table containing our desired attributes (SCOTT.WORK_EMP):

     CREATE TABLE scott.source_emp
     (empno NUMBER(4) PRIMARY KEY,
     ename VARCHAR2(10),
     job VARCHAR2(9),
     mgr NUMBER(4),
     hiredate DATE,
     sal NUMBER(7, 2),
     comm NUMBER(7, 2),
     deptno NUMBER(2));

     create table scott.work_emp
     (enum NUMBER PRIMARY KEY,
     lname VARCHAR2(20),
     new_col TIMESTAMP,
     salary  NUMBER));

After the redefinition process is complete, SCOTT.WORK_EMP will become the new SCOTT.SOURCE_EMP table and SCOTT.SOURCE_EMP will become SCOTT.WORK_EMP. The tables are in effect "swapped" during the final phase of transformation.

The next step is to transfer the data from the SCOTT.SOURCE_EMP table to SCOTT.WORK_EMP using the DBMS_REDEFINITION.START_REDEF_TABLE procedure. The step also links the two tables together for the remainder of the redefinition process. Administrators code column mappings and data modifications during this step to transform the data. The statement below shows the SCOTT.SOURCE_EMP data being manipulated as it is being transferred to the SCOTT.WORK_EMP table:

    EXEC dbms_redefinition.start_redef_table
    ('SCOTT', 'SOURCE_EMP', 'WORK_EMP', 'EMPNO ENUM, ENAM LNAME, SAL*3 
    SALARY')

The above redefinition statement multiplies the SALARY column by three and renames columns EMPNO to ENUM and ENAM to LNAM. The work table also has a new column added (NEW_COL) and does not have column definitions for JOB, MGR, HIREDATE, COMM, DEPTNO.

Triggers, indexes, constraints and grants can now be created on the work table. Referential constraints must be created using the DISABLE option. All triggers, indexes, constraints and grants replace those on the source table being redefined.

The final step of the redefinition process is to execute DBMS_REDEFINITION.FINISH_REDEF_TABLE, which performs the following functions:

The work table becomes the new source table. The new source table's definition includes all grants, indexes, constraints and triggers created on the work table during the transformation process.

All referential integrity constraints created on the work table are enabled.
The source table becomes the new work table. All grants, indexes, constraints and triggers that were on the old source table are also transferred. Referential integrity constraints on the new work table are disabled.
All DML statements applied to the old source table during the redefinition process are transferred to the work (new source) table.
The tables are locked for the length of time it takes to perform the table name "swap."
PL/SQL procedures that access the table being redefined are invalidated. They may remain invalidated if the redefinition process has changed the table structure in such a way that they can no longer successfully access the table data.

During the time period between the executions of START_REDEF_TABLE and FINISH_REDEF_TABLE, Oracle9i saves all DML changes being applied to the source table. These recorded changes are applied to the work table during the final step of the transformation process. The number of stored changes that need to be applied has a direct affect on the length of time it takes FINISH_REDEF_TABLE to execute. A large number of changes being applied to the source table during the redefinition process may cause the FINISH_REDEF_TABLE step to become quite "lengthy." Administrators are able to execute the DBMS_REDEFINITION.SYNC_INTERIM_TABLE procedure to periodically synchronize the source and work tables during the period between START_REDEF_TABLE and FINISH_REDEF_TABLE. Periodically synchronizing the tables reduces the number of stored changes that need to be applied to the work table and the amount of time it takes FINISH_REDEF to execute.

Oracle9i supplies DBMS_REDEFINITION.ABORT_REDEF_TABLE that can be used to cancel the redefinition process. Administrators are able to abort the process at any time between the executions of START_REDEF_TABLE and FINISH_REDEF_TABLE.

Index Monitoring

Determining if an index will increase performance is a pretty straightforward process. The administrator is focusing their tuning efforts on a particular table or query and is able to gather the specific information necessary to assist in the decision making process.

Dropping unused indexes is also an important part of application tuning. Indexes force Oracle to occur additional I/O every time a row is inserted or deleted into the table they are built upon. Every update of the table's columns incurs additional I/O to all indexes defined on those columns. Unused indexes also waste space and add unnecessary administrative complexity.

Determining if indexes were being used in releases prior to Oracle9i was a time consuming and error-prone process. EXPLAIN plan and trace output could be used but there was no single mechanism that monitored index usage at the database level.

Oracle9i simplifies the index usage monitoring process by providing the ALTER INDEX……… MONITOR USAGE command. To successfully start or stop a monitoring session, the user must be logged on as the schema owner of the index. The statement below turns monitoring on for the index SCOTT.EMPIDX while the second statement ends the monitoring session:

    ALTER INDEX scott.empidx MONITORING USAGE;
    ALTER INDEX scott.empidx NOMONITORING USAGE;

The V$OBJECT_USAGE table can then be accessed to determine if the index was used during the monitoring session. When the session is started, Oracle clears the information in V$OBJECT_USAGE for the index being monitored and enters a new start time identifying when the index monitoring session started. After the index monitoring session is concluded, the USED column in the V$OBJECT_USAGE table will contain the value 'YES' if the index was used during the monitoring session and the value 'NO' if it was not.

In the last installment of this series, we'll discuss RAC (Real Application Clusters), fail safe, data guard, fine-grained resource management, fine-grained auditing and label security.

Thanks and see you in class!

Christopher Foot has been involved in database management for over 18 years, serving as a database administrator, database architect, trainer, speaker, and writer. Currently, Chris is employed as a Senior Database Architect at RemoteDBA Experts, a remote database services provider. Chris is the author of over forty articles for a variety of magazines and is a frequent lecturer on the database circuit having given over a dozen speeches to local, national and international Oracle User Groups. His book titled OCP Instructors Guide for DBA Certification, can be found at http://www.dba-oracle.com/bp/bp_book14_OCP.htm.

Contributors : Christopher Foot, Dave Moore
Last modified 2006-03-21 09:58 AM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

An Oracle Instructor's Guide to Oracle9i - External Tables

External Tables

Tablespace Changes

Online Table Reorganizations

Index Monitoring