Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Datawarehouse » Data Warehouse Articles Archive » An Introduction to Oracle Warehouse Builder 10g
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 3565
 

An Introduction to Oracle Warehouse Builder 10g

by Mark Rittman

Whilst many Oracle DBAs build and maintain their data warehouses using nothing more than UNIX, SQL, and PL/SQL scripts, many are now starting to use graphical tools such as Oracle Warehouse Builder 10g (OWB) to help with development. What do tools such as OWB10g offer the DBA, and how can they help with the warehouse design and build?

Introduction

The Oracle database has many features that make it well suited to data warehousing, including support for very large databases, automated summary management, and an embedded multidimensional OLAP engine. Recent versions of Oracle have come with built-in extraction, transformation, and load (ETL) features, and it is possible to build an Oracle data warehouse using just these features, and SQL*Plus.

DBAs will, however, probably be aware that a number of vendors, including Oracle, offer tools and applications that are designed to assist with the design, build, and load of data warehouses — a type of application known as an ETL tool, with Oracle Oracle’s own offering being Oracle Warehouse Builder 10g. So, what does Oracle Warehouse Builder offer the Oracle DBA, and how can it help with the warehouse project lifecycle?

Product Architecture

Oracle Warehouse Builder is a GUI tool used for building Oracle data warehouses and data marts. Written in Java and currently now at version 10, versions are generally provided for Microsoft Windows, Intel Linux, and Solaris/HP-UX/AIX/Tru-64. OWB is licensed as part of the 10g Developer Suite, though as versions are generally released more frequently than  updates to 10gDS, it’s best to check on the OTN download site to obtain the latest version.

OWB10g consists of two main components: the design environment, which is concerned with working with metadata (abstract representations of warehouse objects, together with business rules), and the runtime environment, which takes this metadata and turns it into physical database objects and process flows.

The design environment consists of the OWB client and a number of utility programs used for setting up the runtime environment, reporting on the repository, and providing a command-line interface for OWB.

All of the client tools work against metadata in the OWB repository, which is usually installed in its own schema on a development server. OWB comes with a Web-based reporting environment, which uses a desktop installation of OC4J to provide reports against the metadata in the repository.

Once the DBA has built the logical model in the repository, he needs to translate it into a physical design using the OWB Runtime component. This runtime component creates physical database objects, together with scripts and packages to extract data from source system, and populate the warehouse objects. OWB10g can extract data from any database that has either an ODBC driver, or an Oracle Gateway, together with flat files and data held in SAP. In addition, customers who license the business intelligence functionality for Oracle Applications are provided with a customized version of OWB, for extraction of applications data using business terms.

Graphical Object Definition

OWB10g allows you to create source modules that import data from flat files, relational databases, or SAP. When deployed, these import routines consist of SQL*Loader scripts, external table definitions, database links through to Oracle databases and databases accessed through Generic Connectivity or Oracle Gateways, and ABAP programs for extracting data out of SAP. Import definitions are defined using the OWB client graphical user interface, saved to the metadata repository, and generated using the OWB runtime component.

These OWB source modules are then used later on in the process as the start point for data transformations, which eventually go on to load the warehouse itself. As such, although a graphical user interface is used to define these sources, an experienced DBA or developer familiar with SQL*Loader syntax or Oracle DDL could probably hand-code the equivalent scripts in a shorter amount of time, as could an ABAP programmer working in a SAP environment. However, by creating source definitions within the OWB client, the definitions of these sources are captured within the central repository, providing a form of documentation as well as a way of automatically reconciling these definitions with possible changes in the underlying source data. Likewise, OWB provides a graphical interface for building warehouse objects, either as regular Oracle tables or with the additional ROLAP metadata available with the Enterprise Edition of Oracle 8i, 9i, and 10g

Source-to-Target Data Mappings

Although a GUI interface for defining warehouse objects is welcome, the big productivity gain is through the mapping and transformation capability within OWB10g. OWB allows the DBA to define a mapping canvas that has one or more data sources and targets, with data being processed through SQL transformations or PL/SQL functions.

The palette on the right-hand side of the mapping allows you to drop tables, views, external tables, flat files, and materialized views onto the mapping canvas; draw data mappings between the objects; and apply a range of SQL and PL/SQL transformations to the data, including the ability to use pipelined table functions. As part of the OWB10g package, a number of additional PL/SQL transformations are provided, including a name-and-address cleansing feature that matches addresses in the warehouse with third-party name and address files, a “match-merge” feature, PL/SQL transformations that utilize the XML Toolkit, and ones used for loading data into analytic workspaces

How Efficient is the Generated Code?

ETL tools such as OWB10g usually work by generating program code that is then executed to deploy and load the warehouse. A common DBA concern about such an approach is that the generated code may not as efficient, or optimized, as code that the DBA writes himself. So what does OWB10g-generated code look like?

Firstly, it’s worth understanding that as Oracle has chosen to only support the enterprise editions of Oracle 8i, 9i, and 10g as deployment platforms, the code that is generated is Oracle SQL and PL/SQL, and leverages the data warehousing features of the Oracle database. For example, when deploying table definitions, all options such as parallel access mode, partitioning, logging mode, and so on, are available to use, and the default way to build warehouse objects is to use the ROLAP functionality (dimensions, facts, hierarchies, and so on) that comes with the enterprise edition. When deploying data mappings, OWB first tries to generate the mapping as a set-based SQL update, only falling back to a cursor-based PL/SQL update if a single SQL statement cannot be used. For a regular SQL update, the only code that OWB adds to the base SQL statement is a PL/SQL wrapper that runs the code within a package and logs the mapping results to a set of audit tables.

To take an example, a simple mapping that inserts or updates a dimension table from an external table would be built using the following OWB data mapping:

If we were writing this manually and deploying to an Oracle9i or 10g database, we would use the MERGE command to merge into the underlying CHANNEL table by selecting from the CHANNELS_EXT external table, and, if possible, use parallelism and direct path inserts. Looking at the code generated by OWB, this is indeed the approach that the tool takes:

CREATE OR REPLACE PACKAGE "MAP_CHANNEL" AS
sql_stmt  VARCHAR2(32767);
get_abort BOOLEAN := FALSE;
get_trigger_success BOOLEAN := TRUE;
get_errors NUMBER(22) := 0;
get_status NUMBER(22) := 0;
-- Status variable for Batch cursors
"CHANNEL_St" BOOLEAN;

-- Function Main -- Entry point in package "MAP_CHANNEL"
FUNCTION Main RETURN NUMBER; 

END "MAP_CHANNEL";

/

CREATE OR REPLACE PACKAGE BODY "MAP_CHANNEL" AS

-----------------------------------------------------------------------
-- Function "CHANNEL_Bat"
-- performs batch extraction
-- Returns TRUE on success
-- Returns FALSE on failure
------------------------------------------------------------------------
FUNCTION "CHANNEL_Bat" RETURN BOOLEAN IS

BEGIN  
  EXECUTE IMMEDIATE 'ALTER SESSION ENABLE PARALLEL DML';      

  BEGIN    
    MERGE    
    /*+ APPEND PARALLEL(CHANNEL, DEFAULT, DEFAULT)  */    
    INTO      
      "CHANNEL"    
    USING      
      (SELECT        
      "CHANNELS_EXT"."CHANNEL_CLASS" "CHANNEL_CLASS",        
      "CHANNELS_EXT"."CHANNEL_DESC" "CHANNEL_DESC",        
      "CHANNELS_EXT"."CHANNEL_ID" "CHANNEL_ID"      
      FROM  "CHANNELS_EXT" "CHANNELS_EXT"        
      )  "MERGEQUERY_186"    
    ON (        
      "CHANNEL"."CNL_ID" = "MERGEQUERY_186"."CHANNEL_ID" )    
    WHEN NOT MATCHED THEN      
      INSERT        
        ("CHANNEL"."CLS_CLASS_NAME",        
        "CHANNEL"."CNL_NAME",        
        "CHANNEL"."CNL_ID")      
      VALUES        
        ("MERGEQUERY_186"."CHANNEL_CLASS",        
        "MERGEQUERY_186"."CHANNEL_DESC",        
        "MERGEQUERY_186"."CHANNEL_ID")    
    WHEN MATCHED THEN      
      UPDATE      
      SET        
        "CLS_CLASS_NAME" = "MERGEQUERY_186"."CHANNEL_CLASS",    
    "CNL_NAME" = "MERGEQUERY_186"."CHANNEL_DESC";    
    COMMIT;  
  EXCEPTION WHEN OTHERS THEN    
    ROLLBACK;    
    COMMIT;    
    RETURN FALSE;  
    END;  
    COMMIT;  
    RETURN TRUE;
  END "CHANNEL_Bat";


  FUNCTION Main RETURN NUMBER IS
  get_batch_status           BOOLEAN := TRUE;
  BEGIN

    -- Initialize all batch status variables  
    "CHANNEL_St" := FALSE;      


    "CHANNEL_St" := "CHANNEL_Bat";   

  RETURN get_status;
  END Main;
  END "MAP_CHANNEL";

  /

Note the use of “EXECUTE IMMEDIATE 'ALTER SESSION ENABLE PARALLEL DML';” and the “MERGE /*+ APPEND PARALLEL(CHANNEL, DEFAULT, DEFAULT)  */ INTO” within the script, and the minimal amount of cruft that OWB adds to the overall statement.

Another useful benefit of using an ETL tool such as OWB10g is that it provides a level of abstraction between the required process (for example, loading a table, merging into a dimension, pivoting rows and columns) and the actual implementation method for your particular Oracle version. An example of this would be our previous data mapping, which, in Oracle8i and earlier, would have been accomplished using separate INSERT and UPDATE statements, but with Oracle9i onwards, is better served by using a single MERGE statement. Using OWB10g allows you to define your process at the logical level with the tool that chooses the most appropriate execution method for your particular version of Oracle. This allows you to take advantage of any additional ETL functions Oracle puts into future releases of the database without having to revise your mappings.

The OMB*Plus Scriping Environment

A DBA concern with graphical ETL tools is that, whilst they might make a novice Oracle developer more productive, they actually slow down more experienced developers who are more used to working with the command line. Whilst tools such as OWB10g are great for novice users who need visual prompts when putting together objects and mappings in a warehouse, DBAs and UNIX sysadmins are more used to defining objects using scripts, and often write programs to generate or alter batches of objects in bulk. Another issue often facing developers working with GUI applications is that there’s normally no way to record and replay a series of development steps to apply them in one go to your project (e.g., to automate testing or to help with refactoring code).

To address these issues, OWB10g has a scripting capability that allows us to create, alter, generate, or drop both warehouse objects and mappings using the command line. OWB scripting consists of two components:

      • OMB Scripting, a programming language based on Tcl that allows us to manipulate objects in the OWB repository, and
      • OMB*Plus, a “SQL*Plus lookalike” used for interactively running OMB Scripting commands

OMB Scripting acts as an interface to the OWB Java API, translating scripting commands into API calls and allowing DBAs to work with the OWB repository without using the GUI interface. For example, to create a table definition in the OWB repository, you’d issue the following OMB*Plus command:

OMBCREATE TABLE 'STG_CUSTOMERS' \
  SET PROPERTIES (DESCRIPTION, BUSINESS_NAME) VALUES ('Customer staging
table','Staging Customers') \
ADD COLUMN 'CUST_ID' \
  SET PROPERTIES (DATATYPE, PRECISION, SCALE, NOT_NULL) VALUES
('NUMBER',7,0,1) \
  ADD COLUMN 'CUST_NAME' \

whilst to create a mapping, you might issue the following OMB*Plus command:

OMBCREATE MAPPING 'LOAD_COUNTRIES_MAP' \
SET PROPERTIES (BUSINESS_NAME, DESCRIPTION) \
VALUES ('Load Countries from HR Mapping', 'Mapping to load the countries
table from HR schema') \
ADD TABLE OPERATOR 'COUNTRIES' BOUND TO TABLE '../HR_SRC/COUNTRIES'\
ADD TABLE OPERATOR 'REGIONS' BOUND TO TABLE '../HR_SRC/REGIONS'\
ADD TABLE OPERATOR 'GEOGRAPHIES' BOUND TO TABLE 'GEOGRAPHIES' \
ADD JOINER OPERATOR 'JOIN' SET PROPERTIES (JOIN_CONDITION) \
VALUES ('INGRP1.REGION_ID = INGRP2.REGION_ID') \
ADD CONNECTION FROM ATTRIBUTE 'COUNTRY_ID' OF GROUP 'INOUTGRP1' OF
OPERATOR 'COUNTRIES' TO \
GROUP 'INGRP1' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM ATTRIBUTE 'COUNTRY_NAME' OF GROUP 'INOUTGRP1' OF
OPERATOR 'COUNTRIES' TO \
GROUP 'INGRP1' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM ATTRIBUTE 'REGION_ID' OF GROUP 'INOUTGRP1' OF
OPERATOR 'COUNTRIES' TO \
GROUP 'INGRP1' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM ATTRIBUTE 'REGION_ID' OF GROUP 'INOUTGRP1' OF
OPERATOR 'REGIONS' TO \
GROUP 'INGRP2' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM ATTRIBUTE 'REGION_NAME' OF GROUP 'INOUTGRP1' OF
OPERATOR 'REGIONS' TO \
GROUP 'INGRP2' OF OPERATOR 'JOIN' \
ADD CONNECTION FROM GROUP 'OUTGRP1' OF OPERATOR 'JOIN' TO \
GROUP 'INOUTGRP1' OF OPERATOR 'GEOGRAPHIES' BY NAME

As OMB scripting is based on Tcl with extensions for dealing with the OWB repository, you can use all of the Tcl programming constructs to programmatically create or maintain objects in the OWB repository, allowing you to for example loop through all of the tables within a module and add an additional MODIFIED DATE column.

Interestingly, OMB Scripting still gives the DBA the possibility of using scripts to create object definitions, although those scripts actually create metadata repository entries rather than actual physical objects in the database. This provides a level of abstraction from the actual physical implementation, but keeps the familiar command-line interface.

Metadata and Change Management

Whilst an experienced DBA can create a data warehouse with nothing more than a set of hand-written scripts, the area in which this approach tends to fall down is the capture and maintenance of metadata — “data about data” — that comprises the logical, business definitions of warehouse objects: business rules to define how they are used, together with information about the sources of data and the way in which the data has been processed prior to loading into the warehouse. One of the key DBA benefits with OWB10g is that it makes it easy to capture and propagate this metadata, storing it in the repository and providing a means to transfer it into other Oracle tools such as Oracle Discoverer and the OLAP Option.

OWB10g captures metadata as part of the normal process when defining warehouse objects, by either using the OWB client graphical user interface or the OMB*Plus scripting interface. This metadata is then grouped together into projects, which can be version controlled and reported on using the WSeb-based OWB10g Design Browser.

Metadata within Oracle Warehouse Builder is built around the Common Warehouse Model (CWM), a standard maintained by the Object Management Group. CWM-compliant metadata is (in theory) easily exchangeable between different vendors’ tools. OWB10g has a built-in metadata import facility for bringing in metadata from Computer Associates' Erwin, Powersoft’s Powerdesigner and in the standard OMG CWM 1.0 format, and can produce export files in the OMG CWM 1.0 format. From OWB 9.2 onwards, there is also a separately licensable product from Metaintegration Technology Inc known as “MITI Bridges,” which integrates in with the OWB client and extends the metadata exchange options to include products such as Cognos ReportNet, Microsoft Visio and Business Objects.

The version control and reporting features within OWB, referred to as OWB10g’s “Lifecycle Management” feature, are probably the key feature in the product that you can’t get by just manually creating and deploying scripts. Whilst using OWB on small projects can sometimes be seen as a bit of an overhead (especially for experienced DBAs who know Oracle commands) these features really become useful when your data warehouse scales up and starts to incorporate feeds from lots of disparate systems. OWB10g can manage and report on the complete flow of data from source to target, and propagate metadata you capture within OWB to all the BI & reporting tools in the Oracle technology stack. From a technical DBA’s point of view, compared to directly interacting with the Oracle server this is the less “glamorous” part of building a data warehouse, but for business users this is where much of the value lies, and by using an ETL tool like OWB10g you can ensure this information is properly stored in an accessible repository rather than spread amongst a series of spreadsheets and text files which eventually get out of date.

OLAP Integration

Those of you that read my previous article on the Oracle 10g OLAP Option may well be considering moving some of your relational data into multidimensional analytic workspaces. Whilst Oracle’s Analytic Workspace Manager provides an easy way to copy existing relational star-schemas into an analytic workspace, it does not provide any ETL functionality and does not have any of OWB’s metadata or change management functionality.

OWB10g does however have this functionality, and allows you to take a cube definition within the repository and deploy it to an analytic workspace. Two new PL/SQL transformations, WB_OLAP_LOAD_CUBE and WB_OLAP_LOAD_DIMENSION can be added to an existing data mapping to allow relational fact tables and dimension tables to be copied across into analytic workspace variables and dimensions, and entire relational star schemas can be “cloned” into analytic workspaces using the OLAP Metadata transfer bridge.

Although this functionality is welcome, there are limitations with the way it is current implemented, which may well be addressed in future, releases. A current limitation with the Oracle 10g OLAP Option is that the PL/SQL routines Oracle provide for creating and populating analytic workspace objects all rely on relational versions having being created previously, and because of this OWB10g first requires that you build and deploy relational versions of your facts and dimensions before you can deploy them to the multidimensional analytic workspace. In addition, the features that allow you to deploy to analytic workspaces (the two new PL/SQL transformations, and the OLAP Metadata Bridge) are clearly add-ons rather than integrated features of the product. Indications suggest that a future release of OWB (known as Oracle Warehouse Builder 10g Release 2) will feature a completely redesigned interface for building cubes and dimensions and will take advantage of the new AWXML feature with Oracle 10g OLAP, that allows DBAs and developers to build analytic workspace objects using XML definintions directly from relational tables or source files without the intermediate step of creating relational star schemas.

Issues And Drawbacks

OWB isn’t perfect however and from working on several projects that use the tool, there are a few drawbacks that prospective DBAs and developers should be aware of. For example, although OWB10g leverages most of the new ETL functions in Oracle 9i and 10g, it often lags behind the database when new features are introduced, a good example being the Change Data Capture feature that was introduced in Oracle 9i that is still not supported in OWB10g.

Support for summary management and materialized views are also fairly weak within OWB10g. Materialized views can be created like any other database object, but the SQL to define the aggregation within the materialized view has to be manually keyed and there is no graphical way to pick columns for grouping or summary. OLAP materialized views are also not directly supported, which you may wish to use if you’ve licensed the OLAP Option.

Another area that is lacking is support for change data capture and streams, features which were introduced with Oracle 9i and are used to propagate changes in the source system through to the target warehouse. Again, these features are slated for inclusion in OWB10g Release 2, although this is several years since these features were first introduced with Oracle 9i.

Other areas of concern are that, as OWB10g is a Java application, it has a significant “footprint” on the desktop and developers are recommended to run OWB10g on at least a Pentium III with no less that 512MB of RAM. Lastly, a final consideration with OWB10g is that, like any tool, it has its own learning curve, which may not be worthwhile if your project is small or does not have a requirement for lifecycle or metadata management.

Further Reading

DBAs and developers looking to find out more information on OWB10g should first check out the OTN Oracle Warehouse Builder homepage which has an extensive collateral library with white papers on OWB10g’s architecture and components. Also, readers might be find the archive of OWB postings on my Oracle weblog useful, which contains a number of tips on working with OWB gathered from several years of working with the product.

--

Mark Rittman is a Certified Oracle Professional DBA and works as a Consulting Manager at SolStonePlus, specializing in developing BI and DW applications using the Oracle Database, Application Server, Discoverer, Warehouse Builder, and Oracle OLAP. Outside of Plus Consultancy, Mark chairs the UKOUG BI and Reporting Tools SIG, and runs a Weblog dedicated to Oracle BI and Data Warehousing technology. Mark recently was awarded an OTN Community Award for contributions to the Oracle developer community, and is a regular speaker at Oracle User events in both the UK and Europe.


Contributors : Mark Rittman
Last modified 2006-02-08 02:53 PM
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone