Skip to content

DBAzine.com

Sections
Personal tools
You are here: Home » Oracle » Oracle Articles Archive » Hierarchical Query Enhancements in Oracle Database 10g
Seeking new owner for this high-traffic DBAzine.com site.
Tap into the potential of this DBA community to expand your business! Interested? Contact us today.
Who Are You?
I am a:
Mainframe True Believer
Distributed Fast-tracker

[ Results | Polls ]
Votes : 3549
 

Hierarchical Query Enhancements in Oracle Database 10g

by Sanjay Mishra

Some applications make extensive use of hierarchical data such as an organization chart, a bill of material in a manufacturing and assembly plant, or a family tree. These types of information are most conveniently represented in a tree structure. However, such data can be easily fit into a relational table by using a self-referential relationship, as in the following definition of the EMPLOYEE table:

CREATE TABLE EMPLOYEE (
EMP_ID          NUMBER (4) CONSTRAINT EMP_PK PRIMARY KEY,
EMP_NAME        VARCHAR2 (15) NOT NULL,
DEPT_ID         NUMBER (2) NOT NULL,
MGR_ID          NUMBER (4) CONSTRAINT EMP_FK REFERENCES EMPLOYEE 9EMP_ID)
SALARY          NUMBER (7,2) NOT NULL,
HIRE_DATE       DATE NOT NULL);

Note the foreign key constraint EMP_FK on the column MGR_ID, which references the EMP_ID column of the same table (and hence the term “self-referential”). In this table, the MGR_ID represents the EMP_ID of an employee’s manager.

Oracle provides some useful extensions to ANSI SQL to manipulate hierarchical data represented in a relational table. Up to Oracle9i, Oracle’s hierarchical extensions include the START WITH … CONNECT BY clause, the PRIOR operator, and the LEVEL pseudo-column. The following example lists the employees in a hierarchical order and indents the subordinates under an employee:

SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID
FROM EMPLOYEE
START WITH MGR_ID IS NULL
CONNECT BY PRIOR EMP_ID = MGR_ID;

     LEVEL EMPLOYEE                 EMP_ID     MGR_ID
---------- -------------------- ---------- ----------
         1 KING                       7839
         2   JONES                    7566       7839
         3     SCOTT                  7788       7566
         4       ADAMS                7876       7788
         3     FORD                   7902       7566
         4       SMITH                7369       7902
         2   BLAKE                    7698       7839
         3     ALLEN                  7499       7698
         3     WARD                   7521       7698
         3     MARTIN                 7654       7698
         3     TURNER                 7844       7698
         3     JAMES                  7900       7698
         2   CLARK                    7782       7839
         3     MILLER                 7934       7782

Using Oracle provided extensions, you can construct complex hierarchical operations on a tree-structured data. With Oracle Database 10g, new features have been added to Oracle’s support for hierarchical queries. In this article, we will discuss these new features in detail, with examples.

New Features

The new hierarchical query features in Oracle Database 10g are:

      • New Operator
          • CONNECT_BY_ROOT
      • New Pseudocolumns
          • CONNECT_BY_ISCYCLE
          • CONNECT_BY_ISLEAF
      • New Function
          • SYS_CONNECT_BY_PATH (Oracle9i)
      • New Keywords
          • NOCYCLE
          • SIBLINGS (Oracle9i)

We will discuss each of these in the following sections.

CONNECT_BY_ROOT

The CONNECT_BY_ROOT operator, when applied to a column, returns the value for that column for the root row. The following example illustrates how the CONNECT_BY_ROOT operator is used:

SELECT EMP_NAME, CONNECT_BY_ISLEAF
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY PRIOR EMP_ID = MGR_ID;

Employee             Top Manager
-------------------- ------------
JONES                JONES
SCOTT                JONES
ADAMS                JONES
FORD                 JONES
SMITH                JONES
BLAKE                BLAKE
ALLEN                BLAKE
WARD                 BLAKE
MARTIN               BLAKE
TURNER               BLAKE
JAMES                BLAKE
CLARK                CLARK
MILLER               CLARK

In this example, the organization tree is built by starting with the rows that have MGR_ID = 7839. This means that anyone whose manager is “7839” will be considered a root for this query. Now, all the employees who come under the organizations under these roots will be displayed in the result set of this query along with the name of their top-most manager in the tree. The CONNECT_BY_ROOT operator determines the top-most node in the tree for a given row.

NOCYCLE

Cycles are not allowed in a true tree structure. But some hierarchical data may contain cycles. In a hierarchical structure, if a descendant is also an ancestor, it is called a cycle. It is sometimes difficult to identify cycles in hierarchical data. The hierarchical construct “START WITH … CONNECT BY … PRIOR” will report an error if there is a cycle in the data.

To allow the “START WITH … CONNECT BY … PRIOR” construct to work properly even if cycles are present in the data, Oracle Database 10g provides a new keyword, NOCYCLE. If there are cycles in the data, you can use the NOCYCLE keyword in the CONNECT BY clause, and you will not get the error mentioned earlier.

The test data we have in the EMPLOYEE table doesn’t have a cycle. To test the NOCYCLE feature, let’s introduce a cycle into the existing EMPLOYEE data, by updating the MGR_ID column of the top-most employee (KING with EMP_ID=7839) with the EMP_ID of one of the lowest-level employees (MARTIN with EMP_ID = 7654).

UPDATE EMPLOYEE
SET MGR_ID = 7654
WHERE MGR_ID IS NULL;

Now, if you perform a hierarchical query, you will get an ORA-01436 error:

SELECT LEVEL, LPAD(' ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY PRIOR EMP_ID = MGR_ID;

     LEVEL EMPLOYEE                 EMP_ID     MGR_ID
---------- -------------------- ---------- ----------
         1 KING                       7839       7654
         2   JONES                    7566       7839
         3     SCOTT                  7788       7566
         4       ADAMS                7876       7788
         3     FORD                   7902       7566
         4       SMITH                7369       7902
         2 BLAKE                      7698       7839
         3     ALLEN                  7499       7698
         3     WARD                   7521       7698
         3     MARTIN                 7654       7698
         4       KING                 7839       7654
         5         JONES              7566       7839
         6           SCOTT            7788       7566
         7             ADAMS          7876       7788
         6           FORD             7902       7566
ERROR:
ORA-01436: CONNECT BY loop in user data

Besides the error, note that the whole tree starting with KING starts repeting under MARTIN. This is erroneous and confusing. The NOCYCLE keyword can be used in the CONNECT BY clause to get rid of this error:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY NOCYCLE PRIOR EMP_ID = MGR_ID;

     LEVEL             EMPLOYEE     EMP_ID     MGR_ID
---------- -------------------- ---------- ----------
         1             KING           7839       7654
         2               JONES        7566       7839
         3             SCOTT          7788       7566
         4               ADAMS        7876       7788
         3             FORD           7902       7566
         4               SMITH        7369       7902
         2             BLAKE          7698       7839
         3               ALLEN        7499       7698
         3               WARD         7521       7698
         3               MARTIN       7654       7698
         3               TURNER       7844       7698
         3               JAMES        7900       7698
         2             CLARK          7782       7839
         3               MILLER       7934       7782

The above query recognizes that there is a cycle and ignore the cycle (as an impact of the NOCYCLE keyword), and returns the rows as if there were no cycle.

CONNECT_BY_ISCYCLE

It is sometimes difficult to identify cycles in hierarchical data. Oracle 10g’s new pseudocolumn CONNECT_BY_ISCYCLE can help you identify the cycles in the data easily. The CONNECT_BY_ISCYCLE can be used only in conjunction with the NOCYCLE keyword in a hierarchical query. The CONNECT_BY_ISCYCLE pseudocolumn returns 1 if the current row has a child which is also its ancestor; otherwise it returns 0. For example:

SELECT EMP_NAME, CONNECT_BY_ISCYCLE
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY NOCYCLE PRIOR EMP_ID = MGR_ID;

EMP_NAME        CONNECT_BY_ISCYCLE
--------------- ------------------
KING                             0
JONES                            0
SCOTT                            0
ADAMS                            0
FORD                             0
SMITH                            0
BLAKE                            0
ALLEN                            0
WARD                             0
MARTIN                           1
TURNER                           0
JAMES                            0
CLARK                            0
MILLER                           0

Note that since MARTIN is KING’s manager in this data set, and also MARTIN comes under the organization tree under KING, the row for MARTIN has a value 1 for CONNECT_BY_ISCYCLE.

Note: For correct results in the subsequent queries, we should revert the data back to its original state by rolling back (if you have not committed) the earlier change we did to force a cycle in the data. If you have already committed the change, then update the MGR_ID for KING to NULL.

CONNECT_BY_ISLEAF

In a tree structure, the nodes at the lowest level of the tree are referred to as leaf nodes. Leaf nodes have no children. CONNECT_BY_ISLEAF is a pseudocolumn that returns 1 if the current row is a leaf, and returns 0 if the current row is not a leaf. For example:

SELECT EMP_NAME, CONNECT_BY_ISLEAF
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY PRIOR EMP_ID = MGR_ID;

EMP_NAME         CONNECT_BY_ISLEAF
---------------  -----------------
KING                             0
JONES                            0
SCOTT                            0
ADAMS                            1
FORD                             0
SMITH                            1
BLAKE                            0
ALLEN                            1
WARD                             1
MARTIN                           1
TURNER                           1
JAMES                            1
CLARK                            0
MILLER                           1

This new feature can help simplify SQL statements that need to identify all the leaf nodes. Without this pseudocolumn, to identify the leaf nodes, you would probably write a query like the following:

SELECT EMP_ID, EMP_NAME, SALARY, HIRE_DATE
FROM EMPLOYEE E
WHERE NOT EXISTS 
(SELECT EMP_ID FROM EMPLOYEE E1 WHERE E.EMP_ID = E1.MGR_ID);

EMP_ID  EMP_NAME        SALARY    HIRE_DATE
------- --------------- ---------- ---------
7369     SMITH                 800 17-DEC-80
7499     ALLEN                1600 20-FEB-81
7521     WARD                 1250 22-FEB-81
7654     MARTIN               1250 28-SEP-81
7844     TURNER               1500 08-SEP-81
7876     ADAMS                1100 23-MAY-87
7900     JAMES                 950 03-DEC-81
7934     MILLER               1300 23-JAN-82

However, this query can be made much simpler with the new pseudocolumn CONNECT_BY_ISLEAF, as shown below:

SELECT EMP_ID, EMP_NAME, SALARY, HIRE_DATE
FROM EMPLOYEE E
WHERE CONNECT_BY_ISLEAF = 1
START WITH MGR_ID IS NULL
CONNECT BY PRIOR EMP_ID = MGR_ID;

EMP_ID EMP_NAME            SALARY HIRE_DATE
------- --------------- ---------- ---------
7876    ADAMS                 1100 23-MAY-87
7369    SMITH                  800 17-DEC-80
7499    ALLEN                 1600 20-FEB-81
7521    WARD                  1250 22-FEB-81
7654    MARTIN                1250 28-SEP-81
7844    TURNER                1500 08-SEP-81
7900    JAMES                  950 03-DEC-81
7934    MILLER                1300 23-JAN-82

SYS_CONNECT_BY_PATH

The SYS_CONNECT_BY_PATH function was introduced in Oracle9i. However, it makes sense to discuss it along with the enhancements in Oracle Database 10g. The SYS_CONNECT_BY_PATH is function takes two arguments — a column name, and a character string — and returns the value of the column from the root node to each node, separated by the character string. For example:

SELECT SYS_CONNECT_BY_PATH(EMP_NAME, '::')
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY NOCYCLE PRIOR EMP_ID = MGR_ID;

SYS_CONNECT_BY_PATH(EMP_NAME,'::')
----------------------------------------
::KING
::KING::JONES
::KING::JONES::SCOTT
::KING::JONES::SCOTT::ADAMS
::KING::JONES::FORD
::KING::JONES::FORD::SMITH
::KING::BLAKE
::KING::BLAKE::ALLEN
::KING::BLAKE::WARD
::KING::BLAKE::MARTIN
::KING::BLAKE::TURNER
::KING::BLAKE::JAMES
::KING::CLARK
::KING::CLARK::MILLER

ORDER SIBLINGS BY

The SIBLINGS keyword was introduced in Oracle9i. However, it makes sense to discuss it along with the enhancements in Oracle Database 10g. A hierarchical query with a “START WITH … CONNECT BY … PRIOR …” construct displays the results in an arbitrary order, as shown in the following example:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY PRIOR EMP_ID = MGR_ID;

  LEVEL EMPLOYEE                 EMP_ID     MGR_ID
------- -------------------- ---------- ----------
1 KING                             7839
2   JONES                          7566       7839
3     SCOTT                        7788       7566
4       ADAMS                      7876       7788
3     FORD                         7902       7566
4       SMITH                      7369       7902
2   BLAKE                          7698       7839
3     ALLEN                        7499       7698
3     WARD                         7521       7698
3     MARTIN                       7654       7698
3     TURNER                       7844       7698
3     JAMES                        7900       7698
2 CLARK                            7782       7839
3   MILLER                         7934       7782

As always, you can use an ORDER BY clause to order the result rows in the way you want. However, in this case, an ORDER BY clause can destroy the hierarchical layers of the displayed data, as shown in the following example:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY PRIOR EMP_ID = MGR_ID
ORDER BY EMP_NAME;

LEVEL EMPLOYEE                 EMP_ID     MGR_ID
----- -------------------- ---------- ----------
    4       ADAMS                7876       7788
    3     ALLEN                  7499       7698
    2   BLAKE                    7698       7839
    2   CLARK                    7782       7839
    3     FORD                   7902       7566
    3     JAMES                  7900       7698
    2   JONES                    7566       7839
    1 KING                       7839
    3     MARTIN                 7654       7698
    3     MILLER                 7934       7782
    3     SCOTT                  7788       7566
    4        SMITH               7369       7902
    3     TURNER                 7844       7698
    3     WARD                   7521       7698

As you can see from the above output, it is impossible to identify the hierarchical relationship between the rows. To resolve this problem, Oracle Database 10g has introduced a new keyword SIBLINGS, that you can use in an ORDER BY clause, and order the result set properly. For example:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || EMP_NAME "EMPLOYEE", EMP_ID, MGR_ID
FROM EMPLOYEE
START WITH EMP_ID = 7839
CONNECT BY PRIOR EMP_ID = MGR_ID
ORDER SIBLINGS BY EMP_NAME;

LEVEL EMPLOYEE                 EMP_ID     MGR_ID
----- -------------------- ---------- ----------
    1 KING                       7839
    2   BLAKE                    7698       7839
    3     ALLEN                  7499       7698
    3     JAMES                  7900       7698
    3     MARTIN                 7654       7698
    3     TURNER                 7844       7698
    3     WARD                   7521       7698
    2   CLARK                    7782       7839
    3     MILLER                 7934       7782
    2   JONES                    7566       7839
    3     FORD                   7902       7566
    4       SMITH                7369       7902
    3     SCOTT                  7788       7566
    4       ADAMS                7876       7788

In the above output, BLAKE, CLARK and JONES are siblings, and they are displayed in the ascending order. So are BLAKE’s children – ALLEN, JAMES, MARTIN, TURNER and WARD.

Conclusion

Oracle Database 10g enhances the already powerful hierarchical query features of the Oracle database. Among the new features are the easy ways to identify leafs and cycles in the data. The ordering of siblings provides a great way to improve the readability of the result sets. Developers who are familiar with Oracle’s hierarchical query constructs will find these features very useful.

--

Sanjay Mishra has more than 12 years of industry experience, and has extensively worked in the areas of database architecture, database management, performance tuning, scalability, ETL, backup / recovery, parallel server, and parallel execution. He has coauthored three Oracle books published by O'Reilly & Associates (Mastering Oracle SQL, Oracle SQL Loader: The Definitive Guide, Oracle Parallel Processing). Sanjay can be reached at smishra_tech@yahoo.com.


Sanjay Mishra
Last modified 2005-04-16 09:23 AM

Typo In CONNECT_BY_ROOT Example?

Posted by ibrandt at 2007-04-07 10:01 AM
In the CONNECT_BY_ROOT example I'm guessing you meant "SELECT EMP_NAME, CONNECT_BY_ROOT...", not "SELECT EMP_NAME, CONNECT_BY_ISLEAF...".
Transaction Management
Reduce downtime and increase repeat sales by improving end-user experience.
Free White Paper
Database Recovery
Feeling the increased demands on data protection and storage requirements?
Download Free Report!
 
 

Powered by Plone