Matrix Transposition in SQL

by Vadim Tropashko

“God made the integers; all the rest is the work of man.”

-- Leopold Kronecker

Matrix transposition is among Frequently Asked Questions. Given a single-column table ORIGINAL,

ENAME

SMITH

ALLEN

WARD

JONES

MARTIN

BLAKE

CLARK

SCOTT

KING

TURNER

ADAMS

JAMES

FORD

MILLER

we’ll explore how to transform it into TRANSPOSED:

Column

S A W J M B C S K T A J F M

M L A O A L L C I U D A O I

I L R N R A A O N R A M R L

T E D E T K R T G N M E D L

H N S I E K T E S S E

N R R

This problem has been discussed in the Usenet thread Matrix transpose in SQL, and general agreement was that it can’t be done in standard SQL-92. One of the latest Oracle magazine code tips -- “Transposing a Table into a Summary Matrix” -- suggests a rather lengthy procedural solution. This article describes a SQL solution with a minimal amount of procedural code. As a bonus, we’ll learn several ways how to program user-defined aggregates in Oracle 9i.

Nesting and Unnesting

Consider the table

EMPNO	POS	LET
7369	1	S
7369	2	M
7369	3	I
7369	4	T
7369	5	H
7499	1	A
7499	2	L
7499	3	L
7499	4	E
7499	5	N
…	…	…

that we call UNNESTED. Both tables in the beginning of the article could be viewed as aggregates of UNNESTED. The ORIGINAL table could be specified as:

    select concat(LET) from UNNESTED group by EMPNO

while the TRANSPOSED table is just a grouping by a different column:

    select concat(LET||’ ‘) from UNNESTED group by POS

(I also added a space padding to make the query result set more readable). So the problem of transposing the ORIGINAL table into TRANSPOSED can be solved by just implementing two steps:

Unnesting ORIGINAL --> UNNESTED

Nesting UNNESTED --> TRANSPOSED

The first step involves an integer enumeration relation, introduced in myprevious article. Reader feedback, and other articles about integer enumeration convinced me to further expand on this topic.

Integer Enumeration for Aggregate Dismembering

Again, I prefer producing arbitrary, large list of integers with a Table Function

CREATE TYPE IntSet AS TABLE OF Integer;

/



CREATE or replace FUNCTION UNSAFE

  RETURN  IntSet PIPELINED IS

BEGIN

    loop

       PIPE ROW(1);

    end loop;

END;

/

select rownum from TABLE(UNSAFE) where rownum < 1000000


select rownum from TABLE(UNSAFE) where rownum < 1000000

In my previous article, I reserved the possibility of using an upper-bound integer range argument that would make the function safe. In other words, the function would never spin out of control whenever a user forgot the stop predicate rownum < 1000000. On the other hand, using the function argument is inferior for two reasons:

predicates are more self-documenting than function arguments, and
we can use subqueries instead of hardcoded limits.

The runtime expense of using the table function is minimal: unlike forced materialization into a real table, logical I/O associated with table function calls is virtually zero.

In DB2, a list of integers can be constructed with recursive SQL:

with iota (n) as

( values(1)

   union all

  select n+1 from iota

  where n<100000

)

select * from iota;

It is slightly inconvenient, however, that the predicate, which limits the list of numbers, must be specified within the recursion subquery, while it naturally belongs to the main query. The problem of pushing the predicate inside an inner query is somewhat similar to the one we saw for UNSAFE table function.

With the list of integers at our disposal, writing an unnesting query is easy:

SQL>select empno, pos, substr(ENAME,i,1) from emp,

2 (select rownum pos from table(unsafe) 

3  where rownum < (select max(length(ename)) from emp));



     EMPNO        POS S

---------- ---------- -

      7369          1 S

      7369          2 M

      7369          3 I

      7369          4 T

      7369          5 H

      7499          1 A

      7499          2 L

      7499          3 L

      7499          4 E

      7499          5 N

      7521          1 W

Looking at strings "SMITH," "ALLEN," etc., as aggregates of letters might seem odd at first, but that is what they really are. We’ll assemble those letters back into aggregates of (different) words.

User Defined Aggregate Functions

There is no aggregate function that would concatenate strings in standard SQL. However, there are multiple ways defining it in Oracle:

Casting the subquery result set into the collection and the defining aggregate on it
Pipelining the user-defined aggregate function:

CREATE or replace FUNCTION CONCAT_LIST( cur SYS_REFCURSOR )

  RETURN  VARCHAR2 IS

    ret VARCHAR2(32000); 

    tmp VARCHAR2(4000);

BEGIN

    loop

        fetch cur into tmp;

        exit when cur%NOTFOUND;

            ret := ret || tmp;

    end loop;

    RETURN ret;

END;

/



select distinct 

    deptno, 

    CONCAT_LIST(CURSOR(

        select ename ||',' from emp ee where e.deptno = ee.deptno

    ) employees

from emp e;

Syntactically, neither of these solutions looks like a group by. However, scalar subquery in the select list is actually more powerful than group by. This idea is emphasized in the article by C.J. Date: “A discussion of some redundancies in SQL.” If you prefer, however, the traditional group by syntax, then there is yet another way to program user-defined aggregates:

Oracle 9i user-defined aggregates:

create or replace type string_agg_type as object (

     total varchar2(4000),

  

     static function

          ODCIAggregateInitialize(sctx IN OUT string_agg_type )

          return number,

  

     member function

          ODCIAggregateIterate(self IN OUT string_agg_type ,

                               value IN varchar2 )

          return number,

  

     member function

          ODCIAggregateTerminate(self IN string_agg_type,

                                 returnValue OUT  varchar2,

                                 flags IN number)

          return number,

  

     member function

          ODCIAggregateMerge(self IN OUT string_agg_type,

                             ctx2 IN string_agg_type)

          return number

);

/



create or replace type body string_agg_type is

  

  static function ODCIAggregateInitialize(sctx IN OUT string_agg_type)

  return number

  is

  begin

      sctx := string_agg_type( null );

      return ODCIConst.Success;

  end;

  

  member function ODCIAggregateIterate(self IN OUT string_agg_type,

                                       value IN varchar2 )

  return number

  is

  begin

      self.total := self.total || value;

      return ODCIConst.Success;

  end;

  

  member function ODCIAggregateTerminate(self IN string_agg_type,

                                         returnValue OUT varchar2,

                                         flags IN number)

  return number

  is

  begin

      returnValue := self.total;

      return ODCIConst.Success;

  end;

  

  member function ODCIAggregateMerge(self IN OUT string_agg_type,

                                     ctx2 IN string_agg_type)

  return number

  is

  begin

      self.total := self.total || ctx2.total;

      return ODCIConst.Success;

  end;

  

  

end;

/



CREATE or replace

FUNCTION stragg(input varchar2 )

RETURN varchar2

PARALLEL_ENABLE AGGREGATE USING string_agg_type;

/



select deptno, stragg(ename)

from emp

group by deptno;

This last solution is probably the best from a performance perspective, since the query with the user-defined aggregate looks exactly like the traditional group by, the usual optimizations can be employed. Compare this to the second method -- pipelining the user-defined aggregate function. In that case the optimizer would certainly not be able to unnest a scalar subquery within a table function (yet).

Now we have all the ingredients necessary for writing the transposition query. According to our program, we need to apply aggregation to the unnested view. I skipped that step, however, and rewrote the query directly to

select CONCAT_LIST(CURSOR(

        SELECT substr(ENAME,i,1)|| '  '  from emp 

    ))

from (select rownum i from table(unsafe) 

      where rownum < (select max(length(ename))+1 from emp))

Here I used solution #2 for user defined aggregates, and the reader is advised writing a transposition query with the other aggregate solutions as well. The last query needs the final touch: taking care of those employee names, which are shorter than the maximum length. It can be easily accommodated with a switch:

select CONCAT_LIST(CURSOR(

       select case when length(ename)<i 

              then ' ' 

              else substr(ENAME,i,1)|| '  ' 

              end 

       from emp 

       ))

from (select rownum i from table(unsafe) 

      where rownum < (select max(length(ename))+1 from emp))



CONCAT_LIST(CURSOR(SELECTCASEWHENLENGTH(ENAME

---------------------------------------------

  S  A  W  J  M  B  C  S  K  T  A  J  F  M

  M  L  A  O  A  L  L  C  I  U  D  A  O  I

  I  L  R  N  R  A  A  O  N  R  A  M  R  L

  T  E  D  E  T  K  R  T  G  N  M  E  D  L

  H  N     S  I  E  K  T     E  S  S     E

              N              R           R

Vadim Tropashko works for Real World Performance group at Oracle Corp. Prior to that he was application programmer and translated "The C++ Programming Language" by B. Stroustrup, 2nd edition into Russian. His current interests include SQL Optimization, Constraint Databases, and Computer Algebra Systems.

Contributors : Vadim Tropashko
Last modified 2005-04-13 03:13 PM

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

Matrix Transposition in SQL

Nesting and Unnesting

Integer Enumeration for Aggregate Dismembering

User Defined Aggregate Functions