Matrix Transposition in SQL
“God made the integers; all the rest is the work of man.”
-- Leopold Kronecker
Matrix transposition is among Frequently Asked Questions. Given a single-column table ORIGINAL,
ENAMESMITH
ALLEN
WARD
JONES
MARTIN
BLAKE
CLARK
SCOTT
KING
TURNER
ADAMS
JAMES
FORD
MILLER
we’ll explore how to transform it into TRANSPOSED:
ColumnS A W J M B C S K T A J F M
M L A O A L L C I U D A O I
I L R N R A A O N R A M R L
T E D E T K R T G N M E D L
H N S I E K T E S S E
N R R
This problem has been discussed in the Usenet thread Matrix transpose in SQL, and general agreement was that it can’t be done in standard SQL-92. One of the latest Oracle magazine code tips -- “Transposing a Table into a Summary Matrix” -- suggests a rather lengthy procedural solution. This article describes a SQL solution with a minimal amount of procedural code. As a bonus, we’ll learn several ways how to program user-defined aggregates in Oracle 9i.
Nesting and Unnesting
Consider the table
EMPNO | POS | LET |
7369 | 1 | S |
7369 | 2 | M |
7369 | 3 | I |
7369 | 4 | T |
7369 | 5 | H |
7499 | 1 | A |
7499 | 2 | L |
7499 | 3 | L |
7499 | 4 | E |
7499 | 5 | N |
… | … | … |
that we call UNNESTED. Both tables in the beginning of the article could be viewed as aggregates of UNNESTED. The ORIGINAL table could be specified as:
select concat(LET) from UNNESTED group by EMPNO
while the TRANSPOSED table is just a grouping by a different column:
select concat(LET||’ ‘) from UNNESTED group by POS
(I also added a space padding to make the query result set more readable). So the problem of transposing the ORIGINAL table into TRANSPOSED can be solved by just implementing two steps:
Unnesting ORIGINAL --> UNNESTED
Nesting UNNESTED --> TRANSPOSED
The first step involves an integer enumeration relation, introduced in myprevious article. Reader feedback, and other articles about integer enumeration convinced me to further expand on this topic.
Integer Enumeration for Aggregate Dismembering
Again, I prefer producing arbitrary, large list of integers with a Table Function
CREATE TYPE IntSet AS TABLE OF Integer; / CREATE or replace FUNCTION UNSAFE RETURN IntSet PIPELINED IS BEGIN loop PIPE ROW(1); end loop; END; / select rownum from TABLE(UNSAFE) where rownum < 1000000 select rownum from TABLE(UNSAFE) where rownum < 1000000
In my previous article, I reserved the possibility of using an upper-bound integer range argument that would make the function safe. In other words, the function would never spin out of control whenever a user forgot the stop predicate rownum < 1000000. On the other hand, using the function argument is inferior for two reasons:
- predicates are more self-documenting than function arguments, and
- we can use subqueries instead of hardcoded limits.
The runtime expense of using the table function is minimal: unlike forced materialization into a real table, logical I/O associated with table function calls is virtually zero.
In DB2, a list of integers can be constructed with recursive SQL:
with iota (n) as ( values(1) union all select n+1 from iota where n<100000 ) select * from iota;
It is slightly inconvenient, however, that the predicate, which limits the list of numbers, must be specified within the recursion subquery, while it naturally belongs to the main query. The problem of pushing the predicate inside an inner query is somewhat similar to the one we saw for UNSAFE table function.
With the list of integers at our disposal, writing an unnesting query is easy:
SQL>select empno, pos, substr(ENAME,i,1) from emp, 2 (select rownum pos from table(unsafe) 3 where rownum < (select max(length(ename)) from emp)); EMPNO POS S ---------- ---------- - 7369 1 S 7369 2 M 7369 3 I 7369 4 T 7369 5 H 7499 1 A 7499 2 L 7499 3 L 7499 4 E 7499 5 N 7521 1 W
Looking at strings "SMITH," "ALLEN," etc., as aggregates of letters might seem odd at first, but that is what they really are. We’ll assemble those letters back into aggregates of (different) words.
User Defined Aggregate Functions
There is no aggregate function that would concatenate strings in standard SQL. However, there are multiple ways defining it in Oracle:
- Casting the subquery result set into the collection and the defining aggregate on it
- Pipelining the user-defined aggregate function:
CREATE or replace FUNCTION CONCAT_LIST( cur SYS_REFCURSOR ) RETURN VARCHAR2 IS ret VARCHAR2(32000); tmp VARCHAR2(4000); BEGIN loop fetch cur into tmp; exit when cur%NOTFOUND; ret := ret || tmp; end loop; RETURN ret; END; / select distinct deptno, CONCAT_LIST(CURSOR( select ename ||',' from emp ee where e.deptno = ee.deptno ) employees from emp e;
Syntactically, neither of these solutions looks like a group by. However, scalar subquery in the select list is actually more powerful than group by. This idea is emphasized in the article by C.J. Date: “A discussion of some redundancies in SQL.” If you prefer, however, the traditional group by syntax, then there is yet another way to program user-defined aggregates:
- Oracle 9i user-defined aggregates:
create or replace type string_agg_type as object ( total varchar2(4000), static function ODCIAggregateInitialize(sctx IN OUT string_agg_type ) return number, member function ODCIAggregateIterate(self IN OUT string_agg_type , value IN varchar2 ) return number, member function ODCIAggregateTerminate(self IN string_agg_type, returnValue OUT varchar2, flags IN number) return number, member function ODCIAggregateMerge(self IN OUT string_agg_type, ctx2 IN string_agg_type) return number ); / create or replace type body string_agg_type is static function ODCIAggregateInitialize(sctx IN OUT string_agg_type) return number is begin sctx := string_agg_type( null ); return ODCIConst.Success; end; member function ODCIAggregateIterate(self IN OUT string_agg_type, value IN varchar2 ) return number is begin self.total := self.total || value; return ODCIConst.Success; end; member function ODCIAggregateTerminate(self IN string_agg_type, returnValue OUT varchar2, flags IN number) return number is begin returnValue := self.total; return ODCIConst.Success; end; member function ODCIAggregateMerge(self IN OUT string_agg_type, ctx2 IN string_agg_type) return number is begin self.total := self.total || ctx2.total; return ODCIConst.Success; end; end; / CREATE or replace FUNCTION stragg(input varchar2 ) RETURN varchar2 PARALLEL_ENABLE AGGREGATE USING string_agg_type; / select deptno, stragg(ename) from emp group by deptno;
This last solution is probably the best from a performance perspective, since the query with the user-defined aggregate looks exactly like the traditional group by, the usual optimizations can be employed. Compare this to the second method -- pipelining the user-defined aggregate function. In that case the optimizer would certainly not be able to unnest a scalar subquery within a table function (yet).
Now we have all the ingredients necessary for writing the transposition query. According to our program, we need to apply aggregation to the unnested view. I skipped that step, however, and rewrote the query directly to
select CONCAT_LIST(CURSOR( SELECT substr(ENAME,i,1)|| ' ' from emp )) from (select rownum i from table(unsafe) where rownum < (select max(length(ename))+1 from emp))
Here I used solution #2 for user defined aggregates, and the reader is advised writing a transposition query with the other aggregate solutions as well. The last query needs the final touch: taking care of those employee names, which are shorter than the maximum length. It can be easily accommodated with a switch:
select CONCAT_LIST(CURSOR( select case when length(ename)<i then ' ' else substr(ENAME,i,1)|| ' ' end from emp )) from (select rownum i from table(unsafe) where rownum < (select max(length(ename))+1 from emp)) CONCAT_LIST(CURSOR(SELECTCASEWHENLENGTH(ENAME --------------------------------------------- S A W J M B C S K T A J F M M L A O A L L C I U D A O I I L R N R A A O N R A M R L T E D E T K R T G N M E D L H N S I E K T E S S E N R R
--
Vadim Tropashko works for Real World Performance group at Oracle Corp. Prior to that he was application programmer and translated "The C++ Programming Language" by B. Stroustrup, 2nd edition into Russian. His current interests include SQL Optimization, Constraint Databases, and Computer Algebra Systems.
Contributors : Vadim Tropashko
Last modified 2005-04-13 03:13 PM