Retrieval: Multiple Tables and Aggregation - Part 3

by Lex de Haan

From the book, Mastering Oracle SQL and SQL Plus*, Chapter 8, Apress, January 2005.

Part 1 | Part 2 | Part 3 | Part 4 | Part 5

8.5 The GROUP BY Component

Until now, we have considered queries showing information about only individual rows. Each row in our query results so far had a one-to-one correspondence with some row in the database. However, in real life, you often want to produce aggregated information from a database, where the rows in the query results represent information about a set of database rows. For example, you might want to produce an overview showing the number of employees (the head count) per department. For this type of query, you need the GROUP BY clause of the SELECT command, as shown in Listing 8-14.

SQL> select e.deptno       as "department"
  2  ,      count(e.empno) as "number of employees"
  3  from   employees e
  4  group  by e.deptno;

department number of employees
---------- -------------------
        10                   3
        20                   5
        30                   6

SQL>

Listing 8-14: The GROUP BY clause.

Listing 8-14 shows the COUNT function at work, to count the number of employees per department. COUNT is an example of a group function, and we’ll look at it and the other group functions in Section 8.6.

The result of this query is a table, of course—just like any result of a query. However, there is no one-to-one mapping anymore between the rows of the EMPLOYEES table and the three rows of the result. Instead, you aggregate employee data per department.

To explain how the GROUP BY operator works, and how the SQL language handles aggregation, Listing 8-15 shows an imaginary representation of an intermediate result. Listing 8-15 shows a pseudo-table, with three rows and six columns. For readability, some columns of the EMPLOYEES table are omitted. In the last column, you see the three different department numbers, and the other five columns show sets of attribute values. These sets are represented by enumerating their elements in a comma separated list between braces. Some of these sets contain null values only, such as e.COMM for departments 10 and 20.

Note: The representation in Listing 8-15 is purely fictitious and only serves educational purposes. Data structures as shown in Listing 8-15 do not occur in reality.

e.EMPNO    e.JOB           e.MGR     e.MSAL    e.COMM    e.DEPTNO
=======    ============    ======    ======    ======    ========
{7782      {'MANAGER'      {7839     {2450     {NULL           10
,7839      ,'DIRECTOR'     ,NULL     ,5000     ,NULL
,7934}     ,'ADMIN' }      ,7782}    ,1300}    ,NULL}
-----------------------------------------------------------------
{7369      {'TRAINER'      {7902     { 800     {NULL           20
,7566      ,'MANAGER'      ,7839     ,2975     ,NULL
,7788      ,'TRAINER'      ,7566     ,3000     ,NULL
,7876      ,'TRAINER'      ,7788     ,1100     ,NULL
,7902}     ,'TRAINER'}     ,7566}    ,3000}    ,NULL}
-----------------------------------------------------------------
{7499      {'SALESREP'     {7698     {1600     { 300           30
,7521      ,'SALESREP'     ,7698     ,1250     , 500
,7654      ,'SALESREP'     ,7698     ,1250     ,1400
,7698      ,'MANAGER'      ,7839     ,2850     ,NULL
,7844      ,'SALESREP'     ,7698     ,1500     , 0
,7900}     ,'ADMIN' }      ,7698}    , 800}    ,NULL}
-----------------------------------------------------------------

Listing 8-15: The effect of GROUP BY e.DEPTNO.

Going back to Listing 8-14, it now becomes clear what the COUNT(e.EMPNO) function does: it returns the number of elements of each e.EMPNO set.

You could argue that (as an effect of the GROUP BY e.DEPTNO clause) the last column in Listing 8-15 (e.DEPTNO) contains “regular” values, and the other five columns become “set-valued” attributes. You can use only e.DEPTNO in the SELECT clause. If you want to see data from the other columns in your query result, you must use group functions (such as COUNT) to aggregate those sets into a single value. See the next section for a discussion of group functions.

Note: To be more precise, we should refer to multisets instead of sets in this context. Duplicate valuesare maintained, as you can see in Listing 8-15.We will discuss multisets in Chapter 12.

Multiple-Column Grouping

You can also group on multiple-column expressions, separated by commas. For example, the query in Listing 8-16 produces an overview of the number of registrations per course.

SQL> select   r.course, r.begindate
  2  ,      count(r.attendee) as attendees
  3  from   registrations r
  4  group  by r.course, r.begindate;

COURSE BEGINDATE   ATTENDEES
------ ----------- ---------
JAV    13-DEC-1999         5
JAV    01-FEB-2000         3
OAU    10-AUG-1999         3
OAU    27-SEP-2000         1
PLS    11-SEP-2000         3
SQL    12-APR-1999         4
SQL    04-OCT-1999         3
SQL    13-DEC-1999         2
XML    03-FEB-2000         2

9 rows selected.

SQL>

Listing 8-16: Grouping on two columns.

This result shows one row for each different (COURSE, BEGINDATE) combination found in the REGISTRATIONS table.

Note: As you can see, the rows in Listing 8-16 are ordered on the columns of the GROUP BY clause. However, if you want a certain ordering of your query results, you should never rely on implicit DBMS behavior and always specify an ORDER BY clause.

GROUP BY and Null Values

If a column expression on which you apply the GROUP BY clause contains null values, these null values end up together in a separate group. See Listing 8-17 for an example.

SQL> select e.comm, count(e.empno)
  2  from   employees e
  3  group  by e.comm;

    COMM COUNT(E.EMPNO)
-------- --------------
       0              1
     300              1
     500              1
    1400              1
                     10

SQL>

Listing 8-17: GROUP BY and null values.

Apparently, we have ten employees without commission.

8.6 Group Functions

In the previous section, we used the COUNT function to count the number of employees per department and the number of registrations per course. COUNT is an example of a group function. All group functions have two important properties in common:

They can be applied only to sets of values.
They return a single aggregated value, derived from that set of values.

That’s why group functions often occur in combination with GROUP BY (and optionally the HAVING clause, covered in Section 8.7) in SQL commands. The most important Oracle group functions are listed in Table 8-1.

Function	Description	Applicable To
COUNT()	Number of values	All datatypes
SUM()	Sum of all values	Numeric data
MIN()	Minimum value	All datatypes
MAX()	Maximum value	All datatypes
AVG()	Average value	Numeric data
MEDIAN()	Median (middle value)	Numeric or date (time) data
STATS_MODE()	Modus (most frequent value)	All datatypes
STDDEV()	Standard deviation	Numeric data
VARIANCE()	Statistical variance	Numeric data

Table 8-1: Common Oracle group functions.

The last column in Table 8-1 shows the applicable datatypes for all group functions. The functions MIN and MAX are applicable to any datatype, including dates and alphanumeric strings. MIN and MAX need only an ordering (sorting) criterion for the set of values. Note also that you can apply the AVG function only to numbers, because the average is defined as the SUM divided by the COUNT, and the SUM function accepts only numeric data.

Let’s look at some group function examples in Listing 8-18.

SQL> select e.deptno
  2  ,      count(e.job)
  3  ,      sum(e.comm)
  4  ,      avg(e.msal)
  5  ,      median(e.msal)
  6  from   employees e
  7  group  by e.deptno;

  DEPTNO COUNT(E.JOB) SUM(E.COMM) AVG(E.MSAL) MEDIAN(E.MSAL)
-------- ------------ ----------- ----------- --------------
      10            3                2916.667           2450
      20            5                    2175           2975
      30            6        2200    1541.667           1375

SQL>

Listing 8-18: Some examples of group functions.

Group Functions and Duplicate Values

If you apply a group function to a set of column values, that set of values may contain duplicate values. By default, these duplicate values are all treated as individual values, contributing to the end result of all group functions applied to the set of values. For example, we have five employees in department 20, but we have only two different jobs in that department. Nevertheless, Listing 8-18 shows 5 as the result of COUNT(e.JOB) for department 20.

If you want SQL group functions to ignore duplicate values (except one, of course), you must specify the keyword DISTINCT immediately after the first parenthesis. Although it is syntactically correct, the addition of DISTINCT is meaningless for the MIN and MAX functions. Look at Listing 8-19 for some examples.

SQL> select count(deptno), count(distinct deptno)
  2  ,      avg(comm), avg(coalesce(comm,0))
  3  from   employees;

COUNT(DEPTNO) COUNT(DISTINCTDEPTNO) AVG(COMM) AVG(COALESCE(COMM,0))
------------- --------------------- --------- ---------------------
           14                     3       550              157.1429

SQL>

Listing 8-19: Using the DISTINCT option for group functions.

Note that Listing 8-19 also shows that you can use group functions in the SELECT clause of a query without a GROUP BY clause. The absence of a GROUP BY clause in combination with the presence of group functions in the SELECT clause always results in a single-row result. In other words, the full table is aggregated into a single row. You can achieve precisely the same result by grouping on a constant expression. Try this yourself; for example, see what happens if you add GROUP BY 'x' to the query in Listing 8-19.

Group Functions and Null Values

The ANSI/ISO SQL standard postulates group functions to ignore null values completely. There is only one exception to this rule: the COUNT(*) function. This special case is discussed later in this section. This is a reasonable compromise. The only other consistent behavior for group functions would be to return a null value as soon as the input contains a null value.This would imply that all your SQL statements (containing group functions) should contain additional code to handle null values explicitly. So, ignoring null values completely is not a bad idea. Just make sure that you understand the consequences of this behavior. See Table 8-2 for some typical examples.

Set X	SUM(X)	MIN(X)	AVG(X)	MAX(X)
{1,2,3,NULL}	6	1	2	3
{1,2,3,0}	6	0	1.5	3
{1,2,3,2}	8	1	2	3

Table 8-2: Behavior of group functions and null values.

The SUM function does not make any distinction between {1,2,3,NULL} and {1,2,3,0}. The MIN and AVG functions don’t make any distinction between {1,2,3,NULL} and {1,2,3,2}. The MAX function gives the same result on all three sets.

Looking back at Listing 8-19, you see an example of function nesting: the AVG function operates on the result of the COALESCE function. This is a typical method to handle null values explicitly. As you can see from Listing 8-19, the results of AVG(COMM) and AVG(COALESCE(COMM,0)) are obviously different. In this case, the Oracle DBMS replaces all null values by zeros before applying the AVG function, because the null values in the COMM column actually mean “not applicable.”

The next query, shown in Listing 8-20, tells us how many different courses are scheduled for each trainer and the total number of scheduled courses.

SQL> select trainer
  2  ,      count(distinct course)
  3  ,      count(*)
  4  from   offerings
  5  group  by trainer;

TRAINER COUNT(DISTINCTCOURSE) COUNT(*)
-------- --------------------- --------
    7369                     2        3
    7566                     2        2
    7788                     2        2
    7876                     1        1
    7902                     2        2
                             3        3

SQL>

Listing 8-20: GROUP BY and DISTINCT.

Apparently, we have three course offerings without a trainer being assigned.

Grouping the Results of a Join

The query in Listing 8-21 shows the average evaluation ratings for each trainer, over all courses delivered.

SQL> select o.trainer, avg(r.evaluation)
  2  from   offerings o
 3          join
  4         registrations r
  5         using (course,begindate)
  6  group  by o.trainer;

 TRAINER AVG(R.EVALUATION)
-------- -----------------
    7369                 4
    7566              4.25
    7788
    7876                 4
    7902                 4

SQL>

Listing 8-21: GROUP BY on a join.

Notice the USING clause in line 5, with the COURSE and BEGINDATE columns. This USING clause with two columns is needed to get the correct join results.

The COUNT(*) Function

As mentioned earlier, group functions operate on a set of values, with one important exception. Besides column names, you can specify the asterisk (*) as an argument to the COUNT function. This widens the scope of the COUNT function from a specific column to the full row level. COUNT(*) returns the number of rows in the entire group.

Note: If you think that SELECT COUNT(1) is faster than SELECT COUNT(*), try a little experiment and prepare to be surprised—you will find out that there is no difference. Don’t trust opinions...

Listing 8-20 already showed an example of using the COUNT(*) function, to get the total number of scheduled courses for each trainer from the OFFERINGS table. Listing 8-22 shows another example of using the COUNT(*) function, this time applied against the EMPLOYEES table.

SQL> select e.deptno, count(*)
  2  from   employees e
  3  group  by e.deptno;

  DEPTNO COUNT(*)
-------- --------
      10        3
      20        5
      30        6

SQL>

Listing 8-22: Count employees per department (first attempt).

Obviously, department 40 is missing in this result. If you want to change the query into an outer join in order to show department 40 as well, you must be careful. What’s wrong with the query in Listing 8-23? Apparently, we suddenly have one employee working for department 40.

SQL> select deptno, count(*)
  2  from   employees e
  3         right outer join
  4         departments d
  5         using (deptno)
  6 group   by deptno;

  DEPTNO COUNT(*)
-------- --------
      10        3
      20        5
      30        6
      40        1

SQL>

Listing 8-23: Count employees per department (second attempt).

Compare the results in Listing 8-23 with the results in Listing 8-24. The only difference is the argument of the COUNT function. Listing 8-24 obviously shows the correct result, because department 40 has no employees. By counting over the primary key e.EMPNO, you are sure that all “real” employees are counted, while the null value introduced by the outer join is correctly ignored. You could have used any other NOT NULL column as well.

SQL> select deptno, count(e.empno)
  2  from   employees e
  3         right outer join
  4         departments d
  5         using (deptno)
  6 group   by deptno;

  DEPTNO COUNT(E.EMPNO)
-------- --------------
      10              3
      20              5
      30              6
      40              0

SQL>

Listing 8-24: Count employees per department (third attempt).

At the end of Chapter 5, you saw an example of a PL/SQL stored function to count all employees per department (Section 5.8, Listing 5-31). In that chapter, I mentioned that this counting problem is not trivial to solve in standard SQL. In Listings 8-22, 8-23, and 8-24, you see that you should indeed be careful. You need an outer join, and you should make sure to specify the correct argument for the COUNT function to get correct results.

Caution: You should be careful with the COUNT function, especially if null values might cause problems (since group functions ignore them) and you want to count row occurrences.

Valid SELECT and GROUP BY Clause Combinations

If your queries contain a GROUP BY clause, some syntax combinations are invalid and result in Oracle error messages, such as the following:

ORA-00937: not a single-group group function.

This always means that there is a mismatch between your SELECT clause and your GROUP BY clause.

To demonstrate valid versus invalid syntax, Table 8-3 shows one invalid and three valid syntax examples. Table 8-3 assumes you have a table T with four columns A, B, C, and D.

Syntax	Valid?
select a, b, max(c) from t ... group by a	No
select a, b, max(c) from t ... group by a,b	Yes
select a, count(b), min(c) from t ... group by a	Yes
select count(c) from t ... group by a	Yes

Table 8-3: Valid and invalid GROUP BY syntax examples.

The examples in Table 8-3 illustrate the following two general rules:

You do not need to select the column expression you group on (see the last example).
Any column expression that is not part of the GROUP BY clause can occur only in the SELECT clause as an argument to a group function. That’s why the first example is invalid.

By the way, all GROUP BY examples so far showed only column names, but you can also group on column expressions, such as in the example shown in Listing 8-25.

SQL> select case mod(empno,2)
  2              when 0 then 'EVEN '
  3                     else 'ODD '
  4         end as empno
  5  ,      sum(msal)
  6  from   employees
  7  group  by mod(empno,2);

EMPNO SUM(MSAL)
----- ---------
EVEN      20225
ODD        8650

SQL>

This query shows the salary sums for employees with even and odd employee numbers.

Contributors : Lex de Haan
Last modified 2006-01-06 11:06 AM

Using clause

Posted by mikharakiri at 2005-07-01 12:50 PM

<quote>This USING clause with two columns is needed to get the correct join results.</quote>

Aside the part that I don't see any point in ANSI/ISO join syntax, I don't understand this sentence at all. Lex, did you imply that without this USING clause the result of join is cartesian product?

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

Retrieval: Multiple Tables and Aggregation - Part 3

From the book, Mastering Oracle SQL and SQL Plus*, Chapter 8, Apress, January 2005.

8.5 The GROUP BY Component

Multiple-Column Grouping

GROUP BY and Null Values

8.6 Group Functions

Group Functions and Duplicate Values

Group Functions and Null Values

Grouping the Results of a Join

The COUNT(*) Function

Valid SELECT and GROUP BY Clause Combinations

Using clause

DBAzine.com

Sections

Personal tools

Menu

Who Are You?

Retrieval: Multiple Tables and Aggregation - Part 3

From the book, Mastering Oracle SQL and SQL *Plus, Chapter 8, Apress, January 2005.

8.5 The GROUP BY Component

Multiple-Column Grouping

GROUP BY and Null Values

8.6 Group Functions

Group Functions and Duplicate Values

Group Functions and Null Values

Grouping the Results of a Join

The COUNT(*) Function

Valid SELECT and GROUP BY Clause Combinations

Using clause

From the book, Mastering Oracle SQL and SQL Plus*, Chapter 8, Apress, January 2005.