Correlated Subqueries
A subquery that is allowed to refer to any of the previously mentioned tables is known as a correlated subquery. We also say that the subquery has a correlated reference to a table in the main query.
The following example uses an uncorrelated subquery to list the employee number and name of employees in department 'A00' with a salary greater than the average salary of the department:
SELECT EMPNO, LASTNAME
FROM EMPLOYEE
WHERE WORKDEPT = 'A00'
AND SALARY > (SELECT AVG(SALARY)
FROM EMPLOYEE
WHERE WORKDEPT = 'A00')
This statement produces the following result:
EMPNO LASTNAME
------ ---------------
000010 HAAS
000110 LUCCHESSI
If you want to know the average salary for every department, the subquery needs to be evaluated once for every department. You can do this through the correlation capability of SQL, which permits you to write a subquery that is executed repeatedly, once for each row of the table identified in the outer-level query.
The following example uses a correlated subquery to list all the employees whose salary is higher than the average salary of their department:
SELECT E1.EMPNO, E1.LASTNAME, E1.WORKDEPT
FROM EMPLOYEE E1
WHERE SALARY > (SELECT AVG(SALARY)
FROM EMPLOYEE E2
WHERE E2.WORKDEPT = E1.WORKDEPT)
ORDER BY E1.WORKDEPT
In this query, the subquery is evaluated once for every department. The result is:
EMPNO LASTNAME WORKDEPT
------ --------------- --------
000010 HAAS A00
000110 LUCCHESSI A00
000030 KWAN C01
000060 STERN D11
000150 ADAMSON D11
000170 YOSHIMURA D11
000200 BROWN D11
000220 LUTZ D11
000070 PULASKI D21
000240 MARINO D21
000270 PEREZ D21
000090 HENDERSON E11
000280 SCHNEIDER E11
000100 SPENSER E21
000330 LEE E21
000340 GOUNOT E21
To write a query with a correlated subquery, use the same basic format of an ordinary outer query with a subquery. However, in the FROM clause of the outer query, just after the table name, place a correlation name. The subquery may then contain column references qualified by the correlation name. For example, if E1 is a correlation name, then E1.WORKDEPT means the WORKDEPT value of the current row of the table in the outer query. The subquery is (conceptually) reevaluated for each row of the table in the outer query.
By using a correlated subquery, you let the system do the work for you and reduce the amount of code you need to write within your application.
Unqualified correlated references are allowed in DB2. For example, the table EMPLOYEE has a column named LASTNAME, but the table SALES has a column named SALES_PERSON, and no column named LASTNAME.
SELECT LASTNAME, FIRSTNME, COMM
FROM EMPLOYEE
WHERE 3 > (SELECT AVG(SALES)
FROM SALES
WHERE LASTNAME = SALES_PERSON)
In this example, the system checks the innermost FROM clause for a LASTNAME column. Not finding one, it then checks the next innermost FROM clause (which in this case is the outer FROM clause). While not always necessary, qualifying correlated references is recommended to improve the readability of the query and to ensure that you are getting the result that you intend.
Implementing a Correlated Subquery
When would you want to use a correlated subquery? The use of a column function is sometimes a clue.
Let's say you want to list the employees whose level of education is higher than the average for their department.
First, you must determine the select-list items. The problem says "List the employees". This implies that LASTNAME from the EMPLOYEE table should be sufficient to uniquely identify employees. The problem also states the level of education (EDLEVEL) and the employees' departments (WORKDEPT) as conditions. While the problem does not explicitly ask for columns to be displayed, including them in the select-list will help illustrate the solution. A part of the query can now be constructed:
SELECT LASTNAME, WORKDEPT, EDLEVEL
FROM EMPLOYEE
Next, a search condition (WHERE clause) is needed. The problem statement says, "...whose level of education is higher than the average for that employee's department". This means that for every employee in the table, the average education level for that employee's department must be computed. This statement fits the description of a correlated subquery. Some unknown property (the average level of education of the current employee's department) is being computed for each row. A correlation name is needed for the EMPLOYEE table:
SELECT LASTNAME, WORKDEPT, EDLEVEL
FROM EMPLOYEE E1
The subquery needed is simple. It computes the average level of education for each department. The complete SQL statement is:
SELECT LASTNAME, WORKDEPT, EDLEVEL
FROM EMPLOYEE E1
WHERE EDLEVEL > (SELECT AVG(EDLEVEL)
FROM EMPLOYEE E2
WHERE E2.WORKDEPT = E1.WORKDEPT)
The result is:
LASTNAME WORKDEPT EDLEVEL
--------------- -------- -------
HAAS A00 18
KWAN C01 20
PULASKI D21 16
HENDERSON E11 16
LUCCHESSI A00 19
PIANKA D11 17
SCOUTTEN D11 17
JONES D11 17
LUTZ D11 18
MARINO D21 17
JOHNSON D21 16
SCHNEIDER E11 17
MEHTA E21 16
GOUNOT E21 16
Suppose that instead of listing the employee's department number, you list the department name. The information you need (DEPTNAME) is in a separate table (DEPARTMENT). The outer-level query that defines a correlation variable can also be a join query (see Selecting Data from More Than One Table for details).
When you use joins in an outer-level query, list the tables to be joined in the FROM clause, and place the correlation name next to the appropriate table name.
To modify the query to list the department's name instead of its number, replace WORKDEPT by DEPTNAME in the select-list. The FROM clause must now also include the DEPARTMENT table, and the WHERE clause must express the appropriate join condition.
This is the modified query:
SELECT LASTNAME, DEPTNAME, EDLEVEL
FROM EMPLOYEE E1, DEPARTMENT
WHERE E1.WORKDEPT = DEPARTMENT.DEPTNO
AND EDLEVEL > (SELECT AVG(EDLEVEL)
FROM EMPLOYEE E2
WHERE E2.WORKDEPT = E1.WORKDEPT)
This statement produces the following result:
LASTNAME DEPTNAME EDLEVEL
--------------- ----------------------------- -------
HAAS SPIFFY COMPUTER SERVICE DIV. 18
LUCCHESSI SPIFFY COMPUTER SERVICE DIV. 19
KWAN INFORMATION CENTER 20
PIANKA MANUFACTURING SYSTEMS 17
SCOUTTEN MANUFACTURING SYSTEMS 17
JONES MANUFACTURING SYSTEMS 17
LUTZ MANUFACTURING SYSTEMS 18
PULASKI ADMINISTRATION SYSTEMS 16
MARINO ADMINISTRATION SYSTEMS 17
JOHNSON ADMINISTRATION SYSTEMS 16
HENDERSON OPERATIONS 16
SCHNEIDER OPERATIONS 17
MEHTA SOFTWARE SUPPORT 16
GOUNOT SOFTWARE SUPPORT 16
The above examples show that the correlation name used in a subquery must be defined in the FROM clause of some query that contains the correlated subquery. However, this containment may involve several levels of nesting.
Suppose that some departments have only a few employees and therefore their average education level may be misleading. You might decide that in order for the average level of education to be a meaningful number to compare an employee against, there must be at least five employees in a department. So now we have to list the employees whose level of education is higher than the average for that employee's department, and only consider departments with at least five employees.
The problem implies another subquery because, for each employee in the outer-level query, the total number of employees in that person's department must be counted:
SELECT COUNT(*)
FROM EMPLOYEE E3
WHERE E3.WORKDEPT = E1.WORKDEPT
Only if the count is greater than or equal to 5 is an average to be computed:
SELECT AVG(EDLEVEL)
FROM EMPLOYEE E2
WHERE E2.WORKDEPT = E1.WORKDEPT
AND 5 <= (SELECT COUNT(*)
FROM EMPLOYEE E3
WHERE E3.WORKDEPT = E1.WORKDEPT)
Finally, only those employees whose level of education is greater than the average for that department are included:
SELECT LASTNAME, DEPTNAME, EDLEVEL
FROM EMPLOYEE E1, DEPARTMENT
WHERE E1.WORKDEPT = DEPARTMENT.DEPTNO
AND EDLEVEL >
(SELECT AVG(EDLEVEL)
FROM EMPLOYEE E2
WHERE E2.WORKDEPT = E1.WORKDEPT
AND 5 <=
(SELECT COUNT(*)
FROM EMPLOYEE E3
WHERE E3.WORKDEPT = E1.WORKDEPT))
This statement produces the following result:
LASTNAME DEPTNAME EDLEVEL
--------------- ----------------------------- -------
PIANKA MANUFACTURING SYSTEMS 17
SCOUTTEN MANUFACTURING SYSTEMS 17
JONES MANUFACTURING SYSTEMS 17
LUTZ MANUFACTURING SYSTEMS 18
PULASKI ADMINISTRATION SYSTEMS 16
MARINO ADMINISTRATION SYSTEMS 17
JOHNSON ADMINISTRATION SYSTEMS 16
HENDERSON OPERATIONS 16
SCHNEIDER OPERATIONS 17
注:
1.什么时候用相关子查询呢?
当查询条件中包含有对column的函数计算时,考虑使用相关子查询;
2.在Hibernate里如何实现相关子查询的功能呢?