简单SQL技巧一例
今天的文跟昨天一样,也是在《SQL.Puzzles》上的一个章节。虽然都是一些简单的SQL处理,但是大致翻了一下,这本书的有些例子还是相当有趣的,如果以后没事就慢慢读一读,不过最近比较忙了,就随手记一下吧,以后就等有看着顺眼的SQL再转了。
ABSENTEES
This problem was presented on the MS ACCESS forum on CompuServeby Jim Chupella. He wanted to create a database that tracks employeeabsentee rates. Here is the table you will use:
CREATE TABLE Absenteeism
(emp_id INTEGER NOT NULL REFERENCES Personnel (emp_id),
absent_date DATE NOT NULL,
reason_code CHAR (40) NOT NULL REFERENCES ExcuseList(reason_code),
severity_points INTEGER NOT NULL CHECK (severity_pointsBETWEEN 1 AND 4),
PRIMARY KEY (emp_id, absent_date));
An employee ID number identifies each employee. The reason_codeis a short text explanation for the absence (for example, “hit by beertruck,” “bad hair day,” and so on) that you pull from an ever-growingand imaginative list, and severity point is a point system that scores thepenalty associated with the absence.
If an employee accrues 40 severity points within a one-year period,you automatically discharge that employee. If an employee is absentmore than one day in a row, it is charged as a long-term illness, not as atypical absence. The employee does not receive severity points on thesecond, third, or later days, nor do those days count toward his or hertotal absenteeism.
Your job is to write SQL to enforce these two business rules, changingthe schema if necessary.
Answer #1
Looking at the first rule on discharging personnel, the most commondesign error is to try to drop the second, third, and later days from the
table. This approach messes up queries that count sick days, and makeschains of sick days very difficult to find.
The trick is to allow a severity score of zero, so you can track the longtermillness of an employee in the Absenteeism table. Simply change the
severity point declaration to “CHECK (severity_points BETWEEN 0AND 4)” so that you can give a zero to those absences that do not count.
This is a trick newbies miss because storing a zero seems to be a waste ofspace, but zero is a number and the event is a fact that needs to be noted.
UPDATE Absenteeism
SET severity_points= 0,
reason_code = 'long term illness'
WHERE EXISTS
(SELECT *
FROM Absenteeism AS A2
WHERE Absenteeism.emp_id = A2.emp_id
AND Absenteeism.absent_date = (A2.absent_date -INTERVAL 1 DAY));
When a new row is inserted, this update will look for another absenceon the day before and change its severity point score and reason_code in
accordance with your first rule.
The second rule for firing an employee requires that you know whathis or her current point score is. You would write that query as follows:
SELECT emp_id, SUM(severity_points)
FROM Absenteeism
GROUP BY emp_id;
This is the basis for a grouped subquery in the DELETE statementyou finally want. Personnel with less than 40 points will return a NULL,and the test will fail.
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
The GROUP BY clause is not really needed in SQL-92, but some olderSQL implementations will require it.
Answer #2
Bert Scalzo, a senior instructor for Oracle Corporation, pointed out thatthe puzzle solution had two flaws and room for performance
improvements.The flaws are quite simple. First, the subquery does not check forpersonnel accruing 40 or more severity points within a one-year period,
as required. It requires the addition of a date range check in the WHERE
clause:
DELETE FROM Personnel
WHERE emp_id = (SELECT A1.emp_id
FROM Absenteeism AS A1
WHERE A1.emp_id = Personnel.emp_id
AND absent_date
BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS
AND CURRENT_TIMESTAMP
GROUP BY A1.emp_id
HAVING SUM(severity_points) >= 40);
Second, this SQL code deletes only offending personnel and not theirabsences. The related Absenteeism row must be either explicitly orimplicitly deleted as well. You could replicate the above deletion for theAbsenteeism table. However, the best solution is to add a cascadingdeletion clause to the Absenteeism table declaration:
CREATE TABLE Absenteeism
( ... emp_id INTEGER NOT NULL
REFERENCES Personnel(emp_id)
ON DELETE CASCADE,
...);
The performance suggestions are based on some assumptions. If youcan safely assume that the UPDATE is run regularly and people do notchange their departments while they are absent, then you can improvethe UPDATE command’s subquery:
UPDATE Absenteeism AS A1
SET severity_points = 0,
reason_code = 'long term illness'
WHERE EXISTS
(SELECT *
FROM absenteeism as A2
WHERE A1.emp_id = A2.emp_id
AND (A1.absent_date + INTERVAL 1 DAY) =A2.absent_date);
There is still a problem with long-term illnesses that span weeks. Thecurrent situation is that if you want to spend your weekends being sick,that is fine with the company. This is not a very nice place to work. If anemployee reports in absent on Friday of week number 1, all of weeknumber 2, and just Monday of week number 3, the UPDATE will catchonly the five days from week number 2 as long-term illness. The Fridayand Monday will show up as sick days with severity points. The subqueryin the UPDATE requires additional changes to the missed-date chaining.
I would avoid problems with weekends by having a code forscheduled days off (weekends, holidays, vacation, and so forth) thatcarry a severity point of zero. A business that has people workingweekend shifts would need such codes.
The boss could manually change the Saturday and Sunday “weekend”codes to “long-term illness” to get the UPDATE to work the way youdescribed. This same trick would also prevent you from losing scheduledvacation time if you got the plague just before going on a cruise. If theboss is a real sweetheart, he or she could also add compensation days forthe lost weekends with a zero severity point to the table, or reschedule anemployee’s vacation by adding absences dated in the future.
While I agreed that I left out the aging on the dates missed, I willargue that it would be better to have another DELETE statement thatremoves the year-old rows from the Absenteeism table, to keep the sizeof the table as small as possible.
The expression
(BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS AND
CURRENT_TIMESTAMP)
could also be
(BETWEEN CURRENT_TIMESTAMP - INTERVAL 1 YEAR AND
CURRENT_TIMESTAMP),
so the system would handle leap years. Better yet, DB2 and some otherSQL products have an AGE(date1) function, which returns the age inyears of something that happened on the date parameter. You wouldthen write (AGE(absent_date) >= 1) instead.
Answer #3
Another useful tool for this kind of problem is a Calendar table, whichhas the working days that can count against the employee. In the 10years since this book was first written, this has become a customary SQLprogramming practice.
SELECT A.emp_id,
SUM(A.severity_points) AS absentism_score
FROM Absenteeism AS A, Calendar AS C
WHERE C1.cal_date = A.absent_date
AND A.absent_date
BETWEEN CURRENT_TIMESTAMP - INTERVAL 365 DAYS
AND CURRENT_TIMESTAMP
AND C1.date_type = ‘work’
GROUP BY emp_id
HAVING SUM(A.severity_points)>= 40;
Some people will also have a column in the Calendar table thatJulianizes the working days. Holidays and weekends would carry thesame Julian number as the preceding workday. For example
(cal_date,Julian_workday) :
('2006-04-21', 42) – Friday
('2006-04-22', 42) – Saturday
('2006-04-23', 42) – Sunday
('2006-04-24', 43) – Monday
You do the math from the current date’s Julian workday number tofind the start of their adjusted one-year period.