Monday, February 20, 2012

list all employees that didnt take courses in 2005 (was "SQL statment")

I need help on the SQL statment.

This SQL statment is extreamly slow and i don't know how to make it run faster. I have a tblTrainingStudents table where it keeps all the employees trainings and tblSHPEmployee is the employee roster and tblBasicSchool is the table where we track all the cadets who graduate from the basic school (from cadets to troopers) in each year. What i want the result to be is list all employees that didn't take courses in say year 2005. New troopers don't have to take any extra courses for year 2005 since they took these course when they were in basic school. I really need someone can help me speed up my SQL statment ASAP.

Thanks.

************************************************** **

SELECT *
FROM
(SELECT *
FROM tblTrainingCourses C, tblSHPEmployee
WHERE year = '#currentYear#'
AND (reg_no LIKE '1%'
OR reg_no LIKE '2%'
OR reg_no LIKE '3%')
AND troop = '#troop#'
AND NOT EXISTS
(SELECT *
FROM tblTrainingStudents S
WHERE C.trainingType = S.trainingType
AND C.courseID = S.courseID
AND year = '#currentYear#'
AND reg_no = regNo)) as employee

WHERE employee.reg_no NOT IN
(SELECT regNo
FROM tblBasicschool
WHERE employee.reg_no = regNo
AND left(enddate,4) = '#currentYear#'
AND (rank='TRP' OR left(regNo, 1) = '3'))
************************************************** *********

ORDER BY troop, lname, fname, minit, trainingType, courseIDI don(t see any need for the "SELECT FROM SELECT", so what about:SELECT *
FROM tblTrainingCourses as C, tblSHPEmployee
WHERE year = '#currentYear#'
AND (reg_no LIKE '1%' OR reg_no LIKE '2%' OR reg_no LIKE '3%')
AND troop = '#troop#'
AND NOT EXISTS
(SELECT 1
FROM tblTrainingStudents
WHERE C.trainingType = trainingType
AND C.courseID = courseID
AND year = '#currentYear#'
AND reg_no = regNo)
AND reg_no NOT IN
(SELECT regNo
FROM tblBasicschool
WHERE left(enddate,4) = '#currentYear#'
AND (rank='TRP' OR regNo LIKE '3%')
)
ORDER BY ...Clearly the "NOT EXISTS" and "NOT IN" will potentially slow down the query (since no index can be used for resolving these conditions).
Similarly the use of left(enddate,4) may slow down things; you should (if possible) replace it with someting like "enddate LIKE '2006%'".|||Peter, thanks for your post. It is still so slow. It takes about 3 mins to run the query. Is there any other way to speed this up a lot more? The tblTrainingStudents is kind of big. I really appreciate your help.|||It looks like your main query produces a cartesian product for I don't see a join-clause between 'tblTrainingCourses' and 'tblSHPEmployee'... or did I miss something?

Grts|||What about the following query?
I've removed tblTrainingCourses, and also removed the redundant condition reg_no = regNo in the second subquery.
You will maybe have to re-insert your conditions on year and troop -- I've no idea in which tables they belong.
Also, for the time being, I replaced the "in current year" condition by ">= '2006-01-01'"; adapt this if necessary.
The conditions on reg_no (with "BETWEEN" and ">= '3'") are more performant than a LIKE; but they assume reg_no to be all digits, and moreover assume an ASCII ordering.
Again, adapt if necessary.SELECT *
FROM tblSHPEmployee AS e
WHERE reg_no BETWEEN '1' AND '39999999'
AND NOT EXISTS
( SELECT regNo
FROM tblTrainingStudents AS s INNER JOIN tblTrainingCourses AS c
ON s.CourseID = c.CourseID
WHERE regNo = e.reg_No
AND year = '2006'
)
AND reg_no NOT IN
( SELECT regNo
FROM tblBasicschool
WHERE endDate >= '2006-01-01'
AND (rank = 'TRP' OR regNo >= '3')
)You may replace the "NOT EXISTS" by a "NOT IN" and vice versa; that could give performance differences (depending on the size of the tables), so try out all four possibilities. Don't forget to remove the condition "regNo = e.reg_No" in the "NOT IN" and to add it in the "NOT EXISTS".|||'LIKE' keywords also slow down queries since they force a table scan. Also you're selecting * when you potentially could not be.|||'LIKE' keywords also slow down queries since they force a table scan.Not necessarily: most RDBMS' will use a matching index scan for "LIKE '3%'". But you're right in the case of a "%" at the beginning of the string.
Also you're selecting * when you potentially could not be.This is less of a performance problem, it's at most a data transmission bottleneck (over a slow communication line).
Except in the case where you would only SELECT a column which happens to be an indexed column (or column combination), since in that case the query *could* be performed with an index-only scan (but that also depends on the WHERE condition of course).|||Not necessarily: most RDBMS' will use a matching index scan for "LIKE '3%'". But you're right in the case of a "%" at the beginning of the string.
This is less of a performance problem, it's at most a data transmission bottleneck (over a slow communication line).
Except in the case where you would only SELECT a column which happens to be an indexed column (or column combination), since in that case the query *could* be performed with an index-only scan (but that also depends on the WHERE condition of course).

I defer to your superior knowledge Peter!

Our DBAs here refuse to allow us to use LIKE in code because most of the time it does a table scan. 'SELECT *' is also banned since not only is it wasteful, it has a tendency to stop working if the table schema changes. :D|||'SELECT *' is also banned since not only is it wasteful, it has a tendency to stop working if the table schema changes.I fully agree with that: it's *always* safer to list all required columns explicitly, even it turns out to be all table columns.
So indeed "SELECT *" should be banned!|||I fully agree with that: it's *always* safer to list all required columns explicitly, even it turns out to be all table columns.
So indeed "SELECT *" should be banned!

ITA Peter...we've still got some 'select *' stuff hanging around but it is slowly being re-written to explicitly list all the columns.|||Or if you want to list all courses together with all employees who did not take that course:SELECT c.CourseName, e.EmployeeName
FROM
tblSHPEmployee AS e
INNER JOIN
tblTrainingCourses AS c
ON NOT EXISTS
( SELECT 1
FROM tblTrainingStudents AS s
WHERE s.CourseID = c.CourseID
AND e.reg_No = s.regNo
)
WHERE
reg_no BETWEEN '1' AND '39999999'
ORDER BY 1, 2Mind you that this may again run a lot slower!|||[Non-ANSI Comment]Assuming this is SQL Server then SELECT * within an exists statement is the one time SELECT * is recommended - the optimiser then selects optimal index.[/Non-ANSI Comment]|||SELECT * within an exists statement is recommended - the optimiser then selects optimal index.Are you sure that SQLServer would not do that with "SELECT 1" ?
I know DB2 will typically do an index scan without data lookup when using "SELECT 1".|||You know - I have lost my original MS link but coincidently today I've read something elsewhere (by the bloomin head of SQL Server Optimisation no less) that contradicts what I posted. So disregard :rolleyes:|||Anyway, in an EXISTS, "SELECT 1" is better than "SELECT *" since the former does not require data access (when an index is available on the column(s) specified in the WHERE condition) while the latter may need data access (unless the optimizer is clever enough to see that the two are equivalent :-)

No comments:

Post a Comment