1

My schema is as follows:

EMPLOYEE (fmane, minit, lname, ssn, birthdate, address, sex, salary, superssn, dno), KEY: ssn
DEPARTMENT (dname, dnumber, mgrssn, mgrstartdate), KEY: dnumber.
PROJECT  (pname, pnumber, plocation, dnum), KEY: pnumber.
WORKS_ON (essn, pno, hours), KEY: (essn, pno)
DEPENDENT  (essn, dependent-name, sex, bdate, relationship), KEY: (essn, dependent-name)

I want to find the last name and SSN of those managers who DO NOT work on any one of the projects located in Chicago.

Note: I modifited the original question to practice on SQL, so I changed managers to employees

here is what I have so far:

  select e.lname
  from Employee e
  where e.ssn not in 
       (select w.essn
        from works_on w, Project p
        where w.pno = p.pnumber
        and p.plocation = 'Cleveland')

So basically are my modification correct? also

  1. Do I need write e.lname or just lname since people told me if there's only one item I can just simply use lname?

  2. The most confusing part when I use NOT IN clause, how do I know when should I use e.ssn or ssn?

Thank you

Bohemian
  • 365,064
  • 84
  • 522
  • 658
Lin Wei
  • 87
  • 5

1 Answers1

0

Using not in (...) is arguably the easiest to read/understand, but typically performs the worst.

The best performing is typically an outer join filtering on missed joins:

select ssn, lname
from Employee e
left join (
    select essn
    from Project p
    join works_on w on pno = pnumber
    where plocation = 'Cleveland') x on essn = ssn
where essn is null

This gives the same result as:

select ssn, lname
from Employee e
where ssn not in (
    select essn
    from Project p
    join works_on w on pno = pnumber
    where plocation = 'Cleveland')

Note how in both approaches, the subquery selects from Project first and uses a where clause to select Cleveland locations, which gives the optimiser the best chance to use an index (if one exists) on location, and/or do the least work to get the employees that work in them (via an index on pno if one exists).

Also note the use on proper joins (whose syntax became partied the SQL standard more than 20 years ago).


Regarding qualifying column names with the table name or alias they come from - it's optional if the column is unambiguous, but generally it's good practice to qualify in production code because:

  • you don't have to consult the scheme to figure out where the column lives (self documenting)
  • if the schema changes so that the column is no longer unambiguous, your query can suddenly break and you might not realise why
Bohemian
  • 365,064
  • 84
  • 522
  • 658
  • can you write an answer for me using not in? I promise I will accept it – Lin Wei Nov 28 '15 at 06:27
  • Thank you so e.lname or lname can be either fine? – Lin Wei Nov 28 '15 at 06:42
  • @lin (note further edits) yes just `lname` is OK, because there is only one such column in the tables selected from – Bohemian Nov 28 '15 at 06:44
  • Tjank you! also lastly why must work_on be used, since I only need find location, which is not in work_on – Lin Wei Nov 28 '15 at 06:48
  • the part says from works_on w, Project p where w.pno = p.pnumber – Lin Wei Nov 28 '15 at 06:48
  • why can't i just be from project instead – Lin Wei Nov 28 '15 at 06:48
  • Project can't tell use the ssn of those that work on projects. You need project to find the projects that are in Cleveland, then using those keys you need works_on to find the employee sans working on Cleveland projects, then finally you can use that list to find employees other than those in that list. – Bohemian Nov 28 '15 at 06:52