1

I have a hierarchical table with an id and parent_id column, the parent_id has a foreign key to the id column. Each row can only have one parent, but multiple rows can be tied to the same parent. I wanted to retrieve all the rows that didn't have any children.

I attempted this with the following query

SELECT *
FROM table
WHERE id NOT IN (
    SELECT DISTINCT(parent_id)
    FROM table
)

This returned 0 rows. If i change the NOT IN to IN it correctly returns the rows that have children (other rows tie to it through their parent_id)

I ended up getting this working:

SELECT *
FROM table
WHERE id NOT IN(
    SELECT id
    FROM table
    WHERE id IN (
        SELECT DISTINCT(parent_id)
        FROM table
    )
)

But I don't understand why the first query didn't work? Can anybody help me to understand what's going on here? Do I not understand how NOT IN should work?

a_horse_with_no_name
  • 440,273
  • 77
  • 685
  • 758
Will Harrison
  • 203
  • 4
  • 15

1 Answers1

1

Try

SELECT *
FROM table
WHERE id NOT IN (
    SELECT DISTINCT(parent_id)
    FROM table
    WHERE parent_id IS NOT NULL
)

For example:

with t(x) as (values(1),(2))
select 3
where 3 not in (select x from t);

┌──────────┐
│ ?column? │
├──────────┤
│        3 │
└──────────┘

but

with t(x) as (values(1),(2),(null))
select 3
where 3 not in (select x from t);

┌──────────┐
│ ?column? │
├──────────┤
└──────────┘

It is because the DBMS can't made decision is id = null or not (the result is undefined)

You could to fix it as mentioned above or using not exists:

with t(x) as (values(1),(2),(null))
select 3
where not exists (select x from t where x = 3);

┌──────────┐
│ ?column? │
├──────────┤
│        3 │
└──────────┘
Abelisto
  • 12,110
  • 2
  • 24
  • 35