3

I'm using Firebird WI-V3.0.4.33054 on Windows.

I'm having trouble optimizing this query, that uses an in clause with a select:

update CADPC p set p.STA = 'L'
where p.COD in (select distinct CODPC from CADPCI_Rec where IDNfr = 27)
and not exists (select * from CADPCI where CODPC = p.COD)

The plan for this query is (and the obvious problem is the P NATURAL part):

PLAN SORT (CADPCI_REC INDEX (PK_CADPCI_REC))
PLAN (CADPCI INDEX (FK_CADPCI_CODPC))
PLAN (P NATURAL)

Select Expression
    -> Filter
        -> Unique Sort (record length: 36, key length: 8)
            -> Filter
                -> Table "CADPCI_REC" Access By ID
                    -> Bitmap
                        -> Index "PK_CADPCI_REC" Range Scan (partial match: 1/3)
Select Expression
    -> Filter
        -> Table "CADPCI" Access By ID
            -> Bitmap
                -> Index "FK_CADPCI_CODPC" Range Scan (full match)
Select Expression
    -> Filter
        -> Table "CADPC" as "P" Full Scan

If, on the other hand, I manually run the select distinct, copy the results and paste in the query, like this:

update CADPC p set p.STA = 'L'
where p.COD in (5699, 5877, 5985)
and not exists (select * from CADPCI where CODPC = p.COD)

Now the optimizer chooses a sensible plan for the P table and the query runs very quickly:

PLAN (CADPCI INDEX (FK_CADPCI_CODPC))
PLAN (P INDEX (PK_CADPC, PK_CADPC, PK_CADPC))

Select Expression
    -> Filter
        -> Table "CADPCI" Access By ID
            -> Bitmap
                -> Index "FK_CADPCI_CODPC" Range Scan (full match)
Select Expression
    -> Filter
        -> Table "CADPC" as "P" Access By ID
            -> Bitmap Or
                -> Bitmap Or
                    -> Bitmap
                        -> Index "PK_CADPC" Unique Scan
                    -> Bitmap
                        -> Index "PK_CADPC" Unique Scan
                -> Bitmap
                    -> Index "PK_CADPC" Unique Scan

I have also tried exists in both conditions, but the result is the same: the subquery is re-evaluated for each row.

update CADPC p set p.STA = 'L'
where exists (select * from CADPCI_Rec where IDNfr = 27 and CODPC = p.COD)
and not exists (select * from CADPCI where CODPC = p.COD)

Plan:

PLAN (CADPCI_REC INDEX (PK_CADPCI_REC))
PLAN (CADPCI INDEX (FK_CADPCI_CODPC))
PLAN (P NATURAL)

Select Expression
    -> Filter
        -> Table "CADPCI_REC" Access By ID
            -> Bitmap
                -> Index "PK_CADPCI_REC" Range Scan (partial match: 1/3)
Select Expression
    -> Filter
        -> Table "CADPCI" Access By ID
            -> Bitmap
                -> Index "FK_CADPCI_CODPC" Range Scan (full match)
Select Expression
    -> Filter
        -> Table "CADPC" as "P" Full Scan

So, the question is: can I somehow make the engine choose the indexed plan when the in clause contains a select (typically only a couple of records)?

GabrielF
  • 1,913
  • 1
  • 13
  • 29
  • 2
    The problem is the optimizer of Firebird doesn't distinguish between correlated and non-correlated subqueries, so the updated table drives the query, and the subquery is executed for each row. It might be faster to use `merge`, I'll see if I can write an answer when I have time or maybe find an alternative option (a merge can be bit verbose). – Mark Rotteveel Aug 07 '19 at 19:07
  • 1
    @MarkRotteveel you mean the optimizer converts "in" into "exists" there ? Or I can not imagine how else can in-subquery be re-executed for "every row" of the table to be updated. May advanced PLAN options of FB3 somehow tell, what the optimized query "looks like" before the very execution... I know, it is BLR then not SQL any more, but still. – Arioch 'The Aug 08 '19 at 15:01
  • @Arioch'The No, as far as I understand it, it will evaluate the condition for each row of the table, and as part of that evaluation, it will execute the subquery for each row. The behaviour is the equivalent of an exists, but afaik it will not 'convert' it to an exists. – Mark Rotteveel Aug 09 '19 at 07:54
  • 2
    @MarkRotteveel if true, if the result of the "in" query is not cached, but on every row the whole set is re-evaluated as a whole set (instead of checking for that very one row - effectively converting to "exists") - then it is the most inefficient way to do it, picking the worst sides of both approaches. – Arioch 'The Aug 09 '19 at 08:38

2 Answers2

3

You may try EXECUTE BLOCK and "invert control"

essentially issuing an anonymous ad hoc stored procedure

EXECUTE BLOCK AS
  declare id INTEGER;
BEGIN
  for select distinct t1.CODPC from CADPCI_Rec t1 
    left join CADPCI t2 on where t2.CODPC = t1.CODPC
  where t2.CODPC is NULL and t1.IDNfr = 27
  into :id
  do 
    update CADPC p set p.STA = 'L' where p.COD = :ID and p.STA <> 'L';
END

You might also use Global Temporary Tables (GTTs)

and then create list of IDs before actually deleting.

Database preparation (creating a body-less table):

CREATE GLOBAL TEMPORARY TABLE CADPC_mark_IDs
   ( ID integer )
ON COMMIT DELETE ROWS

And then the commands would be like

insert into CADPC_mark_IDs(ID)
select distinct t1.CODPC from CADPCI_Rec t1 
   left join CADPCI t2 on where t2.CODPC = t1.CODPC
where t2.CODPC is NULL and t1.IDNfr = 27

then

update CADPC p set p.STA = 'L'
where p.COD in (select * from CADPC_mark_IDs) and p.STA <> 'L'

then

commit; -- clear the in-memory table for next uses

one more option, like Mark suggested, would be using MERGE, after you converted "where not exist" into "left join" (already done above, hopefully correct).

Something along that

merge into CADPC p
  using (
    select distinct t1.CODPC as id from CADPCI_Rec t1 
      left join CADPCI t2 on where t2.CODPC = t1.CODPC
    where t2.CODPC is NULL and t1.IDNfr = 27
  ) t
on (t.id = p.COD) and (p.STA <> 'L')
when matched then update set p.STA = 'L'
Arioch 'The
  • 15,005
  • 31
  • 59
  • Wow, very complete answer! I thought about the first alternative, and I'm quite sure it's gonna work. It's definitely what I'm going for if nothing else works. Second one I think isn't gonna work, since the problem seems to be the `in (subquery)`, which it keeps (I'll try it anyway, so that we know for sure). 3rd alternative, I had no idea it even existed. I'll try tomorrow and let you know. – GabrielF Aug 08 '19 at 23:43
  • @GabrielF also notice that FB3 seems to have introduced some advanced options of displaying the query plans, more "graphic" so to say. – Arioch 'The Aug 09 '19 at 08:40
  • 1
    As I imagined, the second solution still uses `plan (P natural)` (although it's somewhat faster, I don't know why). – GabrielF Aug 13 '19 at 17:55
  • The merge solution is the one. Very quick and elegant. – GabrielF Aug 13 '19 at 17:55
  • @GabrielF it might be faster if Mark's insight that the in-subquery would be re-executed time and again for EVERY row in the master table holds water. Clear separation of those commands with keeping IDs cached in GTT would prevent those re-executions. Also, it would be better to explicitly prevent updates of rows which are already L-marked, `set p.STA = 'L' where p.STA <> 'L'` – Arioch 'The Aug 14 '19 at 08:20
2

Try using exists for both:

update CADPC p
    set p.STA = 'L'
    where exists (select 1 from CADPCI_Rec where r.IDNfr = 27 and p.COD = r.CODPC) and
         not exists (select 1 from CADPCI c2 where c2.CODPC = p.COD);

In particular, you want an index on CADPCI_Rec(CODPC, IDNfr).

Gordon Linoff
  • 1,122,135
  • 50
  • 484
  • 624
  • I tried that. `exists` on both subqueries still uses the `P NATURAL` plan and has no distinguishable performance gain. – GabrielF Aug 08 '19 at 23:39
  • @GabrielF try the opposite approach then, try to remove `not exists` from your query either and only leave one single IN-condition, encapsulating both CADPCI_Rec and CADPCI inside it – Arioch 'The Aug 09 '19 at 08:42
  • BTW, what was the plan of "exists on both subqueries", full of it ? – Arioch 'The Aug 09 '19 at 15:55
  • 1
    @Arioch'The I have edited the question to include the explained plan on all queries, including this one. – GabrielF Aug 13 '19 at 17:39