PostgreSQL 11 goes for parallel seq scan on partitioned table where index should be enough

Question

The problem is I keep getting seq scan on a rather simple query for a very trivial setup. What am I doing wrong?

Postgres 11 on Windows Server 2016
Config changes done: constraint_exclusion = partition
A single table partitioned to 200 subtables, dozens of million records per partition.
Index on a field in question (assuming one is partitioned also)

Here's the create statement:

CREATE TABLE A (
    K int NOT NULL,
    X bigint NOT NULL,
    Date timestamp NOT NULL,
    fy smallint NOT NULL,
    fz decimal(18, 8) NOT NULL,
    fw decimal(18, 8) NOT NULL,
    fv decimal(18, 8) NULL,
    PRIMARY KEY (K, X)
) PARTITION BY LIST (K);

CREATE TABLE A_1 PARTITION OF A FOR VALUES IN (1);
CREATE TABLE A_2 PARTITION OF A FOR VALUES IN (2);
...
CREATE TABLE A_200 PARTITION OF A FOR VALUES IN (200);
CREATE TABLE A_Default PARTITION OF A DEFAULT;

CREATE INDEX IX_A_Date ON A (Date);

The query in question:

SELECT K, MIN(Date), MAX(Date)
FROM A 
GROUP BY K

That always gives a sequence scan which takes several minutes while it's clearly evident there's no need for table data at all as Date field is indexed and I'm just asking for first and last leaf of its B-tree.

Originally the index was on (K, Date) and it rendered to me quickly that Postgres will not honor one in any query I assumed it to be in use in. Index on (Date) did the trick for other queries and it seems like Postgres claims to partition indexes automatically. However this specific simple query always goes for seq scan.

Any thoughts appreciated!

UPDATE

Query plan (analyze, buffers) is as follows:

Finalize GroupAggregate  (cost=4058360.99..4058412.66 rows=200 width=20) (actual time=148448.183..148448.189 rows=5 loops=1)
  Group Key: a_16.k
  Buffers: shared hit=5970 read=548034 dirtied=4851 written=1446
  ->  Gather Merge  (cost=4058360.99..4058407.66 rows=400 width=20) (actual time=148448.166..148463.953 rows=8 loops=1)
    Workers Planned: 2
    Workers Launched: 2
    Buffers: shared hit=5998 read=1919356 dirtied=4865 written=1454
    ->  Sort  (cost=4057360.97..4057361.47 rows=200 width=20) (actual time=148302.271..148302.285 rows=3 loops=3)
        Sort Key: a_16.k
        Sort Method: quicksort  Memory: 25kB
        Worker 0:  Sort Method: quicksort  Memory: 25kB
        Worker 1:  Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=5998 read=1919356 dirtied=4865 written=1454
        ->  Partial HashAggregate  (cost=4057351.32..4057353.32 rows=200 width=20) (actual time=148302.199..148302.203 rows=3 loops=3)
            Group Key: a_16.k
            Buffers: shared hit=5984 read=1919356 dirtied=4865 written=1454
            ->  Parallel Append  (cost=0.00..3347409.96 rows=94658849 width=12) (actual time=1.678..116664.051 rows=75662243 loops=3)
                Buffers: shared hit=5984 read=1919356 dirtied=4865 written=1454
                ->  Parallel Seq Scan on a_16  (cost=0.00..1302601.32 rows=42870432 width=12) (actual time=0.320..41625.766 rows=34283419 loops=3)
                    Buffers: shared hit=14 read=873883 dirtied=14 written=8
                ->  Parallel Seq Scan on a_19  (cost=0.00..794121.94 rows=26070794 width=12) (actual time=0.603..54017.937 rows=31276617 loops=2)
                    Buffers: shared read=533414
                ->  Parallel Seq Scan on a_20  (cost=0.00..447025.50 rows=14900850 width=12) (actual time=0.347..52866.404 rows=35762000 loops=1)
                    Buffers: shared hit=5964 read=292053 dirtied=4850 written=1446
                ->  Parallel Seq Scan on a_18  (cost=0.00..198330.23 rows=6450422 width=12) (actual time=4.504..27197.706 rows=15481014 loops=1)
                    Buffers: shared read=133826
                ->  Parallel Seq Scan on a_17  (cost=0.00..129272.31 rows=4308631 width=12) (actual time=3.014..18423.307 rows=10340224 loops=1)
                    Buffers: shared hit=6 read=86180 dirtied=1
                ...
                ->  Parallel Seq Scan on a_197  (cost=0.00..14.18 rows=418 width=12) (actual time=0.000..0.000 rows=0 loops=1)
                ->  Parallel Seq Scan on a_198  (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.002 rows=0 loops=1)
                ->  Parallel Seq Scan on a_199  (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.001 rows=0 loops=1)
                ->  Parallel Seq Scan on a_default  (cost=0.00..14.18 rows=418 width=12) (actual time=0.001..0.002 rows=0 loops=1)
Planning Time: 16.893 ms
Execution Time: 148466.519 ms

UPDATE 2 Just to avoid future comments like “you should index on (K, Date)”:

The query plan with both indexes in place is exactly the same, analysis numbers are the same and even buffer hits/reads are almost the same.

Your query requests all rows from all partitions, so an index is very likely not helpful. Additionally your index only contains the `date` column, but not the `K` column, so Postgres would need to lookup the `K` value for each `date` value using random I/O which is most probably slower than a seq scan. You could try an index on `k, date` instead. What is the value for `random_page_cost`? If you are certain the random I/O would be faster, then lowering that might convince the planner to favor an index scan — a_horse_with_no_name, Feb 24 '19 at 21:35
Getting back to index on (K, Date) was the first thing I tried and it did no good. — Vitaly, Feb 24 '19 at 21:46
`what am I doing wrong?` you are using windows? You use Date as an identifier (for a timestamp...)? — wildplasser, Feb 24 '19 at 21:58
X (bigint) is for identifier and I'm using date as date 'cause I need a date to be here. And windows ... is it relevant after all? — Vitaly, Feb 24 '19 at 22:01
The timing indeed seems pretty slow. 27 seconds to read 15 million rows from shared memory isn't right. But reading from disk also seems quite slow: 292053 blocks or 2GB in 52 seconds - that could be well be cause by Windows as NTFS isn't the fastest file system out there. One reason for slow I/O performance could be a virus scanner. But I have no clue what could make accessing blocks from the cache that slow. How many CPUs does that server have? Maybe you could alleviate the problem a bit by increasing `max_parallel_workers_per_gather` and `max_parallel_workers` — a_horse_with_no_name, Feb 25 '19 at 06:47
Well, my concern is not around overall performance of the system but rather about the query plan Postgres chooses to execute the query. — Vitaly, Feb 25 '19 at 07:01

Laurenz Albe · Accepted Answer · 2019-02-25T12:03:53.763

Aggregate push-down into parallel plans can be enabled by setting enable_partitionwise_aggregate to on.

That will probably speed up your query somewhat, because PostgreSQL doesn't have to pass so many data between the parallel workers.

But it looks like PostgreSQL isn't smart enough to figure out it can use the index to speed up min and max for each partition, although it is smart enough to do that with a non-partitioned table.

There is no pretty way to work around that; you could resort to querying each partition:

SELECT k, min(min_date), max(max_date)
FROM (
   SELECT 1 AS k, MIN(date) AS min_date, MAX(date) AS max_date FROM a_1
   UNION ALL
   SELECT 2, MIN(date), MAX(date) FROM a_2
   UNION ALL
   ...
   SELECT 200, MIN(date), MAX(date) FROM a_200
   UNION ALL
   SELECT k, MIN(date), MAX(date) FROM a_default
) AS all_a
GROUP BY k;

Yuck! There is clearly room for improvement here.

I dug into the code and found the reason in src/backend/optimizer/plan/planagg.c:

/*
 * preprocess_minmax_aggregates - preprocess MIN/MAX aggregates
 *
 * Check to see whether the query contains MIN/MAX aggregate functions that
 * might be optimizable via indexscans.  If it does, and all the aggregates
 * are potentially optimizable, then create a MinMaxAggPath and add it to
 * the (UPPERREL_GROUP_AGG, NULL) upperrel.
[...]
 */
void
preprocess_minmax_aggregates(PlannerInfo *root, List *tlist)
{
[...]                                                                                
    /*
     * Reject unoptimizable cases.
     *
     * We don't handle GROUP BY or windowing, because our current
     * implementations of grouping require looking at all the rows anyway, and
     * so there's not much point in optimizing MIN/MAX.
     */
    if (parse->groupClause || list_length(parse->groupingSets) > 1 ||
        parse->hasWindowFuncs)
        return;

Basically, PostgreSQL punts when it sees a GROUP BY clause.

"Yuck! There is clearly room for improvement here." The topicstarter could generate the inner SQL with dynamic SQL which PostgreSQL supports.. — Raymond Nijland, Feb 25 '19 at 11:37
Also i now notice.you are mixing non-aggregate column with aggregate columns which is not allowed by SQL standards so most likely PostgreSQL will error on your query. — Raymond Nijland, Feb 25 '19 at 11:57
No problem i should have said in the outer SQL, the inner SQL is correct because there the columns are constants so it's allowed there. — Raymond Nijland, Feb 25 '19 at 12:06
Thank you! That explains exactly, I'll just keep that nuance in mind when working with Postgre — Vitaly, Feb 25 '19 at 12:52

PostgreSQL 11 goes for parallel seq scan on partitioned table where index should be enough

1 Answers1