2

I have a PostgreSQL table of point locations that contains 9.5 million rows. I am attempting to run this query: /geoserver/workspace/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=workspace:layer&maxFeatures=50&outputFormat=application/json but it takes multiple minutes to respond.

When I look at the geoserver logs I see this:

Request: getFeature
    service = WFS
    version = 1.0.0
    baseUrl = http://my_geoserver_url/geoserver/
    query[0]:
        typeName[0] = {workspace}layer
    maxFeatures = 50
    outputFormat = application/json
    resultType = results
19 Nov 15:26:04 INFO [wfs.json] - about to encode JSON

At that point it stalls for many minutes.

When I look at the currently active queries of my PostgreSQL server I see this: SELECT count(*) FROM "public"."layer"

This query by itself takes 335 seconds to return a response. First of all, wtf? Even if it has to count row by row, 9.5 million isn't THAT many rows. Is there any way to speed up this operation?

Second, why is it trying to do SELECT count(*) FROM "public"."layer" and is there any way to prevent it? I specified maxFeatures = 50, so why count them?

Version: PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bi

Machine: n1-standard-2 (2 vCPUs, 7.5 GB memory) (from Google Cloud)

Variables:

allow_system_table_mods off
application_name    Navicat
archive_command (disabled)
archive_mode    off
archive_timeout 0
array_nulls on
authentication_timeout  60
autovacuum  on
autovacuum_analyze_scale_factor 0.1
autovacuum_analyze_threshold    50
autovacuum_freeze_max_age   200000000
autovacuum_max_workers  3
autovacuum_multixact_freeze_max_age 400000000
autovacuum_naptime  60
autovacuum_vacuum_cost_delay    20
autovacuum_vacuum_cost_limit    -1
autovacuum_vacuum_scale_factor  0.2
autovacuum_vacuum_threshold 50
autovacuum_work_mem -1
backend_flush_after 0
backslash_quote safe_encoding
bgwriter_delay  200
bgwriter_flush_after    64
bgwriter_lru_maxpages   100
bgwriter_lru_multiplier 2
block_size  8192
bonjour off
bonjour_name    
bytea_output    hex
check_function_bodies   on
checkpoint_completion_target    0.5
checkpoint_flush_after  32
checkpoint_timeout  300
checkpoint_warning  30
client_encoding UNICODE
client_min_messages notice
cluster_name    
commit_delay    0
commit_siblings 5
config_file /var/lib/postgresql/data/postgresql.conf
constraint_exclusion    partition
cpu_index_tuple_cost    0.005
cpu_operator_cost   0.0025
cpu_tuple_cost  0.01
cursor_tuple_fraction   0.1
data_checksums  off
data_directory  /var/lib/postgresql/data
data_directory_mode 0700
data_sync_retry off
DateStyle   ISO, MDY
db_user_namespace   off
deadlock_timeout    1000
debug_assertions    off
debug_pretty_print  on
debug_print_parse   off
debug_print_plan    off
debug_print_rewritten   off
default_statistics_target   100
default_tablespace  
default_text_search_config  pg_catalog.english
default_transaction_deferrable  off
default_transaction_isolation   read committed
default_transaction_read_only   off
default_with_oids   off
dynamic_library_path    $libdir
dynamic_shared_memory_type  posix
effective_cache_size    524288
effective_io_concurrency    1
enable_bitmapscan   on
enable_gathermerge  on
enable_hashagg  on
enable_hashjoin on
enable_indexonlyscan    on
enable_indexscan    on
enable_material on
enable_mergejoin    on
enable_nestloop on
enable_parallel_append  on
enable_parallel_hash    on
enable_partition_pruning    on
enable_partitionwise_aggregate  off
enable_partitionwise_join   off
enable_seqscan  on
enable_sort on
enable_tidscan  on
escape_string_warning   on
event_source    PostgreSQL
exit_on_error   off
external_pid_file   
extra_float_digits  0
force_parallel_mode off
from_collapse_limit 8
fsync   on
full_page_writes    on
geqo    on
geqo_effort 5
geqo_generations    0
geqo_pool_size  0
geqo_seed   0
geqo_selection_bias 2
geqo_threshold  12
gin_fuzzy_search_limit  0
gin_pending_list_limit  4096
hba_file    /var/lib/postgresql/data/pg_hba.conf
hot_standby on
hot_standby_feedback    off
huge_pages  try
ident_file  /var/lib/postgresql/data/pg_ident.conf
idle_in_transaction_session_timeout 0
ignore_checksum_failure off
ignore_system_indexes   off
integer_datetimes   on
IntervalStyle   postgres
jit off
jit_above_cost  100000
jit_debugging_support   off
jit_dump_bitcode    off
jit_expressions on
jit_inline_above_cost   500000
jit_optimize_above_cost 500000
jit_profiling_support   off
jit_provider    llvmjit
jit_tuple_deforming on
join_collapse_limit 8
krb_caseins_users   off
krb_server_keyfile  FILE:/etc/postgresql-common/krb5.keytab
lc_collate  en_US.utf8
lc_ctype    en_US.utf8
lc_messages en_US.utf8
lc_monetary en_US.utf8
lc_numeric  en_US.utf8
lc_time en_US.utf8
listen_addresses    *
lo_compat_privileges    off
local_preload_libraries 
lock_timeout    0
log_autovacuum_min_duration -1
log_checkpoints off
log_connections off
log_destination stderr
log_directory   log
log_disconnections  off
log_duration    off
log_error_verbosity default
log_executor_stats  off
log_file_mode   0600
log_filename    postgresql-%Y-%m-%d_%H%M%S.log
log_hostname    off
log_line_prefix %m [%p] 
log_lock_waits  off
log_min_duration_statement  -1
log_min_error_statement error
log_min_messages    warning
log_parser_stats    off
log_planner_stats   off
log_replication_commands    off
log_rotation_age    1440
log_rotation_size   10240
log_statement   none
log_statement_stats off
log_temp_files  -1
log_timezone    UTC
log_truncate_on_rotation    off
logging_collector   off
maintenance_work_mem    65536
max_connections 100
max_files_per_process   1000
max_function_args   100
max_identifier_length   63
max_index_keys  32
max_locks_per_transaction   64
max_logical_replication_workers 4
max_parallel_maintenance_workers    2
max_parallel_workers    8
max_parallel_workers_per_gather 2
max_pred_locks_per_page 2
max_pred_locks_per_relation -2
max_pred_locks_per_transaction  64
max_prepared_transactions   0
max_replication_slots   10
max_stack_depth 2048
max_standby_archive_delay   30000
max_standby_streaming_delay 30000
max_sync_workers_per_subscription   2
max_wal_senders 10
max_wal_size    1024
max_worker_processes    8
min_parallel_index_scan_size    64
min_parallel_table_scan_size    1024
min_wal_size    80
old_snapshot_threshold  -1
operator_precedence_warning off
parallel_leader_participation   on
parallel_setup_cost 1000
parallel_tuple_cost 0.1
password_encryption md5
port    5432
post_auth_delay 0
postgis.backend geos
pre_auth_delay  0
quote_all_identifiers   off
random_page_cost    4
restart_after_crash on
row_security    on
search_path public, "$user", public
segment_size    131072
seq_page_cost   1
server_encoding UTF8
server_version  11.2 (Debian 11.2-1.pgdg90+1)
server_version_num  110002
session_preload_libraries   
session_replication_role    origin
shared_buffers  16384
shared_preload_libraries    
ssl off
ssl_ca_file 
ssl_cert_file   server.crt
ssl_ciphers HIGH:MEDIUM:+3DES:!aNULL
ssl_crl_file    
ssl_dh_params_file  
ssl_ecdh_curve  prime256v1
ssl_key_file    server.key
ssl_passphrase_command  
ssl_passphrase_command_supports_reload  off
ssl_prefer_server_ciphers   on
standard_conforming_strings on
statement_timeout   0
stats_temp_directory    pg_stat_tmp
superuser_reserved_connections  3
synchronize_seqscans    on
synchronous_commit  on
synchronous_standby_names   
syslog_facility local0
syslog_ident    postgres
syslog_sequence_numbers on
syslog_split_messages   on
tcp_keepalives_count    9
tcp_keepalives_idle 7200
tcp_keepalives_interval 75
temp_buffers    1024
temp_file_limit -1
temp_tablespaces    
TimeZone    UTC
timezone_abbreviations  Default
trace_notify    off
trace_recovery_messages log
trace_sort  off
track_activities    on
track_activity_query_size   1024
track_commit_timestamp  off
track_counts    on
track_functions none
track_io_timing off
transaction_deferrable  off
transaction_isolation   read committed
transaction_read_only   off
transform_null_equals   off
unix_socket_directories /var/run/postgresql
unix_socket_group   
unix_socket_permissions 0777
update_process_title    on
vacuum_cleanup_index_scale_factor   0.1
vacuum_cost_delay   0
vacuum_cost_limit   200
vacuum_cost_page_dirty  20
vacuum_cost_page_hit    1
vacuum_cost_page_miss   10
vacuum_defer_cleanup_age    0
vacuum_freeze_min_age   50000000
vacuum_freeze_table_age 150000000
vacuum_multixact_freeze_min_age 5000000
vacuum_multixact_freeze_table_age   150000000
wal_block_size  8192
wal_buffers 512
wal_compression off
wal_consistency_checking    
wal_keep_segments   0
wal_level   replica
wal_log_hints   off
wal_receiver_status_interval    10
wal_receiver_timeout    60000
wal_retrieve_retry_interval 5000
wal_segment_size    16777216
wal_sender_timeout  60000
wal_sync_method fdatasync
wal_writer_delay    200
wal_writer_flush_after  128
work_mem    4096
xmlbinary   base64
xmloption   content
zero_damaged_pages  off

explain (analyze, buffers, format text):

  Buffers: shared hit=2940 read=1601607
  ->  Gather  (cost=1655905.64..1655905.85 rows=2 width=8) (actual time=191335.534..191349.259 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=2940 read=1601607
        ->  Partial Aggregate  (cost=1654905.64..1654905.65 rows=1 width=8) (actual time=191277.227..191277.227 rows=1 loops=3)
              Buffers: shared hit=2940 read=1601607
              ->  Parallel Seq Scan on forecast  (cost=0.00..1644833.91 rows=4028691 width=0) (actual time=8.037..190681.354 rows=3222220 loops=3)
                    Buffers: shared hit=2940 read=1601607
Planning Time: 0.351 ms
Execution Time: 191349.424 ms

Indexes:

CREATE INDEX forecast_init_utc ON public.forecast USING btree (forecast_init_utc)
CREATE UNIQUE INDEX forecast_pkey ON public.forecast USING btree (ogc_fid)
CREATE INDEX forecast_wkb_geometry_geom_idx ON public.forecast USING gist (wkb_geometry)
CREATE INDEX forecast_init_local ON public.forecast USING btree (forecast_init_local)
CREATE INDEX country_code ON public.forecast USING btree (country_code)
CREATE INDEX store_num ON public.forecast USING btree (store_num)

Edit: It appears that VACUUM ANALYZE did the trick. It took 728 seconds to run, but now the query returns quickly. How often do I need to run that?

Here is the new explain (analyze, buffers, format text):

Finalize Aggregate  (cost=204220.77..204220.78 rows=1 width=8) (actual time=7116.754..7116.755 rows=1 loops=1)
  Buffers: shared hit=525705 read=25977
  ->  Gather  (cost=204220.55..204220.77 rows=2 width=8) (actual time=7115.613..7123.979 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        Buffers: shared hit=525705 read=25977
        ->  Partial Aggregate  (cost=203220.55..203220.57 rows=1 width=8) (actual time=7051.177..7051.178 rows=1 loops=3)
              Buffers: shared hit=525705 read=25977
              ->  Parallel Index Only Scan using store_num on forecast  (cost=0.43..193125.85 rows=4037882 width=0) (actual time=18.684..6564.663 rows=3229577 loops=3)
                    Heap Fetches: 0
                    Buffers: shared hit=525705 read=25977
Planning Time: 0.896 ms
Execution Time: 7124.113 ms
Mike Furlender
  • 3,488
  • 5
  • 38
  • 70
  • Which Postgres version? Is there a primary key on the table `layer`? Is that table well vacuumed? What kind of hardware (disk types, memory, CPUs) are you using? – a_horse_with_no_name Nov 19 '19 at 15:33
  • Please **[edit]** your question and add the [execution plan](https://www.postgresql.org/docs/current/static/using-explain.html) of that query generated using **`explain (analyze, buffers, format text)`** (_not_ just a "simple" explain) as [formatted text](http://stackoverflow.com/help/formatting) and make sure you prevent the indention of the plan. Paste the text, then put `\`\`\`` on the line before the plan and on a line after the plan. Please also include complete `create index` statements for all indexes as well. – a_horse_with_no_name Nov 19 '19 at 15:33
  • PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit – Mike Furlender Nov 19 '19 at 15:34
  • @a_horse_with_no_name done - thanks for telling me to do that.. slipped my mind. I full vacuumed it yesterday, and yes there is a primary key on it. – Mike Furlender Nov 19 '19 at 15:42
  • Am I perhaps running out of memory? Would bumping the specs significantly speed things up? – Mike Furlender Nov 19 '19 at 15:47
  • 1
    I am surprised Postgres didn't choose the scan on the PK index which should be smaller than the Seq Scan on the table. The throughput is about 60MB/Second which seems to be expected in such an environment. You have an average of 4 rows per block. This can either indicate a bloated table (a `vacuum full` would fix that) or an extremely wide table. Does Postgres start to use an Index Only Scan if you run a `vacuum analyze` on the table (note: no `full` option). – a_horse_with_no_name Nov 19 '19 at 16:00
  • @a_horse_with_no_name It is indeed a very wide table. I will try vacuum analyse and get back to you but it will probably take a long time. – Mike Furlender Nov 19 '19 at 16:20
  • A `vacuum analyze` shouldn't take very long. – a_horse_with_no_name Nov 19 '19 at 16:21
  • wow, that `vaccum analyze` command did the trick. It took 728 seconds to run, but now it is very fast! – Mike Furlender Nov 19 '19 at 16:46
  • 1
    You should configure your system to make vacuum more aggressive either for all tables, or for just that table. Or check if you have sessions that are `idle in transaction` those would prevent regular vacuum – a_horse_with_no_name Nov 19 '19 at 17:11

1 Answers1

3

@a_horse_with_no_name's suggestion to run vaccum analyze on the table made a huge difference but there is a setting to turn of those select count(*) queries entirely. , It is located in Layers -> <Layer Name> -> Publishing -> Skip the counting of the numberMatched attribute in the admin panel.

Mike Furlender
  • 3,488
  • 5
  • 38
  • 70