Code Monkey home page Code Monkey logo

Comments (7)

lonvia avatar lonvia commented on June 12, 2024 1

I tried with NOT MATERIALIZED and it didn't change what the query planner was doing. So it looks like at least in this negative form it is advisory only.

from osm2pgsql.

pnorman avatar pnorman commented on June 12, 2024

Do you have an EXPLAIN ANALYZE of the good and bad plan?

from osm2pgsql.

lonvia avatar lonvia commented on June 12, 2024

An EXPLAIN ANALYSE is naturally not possible short of waiting for a couple of hours.

Bad plan:

nominatim=# EXPLAIN                
WITH changed_buckets AS (
      SELECT array_agg(id) AS node_ids, id >> 5 AS bucket
      FROM (SELECT osm_id as id FROM place LIMIT 1000) as x GROUP BY id >> 5)
      SELECT DISTINCT w.id
      FROM "public"."planet_osm_ways" w, changed_buckets b
      WHERE w.nodes && b.node_ids
      AND planet_osm_index_bucket(w.nodes) && (ARRAY[b.bucket])::bigint[];
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=46.34..55985610375.59 rows=20752577 width=8)
   ->  Nested Loop  (cost=46.34..55985558494.15 rows=20752577 width=8)
         Join Filter: ((w.nodes && (array_agg(x.id))) AND (planet_osm_index_bucket(w.nodes) && ARRAY[((x.id >> 5))]))
         ->  Index Scan using planet_osm_ways_pkey on planet_osm_ways w  (cost=0.57..472413790.47 rows=965236160 width=108)
         ->  Materialize  (cost=45.77..52.22 rows=215 width=40)
               ->  HashAggregate  (cost=45.77..48.99 rows=215 width=40)
                     Group Key: (x.id >> 5)
                     ->  Subquery Scan on x  (cost=0.57..40.77 rows=1000 width=16)
                           ->  Limit  (cost=0.57..28.27 rows=1000 width=8)
                                 ->  Index Only Scan using place_id_idx on place  (cost=0.57..9085193.42 rows=328039168 width=8)
(10 rows)

Good plan:

nominatim=# EXPLAIN                                            
WITH changed_buckets AS (
      SELECT array_agg(id) AS node_ids, id >> 5 AS bucket
      FROM (SELECT osm_id as id FROM place LIMIT 1000) as x GROUP BY id >> 5)
      SELECT DISTINCT w.id
      FROM "public"."planet_osm_ways" w, changed_buckets b
      WHERE w.nodes && b.node_ids
      AND planet_osm_index_bucket(w.nodes) && (ARRAY[b.bucket])::bigint[];
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=59521649340.32..59521753103.21 rows=20752577 width=8)
   ->  Sort  (cost=59521649340.32..59521701221.77 rows=20752577 width=8)
         Sort Key: w.id
         ->  Nested Loop  (cost=72489.62..59518154417.96 rows=20752577 width=8)
               ->  HashAggregate  (cost=45.77..48.99 rows=215 width=40)
                     Group Key: (x.id >> 5)
                     ->  Subquery Scan on x  (cost=0.57..40.77 rows=1000 width=16)
                           ->  Limit  (cost=0.57..28.27 rows=1000 width=8)
                                 ->  Index Only Scan using place_id_idx on place  (cost=0.57..9085193.42 rows=328039168 width=8)
               ->  Bitmap Heap Scan on planet_osm_ways w  (cost=72443.86..276827659.72 rows=96524 width=108)
                     Recheck Cond: (planet_osm_index_bucket(nodes) && ARRAY[((x.id >> 5))])
                     Filter: (nodes && (array_agg(x.id)))
                     ->  Bitmap Index Scan on planet_osm_ways_nodes_bucket_idx  (cost=0.00..72419.73 rows=9652362 width=0)
                           Index Cond: (planet_osm_index_bucket(nodes) && ARRAY[((x.id >> 5))])
(14 rows)

from osm2pgsql.

pnorman avatar pnorman commented on June 12, 2024

Bad plan: https://explain.depesz.com/s/S6gG
Good plan: https://explain.depesz.com/s/jm3H

On the server getting the bad plan, can you force it to the good plan by overwriting SET enable_* variables?

Can you get an EXPLAIN ANALYZE for just the good plan?

The bad plan is materializing the CTE. The documentation for 14 and 15 states

However, if a WITH query is non-recursive and side-effect-free (that is, it is a SELECT containing no volatile functions) then it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the WITH query just once, but not if it references the WITH query more than once.

I guess it can still materialize a CTE just like it can materialize a query plan node if it's required for the plan.

Both plans are estimating 20 million rows. This sounds very off, as we're doing batches of 1000, and indicates the problem is likely statistics-related.

The HashAggregate node statistics look reasonable as it is <1000. Reality is probably higher than 215 if places are randomly selected.

The estimate of 9.6 million rows for the good plan's bitmap index scan seems high.

I would expect PostgreSQL is already tracking the univariate statistics for planet_osm_index_bucket(nodes). In case it's not, can you try CREATE STATISTICS planet_osm_ways_bucket_stats ( planet_osm_index_bucket(nodes) ) FROM planet_osm_ways and see if that changes the behavior?

from osm2pgsql.

lonvia avatar lonvia commented on June 12, 2024

Sorry if that wasn't clear but the underlying issue is indeed the bad statistics for GIN indexes on array columns. That's a well known and long-standing issue. Last time it did bite us was when JIT and parallel indexing was introduced (#1045).

Problem is that our tables have now reached a size where Postgres resorts to sequential scans instead of using the index. This is bad because forcing Postgres to use an index is much more difficult than preventing it from using an index or JIT or parallel processing. The ideas above are about how to best do that without breaking things in a slightly different setup.

Tried CREATE STATISTICS but it made no difference. Maybe I did something wrong because it only took a few seconds to create it. I would have expected to take longer.

enable_material = off helps and is the workaround I'm using currently to get updates through on that machine. It looks like it slows down other queries of the update process. Updates are still very slow but at least not at a stand-still. So not a permanent solution.

from osm2pgsql.

rustprooflabs avatar rustprooflabs commented on June 12, 2024

Can the query be updated to WITH changed_buckets AS NOT MATERIALIZED (...)? Might help encourage the good plan without fiddling with enable_material when that might benefit other queries.

from osm2pgsql.

joto avatar joto commented on June 12, 2024

Workaround is merged. Closing here.

from osm2pgsql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.