postgrespro / aqo Goto Github PK

Adaptive query optimization for PostgreSQL

License: GNU Affero General Public License v3.0

Makefile 0.47% C 72.93% PLpgSQL 19.82% Perl 6.78%

postgresql query-optimizer adaptive optimization

aqo's Introduction

Adaptive query optimization

Adaptive query optimization is the extension of standard PostgreSQL cost-based query optimizer. Its basic principle is to use query execution statistics for improving cardinality estimation. Experimental evaluation shows that this improvement sometimes provides an enormously large speed-up for rather complicated queries.

Installation

The module works with PostgreSQL 9.6 and above. To avoid compatibility issues, the following branches in the git-repository are allocated:

stable9_6.
stable11 - for PG v10 and v11.
stable12 - for PG v12.
stable13 - for PG v13.
stable14 - for PG v14.
stable15 - for PG v15.
the master branch of the AQO repository correctly works with PGv15 and the PostgreSQL master branch.

The module contains a patch and an extension. Patch has to be applied to the sources of PostgresSQL. Patch affects header files, that is why PostgreSQL must be rebuilt completely after applying the patch (make clean and make install). Extension has to be unpacked into contrib directory and then to be compiled and installed with make install.

cd postgresql-9.6                                                # enter postgresql source directory
git clone https://github.com/postgrespro/aqo.git contrib/aqo        # clone aqo into contrib
patch -p1 --no-backup-if-mismatch < contrib/aqo/aqo_pg<version>.patch  # patch postgresql
make clean && make && make install                               # recompile postgresql
cd contrib/aqo                                                   # enter aqo directory
make && make install                                             # install aqo
make check                                              # check whether it works correctly (optional)

Tag version at the patch name corresponds to suitable PostgreSQL release. For PostgreSQL 9.6 use the 'aqo_pg9_6.patch' file; PostgreSQL 10 use aqo_pg10.patch; for PostgreSQL 11 use aqo_pg11.patch and so on. Also, you can see git tags at the master branch for more accurate definition of suitable PostgreSQL version.

In your database:

CREATE EXTENSION aqo;

Modify your postgresql.conf:

shared_preload_libraries = 'aqo'

and restart PostgreSQL.

It is essential that library is preloaded during server startup, because adaptive query optimization must be enabled on per-cluster basis instead of per-database.

Usage

The typical case is follows: you have complicated query, which executes too long. EXPLAIN ANALYZE shows, that the possible reason is bad cardinality estimation.

Example:

                                                                             QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15028.15..15028.16 rows=1 width=96) (actual time=8168.188..8168.189 rows=1 loops=1)
   ->  Nested Loop  (cost=8.21..15028.14 rows=1 width=48) (actual time=199.500..8167.708 rows=88 loops=1)
         ->  Nested Loop  (cost=7.78..12650.75 rows=5082 width=37) (actual time=0.682..3015.721 rows=785477 loops=1)
               Join Filter: (t.id = ci.movie_id)
               ->  Nested Loop  (cost=7.21..12370.11 rows=148 width=41) (actual time=0.666..404.791 rows=14165 loops=1)
                     ->  Nested Loop  (cost=6.78..12235.17 rows=270 width=20) (actual time=0.645..146.855 rows=35548 loops=1)
                           ->  Seq Scan on keyword k  (cost=0.00..3632.40 rows=8 width=20) (actual time=0.126..29.117 rows=8 loops=1)
                                 Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[]))
                                 Rows Removed by Filter: 134162
                           ->  Bitmap Heap Scan on movie_keyword mk  (cost=6.78..1072.32 rows=303 width=8) (actual time=0.919..13.800 rows=4444 loops=8)
                                 Recheck Cond: (keyword_id = k.id)
                                 Heap Blocks: exact=23488
                                 ->  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..6.71 rows=303 width=0) (actual time=0.535..0.535 rows=4444 loops=8)
                                       Index Cond: (keyword_id = k.id)
                     ->  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21) (actual time=0.007..0.007 rows=0 loops=35548)
                           Index Cond: (id = mk.movie_id)
                           Filter: (production_year > 2000)
                           Rows Removed by Filter: 1
               ->  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.56..1.47 rows=34 width=8) (actual time=0.009..0.168 rows=55 loops=14165)
                     Index Cond: (movie_id = mk.movie_id)
         ->  Index Scan using name_pkey on name n  (cost=0.43..0.46 rows=1 width=19) (actual time=0.006..0.006 rows=0 loops=785477)
               Index Cond: (id = ci.person_id)
               Filter: (name ~~ '%Downey%Robert%'::text)
               Rows Removed by Filter: 1
 Planning time: 40.047 ms
 Execution time: 8168.373 ms
(26 rows)

Then you can use the following pattern:

BEGIN;
SET aqo.mode = 'learn';
EXPLAIN ANALYZE <query>;
RESET aqo.mode;
-- ... do EXPLAIN ANALYZE <query> while cardinality estimations in the plan are bad
--                                      and the plan is bad
COMMIT;

Warning: execute query until plan stops changing!

When the plan stops changing, you can often observe performance improvement:

                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=112883.89..112883.90 rows=1 width=96) (actual time=738.731..738.731 rows=1 loops=1)
   ->  Nested Loop  (cost=1.85..112883.23 rows=88 width=48) (actual time=73.826..738.618 rows=88 loops=1)
         ->  Nested Loop  (cost=1.43..110496.69 rows=5202 width=36) (actual time=72.917..723.994 rows=5202 loops=1)
               Join Filter: (t.id = mk.movie_id)
               ->  Nested Loop  (cost=0.99..110046.39 rows=306 width=40) (actual time=72.902..720.310 rows=306 loops=1)
                     ->  Nested Loop  (cost=0.56..109820.42 rows=486 width=19) (actual time=72.856..717.429 rows=486 loops=1)
                           ->  Seq Scan on name n  (cost=0.00..107705.93 rows=2 width=19) (actual time=72.819..717.148 rows=2 loops=1)
                                 Filter: (name ~~ '%Downey%Robert%'::text)
                                 Rows Removed by Filter: 4167489
                           ->  Index Scan using person_id_cast_info on cast_info ci  (cost=0.56..1054.82 rows=243 width=8) (actual time=0.024..0.091 rows=243 loops=2)
                                 Index Cond: (person_id = n.id)
                     ->  Index Scan using title_pkey on title t  (cost=0.43..0.45 rows=1 width=21) (actual time=0.005..0.006 rows=1 loops=486)
                           Index Cond: (id = ci.movie_id)
                           Filter: (production_year > 2000)
                           Rows Removed by Filter: 0
               ->  Index Scan using movie_id_movie_keyword on movie_keyword mk  (cost=0.43..1.26 rows=17 width=8) (actual time=0.004..0.008 rows=17 loops=306)
                     Index Cond: (movie_id = ci.movie_id)
         ->  Index Scan using keyword_pkey on keyword k  (cost=0.42..0.45 rows=1 width=20) (actual time=0.003..0.003 rows=0 loops=5202)
               Index Cond: (id = mk.keyword_id)
               Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[]))
               Rows Removed by Filter: 1
 Planning time: 51.333 ms
 Execution time: 738.904 ms
(23 rows)

The settings system in AQO works with normalised queries, i. e. queries with removed constants. For example, the normalised version of SELECT * FROM tbl WHERE a < 25 AND b = 'str'; is SELECT * FROM tbl WHERE a < CONST and b = CONST;

So the queries have equal normalisation if and only if they differ only in their constants.

Each normalised query has its own hash. The correspondence between normalised query hash and query text is stored in aqo_query_texts table:

SELECT * FROM aqo_query_texts;

 query_hash  |                                query_text
-------------+----------------------------------------------------------------------------
           0 | COMMON feature space (do not delete!)
 -1104999304 | SELECT                                                                    +
             |     MIN(k.keyword) AS movie_keyword,                                      +
             |     MIN(n.name) AS actor_name,                                            +
             |     MIN(t.title) AS hero_movie                                            +
             | FROM                                                                      +
             |     cast_info AS ci,                                                      +
             |     keyword AS k,                                                         +
             |     movie_keyword AS mk,                                                  +
             |     name AS n, title AS t                                                 +
             | WHERE                                                                     +
             |     k.keyword in ('superhero', 'sequel', 'second-part', 'marvel-comics',  +
             |                   'based-on-comic', 'tv-special', 'fight', 'violence') AND+
             |     n.name LIKE '%Downey%Robert%' AND                                     +
             |     t.production_year > 2000 AND                                          +
             |     k.id = mk.keyword_id AND                                              +
             |     t.id = mk.movie_id AND                                                +
             |     t.id = ci.movie_id AND                                                +
             |     ci.movie_id = mk.movie_id AND                                         +
             |     n.id = ci.person_id;
(2 rows)

The most useful settings are learn_aqo and use_aqo. In the example pattern above, if you want to freeze the plan and prevent aqo from further learning from queries execution statistics (which is not recommended, especially if the data tends to change significantly), you can do UPDATE SET aqo_learn=false WHERE query_hash = <query_hash>; before commit.

The extension includes two GUC's to display the executed cardinality predictions for a query. The aqo.show_details = 'on' (default - off) allows to see the aqo cardinality prediction results for each node of a query plan and an AQO summary. The aqo.show_hash = 'on' (default - off) will print hash signature for each plan node and overall query. It is system-specific information and should be used for situational analysis.

The more detailed reference of AQO settings mechanism is available further.

Advanced tuning

AQO has two kind of settings: per-query-type settings are stored in aqo_queries table in the database and also there is GUC variable aqo.mode.

If aqo.mode = 'disabled', AQO is disabled for all queries, so PostgreSQL use its own cardinality estimations during query optimization. It is useful if you want to disable aqo for all queries temporarily in the current session or for the whole cluster but not to remove or to change collected statistics and settings.

Otherwise, if the normalized query hash is stored in aqo_queries, AQO uses settings from there to process the query.

Those settings are:

Learn_aqo setting shows whether AQO collects statistics for next execution of the same query type. Enabled value may have computational overheads, but it is essential when AQO model does not fit the data. It happens at the start of AQO for the new query type or when the data distribution in database is changed.

Use_aqo setting shows whether AQO cardinalities prediction be used for next execution of such query type. Disabling of AQO usage is reasonable for that cases in which query execution time increases after applying AQO. It happens sometimes because of cost models incompleteness.

fs setting is for extra advanced AQO tuning. It may be changed manually to optimize a number of queries using the same model. It may decrease the amount of memory for models and even the query execution time, but also it may cause the bad AQO's behavior, so please use it only if you know exactly what you do.

Auto_tuning setting identifies whether AQO tries to tune learn_aqo and use_aqo settings for the query on its own.

If the normalized query hash is not stored in aqo_queries, AQO behaviour depends on the aqo.mode.

If aqo.mode is 'controlled', the unknown query is just ignored, i. e. the standard PostgreSQL optimizer is used and the query execution statistics is ignored.

If aqo.mode is 'learn', then the normalized query hash appends to aqo_queries with the default settings learn_aqo=true, use_aqo=true, auto_tuning=false, and fs = queryid which means that AQO uses separate machine learning model for this query type optimization. After that the query is processed as if it already was in aqo_queries.

Aqo.mode = 'intelligent' behaves similarly. The only difference is that default auto_tunung variable in this case is true.

if aqo.mode is 'forced', the query is not appended to aqo_queries table, but uses special COMMON feature space with identificator fspace=0 for the query optimization and update COMMON machine learning model with the execution statistics of this query.

Comments on AQO modes

'controlled' mode is the default mode to use in production, because it uses standard PostgreSQL optimizer for all unknown query types and uses predefined settings for the known ones.

'learn' mode is a base mode necessary to memorize new normalized query. The usage pattern is follows

SET aqo.mode='learn'
<query>
SET aqo.mode='controlled';
<query>
<query>
...
-- unitl convergence

'learn' mode is not recommended to be used permanently for the whole cluster, because it enables AQO for every query type, even for those ones that don't need it, and that may lead to unnecessary computational overheads and performance degradation.

'intelligent' mode is the attempt to do machine learning optimizations completelly automatically in a self-tuning manner, i.e. determine for which queries it is reasonable to use machine learing models and for which it is not. If you want to rely completely on it, you may use it on per-cluster basis: just add line aqo.mode = 'intelligent' into your postgresql.conf. Nevertheless, it may still work not very good, so we do not recommend to use it for production.

For handling workloads with dynamically generated query structures the forced mode aqo.mode = 'forced' is provided. We cannot guarantee overall performance improvement with this mode, but you may try it nevertheless. On one hand it lacks of intelligent tuning, so the performance for some queries may even decrease, on the other hand it may work for dynamic workload and consumes less memory than the 'intelligent' mode.

Recipes

If you want to freeze optimizer's behavior (i. e. disable learning under workload), use

UPDATE aqo_queries SET learn_aqo=false, auto_tuning=false;.

If you want to disable AQO for all queries, you may use

UPDATE aqo_queries SET use_aqo=false, learn_aqo=false, auto_tuning=false;.

If you want to disable aqo for all queries temporarily in the current session or for the whole cluster but not to remove or to change collected statistics and settings, you may use disabled mode:

SET aqo.mode = 'disabled';

ALTER SYSTEM SET aqo.mode = 'disabled'.

Limitations

Note that the extension doesn't work with any kind of temporary objects, because in query normalization AQO uses the inner OIDs of objects, which are different for dynamically generated objects, even if their names are equal. That is why 'intelligent', 'learn' and 'forced' aqo modes cannot be used as the system setting with such objects in the workload. In this case you can use aqo.mode='controlled' and use another aqo.mode inside the transaction to store settings for the queries without temporary objects.

The extension doesn't collect statistics on replicas, because replicas are read-only. It may use query execution statistics from master if the replica is binary, nevertheless. The version which overcomes the replica usage limitations is comming soon.

'learn' and 'intelligent' modes are not supposed to work on per-cluster basis with queries with dynamically generated structure, because they memorize all normalized query hashes, which are different for all queries in such workload. Dynamically generated constants are okay.

License

Reference

The paper on the proposed method is also under development, but the draft version with experiments is available here.

aqo's People

Contributors

Stargazers

Watchers

Forkers

chubbymaggie vadv hailanwhu emrul phil-arena yoer77 tvondra jimfamazon rekgrpth hisahin coraop joe2hpimn parimarjan cuulee pengzhanyuan isgasho yunjiazhang hoangdzung ermako27 bcui6611 zhihuifan sshyran yerzhaisang kulaginm a-masterov samat1120 zhengtongyan zhuomingliang queenofpigeons andrey2001 xhevx belovchenko parar020100 listenbehind zekai427 browin-pziemba eduard-gr hnusttyj530 ffrankkk122 prashantgolash svglukhov anisimov-ds hslightdb ringsc alena0704 andreyc21 fredatgithub postgraphdb joshinnis

aqo's Issues

One of the Join Order Benchmark queries stops reducing cardinality error

In one case (and one case only), cardinality error achieves stability without progressing towards 0. here is the query and the plan after convergence. I also have the states of the aqo tables which I can forward if requested.

Here is one line of the explain output showing a non-converged estimated vs actual rows:

Hash Join (cost=96459.24..130001.90 rows=68198 width=29) (actual time=973.541..1415.390 rows=198750 loops=3)

The query:

EXPLAIN (analyze,verbose,buffers) SELECT MIN(chn.name) AS character, MIN(t.title) AS movie_with_american_producer FROM char_name AS chn, cast_info AS ci, company_name AS cn, company_type AS ct, movie_companies AS mc, role_type AS rt, title AS t WHERE ci.note like '%(producer)%' AND cn.country_code = '[us]' AND t.production_year > 1990 AND t.id = mc.movie_id AND t.id = ci.movie_id AND ci.movie_id = mc.movie_id AND chn.id = ci.person_role_id AND rt.id = ci.role_id AND cn.id = mc.company_id AND ct.id = mc.company_type_id;

The text format plan:

QUERY PLAN
Finalize Aggregate (cost=242898.15..242898.16 rows=1 width=64) (actual time=6689.777..6689.777 rows=1 loops=1)
Output: min(chn.name), min(t.title)
Buffers: shared hit=23037955
-> Gather (cost=242897.93..242898.14 rows=2 width=64) (actual time=6689.740..6689.811 rows=3 loops=1)
Output: (PARTIAL min(chn.name)), (PARTIAL min(t.title))
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=23037955
-> Partial Aggregate (cost=241897.93..241897.94 rows=1 width=64) (actual time=6670.368..6670.368 rows=1 loops=3)
Output: PARTIAL min(chn.name), PARTIAL min(t.title)
Buffers: shared hit=23037955
Worker 0: actual time=6663.069..6663.069 rows=1 loops=1
Buffers: shared hit=7855588
Worker 1: actual time=6658.810..6658.810 rows=1 loops=1
Buffers: shared hit=7632929
-> Nested Loop (cost=96460.50..241897.93 rows=1 width=33) (actual time=2452.152..6670.319 rows=3 loops=3)
Output: chn.name, t.title
Inner Unique: true
Buffers: shared hit=23037955
Worker 0: actual time=2371.185..6663.041 rows=2 loops=1
Buffers: shared hit=7855588
Worker 1: actual time=2539.235..6658.735 rows=4 loops=1
Buffers: shared hit=7632929
-> Nested Loop (cost=96460.36..241897.77 rows=1 width=37) (actual time=2452.140..6670.293 rows=3 loops=3)
Output: chn.name, ci.role_id, t.title
Inner Unique: true
Buffers: shared hit=23037933
Worker 0: actual time=2371.169..6663.020 rows=2 loops=1
Buffers: shared hit=7855583
Worker 1: actual time=2539.220..6658.703 rows=4 loops=1
Buffers: shared hit=7632920
-> Nested Loop (cost=96460.23..241897.62 rows=1 width=41) (actual time=2452.113..6670.247 rows=3 loops=3)
Output: chn.name, ci.role_id, mc.company_type_id, t.title
Inner Unique: true
Buffers: shared hit=23037911
Worker 0: actual time=2371.132..6662.976 rows=2 loops=1
Buffers: shared hit=7855578
Worker 1: actual time=2539.188..6658.652 rows=4 loops=1
Buffers: shared hit=7632911
-> Nested Loop (cost=96459.80..199048.11 rows=89456 width=29) (actual time=973.601..6596.097 rows=260701 loops=3)
Output: ci.person_role_id, ci.role_id, mc.company_type_id, t.title
Join Filter: (t.id = ci.movie_id)
Buffers: shared hit=23037869
Worker 0: actual time=968.453..6589.125 rows=263841 loops=1
Buffers: shared hit=7855569
Worker 1: actual time=976.696..6582.546 rows=264690 loops=1
Buffers: shared hit=7632894
-> Hash Join (cost=96459.24..130001.90 rows=68198 width=29) (actual time=973.541..1415.390 rows=198750 loops=3)
Output: mc.movie_id, mc.company_type_id, t.title, t.id
Inner Unique: true
Hash Cond: (mc.movie_id = t.id)
Buffers: shared hit=270485
Worker 0: actual time=968.404..1435.629 rows=203090 loops=1
Buffers: shared hit=90069
Worker 1: actual time=976.625..1420.650 rows=197823 loops=1
Buffers: shared hit=90479
-> Hash Join (cost=6990.11..39505.27 rows=391430 width=8) (actual time=49.950..343.319 rows=384599 loops=3)
Output: mc.movie_id, mc.company_type_id
Inner Unique: true
Hash Cond: (mc.company_id = cn.id)
Buffers: shared hit=54971
Worker 0: actual time=49.496..350.360 rows=393049 loops=1
Buffers: shared hit=18231
Worker 1: actual time=49.708..348.897 rows=389949 loops=1
Buffers: shared hit=18641
-> Parallel Seq Scan on public.movie_companies mc (cost=0.00..29661.37 rows=1087137 width=12) (actual time=0.143..82.673 rows=869710 loop
s=3)
Output: mc.id, mc.movie_id, mc.company_id, mc.company_type_id, mc.note
Buffers: shared hit=37478
Worker 0: actual time=0.111..81.862 rows=860027 loops=1
Buffers: shared hit=12390
Worker 1: actual time=0.136..85.262 rows=892083 loops=1
Buffers: shared hit=12800
-> Hash (cost=5932.46..5932.46 rows=84612 width=4) (actual time=49.380..49.380 rows=84843 loops=3)
Output: cn.id
Buckets: 131072 Batches: 1 Memory Usage: 4007kB
Buffers: shared hit=17433
Worker 0: actual time=48.959..48.959 rows=84843 loops=1
Buffers: shared hit=5811
Worker 1: actual time=49.206..49.206 rows=84843 loops=1
Buffers: shared hit=5811
-> Seq Scan on public.company_name cn (cost=0.00..5932.46 rows=84612 width=4) (actual time=0.110..35.890 rows=84843 loops=3)
Output: cn.id
Filter: ((cn.country_code)::text = '[us]'::text)
Rows Removed by Filter: 150154
Buffers: shared hit=17433
Worker 0: actual time=0.058..35.900 rows=84843 loops=1
Buffers: shared hit=5811
Worker 1: actual time=0.126..35.363 rows=84843 loops=1
Buffers: shared hit=5811
-> Hash (cost=67601.90..67601.90 rows=1749378 width=21) (actual time=914.259..914.259 rows=1749032 loops=3)
Output: t.title, t.id
Buckets: 2097152 Batches: 1 Memory Usage: 110019kB
Buffers: shared hit=215514
Worker 0: actual time=909.719..909.719 rows=1749032 loops=1
Buffers: shared hit=71838
Worker 1: actual time=917.516..917.516 rows=1749032 loops=1
Buffers: shared hit=71838
-> Seq Scan on public.title t (cost=0.00..67601.90 rows=1749378 width=21) (actual time=0.099..446.134 rows=1749032 loops=3)
Output: t.title, t.id
Filter: (t.production_year > 1990)
Rows Removed by Filter: 779280
Buffers: shared hit=215514
Worker 0: actual time=0.117..418.654 rows=1749032 loops=1
Buffers: shared hit=71838
Worker 1: actual time=0.059..457.994 rows=1749032 loops=1
Buffers: shared hit=71838
-> Index Scan using movie_id_cast_info on public.cast_info ci (cost=0.56..1.00 rows=1 width=12) (actual time=0.015..0.026 rows=1 loops=596250)
Output: ci.id, ci.person_id, ci.movie_id, ci.person_role_id, ci.note, ci.nr_order, ci.role_id
Index Cond: (ci.movie_id = mc.movie_id)
Filter: (ci.note ~~ '%(producer)%'::text)
Rows Removed by Filter: 34
Buffers: shared hit=22767384
Worker 0: actual time=0.015..0.025 rows=1 loops=203090
Buffers: shared hit=7765500
Worker 1: actual time=0.015..0.026 rows=1 loops=197823
Buffers: shared hit=7542415
-> Index Scan using char_name_pkey on public.char_name chn (cost=0.43..0.48 rows=1 width=20) (actual time=0.000..0.000 rows=0 loops=782104)
Output: chn.id, chn.name, chn.imdb_index, chn.imdb_id, chn.name_pcode_nf, chn.surname_pcode, chn.md5sum
Index Cond: (chn.id = ci.person_role_id)
Buffers: shared hit=42
Worker 0: actual time=0.000..0.000 rows=0 loops=263841
Buffers: shared hit=9
Worker 1: actual time=0.000..0.000 rows=0 loops=264690
Buffers: shared hit=17
-> Index Only Scan using company_type_pkey on public.company_type ct (cost=0.13..0.15 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=10)
Output: ct.id
Index Cond: (ct.id = mc.company_type_id)
Heap Fetches: 4
Buffers: shared hit=22
Worker 0: actual time=0.019..0.019 rows=1 loops=2
Buffers: shared hit=5
Worker 1: actual time=0.010..0.010 rows=1 loops=4
Buffers: shared hit=9
-> Index Only Scan using role_type_pkey on public.role_type rt (cost=0.14..0.15 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=10)
Output: rt.id
Index Cond: (rt.id = ci.role_id)
Heap Fetches: 4
Buffers: shared hit=22
Worker 0: actual time=0.008..0.008 rows=1 loops=2
Buffers: shared hit=5
Worker 1: actual time=0.006..0.006 rows=1 loops=4
Buffers: shared hit=9
Planning time: 11.438 ms
Execution time: 6698.929 ms
Plan Hash: -1203362165
Using aqo: true
(146 rows)

DockerHub Distribution

It would be great to have a ready-to-run docker image. This would lower the hurdle to get started with aqo.

ERROR: could not find block containing chunk

The error above is raised when attempting to EXPLAIN ANALYZE a statement (shown below) that is being managed by aqo. If aqo.mode is set to 'disabled', then the error no longer appears and the EXPLAIN ANALYZE completes successfully. If aqo.mode is subsequently reset, the error reappears. This is currently reproducible.

In case there is some sort of interaction here, I'll mention that pg_stat_statements and apg_plan_mgmt were also overriding the ExecutorEnd hook in this test, with shared_preload_libraries listing them in the order pg_stat_statements,aqo,apg_plan_mgmt.

imdbload=# EXPLAIN (ANALYZE) SELECT MIN(n.name) AS member_in_charnamed_movie FROM cast_info AS ci, company_name AS cn, keyword AS k, movie_companies AS mc, movie_keyword AS mk, name AS n, title AS t WHERE k.keyword ='character-name-in-title' AND n.name LIKE '%Bert%' AND n.id = ci.person_id AND ci.movie_id = t.id AND t.id = mk.movie_id AND mk.keyword_id = k.id AND t.id = mc.movie_id AND mc.company_id = cn.id AND ci.movie_id = mc.movie_id AND ci.movie_id = mk.movie_id AND mc.movie_id = mk.movie_id;
ERROR: could not find block containing chunk 0x2b625af3cd80
imdbload=# set aqo.mode = 'disable';
ERROR: invalid value for parameter "aqo.mode": "disable"
HINT: Available values: intelligent, forced, controlled, learn, disabled.
imdbload=# set aqo.mode = 'disabled';
SET
imdbload=# EXPLAIN (ANALYZE) SELECT MIN(n.name) AS member_in_charnamed_movie FROM cast_info AS ci, company_name AS cn, keyword AS k, movie_companies AS mc, movie_keyword AS mk, name AS n, title AS t WHERE k.keyword ='character-name-in-title' AND n.name LIKE '%Bert%' AND n.id = ci.person_id AND ci.movie_id = t.id AND t.id = mk.movie_id AND mk.keyword_id = k.id AND t.id = mc.movie_id AND mc.company_id = cn.id AND ci.movie_id = mc.movie_id AND ci.movie_id = mk.movie_id AND mc.movie_id = mk.movie_id;
QUERY PLAN
Aggregate (cost=2938.32..2938.33 rows=1 width=32) (actual time=4747.865..4747.865 rows=1 loops=1)
-> Nested Loop (cost=2.71..2938.31 rows=1 width=15) (actual time=26.390..4744.284 rows=11538 loops=1)
-> Nested Loop (cost=2.28..2937.87 rows=1 width=27) (actual time=26.380..4718.974 rows=11538 loops=1)
-> Nested Loop (cost=1.86..2937.43 rows=1 width=31) (actual time=26.365..4683.451 rows=11538 loops=1)
-> Nested Loop (cost=1.43..2936.94 rows=1 width=23) (actual time=26.357..4668.745 rows=1972 loops=1)
-> Nested Loop (cost=1.00..2698.36 rows=528 width=12) (actual time=0.653..1276.244 rows=1038393 loops=1)
-> Nested Loop (cost=0.43..2662.48 rows=34 width=4) (actual time=0.644..39.356 rows=41840 loops=1)
-> Seq Scan on keyword k (cost=0.00..2626.12 rows=1 width=4) (actual time=0.628..10.811 rows=1 loops=1)
Filter: (keyword = 'character-name-in-title'::text)
Rows Removed by Filter: 134169
-> Index Scan using keyword_id_movie_keyword on movie_keyword mk (cost=0.43..36.02 rows=34 width=8) (actual time=0.012..22.448 rows=41840 loops=1)
Index Cond: (keyword_id = k.id)
-> Index Scan using movie_id_cast_info on cast_info ci (cost=0.56..0.91 rows=15 width=8) (actual time=0.004..0.026 rows=25 loops=41840)
Index Cond: (movie_id = mk.movie_id)
-> Index Scan using name_pkey on name n (cost=0.43..0.45 rows=1 width=19) (actual time=0.003..0.003 rows=0 loops=1038393)
Index Cond: (id = ci.person_id)
Filter: (name ~~ '%Bert%'::text)
Rows Removed by Filter: 1
-> Index Scan using movie_id_movie_companies on movie_companies mc (cost=0.43..0.47 rows=2 width=8) (actual time=0.004..0.006 rows=6 loops=1972)
Index Cond: (movie_id = ci.movie_id)
-> Index Only Scan using company_name_pkey on company_name cn (cost=0.42..0.44 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=11538)
Index Cond: (id = mc.company_id)
Heap Fetches: 11538
-> Index Only Scan using title_pkey on title t (cost=0.43..0.45 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=11538)
Index Cond: (id = ci.movie_id)
Heap Fetches: 11538
Planning time: 6.762 ms
Execution time: 4747.965 ms
(28 rows)
imdbload=#

Search path order could cause aqo_* tables to not be stored in the public schema

The aqo--1.0.sql file doesn't specify the public schema name on the aqo_* relations, but internally the code expects that these objects will be in the public schema. Specifying "public." on the aqo_* objects would eliminate this source of errors if the user sets the search_path to pg_catalog, public; (for example).

Getting join cardinality convergence on a difficult query

The Join Order Benchmark contains the difficult query shown below. AQO was able to improve the cardinality estimates enough to improve this plan from 40 seconds to 30 seconds on an AWS r4.large, but as you can see from the plan below, the cardinality estimates are still quite far from the actuals at the upper levels of the plan after more than 10 iterations.

If you are able to get this plan to converge, can you describe how you were able to do it? What is the best known plan for this query?

I tried adding a pg_hint_plan rows hint at one point to see if this would help aqo get past a stuck point, but this made the plan much worse (60 sec) and again did not converge from that point after multiple iterations. Is it helpful or harmful to attempt to "help" aqo by specifying a rows hint, in general?

Thank you,

/Jim Finnerty

================================

imdbload=# explain (analyze) SELECT MIN(n.name) AS voicing_actress, MIN(t.title) AS jap_engl_voiced_movie FROM aka_name AS an, char_name AS chn, cast_info AS ci, company_name AS cn, info_type AS it, movie_companies AS mc, movie_info AS mi, name AS n, role_type AS rt, title AS t WHERE ci.note in ('(voice)', '(voice: Japanese version)', '(voice) (uncredited)', '(voice: English version)') AND cn.country_code ='[us]' AND it.info = 'release dates' AND n.gender ='f' AND rt.role ='actress' AND t.production_year > 2000 AND t.id = mi.movie_id AND t.id = mc.movie_id AND t.id = ci.movie_id AND mc.movie_id = ci.movie_id AND mc.movie_id = mi.movie_id AND mi.movie_id = ci.movie_id AND cn.id = mc.company_id AND it.id = mi.info_type_id AND n.id = ci.person_id AND rt.id = ci.role_id AND n.id = an.person_id AND ci.person_id = an.person_id AND chn.id = ci.person_role_id;
QUERY PLAN

Finalize Aggregate (cost=320321.28..320321.29 rows=1 width=64) (actual time=30947.238..30947.238 rows=1 loops=1)
-> Gather (cost=320321.06..320321.27 rows=2 width=64) (actual time=26312.904..30948.645 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial Aggregate (cost=319321.06..319321.07 rows=1 width=64) (actual time=27533.687..27533.687 rows=1 loops=3)
-> Hash Join (cost=266398.43..315653.05 rows=733602 width=32) (actual time=14696.038..26887.515 rows=586882 loops=3)
Hash Cond: (mi.info_type_id = it.id)
-> Nested Loop (cost=266396.00..315336.23 rows=115441 width=36) (actual time=14695.834..25412.125 rows=3447230 loops=3)
Join Filter: (t.id = mi.movie_id)
-> Nested Loop (cost=266395.57..307771.28 rows=3908 width=44) (actual time=14695.811..17634.932 rows=88661 loops=3)
-> Nested Loop (cost=266395.14..300561.72 rows=13115 width=52) (actual time=14694.376..17106.971 rows=32440 loops=3)
-> Nested Loop (cost=266394.71..294819.73 rows=7332 width=33) (actual time=14694.349..16690.333 rows=32462 loops=3)
-> Nested Loop (cost=266394.28..285195.83 rows=14323 width=37) (actual time=14694.315..16280.735 rows=34797 loops=3)
-> Merge Join (cost=266393.85..271532.82 rows=26327 width=16) (actual time=14694.278..15562.276 rows=63185 loops=3)
Merge Cond: (mc.movie_id = ci.movie_id)
-> Sort (cost=91113.37..92107.56 rows=397675 width=4) (actual time=2670.659..2864.283 rows=384599 loops=3)
Sort Key: mc.movie_id
Sort Method: external merge Disk: 4920kB
-> Hash Join (cost=7342.99..48688.15 rows=397675 width=4) (actual time=262.085..1904.540 rows=384599 loops=3)
Hash Cond: (mc.company_id = cn.id)
-> Parallel Seq Scan on movie_companies mc (cost=0.00..29661.37 rows=1087137 width=8) (actual time=0.224..667.425 rows=869710 loops=3)
-> Hash (cost=5932.46..5932.46 rows=85962 width=4) (actual time=260.817..260.817 rows=84843 loops=3)
Buckets: 131072 Batches: 2 Memory Usage: 2525kB
-> Seq Scan on company_name cn (cost=0.00..5932.46 rows=85962 width=4) (actual time=0.197..194.391 rows=84843 loops=3)
Filter: ((country_code)::text = '[us]'::text)
Rows Removed by Filter: 150154
-> Materialize (cost=175280.28..176661.11 rows=276166 width=12) (actual time=12018.184..12343.888 rows=305714 loops=3)
-> Sort (cost=175280.28..175970.70 rows=276166 width=12) (actual time=12018.176..12199.636 rows=276140 loops=3)
Sort Key: ci.movie_id
Sort Method: external merge Disk: 5888kB
-> Nested Loop (cost=0.56..145600.04 rows=276166 width=12) (actual time=0.708..11363.055 rows=276166 loops=3)
-> Seq Scan on role_type rt (cost=0.00..1.15 rows=1 width=4) (actual time=0.027..0.033 rows=1 loops=3)
Filter: ((role)::text = 'actress'::text)
Rows Removed by Filter: 11
-> Index Scan using role_id_cast_info on cast_info ci (cost=0.56..142837.23 rows=276166 width=16) (actual time=0.676..11244.881 rows=276166 loops=3)
Index Cond: (role_id = rt.id)
Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voice) (uncredited)","(voice: English version)"}'::text[]))
Rows Removed by Filter: 7175807
-> Index Scan using title_pkey on title t (cost=0.43..0.52 rows=1 width=21) (actual time=0.010..0.010 rows=1 loops=189554)
Index Cond: (id = mc.movie_id)
Filter: (production_year > 2000)
Rows Removed by Filter: 0
-> Index Only Scan using char_name_pkey on char_name chn (cost=0.43..0.67 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=104391)
Index Cond: (id = ci.person_role_id)
Heap Fetches: 27791
-> Index Scan using name_pkey on name n (cost=0.43..0.78 rows=1 width=19) (actual time=0.011..0.011 rows=1 loops=97386)
Index Cond: (id = ci.person_id)
Filter: ((gender)::text = 'f'::text)
Rows Removed by Filter: 0
-> Index Only Scan using person_id_aka_name on aka_name an (cost=0.42..0.52 rows=3 width=4) (actual time=0.010..0.014 rows=3 loops=97320)
Index Cond: (person_id = n.id)
Heap Fetches: 55315
-> Index Scan using movie_id_movie_info on movie_info mi (cost=0.43..1.45 rows=39 width=8) (actual time=0.013..0.061 rows=39 loops=265983)
Index Cond: (movie_id = mc.movie_id)
-> Hash (cost=2.41..2.41 rows=1 width=4) (actual time=0.058..0.058 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4) (actual time=0.023..0.052 rows=1 loops=3)
Filter: ((info)::text = 'release dates'::text)
Rows Removed by Filter: 112
Planning time: 203.974 ms
Execution time: 30954.839 ms
SQL Hash: -1864849711, Plan Hash: 1163570348
Using aqo: true
(63 rows)

First-time user question - setting up and running aqo

The following plan shows a large cardinality estimation error. AQO isn't able to improve the estimate. Is this hitting a known limitation of AQO?

There is also an index-based plan for delivering the order needed by the aggregate has lower startup cost, and should have been selected in this case because of the LIMIT 1.

The plan:

Limit (cost=6479.18..6498.46 rows=1 width=4) (actual time=51.979..51.979 rows=1 loops=1)
-> Subquery Scan on tmp (cost=6479.18..6498.46 rows=1 width=4) (actual time=51.978..51.978 rows=1 loops=1)
Filter: ((tmp.pstn_vrsn_nbr = tmp.a) AND (((tmp.prty_1_trd_rfrnc_id = 'MQa4147f9c7ac809116532482232ee0630'::text) AND (tmp.prty_1_id = 'EQP2317'::text)) OR ((tmp.prty_2_trd_rfrnc_id = 'MQ7867600a47151071bd01bc6c71c0f4d9'::text) AND (tmp.prty_2_id = 'EQP2895'::text))))
-> WindowAgg (cost=6479.18..6487.62 rows=482 width=96) (actual time=51.974..51.974 rows=1 loops=1)
-> Sort (cost=6479.18..6480.39 rows=482 width=92) (actual time=51.962..51.962 rows=2 loops=1)
Sort Key: pos.dtcc_trd_ref_id
Sort Method: quicksort Memory: 102kB
-> Gather (cost=1000.00..6457.70 rows=482 width=92) (actual time=0.343..51.517 rows=519 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on "position" pos (cost=0.00..5409.50 rows=201 width=92) (actual time=0.314..41.780 rows=173 loops=3)
Filter: ((prty_1_id = 'EQP2317'::text) OR (prty_2_id = 'EQP2895'::text))
Rows Removed by Filter: 83160
Planning time: 0.747 ms
Execution time: 53.444 ms
Using aqo: true
(16 rows)

The SQL:

explain analyze
SELECT 1
FROM
(SELECT
dtcc_trd_ref_id,
pstn_vrsn_nbr,
prty_1_trd_rfrnc_id,
prty_1_id,
prty_2_trd_rfrnc_id,
prty_2_id ,
MAX(pstn_vrsn_nbr) OVER (PARTITION BY dtcc_trd_ref_id) AS a
FROM ddl_position.position pos
WHERE (prty_1_id = 'EQP2317' or prty_2_id = 'EQP2895')
) as tmp
WHERE
pstn_vrsn_nbr = a
AND ((prty_1_trd_rfrnc_id = 'MQa4147f9c7ac809116532482232ee0630' AND prty_1_id = 'EQP2317') OR
(prty_2_trd_rfrnc_id = 'MQ7867600a47151071bd01bc6c71c0f4d9' AND prty_2_id = 'EQP2895')
)
limit 1 ;

query_id usage in bundle of AQO and pg_stat_statements (PGSS) extensions

In stable 14 and beyond we pretend to use the same query_id in both extensions.
Adding AQO TAP-tests on this feature we found out next issues:

AQO stores not-parameterized query texts.
In the case of EXPLAIN ANALYZE, query_id in the extensions are different
Also, we should check the case with queries in stored procedures

AQO state for IndexScan doesn't contain Index column

explore_query_plan.sql.zip

Index Scan using person_pkey on person (cost=0.29..3226.29 rows=9738 width=8) (actual rows=9738 loops=1) (AQO: cardinality=9738, error=0%, fss hash = -1150030445)

SELECT * FROM aqo_data WHERE fsspace_hash=-1150030445;
fspace_hash | fsspace_hash | nfeatures | features | targets
-------------+--------------+-----------+----------+---------
1916361181 | -1150030445 | 0 | | {0.01}

Index Only Scan using person_id_age_idx on public.person (cost=0.29..8.31 rows=1 width=8) (actual rows=1 loops=10) (AQO: cardinality=1, error=0%, fss hash = 1972255378)

SELECT * FROM aqo_data WHERE fsspace_hash=1972255378;
fspace_hash | fsspace_hash | nfeatures | features | targets
-------------+--------------+-----------+-------------------------+---------
1916361181 | 1972255378 | 1 | {{-11.512925464970229}} | {0}

WARNING: Unexpected number of features for hash

The following warnings were issued while explain-analyzing the query shown below from the Join Order Benchmark. This also caused postgres to crash. I'll delete data and re-try

imdbload=# explain (hashes, verbose, analyze, buffers) SELECT MIN(n.name) AS voicing_actress, MIN(t.title) AS jap_engl_voiced_movie FROM aka_name AS an, char_name AS chn, cast_info AS ci, company_name AS cn, info_type AS it, movie_companies AS mc, movie_info AS mi, name AS n, role_type AS rt, title AS t WHERE ci.note in ('(voice)', '(voice: Japanese version)', '(voice) (uncredited)', '(voice: English version)') AND cn.country_code ='[us]' AND it.info = 'release dates' AND n.gender ='f' AND rt.role ='actress' AND t.production_year > 2000 AND t.id = mi.movie_id AND t.id = mc.movie_id AND t.id = ci.movie_id AND mc.movie_id = ci.movie_id AND mc.movie_id = mi.movie_id AND mi.movie_id = ci.movie_id AND cn.id = mc.company_id AND it.id = mi.info_type_id AND n.id = ci.person_id AND rt.id = ci.role_id AND n.id = an.person_id AND ci.person_id = an.person_id AND chn.id = ci.person_role_id;
WARNING: unexpected number of features for hash (-101880755, -778540527): expected 0 features, obtained 8
WARNING: unexpected number of features for hash (-101880755, -1173058144): expected 7 features, obtained 6
WARNING: unexpected number of features for hash (-101880755, 1726969303): expected 9 features, obtained 8
WARNING: unexpected number of features for hash (-101880755, 1200086649): expected 8 features, obtained 6

How can i move aqo data from one db to another?

Our db have huge amount queries. For some postgres make good plans, for some - bad.
I want to use aqo on copy database. After it gather enough data for changing only required plans of queries, i want to copy aqo data to production db.
But i saw only views with names aqo*. Only views?
Can i copy views to another bd& And how?

AQO doesn't learn in plpgsql script

nulls.sql.zip

How to use this extension on a Postgres using Docker ?

Hello there ! First, thanks for creating this repository, it looks very promising and I can't wait to test it out.

I have some trouble understanding the steps to get going, especially when using Postgres in a Docker container. So I have some questions :

For now I'm using an official image (postgres:13.4-buster), is it possible to install the extension in the container ? Or do I need to create my own Dockerfile which builds from the official image but has some additional steps ?

If so, what are the steps required ? I'm sure i'm not the only one using Postgres in Docker, maybe it could be a good idea to put it in the README ? What do you think ?

Thanks !

Compilation Issues

I'm running into compilation issues during testing AQO.

Can you propose a release / tag of PostgreSQL and AQO which works together?

For example:

PostgreSQL release REL9_6_14 (https://github.com/postgres/postgres/releases)
AQO stable tag

License?

There's no license information for this project which makes it rather hard for people to use it if they work in an environment where these things are of concern.

Could you add a license file somewhere under which this code is released?

Document dependency of aqo and pg_hint_plan

The relationship of pg_hint_plan Rows hints and aqo should be explored and documented, and if there is a dependency of the order of aqo and pg_hint_plan in shared_preload_libraries, this should be documented also.

AQO не может найти оптимальный параметризованый план запроса, поскольку то не разу не выполнялся

Archive.zip

can not build aqo

I'm trying to build aqo on postgresql 12.5, but it always fails with error like the followed picture attached.
The step in README.md is strictly followed.

Hook calling order

The copy_generic_path_info_hook currently performs its action first, and then at the end checks to see if a prev_copy_generic_path_info_hook was defined, and (if a previous is defined) then does that previous action last. The test and set should occur at the start of aqo_prev_copy_generic_path_info so that the order of execution follows the order of execution as specified in shared_preload_libraries in the event that more than one hook override is specified.

A sequence of hook callbacks following the call-my-predecessor-first convention would call the hooks in the order specified in shared_preload_libraries.

AQO service storage problems

One of the main aqo problems is the storage (aqo_* relations). Extension writes query and a learn result during (sub)transaction, before and after query execution. It induce some problems:

During service relation write access we change CommandCounter state. It can't be happened during a parallel mode (now we disable aqo in this case); Also, changing CommandCounter state during partitioning triggers execution lead to inifinity invalidation messages cycle (it can be demonstrated by alter_table.sql regression test). And some another problems.
Parametrized plans has same hash value. It is induces aqo-related deadlocks which is not expected without aqo learning. It is meaning that on a high loads we need to use AQO_MODE_FORCED mode and learn aqo before high loading.

Recommended way to delete queries

What is the recommended way to delete queries that a user doesn't want to track any more? Do you recommend just deleting unwanted rows from aqo_query_texts and aqo_queries, or is there an API for that?