ioguix / pgsql-bloat-estimation Goto Github PK

View Code? Open in Web Editor NEW

497.0 27.0 92.0 28 KB

Queries to mesure statistical bloat in indexes and tables for PostgreSQL

License: BSD 2-Clause "Simplified" License

pgsql-bloat-estimation's Introduction

pgsql-bloat-estimation

Queries to mesure statistical bloat in btree indexes and tables for PostgreSQL.

Three different kind of non used space should be considered:

the alignment padding: depending on the type, PostgreSQL adds some padding to your fields to align them correctly in the row. This is related to some CPU manipulation optimisation.
the fillfactor: this allows you to set up a ratio of free space to keep in your tables or indexes. See the PostgreSQL documentation for more information
the bloat itself: this is the extra space not needed by the table or the index to keep your rows. This should be mapped and under control by autovacuum and/or your vacuum maintenance procedure.

ToC:

Bloat in Tables
Bloat in btree indexes
Caveats

Bloat in Tables

The queries from the "table" folder estimate the bloat for tables. They expose these fields:

current_database: name of the current database.
schemaname: schema of the table.
tblname: the table name.
real_size: real size of the table.
extra_size: estimated extra size not used/needed in the table. This extra size is composed by the fillfactor, bloat and alignment padding spaces.
extra_pct: estimated percentage of the real size used by extra_size.
fillfactor: the fillfactor of the table.
bloat_size: estimated size of the bloat without the extra space kept for the fillfactor.
bloat_pct: estimated percentage of the real size used by bloat_size.
is_na: is the estimation "Not Applicable" ? If true, do not trust the stats.

AS 7.4, 8.0 and 8.1 do not have fillfactor, extra_size, extra_pct and bloat_size are not reported.

Bloat in btree indexes

The queries from the "btree" folder estimate the bloat for btree indexes. They expose these fields:

current_database: name of the current database
schemaname: schema of the table
tblname: the table name
idxname: the index name
real_size: real size of the index
extra_size: estimated extra size not used/needed by the index. This extra size is composed by the fillfactor, bloat and alignment padding spaces.
extra_pct: estimated percentage of the real size used by extra_size.
fillfactor: the fillfactor of the index.
bloat_size: estimated size of the bloat without the extra space kept for the fillfactor.
bloat_pct: estimated percentage of the real size used by bloat_size.
is_na: is the estimation "Not Applicable" ? If true, do not trust the stats.

The query in file btree_bloat-superuser.sql is much faster than other one. However, it must be executed by a superuser role only.

Caveats

The `is_na` column

This field allows you to filter out statistics considered wrong by the query itself. Just uncomment the WHERE clause.

This excludes:

any table or index using the name type. Statistics for this type are not correlated to its space use, leading to wrong statistics. A lot of relations from pg_catalog reports negative stats because of this.
any table which lacks some statistics. For example, the json type had no statistics available before 9.5

Toasted fields

PostgreSQL has an internal mechanism to compress/slice large values from large rows outside of the heap space in a relation called TOAST. See the following page for more informations: https://www.postgresql.org/docs/current/storage-toast.html

Depending on your values, some fields might be toasted and some other not among the same table. Unfortunately, there's no statistics available about how much data has been toasted away from the heap, how much has been compressed inside the heap, their real size, etc. In short, the average field statistics in heap do not consider if the value is a toast pointer or a real value.

Because of this, statistics on variable length fields might be largely underestimated, even leading to negative bloat for some tables.

There's no way to include such situation in the existing is_na column.

Alignment padding

Unfortunately, as it is not possible to compute the space wasted by the alignment paddings, it is always reported in the bloat fields. Sometime, this space can takes up to 10% or more of the table size. See the chapter "The alignment deviation" from this page for more information.

This means you can estimate this space by running the query on non-bloated table. The bloat fields will then only report this alignment padding space. For large table, you can sample it in a smaller table of 100 pages or so, keeping the same field order. The bloat estimation query will report the same average space wasted by alignment padding from this table.

Size of tables/indexes

Small table or indexes (few pages) will certainly reports high bloat percentage. Each pages beeing 8kB, the less you have rows to fill them, the smaller they are, the more you will have natural spaces in there.

As example, if you need 100 rows to fill one page and your table have 150 rows, your table will be on 2 pages, 16kB. The second page having only 50 rows, You'll have a natural bloat of 4kB, 25% of your table.

pgsql-bloat-estimation's People

Contributors

Stargazers

Watchers

Forkers

larcara feikesteenbergen michaelcrobinson syaroslavtsev fabriziomello dleeman bspindler alexeyklyukin elrendir95 ryninho sammcj cdsboy blunney1 davidgoussev emerichunter postingres tilkow jcott28 logbue-dataxu anayrat frankiee cabecada lcavadas mitchellspryn vladislavepolyakov pgiraud mgiay singhai0 ratheep drake-adl paulmallon deepdiver13 jm-g zhuomingliang d0k1 mvh ryefccd jaa-aa ikyaqoob apermir mcardinalupgrade hunleyd ahmedalhulaibi venumadhavvennu daamien thomaspoty simbataisa apeksharma rustprooflabs steschas madtibo wlokes nziebart jobinau missive guanqingtao gerdansantos aalokinstitute dacofr rajalakshmi-vidispine vitek0585 vibhorkumar123 ayodelec dazuiba008 maizarrahman jonathanperret kzaorski bradnicholson mechos3d ablignaut caddleman kristysiu-finn salilpworkz acidbluebriggs brezencat hds1999 tszgc shangzixie jdedb evgenyek nballance danielabelski taltaf913 rjhalter986 bolajiwahab glauciobb igor-mineev abzelev zhaoyanjie001

pgsql-bloat-estimation's Issues

replacing JOIN pg_catalog.pg_stats with LEFT JOIN

Currently, the script joins with pg_stats (and limits infos about the index only if a corresponding entry is found in pgstats). How about if we do a left join instead? then we need to adjust the code to avoid division by zero and we are done.

-- WARNING: executed with a non-superuser role, the query inspect only index on tables you are granted to read.
-- WARNING: rows with is_na = 't' are known to have bad statistics ("name" type is not supported).
-- This query is compatible with PostgreSQL 8.2 and after
SELECT current_database(), nspname AS schemaname, tblname, idxname, bs*(relpages)::bigint AS real_size,
  bs*(relpages-est_pages)::bigint AS extra_size,
  100 * (relpages-est_pages)::float / relpages AS extra_ratio,
  fillfactor,
  CASE WHEN relpages > est_pages_ff
    THEN bs*(relpages-est_pages_ff)
    ELSE 0
  END AS bloat_size,
  100 * (case when relpages > est_pages_ff then relpages-est_pages_ff else 0 end)::float / relpages AS bloat_ratio,
  is_na
  -- , 100-(pst).avg_leaf_density AS pst_avg_bloat, est_pages, index_tuple_hdr_bm, maxalign, pagehdr, nulldatawidth, nulldatahdrwidth, reltuples, relpages -- (DEBUG INFO)
FROM (
  SELECT 
      coalesce(1 +
         ceil(reltuples/greatest(floor(greatest(bs-pageopqdata-pagehdr,1)/(4+nulldatahdrwidth)::float), 1)), 0 -- ItemIdData size + computed avg size of a tuple (nulldatahdrwidth)
      ) AS est_pages,
      coalesce(1 +
         ceil(reltuples/greatest(floor(greatest(bs-pageopqdata-pagehdr,1)*fillfactor/(100*(4+nulldatahdrwidth)::float)), 1)), 0
      ) AS est_pages_ff,
      bs, nspname, tblname, idxname, relpages, fillfactor, is_na
  FROM (
      SELECT maxalign, bs, nspname, tblname, idxname, reltuples, relpages, idxoid, fillfactor,
            ( index_tuple_hdr_bm +
                maxalign - CASE -- Add padding to the index tuple header to align on MAXALIGN
                  WHEN index_tuple_hdr_bm%maxalign = 0 THEN maxalign
                  ELSE index_tuple_hdr_bm%maxalign
                END
              + nulldatawidth + maxalign - CASE -- Add padding to the data to align on MAXALIGN
                  WHEN nulldatawidth = 0 THEN 0
                  WHEN nulldatawidth::integer%maxalign = 0 THEN maxalign
                  ELSE nulldatawidth::integer%maxalign
                END
            )::numeric AS nulldatahdrwidth, pagehdr, pageopqdata, is_na
            -- , index_tuple_hdr_bm, nulldatawidth -- (DEBUG INFO)
      FROM (
          SELECT n.nspname, i.tblname, i.idxname, i.reltuples, i.relpages,
              i.idxoid, i.fillfactor, current_setting('block_size')::numeric AS bs,
              CASE -- MAXALIGN: 4 on 32bits, 8 on 64bits (and mingw32 ?)
                WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8
                ELSE 4
              END AS maxalign,
              /* per page header, fixed size: 20 for 7.X, 24 for others */
              24 AS pagehdr,
              /* per page btree opaque data */
              16 AS pageopqdata,
              /* per tuple header: add IndexAttributeBitMapData if some cols are null-able */
              CASE WHEN max(coalesce(s.null_frac,0)) = 0
                  THEN 2 -- IndexTupleData size
                  ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
              END AS index_tuple_hdr_bm,
              /* data len: we remove null values save space using it fractionnal part from stats */
              sum( (1-coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 1024)) AS nulldatawidth,
              max( CASE WHEN i.atttypid = 'pg_catalog.name'::regtype THEN 1 ELSE 0 END ) > 0 AS is_na
          FROM (
              SELECT ct.relname AS tblname, ct.relnamespace, ic.idxname, ic.attpos, ic.indkey, ic.indkey[ic.attpos], ic.reltuples, ic.relpages, ic.tbloid, ic.idxoid, ic.fillfactor,
                  coalesce(a1.attnum, a2.attnum) AS attnum, coalesce(a1.attname, a2.attname) AS attname, coalesce(a1.atttypid, a2.atttypid) AS atttypid,
                  CASE WHEN a1.attnum IS NULL
                  THEN ic.idxname
                  ELSE ct.relname
                  END AS attrelname
              FROM (
                  SELECT idxname, reltuples, relpages, tbloid, idxoid, fillfactor, indkey,
                      pg_catalog.generate_series(1,indnatts) AS attpos
                  FROM (
                      SELECT ci.relname AS idxname, ci.reltuples, ci.relpages, i.indrelid AS tbloid,
                          i.indexrelid AS idxoid,
                          coalesce(substring(
                              array_to_string(ci.reloptions, ' ')
                              from 'fillfactor=([0-9]+)')::smallint, 90) AS fillfactor,
                          i.indnatts,
                          pg_catalog.string_to_array(pg_catalog.textin(
                              pg_catalog.int2vectorout(i.indkey)),' ')::int[] AS indkey
                      FROM pg_catalog.pg_index i
                      JOIN pg_catalog.pg_class ci ON ci.oid = i.indexrelid
                      WHERE ci.relam=(SELECT oid FROM pg_am WHERE amname = 'btree')
                      AND ci.relpages > 0
                  ) AS idx_data
              ) AS ic
              JOIN pg_catalog.pg_class ct ON ct.oid = ic.tbloid
              LEFT JOIN pg_catalog.pg_attribute a1 ON
                  ic.indkey[ic.attpos] <> 0
                  AND a1.attrelid = ic.tbloid
                  AND a1.attnum = ic.indkey[ic.attpos]
              LEFT JOIN pg_catalog.pg_attribute a2 ON
                  ic.indkey[ic.attpos] = 0
                  AND a2.attrelid = ic.idxoid
                  AND a2.attnum = ic.attpos
            ) i
            JOIN pg_catalog.pg_namespace n ON n.oid = i.relnamespace
             left JOIN pg_catalog.pg_stats s ON s.schemaname = n.nspname
                                      AND s.tablename = i.attrelname
                                      AND s.attname = i.attname
            GROUP BY 1,2,3,4,5,6,7,8,9,10,11
      ) AS rows_data_stats
  ) AS rows_hdr_pdg_stats
) AS relation_stats
--ORDER BY nspname, tblname, idxname;
order by real_size desc

Use live_tuples instead of reltuples

See discussion: https://www.postgresql.org/message-id/flat/16db4468-edfa-830a-f921-39a50498e77e%402ndquadrant.com#[email protected]

B-tree bloat estimation in PostgreSQL 13

Hi there,
I've faced with an issue while trying to estimate b-tree indexes bloat on newest PostgreSQL 13.
Old sql-queries gives irrelevant result on the same database while with PG 12 they work fine.
Could you please tell me do you have any plans to maintain compatibility with PG 13?

Thank you in advance

Toasted text

As discussed today on daliconsultants:

In pg_stats, toasted data is exactly 18 bytes when data is totally toasted. So the query computes that the bloat is higher than real.

Perhaps is it possible to exclude text columns with 18.0 bytes and an extended/external storage ? That would not solve the case for columns with very different sizes, where avg_width is a mix of 18 and real sizes.

Column says "ratio" but actually reports percentage

The extra_ratio and bloat_ratio columns appear to not be ratios or proportions but rather percentages.

When fillfactor is 100%, bloat size is a negative value

according to the following line,
"Line 24 : FROM '%fillfactor=#"__#"%' FOR '#')::smallint, 100) AS fillfactor,",
It returns 10 when the fillfactor's value is 100.
So, there is a case where bloat size has negative value.
I wonder why you did not parse it to three digits.
Thanks.

Why `IndexTupleData` size is 2

Hi, I noticed when you compute the b-tree bloat, the IndexTupleData size is 2. But I couldn't understand why the size of t_tid + t_info is 2 after I read the source code the 35th line in src/include/access/itup.h?

your sql:

 CASE WHEN max(coalesce(s.stanullfrac,0)) = 0
                  THEN 2 -- IndexTupleData size
                  ELSE 2 + (( 32 + 8 - 1 ) / 8) -- IndexTupleData size + IndexAttributeBitMapData size ( max num filed per index + 8 - 1 /8)
              END AS index_tuple_hdr_bm,

the source code:

typedef struct IndexTupleData
{
	ItemPointerData t_tid;		/* reference TID to heap tuple */

	/* ---------------
	 * t_info is laid out in the following fashion:
	 *
	 * 15th (high) bit: has nulls
	 * 14th bit: has var-width attributes
	 * 13th bit: AM-defined meaning
	 * 12-0 bit: size of tuple
	 * ---------------
	 */

	unsigned short t_info;		/* various info about tuple*/

} IndexTupleData;				/* MORE DATA FOLLOWS AT END OF STRUCT */

Deduplicated B-tree

I think(?) these queries don't account for B-tree deduplication in PostgreSQL 13+.

That would lead to an underestimate of bloat for indices with duplicated values.

Question : is_na column for jsonb in postgreSQL 9.5

Hi @ioguix ,
Thank you for this useful tools.

I run table/table_bloat.sql against RDS Postgres instance running postgres 9.5.15.

Reading the output, only pg_catalog.pg_description table have is_na value f

Reading the README I found

any table which lacks some statistics. For example, the json type had no statistics available before 9.5

Checking my table schema I found that one of the column use jsonb type.

I'm confused is the is_na value in ouput is correct because my table use jsonb data type so the calculation is not correct and cannot be used as reference or the is_na value is incorrect since I use postgresql 9.5.15.

Thank you in advance.

Alignement issues

As far as I understand the query, alignement is not taken care of. It is probably a bit far of the actual concern of the query but it might prove handy to tell users: such columns ordering means you lose that much space on disk because of alignment.

Why do you assume 4 tuples per TOAST page?

Where do you get that logic from? Trying to understand the calculations ...

exclude table with unavailable stats

Some type has no avg length in stats, eg. json.

Mark table where some stats are missing as N/A.

`real_size` in `pgsql-bloat-estimation/btree/btree_bloat.sql`

according to the definition of real_size: the real size of the index. the real_size is one-page size * the number of pages, but the 4th code, real_size = bs*(relpages). the relpages is Size of the on-disk representation of this table in pages (of size BLCKSZ), bs is also the size.

bloat_ratio isn't well-documented

I'm new to this stuff. The docs say

bloat_ratio: estimated ratio of the real size used by bloat_size.

Should this read "percentage of the real table size estimated to be bloat"? So a value of 50.2 means "50.2%"?

(thanks for sharing these queries!)

Table bloat query failing on PG 12

Hi, currently table bloat query breaks due to removal of pg_class.relhasoids.
I think best would be to just remove this part as it's been a deprecated feature for long time so only precision for system catalogs could suffer a bit, but they should be small anyways.

`table_bloat.sql` might fail with division by zero

metric table_bloat.sql fails with division by zero

Details: cybertec-postgresql/pgwatch2#468

Big bloat discrepencies compared to pgstattuple

Using the table bloat checker here I get this on my database (PG 14):

 current_database | schemaname |          tblname           |  real_size  | extra_size  |     extra_pct      | fillfactor | bloat_size  |     bloat_pct      | is_na
------------------+------------+----------------------------+-------------+-------------+--------------------+------------+-------------+--------------------+-------
 cc_perf          | cc_merge   | ccx_ex_contactdrpservices  | 29351370752 | 14890582016 | 50.732151972784294 |        100 | 14890582016 | 50.732151972784294 | f

But with pgstattuple I get this:
SELECT tuple_percent, dead_tuple_percent, free_percent FROM pgstattuple('cc_merge.ccx_ex_contactdrpservices');

 tuple_percent | dead_tuple_percent | free_percent
---------------+--------------------+--------------
         94.65 |                  0 |         3.68

Why the big discrepency?