Code Monkey home page Code Monkey logo

zson's Introduction

ZSON

ZSON Logo

About

ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents (not only keys, but also values, array elements, etc).

In some cases ZSON can save half of your disk space and give you about 10% more TPS. Memory is saved as well. See docs/benchmark.md. Everything depends on your data and workload, though. Don't believe any benchmarks, re-check everything on your data, configuration, hardware, workload and PostgreSQL version.

ZSON was originally created in 2016 by Postgres Professional team: researched and coded by Aleksander Alekseev; ideas, code review, testing, etc by Alexander Korotkov and Teodor Sigaev.

See also discussions on pgsql-general@, Hacker News, Reddit and HabraHabr.

Install

Build and install ZSON:

cd /path/to/zson/source/code
make
sudo make install

Run tests:

make installcheck

Connect to PostgreSQL:

psql my_database

Enable extension:

create extension zson;

Uninstall

Disable extension:

drop extension zson;

Uninstall ZSON:

cd /path/to/zson/source/code
sudo make uninstall

Usage

First ZSON should be trained on common data using zson_learn procedure:

zson_learn(
    tables_and_columns text[][],
    max_examples int default 10000,
    min_length int default 2,
    max_length int default 128,
    min_count int default 2
)

Example:

select zson_learn('{{"table1", "col1"}, {"table2", "col2"}}');

You can create a temporary table and write some common JSONB documents into it manually or use the existing tables. The idea is to provide a subset of real data. Let's say some document type is twice as frequent as another document type. ZSON expects that there will be twice as many documents of the first type as those of the second one in a learning set.

Resulting dictionary could be examined using this query:

select * from zson_dict;

Now ZSON type could be used as a complete and transparent replacement of JSONB type:

zson_test=# create table zson_example(x zson);
CREATE TABLE

zson_test=# insert into zson_example values ('{"aaa": 123}');
INSERT 0 1

zson_test=# select x -> 'aaa' from zson_example;
-[ RECORD 1 ]-
?column? | 123

Migrating to a new dictionary

When a schema of JSONB documents evolves ZSON could be re-learned:

select zson_learn('{{"table1", "col1"}, {"table2", "col2"}}');

This time second dictionary will be created. Dictionaries are cached in memory so it will take about a minute before ZSON realizes that there is a new dictionary. After that old documents will be decompressed using the old dictionary and new documents will be compressed and decompressed using the new dictionary.

To find out which dictionary is used for a given ZSON document use zson_info procedure:

zson_test=# select zson_info(x) from test_compress where id = 1;
-[ RECORD 1 ]---------------------------------------------------
zson_info | zson version = 0, dict version = 1, ...

zson_test=# select zson_info(x) from test_compress where id = 2;
-[ RECORD 1 ]---------------------------------------------------
zson_info | zson version = 0, dict version = 0, ...

If all ZSON documents are migrated to the new dictionary the old one could be safely removed:

delete from zson_dict where dict_id = 0;

In general, it's safer to keep old dictionaries just in case. Gaining a few KB of disk space is not worth the risk of losing data.

When it's a time to re-learn?

Unfortunately, it's hard to recommend a general approach.

A good heuristic could be:

select pg_table_size('tt') / (select count(*) from tt)

... i.e. average document size. When it suddenly starts to grow it's time to re-learn.

However, developers usually know when they change a schema significantly. It's also easy to re-check whether the current schema differs a lot from the original one using zson_dict table.

Known limitations

Intalling ZSON in a schema other than public is not supported (i.e. CREATE EXTENSION zson WITH SCHEMA ...).

zson's People

Contributors

afiskon avatar arssher avatar ildus avatar ligurio avatar madislohmus avatar olshevskiy87 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zson's Issues

pg_config: command not found and missing magic block

Hi,
When postgresql compiled from source and installed into default dirs pg_config not accessible from PATH. My suggestion below. One can easily change pg_config location in Makefile and there won't be no warnings.
I suggest to modify Makefile:

  1. Instead of
    PGXS := $(shell pg_config --pgxs)
    do
    PG_CONFIG = pg_config
    PGXS := $(shell $(PG_CONFIG) --pgxs)
  2. Add OBJS = zson.o. I don't understand why it's removed. Without it i can compile extension but get error when try to install it
    incompatible library "/usr/local/pgsql/lib/zson.so": missing magic block

pg_dump doesnt work

pg_dump -Fc -v bittrex > newtry3
pg_dump: last built-in OID is 16383
pg_dump: reading extensions
pg_dump: identifying extension members
pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading user-defined functions
pg_dump: reading user-defined types
pg_dump: reading procedural languages
pg_dump: reading user-defined aggregate functions
pg_dump: reading user-defined operators
pg_dump: reading user-defined access methods
pg_dump: reading user-defined operator classes
pg_dump: reading user-defined operator families
pg_dump: reading user-defined text search parsers
pg_dump: reading user-defined text search templates
pg_dump: reading user-defined text search dictionaries
pg_dump: reading user-defined text search configurations
pg_dump: reading user-defined foreign-data wrappers
pg_dump: reading user-defined foreign servers
pg_dump: reading default privileges
pg_dump: reading user-defined collations
pg_dump: reading user-defined conversions
pg_dump: reading type casts
pg_dump: reading transforms
pg_dump: reading table inheritance information
pg_dump: reading event triggers
pg_dump: finding extension tables
pg_dump: finding inheritance relationships
pg_dump: reading column info for interesting tables
pg_dump: finding the columns and types of table "public.test"
pg_dump: finding the columns and types of table "public.zson_dict"
pg_dump: finding default expressions of table "public.zson_dict"
pg_dump: flagging inherited columns in subtables
pg_dump: reading indexes
pg_dump: reading indexes for table "public.zson_dict"
pg_dump: flagging indexes in partitioned tables
pg_dump: reading extended statistics
pg_dump: reading constraints
pg_dump: reading triggers
pg_dump: reading rewrite rules
pg_dump: reading policies
pg_dump: reading row security enabled for table "public.test"
pg_dump: reading policies for table "public.test"
pg_dump: reading row security enabled for table "public.zson_dict_dict_id_seq"
pg_dump: reading policies for table "public.zson_dict_dict_id_seq"
pg_dump: reading row security enabled for table "public.zson_dict"
pg_dump: reading policies for table "public.zson_dict"
pg_dump: reading publications
pg_dump: reading publication membership
pg_dump: reading publication membership for table "public.test"
pg_dump: reading subscriptions
pg_dump: reading large objects
pg_dump: reading dependency data
pg_dump: saving encoding = SQL_ASCII
pg_dump: saving standard_conforming_strings = on
pg_dump: saving search_path =
pg_dump: saving database definition
pg_dump: dumping contents of table "public.test"
pg_dump: Dumping the contents of table "test" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: relation "zson_dict" does not exist
LINE 1: select word_id, word from zson_dict where dict_id = $1 order...
^
QUERY: select word_id, word from zson_dict where dict_id = $1 order by word
pg_dump: The command was: COPY public.test (id, datetime, bid, ask, last, volume, orderbook) TO stdout;
<

pg_dump doesnt work with postgres 11.1 and zson.

Actual space consumption higher despite repetitive nature

Hello.
I am trying to evaluate ZSON to store some metadata (as one of many columns in PG table), that should be very well suited for ZSON

  • table has 200k rows
  • json column uses 50MB of disk space, average size is 327 (some are null)
  • dev environment - data are very repetitive (see later), i'm not fiddling with prod ofc.

Steps:

  1. Performed steps from Install section
  2. Ran zson_learn with defaults on a jsonb column of a table. The whole dictionary consists of 28 rows.
  3. Altered table to add a column of zson type
  4. Ran update table, to set zson_column to be jsonb_to_zson(original_column)

The problem is, that according to pg_column_size, the zson column is 7% larger compared to jsonb type.
Also, if I run zson_info, the result is no different
zson version = 0, dict version = 0, jsonb size = 327, zson size (without pglz compression) = 349 (106.73%)
zson version = 0, dict version = 0, jsonb size = 324, zson size (without pglz compression) = 362 (111.73%)
zson version = 0, dict version = 0, jsonb size = 331, zson size (without pglz compression) = 349 (105.44%)

The json looks lite this:
{ "variables": [ { "type": "ComplexValue", "variableCode": "oneOfVeryFewKeyTypes", "variableValue": [ { "type": "StringValue", "variableCode": "otherOfVeryFewKeyTypes", "variableValue": "oneOf20Values" }, { "type": "StringValue", "variableCode": "otherOfVeryFewKeyTypes", "variableValue": "anotherOf20Values" } ] } ], "schemaVersion": "v4" }

What I am doing wrong?
Thanks in advance.

Extension does not load on Postgres 12

postgres=# CREATE EXTENSION zson;
ERROR:  could not load library "/usr/lib/postgresql/12/lib/zson.so": /usr/lib/postgresql/12/lib/zson.so: undefined symbol: PG_RETURN_JSONB

I believe PG_RETURN_JSONB needs to be switched out to PG_RETURN_JSONB_P, and PG_GETARG_JSONB switched out for PG_GETARG_JSONB_P (I see a similar change made in another extension)

Extension is not relocatable

SQL needs to reference @extschema@ to be relocatable. Need to update references to table zson_dict in C. Currently zson assumes all calls are made within the same schema.

Anything like this for just text?

I'm just wondering if there is anything like this with regard to the column but just for text. That is, capable of compression across different rows with the same value?

JSON schema validation

Does zson work with any json schema validation extension, like is_jsonb_valid or postgres_json_schema or some other extension? (preferably the former)

Is this only possible by converting to jsonb? If yes, does it makes sense for this extension to also provide a validation function?

PostgreSQL 13 support

Hey guys! Do you plan to support the project? For example, the extension works with PostgreSQL up to 11 version, but current is already 13.

Create extension fails

Using Docker image postgres:10.8. I install it this way.

apt-get update
apt-get install -y gcc git vim postgresql-server-dev-all postgresql-common
# This rm is required so that pg_config doesn't refer to the wrong directory.
rm -f -r /usr/lib/postgresql/11 
cd /tmp
git clone https://github.com/postgrespro/zson.git zson
cd zson
make
make install
make installcheck

installcheck fails because it's trying to connect using sockets. This is probably OK? However, when I psql to the server and try to create the extension I get this:

postgres=# create extension zson;
ERROR:  could not open extension control file "/usr/local/share/postgresql/extension/zson.control": No such file or directory

When I search the file, this is where I find it

root@d76a8502d69c:/tmp/zson# find / -name zson.control
/tmp/zson/zson.control
/usr/share/postgresql/10/extension/zson.control

Usage example doesn't work

Hiiii, I installed this extension on pg15 and tried the usage example given in the README. I encountered the following error.... not sure if I missed something important or it is that ZSON doesn't work on 15.

tsitamani=# create table zson_example(x zson);
CREATE TABLE

tsitamani=# select * from zson_example ;
 x
---
(0 rows)


tsitamani=# insert into zson_example values ('{"aaa": 123}');
ERROR:  Unable to compress jsonb
LINE 1: insert into zson_example values ('{"aaa": 123}');
                                         ^
DETAIL:  zson_dict is not initialized
HINT:  You probably forgot to execute zson_learn(). In case you are restoring from a backup made via pg_dump, just move the content of zson_dict table above the current line.

Seeing the HINT, I then execute zson_learn, but that doesn't help either, which make sense because the example table is still empty.

tsitamani=# select zson_learn('{{"zson_example", "x"}}');
                                  zson_learn
-------------------------------------------------------------------------------
 Done! Run " select * from zson_dict where dict_id = 0; " to see a dictionary.
(1 row)


tsitamani=# insert into zson_example values ('{"aaa": 123}');
ERROR:  Unable to compress jsonb
LINE 1: insert into zson_example values ('{"aaa": 123}');
                                         ^
DETAIL:  zson_dict is not initialized
HINT:  You probably forgot to execute zson_learn(). In case you are restoring from a backup made via pg_dump, just move the content of zson_dict table above the current line.

Windows installation?

How are we supposed to install this extension under Windows since we cannot use make?

Create Git tags and GitHub releases

Please create create Git tags and GitHub releases to allow subscribers of services like sibbell.com to be automatically notified.

Thank you very much for this great extension, transparent JSONB compression should have been built-in from the beginning, another milestone to make Postgres the most powerful and fastest NoSQL datastore for many use cases, kudos!

pg_dump fails on PG 15.2

pg_dump: error: Dumping the contents of table "test_compress" failed: PQgetResult() failed.
pg_dump: detail: Error message from server: ERROR:  relation "zson_dict" does not exist
LINE 1: select word_id, word from zson_dict where dict_id = $1 order...
                                  ^
QUERY:  select word_id, word from zson_dict where dict_id = $1 order by word
pg_dump: detail: Command was: COPY public.test_compress (id, x) TO stdout;

per report by @Object905

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.