Code Monkey home page Code Monkey logo

rum's Introduction

Build Status PGXN version GitHub license

Postgres Professional

RUM - RUM access method

Introduction

The rum module provides an access method to work with a RUM index. It is based on the GIN access method's code.

A GIN index allows performing fast full-text search using tsvector and tsquery types. But full-text search with a GIN index has several problems:

  • Slow ranking. It needs positional information about lexemes to do ranking. A GIN index doesn't store positions of lexemes. So after index scanning, we need an additional heap scan to retrieve lexeme positions.
  • Slow phrase search with a GIN index. This problem relates to the previous problem. It needs positional information to perform phrase search.
  • Slow ordering by timestamp. A GIN index can't store some related information in the index with lexemes. So it is necessary to perform an additional heap scan.

RUM solves these problems by storing additional information in a posting tree. For example, positional information of lexemes or timestamps. You can get an idea of RUM with the following diagram:

How RUM stores additional information

A drawback of RUM is that it has slower build and insert times than GIN. This is because we need to store additional information besides keys and because RUM uses generic Write-Ahead Log (WAL) records.

License

This module is available under the license similar to PostgreSQL.

Installation

Before building and installing rum, you should ensure following are installed:

  • PostgreSQL version is 9.6+.

Typical installation procedure may look like this:

Using GitHub repository

$ git clone https://github.com/postgrespro/rum
$ cd rum
$ make USE_PGXS=1
$ make USE_PGXS=1 install
$ make USE_PGXS=1 installcheck
$ psql DB -c "CREATE EXTENSION rum;"

Using PGXN

$ USE_PGXS=1 pgxn install rum

Important: Don't forget to set the PG_CONFIG variable in case you want to test RUM on a custom build of PostgreSQL. Read more here.

Common operators and functions

The rum module provides next operators.

Operator Returns Description
tsvector <=> tsquery float4 Returns distance between tsvector and tsquery.
timestamp <=> timestamp float8 Returns distance between two timestamps.
timestamp <=| timestamp float8 Returns distance only for left timestamps.
timestamp |=> timestamp float8 Returns distance only for right timestamps.

The last three operations also work for types timestamptz, int2, int4, int8, float4, float8, money and oid.

Operator classes

rum provides the following operator classes.

rum_tsvector_ops

For type: tsvector

This operator class stores tsvector lexemes with positional information. It supports ordering by the <=> operator and prefix search. See the example below.

Let us assume we have the table:

CREATE TABLE test_rum(t text, a tsvector);

CREATE TRIGGER tsvectorupdate
BEFORE UPDATE OR INSERT ON test_rum
FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');

INSERT INTO test_rum(t) VALUES ('The situation is most beautiful');
INSERT INTO test_rum(t) VALUES ('It is a beautiful');
INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');

To create the rum index we need create an extension:

CREATE EXTENSION rum;

Then we can create new index:

CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);

And we can execute the following queries:

SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
    FROM test_rum
    WHERE a @@ to_tsquery('english', 'beautiful | place')
    ORDER BY a <=> to_tsquery('english', 'beautiful | place');
                t                |  rank
---------------------------------+---------
 It looks like a beautiful place | 8.22467
 The situation is most beautiful | 16.4493
 It is a beautiful               | 16.4493
(3 rows)

SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
    FROM test_rum
    WHERE a @@ to_tsquery('english', 'place | situation')
    ORDER BY a <=> to_tsquery('english', 'place | situation');
                t                |  rank
---------------------------------+---------
 The situation is most beautiful | 16.4493
 It looks like a beautiful place | 16.4493
(2 rows)

rum_tsvector_hash_ops

For type: tsvector

This operator class stores a hash of tsvector lexemes with positional information. It supports ordering by the <=> operator. It doesn't support prefix search.

rum_TYPE_ops

For types: int2, int4, int8, float4, float8, money, oid, time, timetz, date, interval, macaddr, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, timestamp, timestamptz

Supported operations: <, <=, =, >=, > for all types and <=>, <=| and |=> for int2, int4, int8, float4, float8, money, oid, timestamp and timestamptz types.

This operator supports ordering by the <=>, <=| and |=> operators. It can be used with rum_tsvector_addon_ops, rum_tsvector_hash_addon_ops' and rum_anyarray_addon_ops` operator classes.

rum_tsvector_addon_ops

For type: tsvector

This operator class stores tsvector lexemes with any supported by module field. See the example below.

Let us assume we have the table:

CREATE TABLE tsts (id int, t tsvector, d timestamp);

\copy tsts from 'rum/data/tsts.data'

CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d)
    WITH (attach = 'd', to = 't');

Now we can execute the following queries:

EXPLAIN (costs off)
    SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit
   ->  Index Scan using tsts_idx on tsts
         Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
         Order By: (d <=> 'Mon May 16 14:21:25 2016'::timestamp without time zone)
(4 rows)

SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
 id  |                d                |   ?column?
-----+---------------------------------+---------------
 355 | Mon May 16 14:21:22.326724 2016 |      2.673276
 354 | Mon May 16 13:21:22.326724 2016 |   3602.673276
 371 | Tue May 17 06:21:22.326724 2016 |  57597.326724
 406 | Wed May 18 17:21:22.326724 2016 | 183597.326724
 415 | Thu May 19 02:21:22.326724 2016 | 215997.326724
(5 rows)

Warning: Currently RUM has bogus behaviour when one creates an index using ordering over pass-by-reference additional information. This is due to the fact that posting trees have fixed length right bound and fixed length non-leaf posting items. It isn't allowed to create such indexes.

rum_tsvector_hash_addon_ops

For type: tsvector

This operator class stores a hash of tsvector lexemes with any supported by module field.

It doesn't support prefix search.

rum_tsquery_ops

For type: tsquery

It stores branches of query tree in additional information. For example, we have the table:

CREATE TABLE query (q tsquery, tag text);

INSERT INTO query VALUES ('supernova & star', 'sn'),
    ('black', 'color'),
    ('big & bang & black & hole', 'bang'),
    ('spiral & galaxy', 'shape'),
    ('black & hole', 'color');

CREATE INDEX query_idx ON query USING rum(q);

Now we can execute the following fast query:

SELECT * FROM query
    WHERE to_tsvector('black holes never exists before we think about them') @@ q;
        q         |  tag
------------------+-------
 'black'          | color
 'black' & 'hole' | color
(2 rows)

rum_anyarray_ops

For type: anyarray

This operator class stores anyarray elements with length of the array. It supports operators &&, @>, <@, =, % operators. It also supports ordering by <=> operator. For example, we have the table:

CREATE TABLE test_array (i int2[]);

INSERT INTO test_array VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}');

CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);

Now we can execute the query using index scan:

SET enable_seqscan TO off;

EXPLAIN (COSTS OFF) SELECT * FROM test_array WHERE i && '{1}' ORDER BY i <=> '{1}' ASC;
                QUERY PLAN
------------------------------------------
 Index Scan using idx_array on test_array
   Index Cond: (i && '{1}'::smallint[])
   Order By: (i <=> '{1}'::smallint[])
(3 rows

SELECT * FROM test_array WHERE i && '{1}' ORDER BY i <=> '{1}' ASC;
     i
-----------
 {1}
 {1,2}
 {1,2,3}
 {1,2,3,4}
(4 rows)

rum_anyarray_addon_ops

For type: anyarray

This operator class stores anyarray elements with any supported by module field.

Todo

  • Allow multiple additional information (lexemes positions + timestamp).
  • Improve ranking function to support TF/IDF.
  • Improve insert time.
  • Improve GENERIC WAL to support shift (PostgreSQL core changes).

Authors

Alexander Korotkov [email protected] Postgres Professional Ltd., Russia

Oleg Bartunov [email protected] Postgres Professional Ltd., Russia

Teodor Sigaev [email protected] Postgres Professional Ltd., Russia

Arthur Zakirov [email protected] Postgres Professional Ltd., Russia

Pavel Borisov [email protected] Postgres Professional Ltd., Russia

Maxim Orlov [email protected] Postgres Professional Ltd., Russia

rum's People

Contributors

0xflotus avatar anna-akenteva avatar artur-kink avatar df7cb avatar feodor avatar funbringer avatar funny-falcon avatar kovdb75 avatar kulaginm avatar maksm90 avatar pashkinelfe avatar shadforth avatar sokolcati avatar vbwagner avatar za-arthur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rum's Issues

Not seeing huge performance gain - could be my query?

Postgresql 9.6.4
Windows Server 2016 with 32GB ram / SSD

160K hits, Rum index query time: 80+ seconds

When querying a table partition with 500K rows for a search term that returns 160K hits I am not seeing a large performance increase using rum tsvector <=> vs gin ts_rank. tsvector column includes weights. Each document is pretty large, similar to Wikipedia. They are all legal documents.

Judging by explain (analyze, buffers) output, looks like majority of time is reading and not sorting. Could this be why I am not seeing performance gain? Also, output shows there's still a bitmap heap scan performed when searching the RUM index.

I clear shared buffers and restart Postgresql (thanks https://stackoverflow.com/a/43186594/812610) so there's nothing in memory.

Appreciate any help and thanks for putting RUM on github!

RUM tsvector column: tsv_natural_rum

Index: CREATE INDEX code_docs_fl_tsv_natural_rum_idx on code_docs_fl using rum (tsv_natural_rum rum_tsvector_ops)

Query:

SELECT id, node_id, 
            ts_headline('mcc_natural_config', title, query, 'highlightall=true') title, 
	    ts_headline('mcc_natural_config', content, query, 'maxfragments=1') fragment, 
            ancestors, product_id, product_name, client_name, state_abbr, 
 	    0 AS full_count, --window function for getting full count here but removed it for simplicity
            tsv_natural_rum <=> query AS rank 
FROM code_docs_fl, 
           (select ts_rewrite(plainto_tsquery('mcc_natural_config',  'code'), 'SELECT target, sub FROM 
                aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target') query) query
WHERE tsv_natural_rum @@ query
ORDER BY tsv_natural_rum <=> query
LIMIT 10 OFFSET 0

Result:

Count: 159,613
Limit  (cost=9279.80..9284.95 rows=10 width=435) (actual time=81780.797..82289.912 rows=10 loops=1)
  Buffers: shared hit=314429 read=190532
  ->  Result  (cost=9279.80..10576.57 rows=2518 width=435) (actual time=81780.786..82289.862 rows=10 loops=1)
        Buffers: shared hit=314429 read=190532
        ->  Sort  (cost=9279.80..9286.10 rows=2518 width=829) (actual time=81764.315..81764.391 rows=10 loops=1)
              Sort Key: ((code_docs_fl.tsv_natural_rum <=> (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text))))
              Sort Method: top-N heapsort  Memory: 35kB
              Buffers: shared hit=314405 read=190481
              ->  Nested Loop  (cost=43.52..9225.39 rows=2518 width=829) (actual time=174.173..80714.142 rows=159613 loops=1)
                    Buffers: shared hit=314402 read=190481
                    ->  Result  (cost=0.00..0.26 rows=1 width=32) (actual time=2.055..2.057 rows=1 loops=1)
                          Buffers: shared hit=56 read=4
                    ->  Bitmap Heap Scan on code_docs_fl  (cost=43.52..9193.65 rows=2518 width=821) (actual time=163.860..18193.512 rows=159613 loops=1)
                          Recheck Cond: (tsv_natural_rum @@ (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)))
                          Heap Blocks: exact=68933
                          Buffers: shared hit=4 read=69085
                          ->  Bitmap Index Scan on code_docs_fl_tsv_natural_rum_idx  (cost=0.00..42.89 rows=2518 width=0) (actual time=125.595..125.595 rows=159613 loops=1)
                                Index Cond: (tsv_natural_rum @@ (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)))
                                Buffers: shared read=156
Planning time: 276.156 ms
Execution time: 82299.866 ms

Gin TS_Rank_CD tsvector column: tsv_natural (exact same contents as tsv_natural_rum)

Index: gin index on column tsv_natural

Query:

SELECT id, node_id, 
            ts_headline('mcc_natural_config', title, query, 'highlightall=true') title, 
	    ts_headline('mcc_natural_config', content, query, 'maxfragments=1') fragment, 
            ancestors, product_id, product_name, client_name, state_abbr, 
 	    0 AS full_count, --window function for getting full count here but removed it for simplicity
            ts_rank_cd(tsv_natural, query, 4|1) AS rank 
FROM code_docs_fl, 
           (select ts_rewrite(plainto_tsquery('mcc_natural_config',  'code'), 'SELECT target, sub FROM 
                aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target') query) query
WHERE tsv_natural @@ query
ORDER BY rank DESC
LIMIT 10 OFFSET 0

Result:

Count: 159,613
Limit  (cost=10959.80..10964.95 rows=10 width=435) (actual time=104597.190..104597.762 rows=10 loops=1)
  Buffers: shared hit=415488 read=255421 dirtied=11754 written=3715
  ->  Result  (cost=10959.80..12256.57 rows=2518 width=435) (actual time=104597.180..104597.728 rows=10 loops=1)
        Buffers: shared hit=415488 read=255421 dirtied=11754 written=3715
        ->  Sort  (cost=10959.80..10966.10 rows=2518 width=829) (actual time=104597.046..104597.061 rows=10 loops=1)
              Sort Key: (ts_rank_cd(code_docs_fl.tsv_natural, (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)), 5)) DESC
              Sort Method: top-N heapsort  Memory: 27kB
              Buffers: shared hit=415488 read=255421 dirtied=11754 written=3715
              ->  Nested Loop  (cost=1723.52..10905.39 rows=2518 width=829) (actual time=535.603..103393.362 rows=159613 loops=1)
                    Buffers: shared hit=415485 read=255421 dirtied=11754 written=3715
                    ->  Result  (cost=0.00..0.26 rows=1 width=32) (actual time=1.971..1.972 rows=1 loops=1)
                          Buffers: shared hit=55 read=5
                    ->  Bitmap Heap Scan on code_docs_fl  (cost=1723.52..10873.65 rows=2518 width=976) (actual time=525.526..25184.048 rows=159613 loops=1)
                          Recheck Cond: (tsv_natural @@ (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)))
                          Heap Blocks: exact=119510
                          Buffers: shared hit=1 read=120034 dirtied=7722 written=1293
                          ->  Bitmap Index Scan on code_docs_fl_tsv_natural_idx  (cost=0.00..1722.89 rows=2518 width=0) (actual time=450.870..450.870 rows=322447 loops=1)
                                Index Cond: (tsv_natural @@ (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)))
                                Buffers: shared hit=1 read=524
Planning time: 255.426 ms
Execution time: 104601.713 ms

make USE_PGXS=1 ends with error 1

Hi,
when I run "make USE_PGXS=1" it ends with "Error 1".

My env:

  • psql (PostgreSQL) 9.6.0,
  • gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-4),
  • centos-release-7-2.1511.el7.centos.2.10.x86_64

Here is the full text:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
-fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DLINUX_OOM_ADJ=0 -fpic -I. -I./ -I/usr/pgsql-9.6/include/server -I/usr/pgsql-9.6/include/internal -D_GNU_SOURCE -I/usr/include/lib
xml2  -I/usr/include  -c -o rumsort.o rumsort.c
In file included from /usr/pgsql-9.6/include/server/utils/probes.h:10:0,
                 from rumsort.c:136:
rumsort.c: In function ‘rum_tuplesort_begin_rum’:
rumsort.c:1160:9: error: ‘enforceUnique’ undeclared (first use in this function)
         enforceUnique,
         ^
rumsort.c:1159:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^
rumsort.c:1160:9: note: each undeclared identifier is reported only once for each function it appears in
         enforceUnique,
         ^
rumsort.c:1159:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^
/usr/pgsql-9.6/include/server/utils/probes.h:285:1: error: first argument to ‘__builtin_choose_expr’ not a constant
 DTRACE_PROBE5 (postgresql, sort__start, arg1, arg2, arg3, arg4, arg5)
 ^
rumsort.c:1159:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^
/usr/pgsql-9.6/include/server/utils/probes.h:285:1: error: first argument to ‘__builtin_choose_expr’ not a constant
 DTRACE_PROBE5 (postgresql, sort__start, arg1, arg2, arg3, arg4, arg5)
 ^
rumsort.c:1159:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^                                                                                                                                                                                                                                  [2/1925]
/usr/pgsql-9.6/include/server/utils/probes.h:285:1: error: first argument to ‘__builtin_choose_expr’ not a constant
 DTRACE_PROBE5 (postgresql, sort__start, arg1, arg2, arg3, arg4, arg5)
 ^
rumsort.c:1159:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^
rumsort.c: In function ‘rum_tuplesort_begin_rumkey’:
rumsort.c:1193:9: error: ‘enforceUnique’ undeclared (first use in this function)
         enforceUnique,
         ^
rumsort.c:1192:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^
/usr/pgsql-9.6/include/server/utils/probes.h:285:1: error: first argument to ‘__builtin_choose_expr’ not a constant
 DTRACE_PROBE5 (postgresql, sort__start, arg1, arg2, arg3, arg4, arg5)
 ^
rumsort.c:1192:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^
/usr/pgsql-9.6/include/server/utils/probes.h:285:1: error: first argument to ‘__builtin_choose_expr’ not a constant
 DTRACE_PROBE5 (postgresql, sort__start, arg1, arg2, arg3, arg4, arg5)
 ^
rumsort.c:1192:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^
/usr/pgsql-9.6/include/server/utils/probes.h:285:1: error: first argument to ‘__builtin_choose_expr’ not a constant
 DTRACE_PROBE5 (postgresql, sort__start, arg1, arg2, arg3, arg4, arg5)
 ^
rumsort.c:1192:2: note: in expansion of macro ‘TRACE_POSTGRESQL_SORT_START’
  TRACE_POSTGRESQL_SORT_START(INDEX_SORT,
  ^
make: *** [rumsort.o] Error 1

got errors when build by msys2

i have trying to build postgres and rum by msys2.

so i installed msys2 and clone newest postgres code and build it successfully.

but when i build rum, i got these errors:

src/rumutil.o:rumutil.c:(.text+0x267): undefined reference to __imp_RumFuzzySearchLimit' src/rumutil.o:rumutil.c:(.text+0x2ce): undefined reference to __imp_RumArraySimilarityThreshold'
src/rumutil.o:rumutil.c:(.text+0x337): undefined reference to `__imp_RumArraySimilarityFunction'
collect2.exe: error: ld returned 1 exit status

Did i miss something?

Can't install without USE_PGXN=1

Hello!
I have the following issue.

➜ ~ sudo pgxn install rum

INFO: best version: rum 1.1.0
INFO: saving /tmp/tmpYlffn2/rum-1.1.0.zip
INFO: unpacking: /tmp/tmpYlffn2/rum-1.1.0.zip
INFO: building extension
Makefile:31: ../../src/Makefile.global: No such file or directory
Makefile:32: /contrib/contrib-global.mk: No such file or directory
make: *** Brak reguł do wykonania obiektu `/contrib/contrib-global.mk'. Stop.
ERROR: command returned 2: make PG_CONFIG=/usr/bin/pg_config all

My first try:

export USE_PGXS=1
sudo pgxn install rum  #The same problem.
pgxn install rum --sudo sudo  #No success

Finally this works

sudo su
export USE_PGXS=1
pgxn install rum

I think this hint should be in documentation.
You can also consider this point of view: http://blog.pgxn.org/post/20908326485/lose-use-pgxs

test rum failed with PostgreSQL 9.6 beta4

HI,
When i use make installcheck test rum, there is one failed.

make USE_PGXS=1  installcheck
/home/digoal/pgsql9.6/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=./ --bindir='/home/digoal/pgsql9.6/bin'    --dbname=contrib_regression rum ruminv timestamp orderby altorder
(using postmaster on /data04/digoal/pg_root1926, port 1926)
============== dropping database "contrib_regression" ==============
DROP DATABASE
============== creating database "contrib_regression" ==============
CREATE DATABASE
ALTER DATABASE
============== running regression test queries        ==============
test rum                      ... FAILED
test ruminv                   ... ok
test timestamp                ... ok
test orderby                  ... ok
test altorder                 ... ok

======================
 1 of 5 tests failed. 
======================

The differences that caused some tests to fail can be viewed in the
file "/home/digoal/rum/regression.diffs".  A copy of the test summary that you see
above is saved in the file "/home/digoal/rum/regression.out".

make: *** [installcheck] Error 1

diff:

cat /home/digoal/rum/regression.diffs
*** /home/digoal/rum/expected/rum.out   2016-08-20 20:17:37.685721686 +0800
--- /home/digoal/rum/results/rum.out    2016-08-20 20:20:45.365726520 +0800
***************
*** 123,141 ****
   rum_ts_distance |                                    t                                     |                               a                               
  -----------------+--------------------------------------------------------------------------+---------------------------------------------------------------
           16.4493 | my appreciation of you in a more complimentary way than by sending this  | 'appreci':2 'complimentari':8 'send':12 'way':9
!          16.4493 | itself. Put on your “specs” and look at the castle, half way up the      | 'castl':10 'half':11 'look':7 'put':2 'spec':5 'way':12
           16.4493 | so well that only a fragment, as it were, gave way. It still hangs as if | 'fragment':6 'gave':10 'hang':14 'still':13 'way':11 'well':2
!          16.4493 | thinking--“to go or not to go?” We are this far on the way. Reached      | 'far':11 'go':3,7 'reach':15 'think':1 'way':14
  (4 rows)

  SELECT rum_ts_distance(a, to_tsquery('pg_catalog.english', 'way & (go | half)')), *
        FROM test_rum
        WHERE a @@ to_tsquery('pg_catalog.english', 'way & (go | half)')
        ORDER BY a <=> to_tsquery('pg_catalog.english', 'way & (go | half)');
!  rum_ts_distance |                                  t                                  |                            a                            
! -----------------+---------------------------------------------------------------------+---------------------------------------------------------
!          8.22467 | itself. Put on your “specs” and look at the castle, half way up the | 'castl':10 'half':11 'look':7 'put':2 'spec':5 'way':12
!          57.5727 | thinking--“to go or not to go?” We are this far on the way. Reached | 'far':11 'go':3,7 'reach':15 'think':1 'way':14
  (2 rows)

  SELECT
--- 123,141 ----
   rum_ts_distance |                                    t                                     |                               a                               
  -----------------+--------------------------------------------------------------------------+---------------------------------------------------------------
           16.4493 | my appreciation of you in a more complimentary way than by sending this  | 'appreci':2 'complimentari':8 'send':12 'way':9
!          16.4493 | itself. Put on your “specs” and look at the castle, half way up the      | 'castl':10 'half':11 'look':7 'put':2 'way':12 '“specs”':5
           16.4493 | so well that only a fragment, as it were, gave way. It still hangs as if | 'fragment':6 'gave':10 'hang':14 'still':13 'way':11 'well':2
!          16.4493 | thinking--“to go or not to go?” We are this far on the way. Reached      | 'far':12 'go':3,7 'reach':16 'think':1 'way':15 '“to':2 '”':8
  (4 rows)

  SELECT rum_ts_distance(a, to_tsquery('pg_catalog.english', 'way & (go | half)')), *
        FROM test_rum
        WHERE a @@ to_tsquery('pg_catalog.english', 'way & (go | half)')
        ORDER BY a <=> to_tsquery('pg_catalog.english', 'way & (go | half)');
!  rum_ts_distance |                                  t                                  |                               a                               
! -----------------+---------------------------------------------------------------------+---------------------------------------------------------------
!          8.22467 | itself. Put on your “specs” and look at the castle, half way up the | 'castl':10 'half':11 'look':7 'put':2 'way':12 '“specs”':5
!          65.7974 | thinking--“to go or not to go?” We are this far on the way. Reached | 'far':12 'go':3,7 'reach':16 'think':1 'way':15 '“to':2 '”':8
  (2 rows)

  SELECT
***************
*** 144,153 ****
        *
        FROM test_rum
        ORDER BY a <=> to_tsquery('pg_catalog.english', 'way & (go | half)') limit 2;
!  ?column? | rum_ts_distance |                                  t                                  |                            a                            
! ----------+-----------------+---------------------------------------------------------------------+---------------------------------------------------------
!   8.22467 |         8.22467 | itself. Put on your “specs” and look at the castle, half way up the | 'castl':10 'half':11 'look':7 'put':2 'spec':5 'way':12
!   57.5727 |         57.5727 | thinking--“to go or not to go?” We are this far on the way. Reached | 'far':11 'go':3,7 'reach':15 'think':1 'way':14
  (2 rows)

  INSERT INTO test_rum (t) VALUES ('foo bar foo the over foo qq bar');
--- 144,153 ----
        *
        FROM test_rum
        ORDER BY a <=> to_tsquery('pg_catalog.english', 'way & (go | half)') limit 2;
!  ?column? | rum_ts_distance |                                  t                                  |                               a                               
! ----------+-----------------+---------------------------------------------------------------------+---------------------------------------------------------------
!   8.22467 |         8.22467 | itself. Put on your “specs” and look at the castle, half way up the | 'castl':10 'half':11 'look':7 'put':2 'way':12 '“specs”':5
!   65.7974 |         65.7974 | thinking--“to go or not to go?” We are this far on the way. Reached | 'far':12 'go':3,7 'reach':16 'think':1 'way':15 '“to':2 '”':8
  (2 rows)

  INSERT INTO test_rum (t) VALUES ('foo bar foo the over foo qq bar');

======================================================================

why rum use left link?

Hi:

when I read the rum source, I find that RUM add a left link in RumPageOpaqueData struct, it different from gin. why add this? because of ranking from index??

thanks.

joseph

postgresql-11-rum package

Hello!
How to use postgresql-11-rum package?

sudo apt-get install postgresql-11-rum

Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  postgresql-11-rum
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 315 kB of archives.
After this operation, 815 kB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt xenial-pgdg/main amd64 postgresql-11-rum amd64 1.3.0-1.pgdg16.04+1 [315 kB]
Fetched 315 kB in 1s (172 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 1.)
debconf: falling back to frontend: Readline
Selecting previously unselected package postgresql-11-rum.
(Reading database ... 39937 files and directories currently installed.)
Preparing to unpack .../postgresql-11-rum_1.3.0-1.pgdg16.04+1_amd64.deb ...
Unpacking postgresql-11-rum (1.3.0-1.pgdg16.04+1) ...
Processing triggers for postgresql-common (195.pgdg16.04+1) ...
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76.)
debconf: falling back to frontend: Readline
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:
Setting up postgresql-11-rum (1.3.0-1.pgdg16.04+1) 

and then:

psql -h /var/run/postgresql -c "CREATE EXTENSION IF NOT EXISTS rum WITH SCHEMA public;"
ERROR:  could not load library "/usr/lib/postgresql/11/lib/rum.so": /usr/lib/postgresql/11/lib/rum.so: undefined symbol: rb_insert

Combining ts_query AND IN (BIGINT[]) is ~4x slower than GIN

Hi.

Having this index:

create index origo_email_delivery_fts_all_folder_idx ON origo_email_delivery using gin (fts_all, folder_id)

And this query:

EXPLAIN ANALYZE
SELECT
    del.entity_id,
    del.received_timestamp,
    del.received_timestamp <=> '3000-01-01' :: TIMESTAMP,
    del.folder_id
FROM origo_email_delivery del
WHERE del.fts_all @@ to_tsquery('simple', 'andreas&joseph')
      AND del.folder_id IN (44961, 204483, 44965, 2470519)
ORDER BY del.received_timestamp <=> '3000-01-01' :: TIMESTAMP
LIMIT 10 offset 10000

Using GIN-index:

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                              QUERY PLAN                                                                               │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=32018.97..32018.97 rows=1 width=32) (actual time=56.675..56.676 rows=10 loops=1)                                                                         │
│   ->  Sort  (cost=31994.41..32018.97 rows=9822 width=32) (actual time=56.077..56.469 rows=10010 loops=1)                                                              │
│         Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone))                                                                       │
│         Sort Method: quicksort  Memory: 1541kB                                                                                                                        │
│         ->  Bitmap Heap Scan on origo_email_delivery del  (cost=404.67..31343.13 rows=9822 width=32) (actual time=35.911..53.027 rows=14806 loops=1)                  │
│               Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))                       │
│               Heap Blocks: exact=13043                                                                                                                                │
│               ->  Bitmap Index Scan on origo_email_delivery_fts_all_folder_idx  (cost=0.00..402.22 rows=9822 width=0) (actual time=34.493..34.493 rows=14806 loops=1) │
│                     Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))                   │
│ Planning time: 0.614 ms                                                                                                                                               │
│ Execution time: 56.726 ms                                                                                                                                             │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

And with RUM-index:

create index rum_idx ON origo_email_delivery using rum (fts_all rum_tsvector_addon_ops, folder_id, received_timestamp) WITH (attach=received_timestamp, "to"=fts_all)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                       QUERY PLAN                                                                       │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=32066.97..32066.97 rows=1 width=32) (actual time=217.899..217.900 rows=10 loops=1)                                                        │
│   ->  Sort  (cost=32042.41..32066.97 rows=9822 width=32) (actual time=217.300..217.688 rows=10010 loops=1)                                             │
│         Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone))                                                        │
│         Sort Method: quicksort  Memory: 1541kB                                                                                                         │
│         ->  Bitmap Heap Scan on origo_email_delivery del  (cost=452.67..31391.13 rows=9822 width=32) (actual time=196.797..214.216 rows=14806 loops=1) │
│               Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))        │
│               Heap Blocks: exact=13043                                                                                                                 │
│               ->  Bitmap Index Scan on rum_idx  (cost=0.00..450.22 rows=9822 width=0) (actual time=195.369..195.369 rows=14806 loops=1)                │
│                     Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))    │
│ Planning time: 0.721 ms                                                                                                                                │
│ Execution time: 217.969 ms                                                                                                                             │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

217 / 56 = 3,875 time slower. Can anything be done to speed this up?

Thanks.

server process was terminated when create tsquery index using rum

Hi,
I am trying to create tsquery index.
CREATE TABLE query (q tsquery, tag text);
insert into query (q,tag) values('(AA|BB|CC|DD|FF|GGGG)&(HH|II|JJ|KK|LL|MM|NN)&!(XXX|YYY|ZZZ|ABC)','sn');
create index query_idx on query using rum(q);
server process was terminated by signal 11: Segmentation fault.
server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.

When i am trying to insert
insert into query (q,tag) values('(AA|BB|CC|DD|FF|GGGG)&(HH|II|JJ|KK|LL|MM|NN)&!(XXX|YYY|ZZZ)','sn');
it works well.
My Postgresql version is 10.1 & 9.6.

clang warns about unused variable in rum_ts_utils.c

clang warns about the unused variable 'extra_data' at line 722 in rum_ts_utils.c. adding a void cast of extra_data will safely silence the warning.

(void)extra_data

or just remove the declaration of 'extra_data'.

Support indexing tsquery phrases

Currently indexing with rum fails, if the tsquery column or expression contains phrases:

ERROR: Indexing of phrase tsqueries isn't supported yet

It would be very useful if this feature was supported, since it would allow to quickly check if a phrase is contained in some text (for example to filter text containing blacklisted phrases).

bug, return duplicate tuples

9.6 beta4 , has a bug:

postgres=# insert into test15 values (to_tsvector('jiebacfg', 'hello china, i''m digoal')), (to_tsvector('jiebacfg', 'hello world, i''m postgresql')), (to_tsvector('jiebacfg', 'how are you, i''m digoal'));
INSERT 0 3
postgres=# select * from test15;
                         c1                          
-----------------------------------------------------
 ' ':2,5,9 'china':3 'digoal':10 'hello':1 'm':8
 ' ':2,5,9 'hello':1 'm':8 'postgresql':10 'world':3
 ' ':2,4,7,11 'digoal':12 'm':10
(3 rows)

postgres=# create index idx_test15 on test15 using rum(c1 rum_tsvector_ops);
CREATE INDEX
postgres=# select *,c1 <=> to_tsquery('hello') from test15;
                         c1                          | ?column? 
-----------------------------------------------------+----------
 ' ':2,5,9 'china':3 'digoal':10 'hello':1 'm':8     |  16.4493
 ' ':2,5,9 'hello':1 'm':8 'postgresql':10 'world':3 |  16.4493
 ' ':2,4,7,11 'digoal':12 'm':10                     | Infinity
(3 rows)

return duplicate tuples;
postgres=# select ctid,*,c1 <=> to_tsquery('hello') from test15 order by c1 <=> to_tsquery('hello');
 ctid  |                         c1                          | ?column? 
-------+-----------------------------------------------------+----------
 (0,1) | ' ':2,5,9 'china':3 'digoal':10 'hello':1 'm':8     |  16.4493
 (0,2) | ' ':2,5,9 'hello':1 'm':8 'postgresql':10 'world':3 |  16.4493
 (0,1) | ' ':2,5,9 'china':3 'digoal':10 'hello':1 'm':8     |  16.4493
 (0,2) | ' ':2,5,9 'hello':1 'm':8 'postgresql':10 'world':3 |  16.4493
 (0,3) | ' ':2,4,7,11 'digoal':12 'm':10                     | Infinity
(5 rows)

Ranks are above one

postgres=# CREATE TRIGGER tsvectorupdate                                                                                                                                                         
postgres-# BEFORE UPDATE OR INSERT ON test_rum                                                                                                                                                   
postgres-# FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');                                                                                               
postgres=# INSERT INTO test_rum(t) VALUES ('The situation is most beautiful');                                                                                                                   
INSERT 0 1                                                                                                                                                                                       
postgres=# INSERT INTO test_rum(t) VALUES ('It is a beautiful');                                                                                                                                 
INSERT 0 1                                                                                                                                                                                       
postgres=# INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');                                                                                                                   
INSERT 0 1                                                                                                                                                                                       
postgres=# select * from test_rum ;                                                                                                                                                              
postgres=# select * from test_rum ;                                                                                                                                                              
                t                |                   a                                                                                                                                           
---------------------------------+----------------------------------------                                                                                                                       
 The situation is most beautiful | 'beauti':5 'situat':2                                                                                                                                         
 It is a beautiful               | 'beauti':4                                                                                                                                                    
 It looks like a beautiful place | 'beauti':5 'like':3 'look':2 'place':6                                                                                                                        
(3 rows)                                                                                                                                                                                         
postgres=# WITH tsq as (SELECT to_tsquery('english', 'place|beautiful') AS query) SELECT t, rank FROM (SELECT t, a <=> query AS rank FROM test_rum, tsq WHERE a @@ query = true) matches ORDER BY rank ASC LIMIT 10;
                t                |  rank   
---------------------------------+---------
 It looks like a beautiful place | 8.22467
 The situation is most beautiful | 16.4493
 It is a beautiful               | 16.4493
(3 rows)

PostgreSQL 11?

Hello!
I have the following error:

$sudo USE_PGXS=1 pgxn install rum

INFO: best version: rum 1.1.0
INFO: saving /tmp/tmp32u9l4/rum-1.1.0.zip
INFO: unpacking: /tmp/tmp32u9l4/rum-1.1.0.zip
INFO: building extension
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fPIC -pie -fno-omit-frame-pointer -fPIC -I. -I./ -I/usr/include/postgresql/11/server -I/usr/include/postgresql/internal -I/usr/include/x86_64-linux-gnu -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include/mit-krb5  -c -o src/rumsort.o src/rumsort.c
In file included from /usr/include/postgresql/11/server/catalog/index.h:18:0,
                 from src/rumsort.c:129:
/usr/include/postgresql/11/server/nodes/execnodes.h:1849:2: error: unknown type name ‘TuplesortInstrumentation’
  TuplesortInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
  ^
In file included from src/rumsort.c:138:0:
/usr/include/postgresql/11/server/utils/probes.h:10:21: fatal error: sys/sdt.h: No such file or directory
compilation terminated.
<builtin>: recipe for target 'src/rumsort.o' failed
make: *** [src/rumsort.o] Error 1
ERROR: command returned 2: make PG_CONFIG=/usr/bin/pg_config all

My pg_config:

root@DKXP:/home/a.formella# pg_config
BINDIR = /usr/lib/postgresql/11/bin
DOCDIR = /usr/share/doc/postgresql-doc-11
HTMLDIR = /usr/share/doc/postgresql-doc-11
INCLUDEDIR = /usr/include/postgresql
PKGINCLUDEDIR = /usr/include/postgresql
INCLUDEDIR-SERVER = /usr/include/postgresql/11/server
LIBDIR = /usr/lib/x86_64-linux-gnu
PKGLIBDIR = /usr/lib/postgresql/11/lib
LOCALEDIR = /usr/share/locale
MANDIR = /usr/share/postgresql/11/man
SHAREDIR = /usr/share/postgresql/11
SYSCONFDIR = /etc/postgresql-common
PGXS = /usr/lib/postgresql/11/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=/usr/lib/x86_64-linux-gnu' '--libexecdir=/usr/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--with-icu' '--with-tcl' '--with-perl' '--with-python' '--with-pam' '--with-openssl' '--with-libxml' '--with-libxslt' '--with-tclconfig=/usr/lib/x86_64-linux-gnu/tcl8.6' '--with-includes=/usr/include/tcl8.6' 'PYTHON=/usr/bin/python' '--mandir=/usr/share/postgresql/11/man' '--docdir=/usr/share/doc/postgresql-doc-11' '--sysconfdir=/etc/postgresql-common' '--datarootdir=/usr/share/' '--datadir=/usr/share/postgresql/11' '--bindir=/usr/lib/postgresql/11/bin' '--libdir=/usr/lib/x86_64-linux-gnu/' '--libexecdir=/usr/lib/postgresql/' '--includedir=/usr/include/postgresql/' '--with-extra-version= (Ubuntu 11.0-1.pgdg16.04+2)' '--enable-nls' '--enable-integer-datetimes' '--enable-thread-safety' '--enable-tap-tests' '--enable-debug' '--enable-dtrace' '--disable-rpath' '--with-uuid=e2fs' '--with-gnu-ld' '--with-pgport=5432' '--with-system-tzdata=/usr/share/zoneinfo' '--with-llvm' '--with-systemd' '--with-selinux' 'MKDIR_P=/bin/mkdir -p' 'TAR=/bin/tar' 'CFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fPIC -pie -fno-omit-frame-pointer' 'LDFLAGS=-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now' '--with-gssapi' '--with-ldap' '--with-includes=/usr/include/mit-krb5' '--with-libs=/usr/lib/mit-krb5' '--with-libs=/usr/lib/x86_64-linux-gnu/mit-krb5' 'build_alias=x86_64-linux-gnu' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2' 'CXXFLAGS=-g -O2 -fstack-protector-strong -Wformat -Werror=format-security'
CC = gcc
CPPFLAGS = -I/usr/include/x86_64-linux-gnu -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include/mit-krb5
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -fPIC -pie -fno-omit-frame-pointer
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -L/usr/lib/llvm-6.0/lib -L/usr/lib/x86_64-linux-gnu/mit-krb5 -Wl,--as-needed
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lselinux -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -ledit -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 11.0 (Ubuntu 11.0-1.pgdg16.04+2)

Speedup DESC queries

From #15.
The DESC query slower:

explain analyze select * from rum where tsvector @@ plainto_tsquery('simple', 'cow') order by id <=> 1000000 limit 10;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3.00..3.53 rows=10 width=56) (actual time=31.367..31.375 rows=10 loops=1)
   ->  Index Scan using rum_tsvector_id_idx on rum  (cost=3.00..2635.10 rows=49960 width=56) (actual time=31.365..31.371 rows=10 loops=1)
         Index Cond: (tsvector @@ '''cow'''::tsquery)
         Order By: (id <=> 1000000)
 Planning time: 0.133 ms
 Execution time: 31.427 ms

than the next ASC query:

explain analyze select * from rum where tsvector @@ plainto_tsquery('simple', 'cow') order by id <=> 0 limit 10;
                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3.00..3.53 rows=10 width=56) (actual time=18.329..18.338 rows=10 loops=1)
   ->  Index Scan using rum_tsvector_id_idx on rum  (cost=3.00..2635.10 rows=49960 width=56) (actual time=18.328..18.335 rows=10 loops=1)
         Index Cond: (tsvector @@ '''cow'''::tsquery)
         Order By: (id <=> 0)
 Planning time: 0.139 ms
 Execution time: 18.393 ms

New release with PG11 support

Hi,
I see that master has support for PG11. Could you tag a new release so I can update the Debian package?
Thanks!

llvm and can't compile

Окружение
CentOS 7.3.1611
Postgres ваниль 11.1
устанавливал постгрес командой:

yum install postgresql11 postgresql11-server postgresql11-contrib postgresql11-libs postgresql11-devel systemtap-sdt-devel

мой pg_config:

[root@publ-test-db01 bin]# ./pg_config
BINDIR = /usr/pgsql-11/bin
DOCDIR = /usr/pgsql-11/doc
HTMLDIR = /usr/pgsql-11/doc/html
INCLUDEDIR = /usr/pgsql-11/include
PKGINCLUDEDIR = /usr/pgsql-11/include
INCLUDEDIR-SERVER = /usr/pgsql-11/include/server
LIBDIR = /usr/pgsql-11/lib
PKGLIBDIR = /usr/pgsql-11/lib
LOCALEDIR = /usr/pgsql-11/share/locale
MANDIR = /usr/pgsql-11/share/man
SHAREDIR = /usr/pgsql-11/share
SYSCONFDIR = /etc/sysconfig/pgsql
PGXS = /usr/pgsql-11/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--enable-rpath' '--prefix=/usr/pgsql-11' '--includedir=/usr/pgsql-11/include' '--mandir=/usr/pgsql-11/share/man' '--datadir=/usr/pgsql-11/share' '--with-icu' 'CLANG=/opt/rh/llvm-toolset-7/root/usr/bin/clang' 'LLVM_CONFIG=/usr/lib64/llvm5.0/bin/llvm-config' '--with-llvm' '--with-perl' '--with-python' '--with-tcl' '--with-tclconfig=/usr/lib64' '--with-openssl' '--with-pam' '--with-gssapi' '--with-includes=/usr/include' '--with-libraries=/usr/lib64' '--enable-nls' '--enable-dtrace' '--with-uuid=e2fs' '--with-libxml' '--with-libxslt' '--with-ldap' '--with-selinux' '--with-systemd' '--with-system-tzdata=/usr/share/zoneinfo' '--sysconfdir=/etc/sysconfig/pgsql' '--docdir=/usr/pgsql-11/doc' '--htmldir=/usr/pgsql-11/doc/html' 'CFLAGS=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' 'LDFLAGS=-Wl,--as-needed' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic
CFLAGS_SL = -fPIC
LDFLAGS = -Wl,--as-needed -L/usr/lib64/llvm5.0/lib -L/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/usr/pgsql-11/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lselinux -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 11.1

дополнительно устанавливал:
yum -y install git gcc gcc-c++ clang libstdc++-devel llvm llvm-libs

при компиляции получаю ошибку:

...
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC -shared -o rum.so src/rumsort.o src/rum_ts_utils.o src/rumtsquery.o src/rumbtree.o src/rumbulk.o src/rumdatapage.o src/rumentrypage.o src/rumget.o src/ruminsert.o src/rumscan.o src/rumutil.o src/rumvacuum.o src/rumvalidate.o src/btree_rum.o src/rum_arr_utils.o  -L/usr/pgsql-11/lib  -Wl,--as-needed -L/usr/lib64/llvm5.0/lib  -L/usr/lib64 -Wl,--as-needed -Wl,-rpath,'/usr/pgsql-11/lib',--enable-new-dtags -lm
/opt/rh/llvm-toolset-7/root/usr/bin/clang -Wno-ignored-attributes -fno-strict-aliasing -fwrapv -O2  -I. -I./ -I/usr/pgsql-11/include/server -I/usr/pgsql-11/include/internal  -D_GNU_SOURCE -I/usr/include/libxml2  -I/usr/include -flto=thin -emit-llvm -c -o src/rumsort.bc src/rumsort.c
make: /opt/rh/llvm-toolset-7/root/usr/bin/clang: Command not found
make: *** [src/rumsort.bc] Error 127

неясно почему идет попытка искать llvm в /opt/rh/llvm-toolset-7/root/usr/bin/clang
в чём может быть проблема?
?

ERROR: index row size 1648 exceeds maximum 1336 for index "..."

New problem... Trying to create the rum index:

create index some_index on some_table using rum (some_column rum_tsvector_ops);

Here is the text of the error:

SQL Error [54000]: ERROR: index row size 1648 exceeds maximum 1336 for index "some_index"
org.postgresql.util.PSQLException: ERROR: index row size 1648 exceeds maximum 1336 for index "some_index"

about genericxlog in rum

Hello:

I have a question to ask.
RUM index is a extension for PostgreSQL,so it only can use Generic Xlog to write wal.
I want to put RUM into PG,As part of postgres porcess. so I want to change the the wal's mode,not use Generic xlog. I imitate GIN index to write RUM's wal process,but I have many difficulty to finish it.
so , I want to know do you have any details infomation about rum index wal, because gin and rum has some different in coding.

ths very much.

joseph long

Have it as a part of Postgres core distribution?

Hi,
what is the current status of this project and how soon it might be included in the Postgres core?

I've just had a chat with Amazon RDS support, and while they have a pretty long list of already supported Postgres extensions, they say that it's pretty hard for them to include contribs written in C (hehe), so for large number of Postgres users living in clouds (RDS, Heroku and now GCP) the only way to use RUM is to have it as a part of Postgres.

I'm not seeing a performance win. What am I doing wrong.

I have a database:

220K documents. (31% 9 words or less)

Unique words:
P30 9
Median 57
P90 656
p99 1837
p99.9 2928
max 6398
~130K null records (so ~350 records total)

We have an application where common queries return ~50K documents. When the database is warm, these take ~400ms or more. When cold many seconds. We're hoping that RUM can help, but so far, I'm not seeing an improvement.

Gory (but not complicated) details are here:
https://gist.github.com/jimfulton/4104b5387cf7dce3754bef342a9f84c5

Any idea what we're doing wrong?

Error during RUM installation

Here is the messages:
/usr/bin/mkdir -p '/home/artur/progs/pg9.6/lib/postgresql' /usr/bin/mkdir -p '/home/artur/progs/pg9.6/share/postgresql/extension' /usr/bin/mkdir -p '/home/artur/progs/pg9.6/share/postgresql/extension' /usr/bin/install -c -m 755 rum.so '/home/artur/progs/pg9.6/lib/postgresql/rum.so' /usr/bin/install -c -m 644 .//src/rum.h .//src/rumsort.h '/home/artur/progs/pg9.6/include/server/' /usr/bin/install: указанная цель '/home/artur/progs/pg9.6/include/server/' не является каталогом: Нет такого файла или каталога make: *** [Makefile:37: installincludes] Ошибка 1

I don't have /home/artur/progs/pg9.6/include/server/ directory. But I have /home/artur/progs/pg9.6/include/postgresql/server/.
This error came from 85a4291

Autovacuum after large update, with active RUM index crashes postgres

table with 160k records

UPDATE {table_name} SET {document}=data.{document} FROM
(VALUES %s) AS data ({document}, id)
WHERE {table_name}.id=data.id;

UPDATE {table_name} set {document_vector}=(select to_tsvector({document}));

CREATE INDEX {table_name}_rum_index
ON {table_name} USING RUM ({document_vector} rum_tsvector_ops);

After this set of queries, the autovacuum consumes all available memory before it crashes.

screenshot from 2017-05-30 10-41-09

Avoiding bitmap scan with ANY

Hi.

As Teodor commented on #15 PG will use bitmap scan when having queries like this:

EXPLAIN ANALYZE
SELECT
    del.entity_id,
    del.received_timestamp,
    del.received_timestamp <=> '3000-01-01' :: TIMESTAMP,
    del.folder_id
FROM origo_email_delivery del
WHERE del.fts_all @@ to_tsquery('simple', 'andreas&joseph')
      AND del.folder_id IN (44961, 204483, 44965, 2470519)
ORDER BY del.received_timestamp <=> '3000-01-01' :: TIMESTAMP
LIMIT 10 offset 10000
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                       QUERY PLAN                                                                       │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=32066.97..32066.97 rows=1 width=32) (actual time=217.899..217.900 rows=10 loops=1)                                                        │
│   ->  Sort  (cost=32042.41..32066.97 rows=9822 width=32) (actual time=217.300..217.688 rows=10010 loops=1)                                             │
│         Sort Key: ((received_timestamp <=> '3000-01-01 00:00:00'::timestamp without time zone))                                                        │
│         Sort Method: quicksort  Memory: 1541kB                                                                                                         │
│         ->  Bitmap Heap Scan on origo_email_delivery del  (cost=452.67..31391.13 rows=9822 width=32) (actual time=196.797..214.216 rows=14806 loops=1) │
│               Recheck Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))        │
│               Heap Blocks: exact=13043                                                                                                                 │
│               ->  Bitmap Index Scan on rum_idx  (cost=0.00..450.22 rows=9822 width=0) (actual time=195.369..195.369 rows=14806 loops=1)                │
│                     Index Cond: ((fts_all @@ '''andreas'' & ''joseph'''::tsquery) AND (folder_id = ANY ('{44961,204483,44965,2470519}'::bigint[])))    │
│ Planning time: 0.721 ms                                                                                                                                │
│ Execution time: 217.969 ms                                                                                                                             │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(11 rows)

... preventing ordered output, hence requiring the extra Sort-step and preventing the query from using only the index. This results in all tuples must be processed then sorted to match the LIMIT.

Will this be fixed, ie. will index scan be able to process ANY, or will it require fixing something in PG, if so - what needs to be done?

Thanks.

Error create extention rum

andreak=# create extension rum ;
ERROR:  operator <-> already exists

andreak=# select version();
                                                   version                                                   
-------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6beta1 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.3.1-14ubuntu2) 5.3.1 20160413, 64-bit

I've been using and studying smlar -- any way I can help to extend this with it/vice versa?

I updated smlar for postgresql 9.6, and this project seems very interesting given smlar's limitations:

  • sometimes, the smlar(textarray,textarray) function in tfidf mode will return multiple results with the exact same rank, making it seem like there's no normalization. Does smlar do @@-style matching for each array element, or is there another element of similarity involved?
  • It's difficult to get consistent results with smlar in tfidf mode. Maybe I don't understand smlar completely.
  • I want to add a statistical autocorrect method, similar to this, that uses smlar.stattable and row-level triggers that update it based on changes in the fields that feed data to the table's tsvector field.

Thoughts?

fatal error: sys/sdt.h: No such file or directory

Enviroment: CentOS 7.4, Postgres 10.0

export PATH=/usr/pgsql-10/bin:$PATH
git clone https://github.com/postgrespro/rum
cd rum
make USE_PGXS=1

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -DLINUX_OOM_SCORE_ADJ=0 -fPIC -I. -I./ -I/usr/pgsql-10/include/server -I/usr/pgsql-10/include/internal -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include -c -o src/rumsort.o src/rumsort.c
In file included from src/rumsort.c:138:0:
/usr/pgsql-10/include/server/utils/probes.h:10:21: fatal error: sys/sdt.h: No such file or directory
#include <sys/sdt.h>
^
compilation terminated.

invalid memory alloc request size

HI:
I use rum like this:

CREATE TABLE test_rum(t text, a tsvector);

CREATE TRIGGER tsvectorupdate
BEFORE UPDATE OR INSERT ON test_rum
FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');

INSERT INTO test_rum(t) VALUES ('The situation is most beautiful');
INSERT INTO test_rum(t) VALUES ('It is a beautiful');
INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');
CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);
**TRUNCATE rumidx;
SELECT t from bttest ORDER BY id <=> to_tsquery('fdfdfafa');**

it well report a error like this
2018-12-06 17:47:22.060 CST [9345] ERROR: invalid memory alloc request size 18446744073709478784
2018-12-06 17:47:22.060 CST [9345] STATEMENT: SELECT t from test_rum ORDER BY a <=> to_tsquery('fdfdfafa');

ruminv tests fail under both mac sierra and ubuntu 4.4

the ruminv regression test is failing identically under both mac sierra and ubuntu4.4

postgres@condor> svn info
Path: .
Working Copy Root Path: /home/postgres/src/postgresql-9.6.1/contrib/rum
URL: https://github.com/postgrespro/rum/trunk
Relative URL: ^/trunk
Repository Root: https://github.com/postgrespro/rum
Repository UUID: 6851b3cf-d1d2-f0c0-02bc-ce57b7db7bff
Revision: 274
Node Kind: directory
Schedule: normal
Last Changed Author: victor.wagner
Last Changed Rev: 273
Last Changed Date: 2017-01-30 02:49:49 -0600 (Mon, 30 Jan 2017)

postgres@condor> cat regression.diffs
*** /home/postgres/src/postgresql-9.6.1/contrib/rum/expected/ruminv.out Fri Feb 3 11:11:33 2017
--- /home/postgres/src/postgresql-9.6.1/contrib/rum/results/ruminv.out Fri Feb 3 11:12:09 2017


*** 10,22 ****
INSERT INTO test_invrum VALUES ('(a|b)&(c|d)'::tsquery);
INSERT INTO test_invrum VALUES ('!a'::tsquery);
SELECT * FROM test_invrum WHERE q @@ ''::tsvector;
! q
! ----------------
! !( 'a' | 'b' )
! !( 'a' & 'b' )
! !'a' | 'b'
! !'a'
! (4 rows)

SELECT * FROM test_invrum WHERE q @@ 'a'::tsvector;
q
--- 10,18 ----
INSERT INTO test_invrum VALUES ('(a|b)&(c|d)'::tsquery);
INSERT INTO test_invrum VALUES ('!a'::tsquery);
SELECT * FROM test_invrum WHERE q @@ ''::tsvector;
! q
! ---
! (0 rows)

SELECT * FROM test_invrum WHERE q @@ 'a'::tsvector;
q

======================================================================

‘state’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Hi,
There are two warnings when i make install rum.

my env:
PostgreSQL 9.6beta4
gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC)
CentOS 7.2 x64

make USE_PGXS=1 clean
rm -f rum.so   librum.a  librum.pc
rm -f rumsort.o rum_ts_utils.o rumtsquery.o rumbtree.o rumbulk.o rumdatapage.o rumentrypage.o rumget.o ruminsert.o rumscan.o rumutil.o rumvacuum.o rumvalidate.o rum_timestamp.o 
rm -rf results/ regression.diffs regression.out tmp_check/ log/
digoal@iZ28tqoemgtZ-> make USE_PGXS=1 
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumsort.o rumsort.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rum_ts_utils.o rum_ts_utils.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumtsquery.o rumtsquery.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumbtree.o rumbtree.c
rumbtree.c: In function ‘rumInsertValue’:
rumbtree.c:587:23: warning: ‘state’ may be used uninitialized in this function [-Wmaybe-uninitialized]
      GenericXLogFinish(state);
                       ^
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumbulk.o rumbulk.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumdatapage.o rumdatapage.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumentrypage.o rumentrypage.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumget.o rumget.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o ruminsert.o ruminsert.c
ruminsert.c: In function ‘createPostingTree’:
ruminsert.c:81:20: warning: ‘state’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   GenericXLogFinish(state);
                    ^
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumscan.o rumscan.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumutil.o rumutil.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumvacuum.o rumvacuum.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rumvalidate.o rumvalidate.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -I. -I./ -I/home/digoal/pgsql9.6/include/server -I/home/digoal/pgsql9.6/include/internal -D_GNU_SOURCE   -c -o rum_timestamp.o rum_timestamp.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fpic -shared -o rum.so rumsort.o rum_ts_utils.o rumtsquery.o rumbtree.o rumbulk.o rumdatapage.o rumentrypage.o rumget.o ruminsert.o rumscan.o rumutil.o rumvacuum.o rumvalidate.o rum_timestamp.o  -L/home/digoal/pgsql9.6/lib -Wl,--as-needed -Wl,-rpath,'/home/digoal/pgsql9.6/lib',--enable-new-dtags  

RUM use GENERIC WAL

Hello

I find that rum use generic WAL, generates too big WALtraffic. can I use internal WAL function to WAL.

Thanks

error when used with postgresql 12 devel version.

error when used with postgresql 12 devel version.

root@iZbp13nu0s9j3x3op4zpd4Z-> USE_PGXS=1 make clean
rm -f rum.so   librum.a  librum.pc
rm -f src/rumsort.o src/rum_ts_utils.o src/rumtsquery.o src/rumbtree.o src/rumbulk.o src/rumdatapage.o src/rumentrypage.o src/rumget.o src/ruminsert.o src/rumscan.o src/rumutil.o src/rumvacuum.o src/rumvalidate.o src/btree_rum.o src/rum_arr_utils.o  src/rumsort.bc src/rum_ts_utils.bc src/rumtsquery.bc src/rumbtree.bc src/rumbulk.bc src/rumdatapage.bc src/rumentrypage.bc src/rumget.bc src/ruminsert.bc src/rumscan.bc src/rumutil.bc src/rumvacuum.bc src/rumvalidate.bc src/btree_rum.bc src/rum_arr_utils.bc
rm -rf results/ regression.diffs regression.out tmp_check/ tmp_check_iso/ log/ output_iso/
root@iZbp13nu0s9j3x3op4zpd4Z-> git checkout stable
Branch stable set up to track remote branch stable from origin.
Switched to a new branch 'stable'
root@iZbp13nu0s9j3x3op4zpd4Z-> USE_PGXS=1 make 
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumsort.o src/rumsort.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rum_ts_utils.o src/rum_ts_utils.c
In file included from /home/digoal/pgsql12/include/server/access/skey.h:19:0,
                 from /home/digoal/pgsql12/include/server/access/genam.h:18,
                 from /home/digoal/pgsql12/include/server/access/amapi.h:15,
                 from /home/digoal/pgsql12/include/server/access/hash.h:20,
                 from src/rum_ts_utils.c:14:
src/rum_ts_utils.c: In function ‘rum_tsquery_distance’:
src/rum_ts_utils.c:1427:20: warning: implicit declaration of function ‘get_float8_infinity’ [-Wimplicit-function-declaration]
   PG_RETURN_FLOAT8(get_float8_infinity());
                    ^
/home/digoal/pgsql12/include/server/fmgr.h:331:52: note: in definition of macro ‘PG_RETURN_FLOAT8’
 #define PG_RETURN_FLOAT8(x)  return Float8GetDatum(x)
                                                    ^
src/rum_ts_utils.c: In function ‘rum_ts_distance_tt’:
src/rum_ts_utils.c:1447:20: warning: implicit declaration of function ‘get_float4_infinity’ [-Wimplicit-function-declaration]
   PG_RETURN_FLOAT4(get_float4_infinity());
                    ^
/home/digoal/pgsql12/include/server/fmgr.h:330:52: note: in definition of macro ‘PG_RETURN_FLOAT4’
 #define PG_RETURN_FLOAT4(x)  return Float4GetDatum(x)
                                                    ^
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumtsquery.o src/rumtsquery.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumbtree.o src/rumbtree.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumbulk.o src/rumbulk.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumdatapage.o src/rumdatapage.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumentrypage.o src/rumentrypage.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumget.o src/rumget.c
src/rumget.c: In function ‘keyGetOrdering’:
src/rumget.c:2175:11: warning: implicit declaration of function ‘get_float8_infinity’ [-Wimplicit-function-declaration]
    return get_float8_infinity();
           ^~~~~~~~~~~~~~~~~~~
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/ruminsert.o src/ruminsert.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumscan.o src/rumscan.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumutil.o src/rumutil.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumvacuum.o src/rumvacuum.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rumvalidate.o src/rumvalidate.c
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/btree_rum.o src/btree_rum.c
src/btree_rum.c: In function ‘rum_int2_distance’:
src/btree_rum.c:262:11: warning: implicit declaration of function ‘get_float8_infinity’ [-Wimplicit-function-declaration]
    diff = get_float8_infinity();         \
           ^
src/btree_rum.c:407:1: note: in expansion of macro ‘RUM_SUPPORT_DIST’
 RUM_SUPPORT_DIST(int2, false, leftmostvalue_int2, btint2cmp, always_false, int2subtract)
 ^~~~~~~~~~~~~~~~
src/btree_rum.c: In function ‘leftmostvalue_float4’:
src/btree_rum.c:440:25: warning: implicit declaration of function ‘get_float4_infinity’ [-Wimplicit-function-declaration]
  return Float4GetDatum(-get_float4_infinity());
                         ^~~~~~~~~~~~~~~~~~~
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -I. -I./ -I/home/digoal/pgsql12/include/server -I/home/digoal/pgsql12/include/internal  -D_GNU_SOURCE   -c -o src/rum_arr_utils.o src/rum_arr_utils.c
In file included from /home/digoal/pgsql12/include/server/access/skey.h:19:0,
                 from /home/digoal/pgsql12/include/server/access/genam.h:18,
                 from /home/digoal/pgsql12/include/server/access/amapi.h:15,
                 from /home/digoal/pgsql12/include/server/access/hash.h:20,
                 from src/rum_arr_utils.c:14:
src/rum_arr_utils.c: In function ‘rum_anyarray_ordering’:
src/rum_arr_utils.c:73:19: warning: implicit declaration of function ‘get_float8_infinity’; did you mean ‘get_attidentity’? [-Wimplicit-function-declaration]
  ( (sml == 0.0) ? get_float8_infinity() : ((float8) 1) / ((float8) (sml)) )
                   ^
/home/digoal/pgsql12/include/server/fmgr.h:331:52: note: in definition of macro ‘PG_RETURN_FLOAT8’
 #define PG_RETURN_FLOAT8(x)  return Float8GetDatum(x)
                                                    ^
src/rum_arr_utils.c:457:20: note: in expansion of macro ‘DIST_FROM_SML’
   PG_RETURN_FLOAT8(DIST_FROM_SML(sml));
                    ^~~~~~~~~~~~~
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -fPIC -shared -o rum.so src/rumsort.o src/rum_ts_utils.o src/rumtsquery.o src/rumbtree.o src/rumbulk.o src/rumdatapage.o src/rumentrypage.o src/rumget.o src/ruminsert.o src/rumscan.o src/rumutil.o src/rumvacuum.o src/rumvalidate.o src/btree_rum.o src/rum_arr_utils.o  -L/home/digoal/pgsql12/lib    -Wl,--as-needed -Wl,-rpath,'/home/digoal/pgsql12/lib',--enable-new-dtags -lm  
root@iZbp13nu0s9j3x3op4zpd4Z-> USE_PGXS=1 make install
/bin/mkdir -p '/home/digoal/pgsql12/lib'
/bin/mkdir -p '/home/digoal/pgsql12/share/extension'
/bin/mkdir -p '/home/digoal/pgsql12/share/extension'
/bin/install -c -m 755  rum.so '/home/digoal/pgsql12/lib/rum.so'
/bin/install -c -d '/home/digoal/pgsql12/include/server/'
/bin/install -c -m 644 .//src/rum.h .//src/rumsort.h '/home/digoal/pgsql12/include/server/'
/bin/install -c -m 644 .//rum.control '/home/digoal/pgsql12/share/extension/'
/bin/install -c -m 644 .//rum--1.0.sql .//rum--1.2.sql .//rum--1.0--1.1.sql .//rum--1.1--1.2.sql  '/home/digoal/pgsql12/share/extension/'
root@iZbp13nu0s9j3x3op4zpd4Z-> psql
psql (12devel)
Type "help" for help.

postgres=# create extension rum;
ERROR:  could not load library "/home/digoal/pgsql12/lib/rum.so": /home/digoal/pgsql12/lib/rum.so: undefined symbol: get_float4_infinity

thank you
best regards.

Weights?

ts_rank and ts_rank_cd let you supply weights for text from different sources (A, B, C, and D).

How does RUM handle text from different sources? Does it treat the sources as the same? Does it use built-in weights? Or is there a way to supply weights?

src/rum.h:792:2: error: unknown type name 'RBTNode' on `env make USE_PGXS=1`

Hello!

For some time i was using docker's postgres by FROM postgres:alpine with installation of rum included in dockerfile below.
My last image was built 2 month ago, so i decided to rebuild it
And i'm currently expiriencing this problem while building an image:

Cloning into 'rum'...
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 -fPIC -I. -I./ -I/usr/local/include/postgresql/server -I/usr/local/include/postgresql/internal  -D_GNU_SOURCE -I/usr/include/libxml2   -c -o src/rumsort.o src/rumsort.c
In file included from src/rumsort.c:142:0:
src/rum.h:792:2: error: unknown type name 'RBTNode'
  RBTNode  rbnode;
  ^~~~~~~
make: *** [<builtin>: src/rumsort.o] Error 1
ERROR: Service 'postgres' failed to build: The command '/bin/sh -c cd /tmp &&     apk add --no-cache --virtual .build-deps git make gcc coreutils libc-dev &&     git clone https://github.com/postgrespro/rum &&     cd rum &&     env make USE_PGXS=1 &&     env make USE_PGXS=1 install &&     cd .. &&     apk del .build-deps &&     rm -rf rum/' returned a non-zero code: 2

I believe it happens right at the env make USE_PGXS=1
Dockerfile:

FROM postgres:alpine

COPY ./create-extension-rum.sh /docker-entrypoint-initdb.d/
COPY ./create-extension-uuid-ossp.sh /docker-entrypoint-initdb.d/

RUN cd /tmp && \
    apk add --no-cache --virtual .build-deps git make gcc coreutils libc-dev && \
    git clone https://github.com/postgrespro/rum && \
    cd rum && \
    env make USE_PGXS=1 && \
    env make USE_PGXS=1 install && \
    cd .. && \
    apk del .build-deps && \
rm -rf rum/ 

Support statistics for `RUM`

From #15.
Estimated row count is not good:

explain (analyze) select * from rum where tsvector @@ 'cow'::tsquery order by id <=> (10^6)::int limit 10;
                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12.00..13.43 rows=10 width=56) (actual time=21.027..21.031 rows=10 loops=1)
   ->  Index Scan using rum_tsvector_id_idx on rum  (cost=12.00..7171.81 rows=50125 width=56) (actual time=21.024..21.028 rows=10 loops=1)
         Index Cond: (tsvector @@ '''cow'''::tsquery)
         Order By: (id <=> 1000000)
 Planning time: 0.181 ms
 Execution time: 21.274 ms

inconsistent results for queries against a table with and without rum indexes

i am getting inconsistent results for queries against a table with and without rum indexes.
i've tested under isolated builds of pg9.6beta3 on both mac and linux. a temporary zip demonstrating the issue is here:

http://setspace.com/~jmscott/rumbug-060ebdc.zip

also, can tsqueries be indexed? i ask because under on linux (not mac) i see this error during regress.

ERROR: Indexing of phrase tsqueries isn't supported yet

could this be related to the inconsistent queries? i do not see the error on macos 10.11.6.

installcheck failed

Using postgresql 10 on Ubuntu, c9.io

Here is what I got:

akakoori:~/workspace/rum $ sudo make USE_PGXS=1 installcheck
/usr/lib/postgresql/10/lib/pgxs/src/makefiles/../../src/test/regress/pg_regress --inputdir=./ --bindir='/usr/lib/postgresql/10/bin'    --dbname=contrib_regression rum rum_validate rum_hash ruminv timestamp orderby orderby_hash altorder altorder_hash limits int2 int4 int8 float4 float8 money oid time timetz date interval macaddr inet cidr text varchar char bytea bit varbit numeric array
(using postmaster on Unix socket, default port)
pg_regress: could not set core size: disallowed by hard limit
============== dropping database "contrib_regression" ==============
psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
command failed: "/usr/lib/postgresql/10/bin/psql" -X -c "DROP DATABASE IF EXISTS \"contrib_regression\"" "postgres"
make: *** [installcheck] Error 2

FYI, I got this when installing:

akakoori:~/workspace/rum $ sudo make USE_PGXS=1 install
/bin/mkdir -p '/usr/lib/postgresql/10/lib'
/bin/mkdir -p '/usr/share/postgresql/10/extension'
/bin/mkdir -p '/usr/share/postgresql/10/extension'
/usr/bin/install -c -m 755  rum.so '/usr/lib/postgresql/10/lib/rum.so'
/usr/bin/install -c -d '/usr/include/postgresql/10/server/'
/usr/bin/install -c -m 644 .//src/rum.h .//src/rumsort.h '/usr/include/postgresql/10/server/'
/usr/bin/install -c -m 644 .//rum.control '/usr/share/postgresql/10/extension/'
/usr/bin/install -c -m 644 .//rum--1.0.sql .//rum--1.2.sql .//rum--1.0--1.1.sql .//rum--1.1--1.2.sql  '/usr/share/postgresql/10/extension/'

I cannot quite understand the error messages, so could someone please tell me what's the problem and how may I fix it?
Thanks.

OPERATOR % Already Exists in Postgresql 9.6.5

I installed RUM in a newly released version of postgresql 9.6, which appears to have added an operator '%'. This seems to conflict with an operator defined in RUM, because when I run

CREATE EXTENSION rum;

I get the error in the title.

Select count number of search results performance

Postgres 9.6.4
Windows Server 2016

I am trying to get total number of search results. I used a windows function count(id) OVER() as count in my search query select clause and that seemed to have caused documents to be loaded into memory instead of just RUM indexes being hit.

I also tried select count(*) from code_docs where tsv_natural @@ $1 ($1 being a tsquery).

Any way to get count of search results and still only hit the rum indexes?

fast update

Hello

Recently,I use rum in my project. I want to use fast update like gin index. I check the previous Issues and find that the fast update is exist before.

So I want to know why delete the function.

Thanks!

Joseph

Missing rumfast.h - trying to compile on windows using Visual Studio

Hi,

The src file rumfast.c includes rumfast.h but that file is not in the src directory and I get compile errors that RumTupleCollector is not defined.

I was able to compile it on Linux so I don't know if I just need to somehow force my windows compiler to ignore the error?

Thanks,
Philip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.