Comments (19)
Index key can't have size more than 1336 bytes. It is a limitation of the current version. It is true also for GIN (it could has another limit).
It seems there is some big string in your table. You can check it with the query:
select word, char_length(word)
from ts_stat('select some_column from some_table')
order by char_length(word) desc
limit 10;
from rum.
Yes, i got HUGE strings in this table, not the only one, a column with ts_vector of it and a gin index on it.
from rum.
Closed
from rum.
We are working on this issue. And as I know there are huge urls in your strings.
Are you do search by these urls? Are they important to you? If not, then it is easy to fix it. Just don't store it in index (I can explain how).
If they important we can fix RUM and cut off urls. But I think we need it anyway, other users may have similar issue.
from rum.
Glad to hear that you are working on this issue.
We are not using these urls at the moment, but they can be used in the near future. So, it would be great, if you had the RUM fixed according to this fact.
from rum.
We find a solution. We can add new OPERATOR CLASS which will store hash of lexems, then you can store huge strings. But in this OPERATOR CLASS we cant use prefix (or partial) matching.
Do you use prefix matching?
from rum.
Can you trim the url and raise some notice like:
NOTICE: Index key can't have size more than 1336 bytes.
HINT: key has been trimmed.
from rum.
Can you trim the url and raise some notice like:
NOTICE: Index key can't have size more than 1336 bytes.
HINT: key has been trimmed.
I think we can trim the url also. We will fix limits for posting trees, and they will become as in GIN.
from rum.
I think we can trim the url also. We will fix limits for posting trees, and they will become as in GIN.
Sounds great, thank you.
from rum.
@To4e , please can you check the issue_9_max_item_size branch?
https://github.com/postgrespro/rum/tree/issue_9_max_item_size
from rum.
@select-artur, still having the same problem:
ERROR: index row size 1544 exceeds maximum 1352 for index "some_index"
from rum.
Please, try commit 58fee28.
from rum.
Please, try commit 58fee28.
Well, installed it, tests are ok, rebuilded the extension, started the creation of the index, and ... about 3 hours of total suffering of the machine, where the database is, ended with its self reboot:
IO Max: 202797/s
Load: 26.71 29.73 31.23
from rum.
Could you shared some more details about dataset and machine? Table size, row count, postgresql settings, amount of RAM, processor, disk type and space etc.
from rum.
Here you are:
Table size: 20 GB + 51 GB toast,
Rows count: 19890432,
Index on search_vector: 6137 MB,
RAM: 25.36 GB,
Number of processors: 8,
Disk type: ssd,
Shared buffers: 16 GB,
work_mem: 1GB,
maintenance_work_mem: 4GB.
And this is the test server.
from rum.
Do we have any chance to access the test server? Or share dataset? Or share some kind of anonymized dataset where issue still occur?
from rum.
Sorry, but no, and no. This information is some kind confidential.
from rum.
But could you try to generate some random data where same issue will occur?
from rum.
Not a good variant too, because the random data will give different size of the vector, tockens and so on.
from rum.
Related Issues (20)
- Querying 1.6 million records using RUM index takes 2 seconds HOT 1
- rum compile warnings on pg14.0 HOT 2
- New release with PG14 support HOT 2
- Ошибка: could not load library "/usr/pgsql-14/lib/rum.so" при выполнении create extension HOT 5
- rum_anyarray_addon_ops index not working HOT 1
- Is there a way to weight the rum_ts_score function?
- why the rum index not working HOT 9
- Build failure with PG15: src/rumsort.c:1316:38: error: too many arguments to function ‘LogicalTapeBackspace’ HOT 14
- ERROR: could not load library rum.so: undefined symbol: postgresql_sort__done_semaphore HOT 5
- Hello, may I ask why rum deleted the fast update function, what is the consideration?
- Can we compute two tsvectors like array op?
- New release with PG15 support? HOT 2
- warning on var totalCount compile on mac venture clang
- create rum index without "WITH" cause an error when select use “ORDER BY”
- In a query, `||` does not work when nested in `<->` HOT 3
- Index with jsonb and int8?
- [bug] the result is different when rum index(for tsquery) used and not used HOT 2
- Feasibility of adding range operator for tsquery HOT 5
- Extension ready for Postgres 14? HOT 2
- [Bug] addInfo is (Datum) 0, which is expected to be not, coredump in func DatumGetByteaP HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rum.