Comments (6)
Hello,
After further investigation, we found out that there may have been a previous index creation that failed but still left index artifacts around. There were two indices: "_idx" and "_idx1". I dropped them and rerun the index command and it worked that time.
I am not sure what could cause that. We initially had a column without a dimension, and creating the index failed. Then we changed the column and retried creating the index. I'll see if i can re-create that issue reliably, but it doesn't appear to be related to any duplicates in the data, but a duplicate index by name.
Thank you.
from pgvector.
Hi @ldhasson, I don't think there's anything pgvector / any extension can do about duplicate key errors on catalog tables.
Also, that seems like a long time for the # of rows and dimensions. See the docs on index build time for how to speed it up.
from pgvector.
I am investigating some more, but it looks like the building of the index failed initially and left things around, i.e., not atomic. I may be completely wrong, but something did glitch.
from pgvector.
The index building process is atomic - did your database crash?
from pgvector.
I am investigating some more, but it looks like the building of the index failed initially and left things around, i.e., not atomic. I may be completely wrong, but something did glitch.
Was the index created CONCURRENTLY? If so, any hiccup may leave it as invalid, including a connection loss on the client that issued the command. Note that it's concurrent to other sessions, not asynchronous or in the background in any way. So the session where you run the concurrent command has to remain up and healthy for the entire duration of the statement.
from pgvector.
Hello Arthur,
I ran
CREATE INDEX ON embeddings.mydoccuments USING hnsw (chunk_embedding vector_cosine_ops);
Is the concurrent build by default? I didn't think it was. I have to do more testing on my end to see if I can reproduce. The connection did not drop at the time even while the build took some the and eventually failed with the error in the original message above.
from pgvector.
Related Issues (20)
- Installation instructions unclear HOT 1
- Large vector data type will cause performance decline? HOT 1
- A question regard table_open() in background worker when building index HOT 3
- jVector Implementation
- Type Error when working with Langchain (Missing Positional Argument: evalue) HOT 1
- pgvector still use row-based storage instead of columnar storage ? HOT 1
- Can't get the query planner to use HNSW index HOT 3
- 【search failed】 2000w、768dim, data search failed HOT 1
- ERROR: index row size 6160 exceeds btree version 4 maximum 2704 for index HOT 3
- Make difficulties HOT 2
- Table Insert Performance with HNSW Index HOT 3
- Comparison with high-precision data HOT 2
- Weight in the filters HOT 5
- can't make pgvector HOT 1
- src\bitvec.c(43): warning C4141: 'dllexport': used more than once HOT 7
- Porting indexes from pinecone to pgvector HOT 1
- Error when creating a halfvec_ip_ops index HOT 3
- Compiling on a mac (Intel)- clang: error: unsupported argument 'native' to option '-march=' HOT 4
- Ability to skip/offset probes (in ivfflat) HOT 1
- Question about generating embeddings HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgvector.