Hello all, We are on PG 15.5 using pgvector 0,6. We have a table def

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hello Arthur, I ran <div class="highlight highlight-source-sql n

Duplicate error when creating a vector index using HNSW about pgvector HOT 6 CLOSED

ldhasson commented on July 18, 2024

Duplicate error when creating a vector index using HNSW

from pgvector.

Comments (6)

ldhasson commented on July 18, 2024

Hello,

After further investigation, we found out that there may have been a previous index creation that failed but still left index artifacts around. There were two indices: "_idx" and "_idx1". I dropped them and rerun the index command and it worked that time.

I am not sure what could cause that. We initially had a column without a dimension, and creating the index failed. Then we changed the column and retried creating the index. I'll see if i can re-create that issue reliably, but it doesn't appear to be related to any duplicates in the data, but a duplicate index by name.

Thank you.

from pgvector.

ankane commented on July 18, 2024

Hi @ldhasson, I don't think there's anything pgvector / any extension can do about duplicate key errors on catalog tables.

Also, that seems like a long time for the # of rows and dimensions. See the docs on index build time for how to speed it up.

from pgvector.

ldhasson commented on July 18, 2024

I am investigating some more, but it looks like the building of the index failed initially and left things around, i.e., not atomic. I may be completely wrong, but something did glitch.

from pgvector.

jkatz commented on July 18, 2024

The index building process is atomic - did your database crash?

from pgvector.

tureba commented on July 18, 2024

I am investigating some more, but it looks like the building of the index failed initially and left things around, i.e., not atomic. I may be completely wrong, but something did glitch.

Was the index created CONCURRENTLY? If so, any hiccup may leave it as invalid, including a connection loss on the client that issued the command. Note that it's concurrent to other sessions, not asynchronous or in the background in any way. So the session where you run the concurrent command has to remain up and healthy for the entire duration of the statement.

from pgvector.

ldhasson commented on July 18, 2024

Hello Arthur,

I ran

CREATE INDEX ON embeddings.mydoccuments USING hnsw (chunk_embedding vector_cosine_ops);

Is the concurrent build by default? I didn't think it was. I have to do more testing on my end to see if I can reproduce. The connection did not drop at the time even while the build took some the and eventually failed with the error in the original message above.

from pgvector.

Recommend Projects

Duplicate error when creating a vector index using HNSW about pgvector HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent