Code Monkey home page Code Monkey logo

Comments (12)

vade avatar vade commented on June 17, 2024

Reading in #409,

Im also trying to dump the pg_data/pg_vector indexes manually on disk, and REINDEX database;

which seemed to work.

For 15m vectors prior to indexing, Explain analyze on a faceted search took roughly 23 seconds.

Post index, it took 6 seconds.

from pgvecto.rs.

vade avatar vade commented on June 17, 2024

Im not going to close, only because there might be some interesting data in here to debug why the first index pass fails.

from pgvecto.rs.

vade avatar vade commented on June 17, 2024

Interesting. It seems like reindexing actually crashes but it happens in the background.

from pgvecto.rs.

VoVAllen avatar VoVAllen commented on June 17, 2024

What's your hardware? How much memory do you have?

from pgvecto.rs.

VoVAllen avatar VoVAllen commented on June 17, 2024

There might be some part already crashed. Can you try it with a fresh new database? Or manually delete all files under pgdata/pgvecto_rs and run REINDEX?

from pgvecto.rs.

vade avatar vade commented on June 17, 2024

Hi There!

Im running Docker on an M2 Mac Pro with 32 Gb Ram. Docker has 5 CPUs and 20 GB allocated.

I was able to run tensorchord/pgvecto-rs:pg16-v0.3.0-alpha.1 and while it seems to build the index from the get go, Its not obvious to me if the index is being used in our queries using EXPLAIN ANALYZE.

Manually deleting that folder and running REINDEX did work for 0.2.1, but it seems unnecessary for 0.3.0

Question: After running create index, and it returns / completed, should I expect higher than idle CPU usage on the Postgres container? I note that it seems like Indexing is still running.

Thanks for any insight @VoVAllen

from pgvecto.rs.

VoVAllen avatar VoVAllen commented on June 17, 2024

The behavior is changed between 0.2.1 to 0.3.0. In 0.2.1, the hnsw index is constructed asynchronously. Therefore when create index is finished, the query is done by a brute force scan at the beginning. And the real hnsw is constructed asynchronously in the back threads. Until the construction is finished, it will use the hnsw index and you'll see the query is much faster.

Question: After running create index, and it returns / completed, should I expect higher than idle CPU usage on the Postgres container? I note that it seems like Indexing is still running.

Yes, it's still running in the background process.

In 0.3.0, we decide to let create index finish when the real index is constructed. So you'll see it took much longer time for create index, but the query will use index directly after that.

from pgvecto.rs.

VoVAllen avatar VoVAllen commented on June 17, 2024

0.2.1 is a stable version. You can use SELECT * FROM pg_vector_index_stat; to check whether the real index is finished.

from pgvecto.rs.

VoVAllen avatar VoVAllen commented on June 17, 2024

What's the error you met on 0.3?

from pgvecto.rs.

vade avatar vade commented on June 17, 2024

Thank you @VoVAllen for the information about the differences in 0.3.0 and 0.2.1 - im happy testing on the Alpha as for now we are able to be flexible.

Right now, with 0.3.0 im not sure im seeing performance I expect, but moving to 0.3.0 from 0.2.1 has removed any crashing or disconnections from our PSQL client, which is awesome!

from pgvecto.rs.

vade avatar vade commented on June 17, 2024

Also, re performance, im not implying PGVecto.rs is slow, mostly that we are trying to find settings that work for our expected load. I suspect this can be closed as the main crashing issue is resolved, and most of my concerns are more suitable for Discord conversation / educating me on expected performance.

Thank you again!

from pgvecto.rs.

gaocegege avatar gaocegege commented on June 17, 2024

Also, re performance, im not implying PGVecto.rs is slow, mostly that we are trying to find settings that work for our expected load. I suspect this can be closed as the main crashing issue is resolved, and most of my concerns are more suitable for Discord conversation / educating me on expected performance.

Thank you again!

Welcome questions about the performance things in Discord!

from pgvecto.rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.