Comments (2)
pgvector performance is pretty good for relatively small datasets (up to 10kk), larger datasets requires PostgreSQL table partitioning, which significantly raises complexity of entire system.
That's not what I've observed - from my testing I've seen pgvector scale pretty well vertically within a table -- I've been a part of multiple 1 billion vector benchmarks with all the vectors stored within a single, unpartitioned table, and pgvector (let alone PostgreSQL) performs pretty well. However, at the size, you would typically partition a PostgreSQL table anyway, and I have seen pgvector users handle that. I recently wrote a blog post on distributed pgvector queries that explores this. I'm hoping to write another one soon on a 8.3B vector dataset I've been working with, where I stored it all in a single database (though in a partitioned table).
However, stepping back a second, "scale" is an interesting term here because there are a few items you need to look at with vector database workloads, including:
- Total number of vectors and their dimensionality
- Size of vectors being stored on disk
- Index build time
- Queries / second and query latency, under different levels of concurrency, co-plotted with recall
All of these items are important, but I do want to highlight testing concurrency (one blog post I discussed this in) as this is particularly important for databases as it's a key part of scaling vertically. The good news is that PostgreSQL itself tends to scale pretty well vertically, and a lot of the work on pgvector over the past years has also focused on this, and it's one area where I've seen it really shine as compared to other vector databases.
The next pgvector release is going to include the ability to perform certain types of (scalar quantization and binary quantization. The link shows some of the results with ANN Benchmarks for scalar quantization; I had finished a run with binary quantization that I will share, but both will provide a way to scale pgvector further as they allow to shrink storage and index build time while boosting QPS with little impact to recall.
The last bit around scale is scale of development: pgvector works with lots of existing PostgreSQL tooling, so you can continue to build your vector-driven workload in the same database (or application) in what you're currently building (at least if you're using PostgreSQL).
from pgvector.
pgvector performance is pretty good for relatively small datasets (up to 10kk), larger datasets requires PostgreSQL table partitioning, which significantly raises complexity of entire system.
https://supabase.com/blog/pgvector-vs-pinecone
from pgvector.
Related Issues (20)
- ivfflat indexing takes 6-8x longer for halfvec than for vector HOT 6
- configuring the pgVector like an nightmare HOT 1
- Intermittent timeout on getting nearest neighbors by L2 distance HOT 1
- make 0.7.0 failed HOT 1
- Does pgvector need to be installed in the slave of the pg database in cluster mode? HOT 2
- Is Increase the concurrency of clients can accelerate the construction speed of the HNSW index on the same table. HOT 1
- What are the impacts of dimension for sparsevec? HOT 1
- Duplicate error when creating a vector index using HNSW HOT 6
- tuning the tmpCtx to improve HNSW build performance HOT 5
- Index vector_ip_ops does not work for halfvec HOT 1
- Can the "LIMIT" statement be included as one of arguments when doing scan operation? HOT 1
- SQL Error [XX000]: FATAL: failed to open bitcode file "/usr/local/lib/postgresql/bitcode/vector/src/vector.bc": No such file or directory HOT 4
- how to list existing databases in postgresql using python HOT 2
- HNSW Indexing and Filtering HOT 2
- A question about building index in background. HOT 1
- Installation instructions unclear HOT 1
- Large vector data type will cause performance decline? HOT 1
- A question regard table_open() in background worker when building index HOT 3
- jVector Implementation
- Type Error when working with Langchain (Missing Positional Argument: evalue) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgvector.