I'm a user looking to utilize pgVector. While I understand that pgVector can be used a

Is pgVector performance competitive compared to other VectorDBs about pgvector HOT 2 CLOSED

cho2hhun commented on June 4, 2024

Is pgVector performance competitive compared to other VectorDBs

from pgvector.

Comments (2)

jkatz commented on June 4, 2024 2

pgvector performance is pretty good for relatively small datasets (up to 10kk), larger datasets requires PostgreSQL table partitioning, which significantly raises complexity of entire system.

That's not what I've observed - from my testing I've seen pgvector scale pretty well vertically within a table -- I've been a part of multiple 1 billion vector benchmarks with all the vectors stored within a single, unpartitioned table, and pgvector (let alone PostgreSQL) performs pretty well. However, at the size, you would typically partition a PostgreSQL table anyway, and I have seen pgvector users handle that. I recently wrote a blog post on distributed pgvector queries that explores this. I'm hoping to write another one soon on a 8.3B vector dataset I've been working with, where I stored it all in a single database (though in a partitioned table).

However, stepping back a second, "scale" is an interesting term here because there are a few items you need to look at with vector database workloads, including:

Total number of vectors and their dimensionality
Size of vectors being stored on disk
Index build time
Queries / second and query latency, under different levels of concurrency, co-plotted with recall

All of these items are important, but I do want to highlight testing concurrency (one blog post I discussed this in) as this is particularly important for databases as it's a key part of scaling vertically. The good news is that PostgreSQL itself tends to scale pretty well vertically, and a lot of the work on pgvector over the past years has also focused on this, and it's one area where I've seen it really shine as compared to other vector databases.

The next pgvector release is going to include the ability to perform certain types of (scalar quantization and binary quantization. The link shows some of the results with ANN Benchmarks for scalar quantization; I had finished a run with binary quantization that I will share, but both will provide a way to scale pgvector further as they allow to shrink storage and index build time while boosting QPS with little impact to recall.

The last bit around scale is scale of development: pgvector works with lots of existing PostgreSQL tooling, so you can continue to build your vector-driven workload in the same database (or application) in what you're currently building (at least if you're using PostgreSQL).

from pgvector.

sgjurano commented on June 4, 2024

pgvector performance is pretty good for relatively small datasets (up to 10kk), larger datasets requires PostgreSQL table partitioning, which significantly raises complexity of entire system.
https://supabase.com/blog/pgvector-vs-pinecone

from pgvector.

Recommend Projects

Is pgVector performance competitive compared to other VectorDBs about pgvector HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent