Comments (4)
I forgot that this was also discussed indirectly upstream.
I guess the main difference is that pgvector types store just the length, oid and array data
Even less ๐ vector
is dimension and the vector itself (with some bits used for handling variable length types and reserved). halfvec
is similar, whereas sparsevec
is set up for handling the absence of data.
The PostgreSQL array types have a bit more overhead, including OID.
So the vector types are a specialization for a single dimension that allow no nulls. Is that a good summary?
Yes -- that's a vector in a nutshell (magnitude and direction) ๐
While I do think it'd be great to have native vector search support in core PostgreSQL in the fullness of time, the current setup with pgvector allows us to go a bit faster. That said, there are likely some changes we need in core PostgreSQL to make things better (which I'm planning to discuss @ PGConf.dev 2024 in a few weeks), such as how we search over data in TOAST tables that happen to be part of the hot path of a query, or dealing with indexing data that goes beyond 8KB.
from pgvector.
@beikov I've spoken at great length numerous times on this topic ๐
The short answer is that pgvector provides indexable mechanisms for searching over vector data. The slightly longer answer is that while we're still sussing out the optimal ways to store vectors (storage format, indexing methods), pgvector can move a bit faster than the PostgreSQL project, which is released once a year. At the PostgreSQL developer conference last year, it this was also a strategic decision to let pgvector move faster than the project and get new vector support and functionality into the hands of developers faster.
The good news is that you can fast PostgreSQL array types to vector
, and vice versa. Additionally, pgvector 0.7.0 added index support for bit
strings, which are a native PostgreSQL type.
from pgvector.
Thanks for sharing that. I was not aware that you spoke about this topic in your talks.
I just found this documentation about how standard arrays are stored. I guess the main difference is that pgvector types store just the length, oid and array data. So the vector types are a specialization for a single dimension that allow no nulls. Is that a good summary?
from pgvector.
Thanks for clarification
from pgvector.
Related Issues (20)
- Installation instructions unclear HOT 1
- Large vector data type will cause performance decline? HOT 1
- A question regard table_open() in background worker when building index HOT 3
- jVector Implementation
- Type Error when working with Langchain (Missing Positional Argument: evalue) HOT 1
- pgvector still use row-based storage instead of columnar storage ? HOT 1
- Can't get the query planner to use HNSW index HOT 3
- ใsearch failedใ 2000wใ768dim๏ผ data search failed HOT 1
- ERROR: index row size 6160 exceeds btree version 4 maximum 2704 for index HOT 3
- Make difficulties HOT 2
- Table Insert Performance with HNSW Index HOT 3
- Comparison with high-precision data HOT 2
- Weight in the filters HOT 5
- can't make pgvector HOT 1
- src\bitvec.c(43): warning C4141: 'dllexport': used more than once HOT 7
- Porting indexes from pinecone to pgvector HOT 1
- Error when creating a halfvec_ip_ops index HOT 3
- Compiling on a mac (Intel)- clang: error: unsupported argument 'native' to option '-march=' HOT 4
- Ability to skip/offset probes (in ivfflat) HOT 1
- Question about generating embeddings HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pgvector.