Code Monkey home page Code Monkey logo

Comments (4)

jkatz avatar jkatz commented on June 19, 2024 1

I forgot that this was also discussed indirectly upstream.

I guess the main difference is that pgvector types store just the length, oid and array data

Even less ๐Ÿ˜‰ vector is dimension and the vector itself (with some bits used for handling variable length types and reserved). halfvec is similar, whereas sparsevec is set up for handling the absence of data.

The PostgreSQL array types have a bit more overhead, including OID.

So the vector types are a specialization for a single dimension that allow no nulls. Is that a good summary?

Yes -- that's a vector in a nutshell (magnitude and direction) ๐Ÿ˜„

While I do think it'd be great to have native vector search support in core PostgreSQL in the fullness of time, the current setup with pgvector allows us to go a bit faster. That said, there are likely some changes we need in core PostgreSQL to make things better (which I'm planning to discuss @ PGConf.dev 2024 in a few weeks), such as how we search over data in TOAST tables that happen to be part of the hot path of a query, or dealing with indexing data that goes beyond 8KB.

from pgvector.

jkatz avatar jkatz commented on June 19, 2024

@beikov I've spoken at great length numerous times on this topic ๐Ÿ˜„

The short answer is that pgvector provides indexable mechanisms for searching over vector data. The slightly longer answer is that while we're still sussing out the optimal ways to store vectors (storage format, indexing methods), pgvector can move a bit faster than the PostgreSQL project, which is released once a year. At the PostgreSQL developer conference last year, it this was also a strategic decision to let pgvector move faster than the project and get new vector support and functionality into the hands of developers faster.

The good news is that you can fast PostgreSQL array types to vector, and vice versa. Additionally, pgvector 0.7.0 added index support for bit strings, which are a native PostgreSQL type.

from pgvector.

beikov avatar beikov commented on June 19, 2024

Thanks for sharing that. I was not aware that you spoke about this topic in your talks.

I just found this documentation about how standard arrays are stored. I guess the main difference is that pgvector types store just the length, oid and array data. So the vector types are a specialization for a single dimension that allow no nulls. Is that a good summary?

from pgvector.

beikov avatar beikov commented on June 19, 2024

Thanks for clarification

from pgvector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.