Code Monkey home page Code Monkey logo

pgvecto.rs's Introduction

pgvecto.rs

discord invitation link trackgit-views all-contributors

pgvecto.rs is a Postgres extension that provides vector similarity search functions. It is written in Rust and based on pgrx. It is currently in the beta status, we invite you to try it out in production and provide us with feedback. Read more at ๐Ÿ“our launch blog.

Why use pgvecto.rs

  • ๐Ÿ’ƒ Easy to use: pgvecto.rs is a Postgres extension, which means that you can use it directly within your existing database. This makes it easy to integrate into your existing workflows and applications.
  • ๐Ÿ”— Async indexing: pgvecto.rs's index is asynchronously constructed by the background threads and does not block insertions and always ready for new queries.
  • ๐Ÿฅ… Filtering: pgvecto.rs supports filtering. You can set conditions when searching or retrieving points. This is the missing feature of other postgres extensions.
  • ๐Ÿงฎ Quantization: pgvecto.rs supports scalar quantization and product qutization up to 64x.
  • ๐Ÿฆ€ Rewrite in Rust: Rust's strict compile-time checks ensure memory safety, reducing the risk of bugs and security issues commonly associated with C extensions.

Comparison with pgvector

pgvecto.rs pgvector
Transaction support โœ… โš ๏ธ
Sufficient Result with Delete/Update/Filter โœ… โš ๏ธ
Vector Dimension Limit 65535 2000
Prefilter on HNSW โœ… โŒ
Parallel HNSW Index build โšก๏ธ Linearly faster with more cores ๐ŸŒ Only single core used
Async Index build Ready for queries anytime and do not block insertions. โŒ
Quantization Scalar/Product Quantization โŒ

More details at pgvecto.rs vs. pgvector

Quick start

For new users, we recommend using the Docker image to get started quickly.

docker run \
  --name pgvecto-rs-demo \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -d tensorchord/pgvecto-rs:pg16-v0.1.13

Then you can connect to the database using the psql command line tool. The default username is postgres, and the default password is mysecretpassword.

psql -h localhost -p 5432 -U postgres

Run the following SQL to ensure the extension is enabled.

DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;

pgvecto.rs introduces a new data type vector(n) denoting an n-dimensional vector. The n within the brackets signifies the dimensions of the vector.

You could create a table with the following SQL.

-- create table with a vector column

CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding vector(3) NOT NULL -- 3 dimensions
);

Tip

vector(n) is a valid data type only if $1 \leq n \leq 65535$. Due to limits of PostgreSQL, it's possible to create a value of type vector(3) of $5$ dimensions and vector is also a valid data type. However, you cannot still put $0$ scalar or more than $65535$ scalars to a vector. If you use vector for a column or there is some values mismatched with dimension denoted by the column, you won't able to create an index on it.

You can then populate the table with vector data as follows.

-- insert values

INSERT INTO items (embedding)
VALUES ('[1,2,3]'), ('[4,5,6]');

-- or insert values using a casting from array to vector

INSERT INTO items (embedding)
VALUES (ARRAY[1, 2, 3]::real[]), (ARRAY[4, 5, 6]::real[]);

We support three operators to calculate the distance between two vectors.

  • <->: squared Euclidean distance, defined as $\Sigma (x_i - y_i) ^ 2$.
  • <#>: negative dot product, defined as $- \Sigma x_iy_i$.
  • <=>: cosine distance, defined as $1 - \frac{\Sigma x_iy_i}{\sqrt{\Sigma x_i^2 \Sigma y_i^2}}$.
-- call the distance function through operators

-- squared Euclidean distance
SELECT '[1, 2, 3]'::vector <-> '[3, 2, 1]'::vector;
-- negative dot product
SELECT '[1, 2, 3]'::vector <#> '[3, 2, 1]'::vector;
-- cosine distance
SELECT '[1, 2, 3]'::vector <=> '[3, 2, 1]'::vector;

You can search for a vector simply like this.

-- query the similar embeddings
SELECT * FROM items ORDER BY embedding <-> '[3,2,1]' LIMIT 5;

Half-precision floating-point

vecf16 type is the same with vector in anything but the scalar type. It stores 16-bit floating point numbers. If you want to reduce the memory usage to get better performance, you can try to replace vector type with vecf16 type.

Roadmap ๐Ÿ—‚๏ธ

Please check out ROADMAP. Want to jump in? Welcome discussions and contributions!

Contribute ๐Ÿ˜Š

We welcome all kinds of contributions from the open-source community, individuals, and partners.

Contributors โœจ

Thanks goes to these wonderful people (emoji key):

Alex Chi
Alex Chi

๐Ÿ’ป
AuruTus
AuruTus

๐Ÿ’ป
Avery
Avery

๐Ÿ’ป ๐Ÿค”
Ben Ye
Ben Ye

๐Ÿ“–
Ce Gao
Ce Gao

๐Ÿ’ผ ๐Ÿ–‹ ๐Ÿ“–
Jinjing Zhou
Jinjing Zhou

๐ŸŽจ ๐Ÿค” ๐Ÿ“†
Joe Passanante
Joe Passanante

๐Ÿ’ป
Keming
Keming

๐Ÿ› ๐Ÿ’ป ๐Ÿ“– ๐Ÿค” ๐Ÿš‡
Mingzhuo Yin
Mingzhuo Yin

๐Ÿ’ป โš ๏ธ ๐Ÿš‡
Usamoi
Usamoi

๐Ÿ’ป ๐Ÿค”
cutecutecat
cutecutecat

๐Ÿ’ป
odysa
odysa

๐Ÿ“– ๐Ÿ’ป
yihong
yihong

๐Ÿ’ป
็›็ฒ’ Yanli
็›็ฒ’ Yanli

๐Ÿ’ป
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

Acknowledgements

Thanks to the following projects:

  • pgrx - Postgres extension framework in Rust
  • pgvector - Postgres extension for vector similarity search written in C

pgvecto.rs's People

Contributors

usamoi avatar vovallen avatar kemingy avatar silver-ymz avatar allcontributors[bot] avatar beautyyuyanli avatar gaocegege avatar cutecutecat avatar averyqi115 avatar odysa avatar skyzh avatar yeya24 avatar joepassanante avatar yihong0618 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.