Code Monkey home page Code Monkey logo

pgvecto.rs's Introduction

pgvecto.rs

discord invitation link trackgit-views all-contributors

pgvecto.rs is a (๐Ÿšง working in progress) Postgres extension that provides vector similarity search functions. It is written in Rust and based on pgrx.

Why use pgvecto.rs

  • ๐Ÿ’ƒ Easy to use: pgvecto.rs is a Postgres extension, which means that you can use it directly within your existing database. This makes it easy to integrate into your existing workflows and applications.
  • ๐Ÿฆ€ Rewrite in Rust: Rewriting in Rust offers benefits such as improved memory safety, better performance, and reduced maintenance costs over time.
  • ๐Ÿ™‹ Community: People loves Rust We are happy to help you with any questions you may have. You could join our Discord to get in touch with us.

Why not a specialty vector database?

Imagine this, your existing data is stored in a Postgres database, and you want to use a vector database to do some vector similarity search. You have to move your data from Postgres to the vector database, and you have to maintain two databases at the same time. This is not a good idea.

Why not just use Postgres to do the vector similarity search? This is the reason why we build pgvecto.rs. The user journey is like this:

-- Update the embedding column for the documents table
UPDATE documents SET embedding = ai_embedding_vector(content) WHERE length(embedding) = 0;

-- Create an index on the embedding column
CREATE INDEX ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

-- Query the similar embeddings
SELECT * FROM documents ORDER BY embedding <-> ai_embedding_vector('hello world') LIMIT 5;

From SingleStore DB Blog:

Vectors and vector search are a data type and query processing approach, not a foundation for a new way of processing data. Using a specialty vector database (SVDB) will lead to the usual problems we see (and solve) again and again with our customers who use multiple specialty systems: redundant data, excessive data movement, lack of agreement on data values among distributed components, extra labor expense for specialized skills, extra licensing costs, limited query language power, programmability and extensibility, limited tool integration, and poor data integrity and availability compared with a true DBMS.

Setting up the development environment

You could use envd to set up the development environment with one command. It will create a docker container and install all the dependencies for you.

pip install envd
envd up

Build from source

cargo install cargo-pgrx
cargo pgrx init
cargo pgrx run

Getting Started

Installation

-- install the extension
DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;
-- check the extension related functions
\df+

Calculate the distance

We support three operators to calculate the distance between two vectors:

  • <->: square Euclidean distance
  • <#>: dot product distance
  • <=>: cosine distance
-- call the distance function through operators

-- square Euclidean distance
SELECT array[1, 2, 3] <-> array[3, 2, 1];
-- dot product distance
SELECT array[1, 2, 3] <#> array[3, 2, 1];
-- cosine distance
SELECT array[1, 2, 3] <=> array[3, 2, 1];

Create a table

You could use the CREATE TABLE statement to create a table with a vector column.

-- create table
CREATE TABLE items (id bigserial PRIMARY KEY, emb numeric[]);
-- insert values
INSERT INTO items (emb) VALUES (ARRAY[1,2,3]), (ARRAY[4,5,6]);
-- query the similar embeddings
SELECT * FROM items ORDER BY emb <-> ARRAY[3,2,1]::real[] LIMIT 5;
-- query the neighbors within a certain distance
SELECT * FROM items WHERE emb <-> ARRAY[3,2,1]::real[] < 5;

Create an index

We planning to support the following index types (issue here):

  • IVF
  • HNSW
  • ScaNN

Welcome to contribute if you are also interested!

Contributing

We need your help! Please check out the issues.

Contributors โœจ

Thanks goes to these wonderful people (emoji key):

Alex Chi
Alex Chi

๐Ÿ’ป
Ce Gao
Ce Gao

๐Ÿ’ผ ๐Ÿ–‹ ๐Ÿ“–
Jinjing Zhou
Jinjing Zhou

๐ŸŽจ ๐Ÿค” ๐Ÿ“†
Keming
Keming

๐Ÿ› ๐Ÿ’ป ๐Ÿ“– ๐Ÿค” ๐Ÿš‡
odysa
odysa

๐Ÿ“– ๐Ÿ’ป
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

Acknowledgements

Thanks to the following projects:

  • pgrx - Postgres extension framework in Rust
  • pgvector - Postgres extension for vector similarity search written in C

pgvecto.rs's People

Contributors

kemingy avatar allcontributors[bot] avatar gaocegege avatar odysa avatar skyzh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.