Code Monkey home page Code Monkey logo

tensorchord / pgvecto.rs Goto Github PK

View Code? Open in Web Editor NEW
1.4K 13.0 52.0 1.53 MB

Scalable, Low-latency and Hybrid-enabled Vector Search in Postgres. Revolutionize Vector Search, not Database.

Home Page: https://docs.pgvecto.rs/getting-started/overview.html

License: Apache License 2.0

Rust 79.21% Shell 1.57% Dockerfile 0.13% Python 5.70% C 0.75% PLpgSQL 12.65%
llm vector vector-database faiss nearest-neighbor-search gpt chatgpt hacktoberfest rust postgres

pgvecto.rs's Introduction

pgvecto.rs

discord invitation link trackgit-views all-contributors

pgvecto.rs is a Postgres extension that provides vector similarity search functions. It is written in Rust and based on pgrx. Read more at πŸ“our blog.

Why use pgvecto.rs

Feature Category Feature
Search Capabilities πŸ” Vector Search Ultra-low-latency, high-precision vector search.
🧩 Sparse Vector Search Keyword-based vector search using SPLADE or BM25 algorithms.
πŸ“„ Full-Text Search Comprehensive text search across any language, powered by tsvector.
Data Handling βœ” Complete SQL Support Full SQL support, enabling joins and filters without limitations or extra configuration.
πŸ”— Async indexing Non-blocking inserts with up-to-date query readiness.
πŸ”„ Easy Data Management No need for syncing vectors and metadata with external vector DB, simplifying development.
Data Types πŸ”’ FP16/INT8 Data type Supports FP16 and INT8 data types for improved storage and computational efficiency.
πŸŒ“ Binary vector support Vector indexing with binary vectors, and Jaccard distance support.
πŸ”ͺ Matryoshka embeddings Subvector indexing, like vector[0:256], for enhanced Matryoshka embeddings.
⬆️ Extended Vector Length Vector lengths up to 65535 supported, ideal for the latest cutting-edge models.
System Performance πŸš€ Production Ready Battle-tested database ecosystem integrated with PostgreSQL.
βš™οΈ High Availability Logical replication support to ensure high availability.
πŸ’‘ Resource Efficient Efficient attribute storage leveraging PostgreSQL.
Security & Permissions πŸ”’ Permission Control Easy access control like read-only roles, powered by PostgreSQL.

Quick start

For new users, we recommend using the Docker image to get started quickly.

docker run \
  --name pgvecto-rs-demo \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -d tensorchord/pgvecto-rs:pg16-v0.2.1

Then you can connect to the database using the psql command line tool. The default username is postgres, and the default password is mysecretpassword.

psql -h localhost -p 5432 -U postgres

Run the following SQL to ensure the extension is enabled.

DROP EXTENSION IF EXISTS vectors;
CREATE EXTENSION vectors;

pgvecto.rs introduces a new data type vector(n) denoting an n-dimensional vector. The n within the brackets signifies the dimensions of the vector.

You could create a table with the following SQL.

-- create table with a vector column

CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding vector(3) NOT NULL -- 3 dimensions
);

Tip

vector(n) is a valid data type only if $1 \leq n \leq 65535$. Due to limits of PostgreSQL, it's possible to create a value of type vector(3) of $5$ dimensions and vector is also a valid data type. However, you cannot still put $0$ scalar or more than $65535$ scalars to a vector. If you use vector for a column or there is some values mismatched with dimension denoted by the column, you won't able to create an index on it.

You can then populate the table with vector data as follows.

-- insert values

INSERT INTO items (embedding)
VALUES ('[1,2,3]'), ('[4,5,6]');

-- or insert values using a casting from array to vector

INSERT INTO items (embedding)
VALUES (ARRAY[1, 2, 3]::real[]), (ARRAY[4, 5, 6]::real[]);

We support three operators to calculate the distance between two vectors.

  • <->: squared Euclidean distance, defined as $\Sigma (x_i - y_i) ^ 2$.
  • <#>: negative dot product, defined as $- \Sigma x_iy_i$.
  • <=>: cosine distance, defined as $1 - \frac{\Sigma x_iy_i}{\sqrt{\Sigma x_i^2 \Sigma y_i^2}}$.
-- call the distance function through operators

-- squared Euclidean distance
SELECT '[1, 2, 3]'::vector <-> '[3, 2, 1]'::vector;
-- negative dot product
SELECT '[1, 2, 3]'::vector <#> '[3, 2, 1]'::vector;
-- cosine distance
SELECT '[1, 2, 3]'::vector <=> '[3, 2, 1]'::vector;

You can search for a vector simply like this.

-- query the similar embeddings
SELECT * FROM items ORDER BY embedding <-> '[3,2,1]' LIMIT 5;

Half-precision floating-point

vecf16 type is the same with vector in anything but the scalar type. It stores 16-bit floating point numbers. If you want to reduce the memory usage to get better performance, you can try to replace vector type with vecf16 type.

Roadmap πŸ—‚οΈ

Please check out ROADMAP. Want to jump in? Welcome discussions and contributions!

Contribute 😊

We welcome all kinds of contributions from the open-source community, individuals, and partners.

Contributors ✨

Thanks goes to these wonderful people (emoji key):

Alex Chi
Alex Chi

πŸ’»
AuruTus
AuruTus

πŸ’»
Avery
Avery

πŸ’» πŸ€”
Ben Ye
Ben Ye

πŸ“–
Ce Gao
Ce Gao

πŸ’Ό πŸ–‹ πŸ“–
Jinjing Zhou
Jinjing Zhou

🎨 πŸ€” πŸ“†
Joe Passanante
Joe Passanante

πŸ’»
Keming
Keming

πŸ› πŸ’» πŸ“– πŸ€” πŸš‡
Mingzhuo Yin
Mingzhuo Yin

πŸ’» ⚠️ πŸš‡
Usamoi
Usamoi

πŸ’» πŸ€”
cutecutecat
cutecutecat

πŸ’»
odysa
odysa

πŸ“– πŸ’»
yi wang
yi wang

πŸ’»
yihong
yihong

πŸ’»
盐粒 Yanli
盐粒 Yanli

πŸ’»
Add your contributions

This project follows the all-contributors specification. Contributions of any kind welcome!

Acknowledgements

Thanks to the following projects:

  • pgrx - Postgres extension framework in Rust
  • pgvector - Postgres extension for vector similarity search written in C

pgvecto.rs's People

Contributors

allcontributors[bot] avatar averyqi115 avatar beautyyuyanli avatar cutecutecat avatar gaocegege avatar joepassanante avatar kemingy avatar my-vegetable-has-exploded avatar odysa avatar silver-ymz avatar skyzh avatar usamoi avatar vovallen avatar whateveraname avatar xieydd avatar yeya24 avatar yihong0618 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pgvecto.rs's Issues

Diesel Support

Will there be support added for Diesel ? It is easier to use Diesel to generate SQL queries.

Clarification about pre-filtering performance

Hi! I'm looking at both pgvecto.rs and pgvector, and I think pre-filtering support is a killer feature that makes this project appealing.

But the README seems to recommend against using it in the filtering section. Instead, it suggests applying post-filtering for better optimization.

Could you clarify what optimization issues there are with pre-filtering? Relatedly, I think it would be helpful to add a performance comparison between pre-filtering and post-filtering.

bug: creating index with ivf will incorrectly close the socket.

Bug Report

1. Minimal reproduce step

NOTE: These sql below can all be gotten from the Indexing section in pgvector.rs/README.md

  • Create table with sql:
CREATE TABLE items (
  id bigserial PRIMARY KEY,
  embedding vector(3) NOT NULL
);
  • Then create ivf index with sql:
CREATE INDEX ivf_index ON items USING vectors (embedding l2_ops)
WITH (options = $$
capacity = 2097152
[vectors]
memmap = "ram"
[algorithm.ivf]
memmap = "ram"
nlist = 1000
nprobe = 10
$$);

2. What did you expect to see? (Required)

Create ivf index successfully and can see it when showing table info with \dt;

3. What did you see instead (Required)

The extension panicked on

ERROR:  called `Result::unwrap()` on an `Err` value: Closed
The stack info from debug build
DETAIL:  
   0: pgrx_pg_sys::submodules::panic::register_pg_guard_panic_hook::{{closure}}::{{closure}}
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/panic.rs:312:39
   1: std::thread::local::LocalKey<T>::try_with
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/thread/local.rs:270:16
   2: std::thread::local::LocalKey<T>::with
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/thread/local.rs:246:9
   3: pgrx_pg_sys::submodules::panic::register_pg_guard_panic_hook::{{closure}}
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/panic.rs:309:9
   4: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/alloc/src/boxed.rs:2021:9
   5: std::panicking::rust_panic_with_hook
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:733:13
   6: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:621:13
   7: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/sys_common/backtrace.rs:170:18
   8: rust_begin_unwind
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:617:5
   9: core::panicking::panic_fmt
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/core/src/panicking.rs:67:14
  10: core::result::unwrap_failed
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/core/src/result.rs:1652:5
  11: core::result::Result<T,E>::unwrap
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/core/src/result.rs:1077:23
  12: vectors::postgres::index_build::build::{{closure}}
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/src/postgres/index_build.rs:33:40
  13: vectors::postgres::hook_transaction::client
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/src/postgres/hook_transaction.rs:51:18
  14: vectors::postgres::index_build::build
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/src/postgres/index_build.rs:19:5
  15: vectors::postgres::index::ambuild::ambuild_inner
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/src/postgres/index.rs:177:5
  16: vectors::postgres::index::ambuild::{{closure}}
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/src/postgres/index.rs:171:1
  17: std::panicking::try::do_call
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:524:40
  18: __rust_try
  19: std::panicking::try
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:488:19
  20: std::panic::catch_unwind
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panic.rs:142:14
  21: pgrx_pg_sys::submodules::panic::run_guarded
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/panic.rs:408:11
  22: pgrx_pg_sys::submodules::panic::pgrx_extern_c_guard
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/panic.rs:385:11
  23: vectors::postgres::index::ambuild
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/src/postgres/index.rs:172:5
  24: index_build
  25: index_create
  26: DefineIndex
  27: <unknown>
  28: standard_ProcessUtility
  29: pgrx_pg_sys::pg15::standard_ProcessUtility::{{closure}}
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/target/debug/build/pgrx-pg-sys-da51dafa206a5da2/out/pg15.rs:50457:1
  30: pgrx_pg_sys::submodules::ffi::pg_guard_ffi_boundary_impl
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/ffi.rs:120:26
  31: pgrx_pg_sys::submodules::ffi::pg_guard_ffi_boundary
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/ffi.rs:94:14
  32: pgrx_pg_sys::pg15::standard_ProcessUtility
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/target/debug/build/pgrx-pg-sys-da51dafa206a5da2/out/pg15.rs:50457:1
  33: pgrx::hooks::pgrx_standard_process_utility_wrapper::pgrx_standard_process_utility_wrapper_inner
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx/src/hooks.rs:710:5
  34: pgrx::hooks::pgrx_standard_process_utility_wrapper::{{closure}}
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx/src/hooks.rs:699:1
  35: std::panicking::try::do_call
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:524:40
  36: __rust_try
  37: std::panicking::try
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:488:19
  38: std::panic::catch_unwind
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panic.rs:142:14
  39: pgrx_pg_sys::submodules::panic::run_guarded
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/panic.rs:408:11
  40: pgrx_pg_sys::submodules::panic::pgrx_extern_c_guard
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/panic.rs:385:11
  41: pgrx::hooks::pgrx_standard_process_utility_wrapper
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx/src/hooks.rs:700:1
  42: pgrx::hooks::pgrx_process_utility::pgrx_process_utility_inner::prev
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx/src/hooks.rs:459:13
  43: <vectors::postgres::hooks::Hooks as pgrx::hooks::PgHooks>::process_utility_hook
             at /home/aurutus/workspace/RUST_DOJO/pgvecto.rs/src/postgres/hooks.rs:48:13
  44: pgrx::hooks::pgrx_process_utility::pgrx_process_utility_inner
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx/src/hooks.rs:473:5
  45: pgrx::hooks::pgrx_process_utility::{{closure}}
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx/src/hooks.rs:437:1
  46: std::panicking::try::do_call
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:524:40
  47: __rust_try
  48: std::panicking::try
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panicking.rs:488:19
  49: std::panic::catch_unwind
             at /rustc/8131b9774ebcb6c162fcac71545a13543ec369e7/library/std/src/panic.rs:142:14
  50: pgrx_pg_sys::submodules::panic::run_guarded
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/panic.rs:408:11
  51: pgrx_pg_sys::submodules::panic::pgrx_extern_c_guard
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx-pg-sys/src/submodules/panic.rs:385:11
  52: pgrx::hooks::pgrx_process_utility
             at /home/aurutus/.cargo/git/checkouts/pgrx-82d5ad52313f9598/c0d11a8/pgrx/src/hooks.rs:438:1
  53: <unknown>
  54: <unknown>
  55: PortalRun
  56: <unknown>
  57: PostgresMain
  58: <unknown>
  59: PostmasterMain
  60: main
  61: __libc_start_call_main
             at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
  62: __libc_start_main_impl
             at ./csu/../csu/libc-start.c:392:3
  63: _start

This panic is caused by the code at pgvecto.rs/src/postgres/index_build.rs:33:40 in function build

...
    let BuildHandle::Leave { x } = build_handler.handle().unwrap() else {
                panic!("Invaild state.")
            };
...

and the incorrect Err comes from src/ipc/transport.rs:94 in method client_send

...
        let stream = self.stream.as_mut().ok_or(ClientIpcError::Closed)?;
...

Don't know why the connection will close incorrectly in advance. πŸ€”

4. What is your pgvector.rs version? (Required)

The version built from main branch.

And the pgrx version:

cargo-pgrx 0.10.0-beta.1

The postgresql version:

psql (PostgreSQL) 15.4 (Ubuntu 15.4-1.pgdg22.04+1)

Evaluate Profile-Guided Optimization (PGO) and LLVM BOLT

Hi!

There are a lot of examples of different software, where Profile-Guided Optimization (PGO) helps with performance - you can check it here. E.g. in this list there are a lot of databases like PostgreSQL and ClickHouse.

We need to test PGO on pgvector.rs, and if it improves performance - at least write a note in the README file about it.

There are several additional options. I'd appreciate it if you could provide an easy way to build pgvector.rs with PGO. And experienced users will be able to do it on their own for their own usage scenarios. Another option is to optimize the pgvector.rs build with a generic-enough profile. Providing PGO-optimized binaries could be a trickier task (since it requires preparing a good-enough profile) but as an option would be great to see too. Probably completing the #21 task can help with it.

As an additional optimization way, I suggest taking a look at LLVM BOLT. But from my experience, it would be better to start with PGO and then try to use BOLT.

Treat this issue just like an idea for future improvements.

For the Rust projects, I recommend starting with cargo-pgo.

Postgres vector extension comparison table

Y'all have done some awesome workβ€”thank you!

I would like to ask for a table or section comparing pgvector, pg_embedded, and pgvecto.rs. It would be really helpful, especially for those new to the subject.

docs: Capacity not documented in README

capacity is a required argument for index creation, but the examples don't include it (similarly for vectors_load). I'm also not sure what the expected behavior is w.r.t. RAM usage. Setting a capacity seems to use that many vectors' worth of RAM, and RAM usage doesn't seem to be affected by quantization. Documenting how it works would be helpful.

Edit: Upon further testing, quantization does affect RAM usage. The quantization just doesn't happen immediately; it's done after index creation.

bug: ERROR: operator is not unique: unknown <-> unknown

❯ psql -h localhost -U postgres
Password for user postgres: 
psql (15.4)
Type "help" for help.

postgres=# SELECT '[1, 2, 3]' <-> '[3, 2, 1]';
ERROR:  operator is not unique: unknown <-> unknown
LINE 1: SELECT '[1, 2, 3]' <-> '[3, 2, 1]';
                           ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

Binary vectors and Index

Hello,

I want to first thank you for the excellent job you're doing in this extension ! pgvector is a perfect project but lacks in performance and extensibility areas and this project is really solving all these issues.

As an ML engineer, I usually need to store binary vectors (generated from perceptual hash for example). the current answer I found is to store the binary vector as an f32 vector of {0,1} representing the binary data. L2 distance is used as it mirrors the hamming distance. Unfortunately, this solution consumes unnecessary space and L2 distance computation on 256-dim floats is far more expensive than a xor+popcount of 256bit vec.

Looking at the current code I see that the core type is a pub a Scalar which is a wrapper type around a f32. All computation assumes a &[Scalar]. How difficult would it be to support a &[u8] which maps to postgres bytea datatype. I would love to contribute to this feature if this is something worth working on πŸ˜ƒ ! Thanks

bug: index is disabled when order operator destination is an array

Reproduce:

CREATE TABLE t (val vector(3));
INSERT INTO t (val) SELECT ARRAY[random(), random(), random()]::real[] FROM generate_series(1, 1000);
CREATE INDEX ON t USING vectors (val l2_ops)
WITH (options = $$
capacity = 2000
[algorithm.hnsw]
$$);

EXPLAIN SELECT 1 FROM t ORDER BY val <-> '[0.5,0.5,0.5]' limit 10;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Limit  (cost=0.00..0.22 rows=10 width=8)
   ->  Index Scan using t_val_idx on t  (cost=0.00..22.51 rows=1000 width=8)
         Order By: (val <-> '[0.5, 0.5, 0.5]'::vector)

EXPLAIN SELECT 1 FROM t ORDER BY val <-> '{0.5,0.5,0.5}'::real[] limit 10;
                           QUERY PLAN                            
-----------------------------------------------------------------
 Limit  (cost=43.65..43.67 rows=10 width=8)
   ->  Sort  (cost=43.65..46.15 rows=1001 width=8)
         Sort Key: ((val <-> ('{0.5,0.5,0.5}'::real[])::vector))
         ->  Seq Scan on t  (cost=0.00..22.01 rows=1000 width=8)

UPDATE SQL question

when run:
update items set "embedding"='[0,0,0]' where id = 1;

result:
ERROR: called Result::unwrap() on an Err value: Closed

Is my SQL statement incorrect?

bug: Can not create `ivf` and `vamana` index

postgres=# create table tb_test_item (id bigserial, embedding vector(3));
CREATE TABLE
postgres=# select * from tb_test_item;
 id | embedding 
----+-----------
(0 rows)

postgres=# CREATE INDEX ivf ON tb_test_item USING vectors (embedding l2_ops) WITH (options = $$capacity = 2097152

[algorithm.ivf]
$$);
ERROR:  called `Result::unwrap()` on an `Err` value: Closed

postgres=# create table tb_test_item (id bigserial, embedding vector(3));
CREATE TABLE
postgres=# select * from tb_test_item;
 id | embedding 
----+-----------
(0 rows)

postgres=# CREATE INDEX vamana ON tb_test_item USING vectors (embedding l2_ops) WITH (options = $$capacity = 2097152

[algorithm.vamana]
$$);
ERROR:  called `Result::unwrap()` on an `Err` value: Closed

consider buying pgvecto.rs domain name

it appears to be available as of 2023 sept 29

➜ ~ whois pgvecto.rs
% IANA WHOIS server
% for more information on IANA, visit http://www.iana.org
% This query returned 1 object

refer: whois.rnids.rs

domain: RS

organisation: Serbian National Internet Domain Registry (RNIDS)
address: Zorza Klemansoa 18a
address: Belgrade 11008
address: Serbia

contact: administrative
name: RNIDS Admin Contact
organisation: Serbian National Internet Domain Registry (RNIDS)
address: Zorza Klemansoa 18a
address: Belgrade 11008
address: Serbia
phone: +381 11 7281281
fax-no: +381 11 7281282
e-mail: [email protected]

contact: technical
name: RNIDS Tech Contact
organisation: Serbian National Internet Domain Registry (RNIDS)
address: Zorza Klemansoa 18a
address: Belgrade 11008
address: Serbia
phone: +381 11 7281281
fax-no: +381 11 7281282
e-mail: [email protected]

nserver: A.NIC.RS 2001:67c:69c:0:0:0:0:59 91.199.17.59
nserver: B.NIC.RS 195.178.32.2 2a00:e90:0:3:0:0:0:3
nserver: F.NIC.RS 2001:500:14:6032:ad:0:0:1 204.61.216.32
nserver: G.NIC.RS 147.91.8.6
nserver: H.NIC.RS 2001:67c:69c:0:0:0:0:60 91.199.17.60
nserver: L.NIC.RS 194.146.106.114 2001:67c:1010:29:0:0:0:53
ds-rdata: 57382 13 2 9aca8316c4ce272097297cf5700e8a66e00ad0c83c4165bfcc90659438dc1794

whois: whois.rnids.rs

status: ACTIVE
remarks: Registration information: http://www.rnids.rs

created: 2007-09-24
changed: 2021-07-20
source: IANA

whois.rnids.rs

%ERROR:103: Domain is not registered%

no, I don't work for the domain company, but wouldn't it be a shame if some random blue jeans company buys it?

feat: Release Docker images for other major versions of Postgres

It would make adoption easier if images were available for more environments in the wild. Our use-case is a self-hosted photo library where we're considering switching to pgvecto.rs. Since we currently use Postgres 14, it would be convenient if we could switch without maintaining our own Postgres images for amd64 and arm64.

bug: Silent failed when capacity in hnsw less than actual rows

If capacity is smaller than the number of rows, The background process appears to have failed silently.

error logs:

2023-09-12 18:31:17.950 UTC [284] ERROR:  Connection reset by peer (os error 104)
2023-09-12 18:31:17.950 UTC [284] STATEMENT:  
            CREATE INDEX ON train USING vectors (embedding cosine_ops)
            WITH (options = $$
            capacity = 2097152
            [vectors]
            memmap = "disk"
            [algorithm.hnsw]
            memmap = "disk"
            m = 16
            ef = 40
            $$);

docs: fix README create table

https://github.com/tensorchord/pgvecto.rs#create-a-table

I followed the README, and an error was thrown when I try to query the table

vectors=# SELECT * FROM items WHERE emb <-> ARRAY[3,2,1] < 5;
ERROR:  operator is not unique: numeric[] <-> integer[]
LINE 1: SELECT * FROM items WHERE emb <-> ARRAY[3,2,1] < 5;
                                      ^
HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.

It is because there are 3 available <-> operators

 public     | <->  | double precision[] | double precision[] | double precision | 
 public     | <->  | integer[]          | integer[]          | double precision | 
 public     | <->  | real[]             | real[]             | real             | 

To fix this, we must explicitly add a type cast, like

 SELECT * FROM items ORDER BY emb <-> ARRAY[3,2,1]::real[] LIMIT 5;

bug: stuck in a certain case

postgres=# drop table test;
DROP TABLE
postgres=# create table test (embedding vector(3) not null);
CREATE TABLE
postgres=# CREATE INDEX ON test USING vectors (embedding l2_ops)
WITH (options = "capacity = 2097152");
CREATE INDEX
postgres=# insert into test (embedding) values (Array[1,2]::real[]);
ERROR:  called `Result::unwrap()` on an `Err` value: Closed
postgres=# insert into test (embedding) values (Array[1,2,3]::real[]);
<< stuck here >>

I build the index on an empty table and then try inserting a bad vector into it, then inserting another normal vector. It gets stuck.

Potential solution:

  • Check the dimension when inserting a new vector, to prevent unexpected behaviors.
  • Handle the Err to release locks

feat: Support ARM

Unable to find image 'tensorchord/pgvecto-rs:latest' locally
latest: Pulling from tensorchord/pgvecto-rs
docker: no matching manifest for linux/arm64/v8 in the manifest list entries.
See 'docker run --help'.

Not sure if it is possible to support Apple M1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.