Code Monkey home page Code Monkey logo

nanopq's Introduction

nanopq

Build Status Documentation Status PyPI version Downloads

Nano Product Quantization (nanopq): a vanilla implementation of Product Quantization (PQ) and Optimized Product Quantization (OPQ) written in pure python without any third party dependencies.

Installing

You can install the package via pip. This library works with Python 3.5+ on linux.

pip install nanopq

Example

import nanopq
import numpy as np

N, Nt, D = 10000, 2000, 128
X = np.random.random((N, D)).astype(np.float32)  # 10,000 128-dim vectors to be indexed
Xt = np.random.random((Nt, D)).astype(np.float32)  # 2,000 128-dim vectors for training
query = np.random.random((D,)).astype(np.float32)  # a 128-dim query vector

# Instantiate with M=8 sub-spaces
pq = nanopq.PQ(M=8)

# Train codewords
pq.fit(Xt)

# Encode to PQ-codes
X_code = pq.encode(X)  # (10000, 8) with dtype=np.uint8

# Results: create a distance table online, and compute Asymmetric Distance to each PQ-code 
dists = pq.dtable(query).adist(X_code)  # (10000, ) 

Author

Contributors

  • @Hiroshiba fixed a bug of importlib (#3)
  • @calvinmccarter implemented parametric initialization for OPQ (#14)
  • @de9uch1 exntended the interface to the faiss so that OPQ can be handled (#19)
  • @mpskex implemented (1) initialization of clustering and (2) dot-product for computation (#24)
  • @lsb fixed a typo (#26)

Reference

nanopq's People

Contributors

calvinmccarter avatar de9uch1 avatar hiroshiba avatar lsb avatar matsui528 avatar mpskex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

nanopq's Issues

Turn print statements into logging

Hi!

Thanks for writing this package, it looks great!

I'd be interested in turning the print statements (with verbose=True) into logging statements. The verbose flag could then be used to control whether this logging is output to stdout (i.e., by setting the log level). Is this something you are interested in? if so, I could submit a PR.

How to compute distance between PQ codes?

Not sure if this should be a feature request.

Supposed I just want to approximate distance between two PQ codes (under the same encoder of course). What is the most efficient way to perform such operation?

Add `shubham0204/pq.rs`, a Rust implementation of `pq.py`, as a community resource in `README.md`

I wanted to learn how product quantization works, and this repository provided excellent code to understand how it works. As I had been learning Rust for a few months now, I decided to re-write the pq.py script in Rust to understand each step thoroughly by self-implementation. Here's the repository containing the Rust code: shubham0204/pq.rs.

The following steps are have to be taken in order to complete the project:

  1. Complete README.md and add a small usage sample of the Rust API
  2. Prepare a crate and upload it to crates.io

Do let me know if the repository can be included as a community resource. Just like me, many other learners would like to learn implementation of product quantization in languages other than Python, and building a section where implementations in other languages would be of great help. Moreover, I'm also working on a detailed blog which will explain product-quantization from first-concepts and with a Rust implementation.

Centroid of Centroids using NanoPQ

I am looking in to do centroid of centroids using NanoPQ, is it possible?. I have a first level nanopq model M=4, K=16, D=24. The codewords that is produced is (4, 16, 6), can this output be sent as an input for the second level nanoPQ to calculate centroid of centroids? The reason for investigating centroid of centroids is due to processing large datasets and reduce processing time.

about reconstructed

thanks for your work.

`import nanopq
import numpy as np

N, Nt, D = 10000, 2000, 128
X = np.random.random((N, D)).astype(np.float32) # 10,000 128-dim vectors to be indexed
Xt = np.random.random((Nt, D)).astype(np.float32) # 2,000 128-dim vectors for training
query = np.random.random((D,)).astype(np.float32) # a 128-dim query vector

pq = nanopq.PQ(M=8, Ks=256)
pq.fit(Xt, seed=123)
X_code = pq.encode(X) # (10000, 8) with dtype=np.uint8
X_reconstructed = pq.decode(codes=X_code)

tmp = X[0]
tmp1 = X_reconstructed[0]
dis = np.sqrt(np.sum(np.square(tmp - tmp1)))`

the dis is about 2.0+ . dose it look like right?

why with parametric init has poor performance than non-parametric one ?

hi,friend ,I have two question .

  1. why with parametric init has poor performance than non-parametric one according to your unit test 'test_parametric_init'? it is inconsistent with the conclusion of the paper 《Optimized Product Quantization for Approximate Nearest Neighbor Search 》--Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun

`def test_parametric_init(self):
N, D, M, Ks = 100, 12, 4, 10
X = np.random.random((N, D)).astype(np.float32)
opq = nanopq.OPQ(M=M, Ks=Ks)
opq.fit(X, parametric_init=False, rotation_iter=1)
err_init = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))

opq = nanopq.OPQ(M=M, Ks=Ks)
opq.fit(X, parametric_init=True, rotation_iter=1)
err = np.linalg.norm(opq.rotate(X) - opq.decode(opq.encode(X)))

self.assertLess(err_init, err)`
  1. the code compute normal not need rotate X, the decode will rotate code to original space at 255 line of opq.py

self.pq.decode(codes) @ self.R.T

Typo

I think this should be 8 bits not 256! otherwise the package is very helpful thanks!

into 256 bits = 1 byte = uint8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.