Code Monkey home page Code Monkey logo

product-quantization's Introduction

product-quantization

A general framework of vector quantization with python.

NEQ, AAAI 2020, Oral

Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search.

  • Abstract

    Vector quantization (VQ) techniques are widely used in similarity search for data compression, fast metric computation and etc. Originally designed for Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or implicitly minimize the quantization error. In this paper, we present a new angle to analyze the quantization error, which decomposes the quantization error into norm error and direction error. We show that quantization errors in norm have much higher influence on inner products than quantization errors in direction, and small quantization error does not necessarily lead to good performance in maximum inner product search (MIPS). Based on this observation, we propose norm-explicit quantization (NEQ) --- a general paradigm that improves existing VQ techniques for MIPS. NEQ quantizes the norms of items in a dataset explicitly to reduce errors in norm, which is crucial for MIPS. For the direction vectors, NEQ can simply reuse an existing VQ technique to quantize them without modification. We conducted extensive experiments on a variety of datasets and parameter configurations. The experimental results show that NEQ improves the performance of various VQ techniques for MIPS, including PQ, OPQ, RQ and AQ.

Datasets

The netflix dataset is contained in this repository, you can download more datasets from here, then you can calculate the ground truth with the script

python run_ground_truth.py  --dataset netflix --topk 50 --metric product

Run examples

python run_pq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
python run_opq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
python run_rq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
python run_aq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256 # very slow

Reproduce results of NEQ

python run_norm_pq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
python run_norm_opq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
python run_norm_rq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
python run_norm_aq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256 # very slow

Reference

If you use this code, please cite the following paper

@article{xinyandai,
  title={Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search},
  author={Dai, Xinyan and Yan, Xiao and Ng, Kelvin KW and Liu, Jie and Cheng, James},
  journal={arXiv preprint arXiv:1911.04654},
  year={2019}
}

product-quantization's People

Contributors

wallaceliu avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.