Code Monkey home page Code Monkey logo

flatnav-experimental's Introduction

Near Neighbor Graph Reordering

This repository implements a graph near neighbor index and provides various ways to reorder the nodes in the graph to improve query latency. To reproduce our experiments, there are three tools that can be run:

  • construct - creates a near neighbor index from a data file
  • reorder - applies graph reordering to permute the node ordering of the index
  • query - queries the index and computes recall and other performance metrics

The tools are largely self-documenting and will provide help if run without any command line arguments. Note that the reordering tools can generally run without needing access to the dataset, queries or distance metrics (unless profile-guided reordering is used).

Installation

FlatNav is implemented in C++ with a complete Python extension with cereal as the only external dependency. The C++ library can be built from source using CMake.

FlatNav is supported on x86-64 machines on linux and MacOS (we can extend this to windows if there is sufficient interest). To build from source you will need

  • C++17 compiler with OpenMP support (version >= 2.0)
  • CMake (version >= 3.14)

We provide some helpful scripts for installing the above in the bin directory.

To build the library with CMake, run

> git clone https://github.com/BlaiseMuhirwa/flatnav-experimental.git --recurse-submodules
> cd flatnav-experimental
> ./bin/build.sh -h 

This will display all available build options:

Usage ./build.sh [OPTIONS]

Available Options:
  -t, --tests:                    Build tests
  -e, --examples:                 Build examples
  -v, --verbose:                  Make verbose
  -b, --benchmark:                Build benchmarks
  -bt, --build_type:              Build type (Debug, Release, RelWithDebInfo, MinSizeRel)
  -nmv, --no_simd_vectorization:Disable SIMD instructions
  -h, --help:                     Print this help message

Example Usage:
  ./build.sh -t -e -v

To build the Python bindings, follow instructions here. There are also examples for how to use the library to build an index and run queries on top of it here.

Support for SIMD Extensions

We currently support SIMD extensions for certain platforms as detailed below.

Operation x86_64 arm64v8 Apple silicone
FP32 Inner product SSE, AVX, AVX512 No SIMD support No SIMD support
FP32 L2 distance SSE, AVX, AVX512 No SIMD support No SIMD support

Datasets from ANN-Benchmarks

ANN-Benchmarks provides HDF5 files for a standard benchmark of near-neighbor datasets, queries and ground-truth results. To run on these datasets, we provide a set of tools to process numpy (NPY) files: construct_npy, reorder_npy and query_npy.

To generate the ANNS benchmark datasets, run the following script

> ./bin/download_anns_datasets.sh <dataset-name> [--normalize]

For datasets that use the angular/cosine similarity, you will need to use --normalize option so that the dataset is normalized. Available dataset names include:

  • mnist-784-euclidean
  • sift-128-euclidean
  • glove-25-angular
  • glove-50-angular
  • glove-100-angular
  • glove-200-angular
  • deep-image-96-angular
  • gist-960-euclidean
  • nytimes-256-angular

Using Custom Datasets

The most straightforward way to include a new dataset for this evaluation is to put it into either the ANN-Benchmarks (NPY) format or to put it into the Big ANN-Benchmarks format. The NPY format requires a float32 2-D Numpy array for the train and test sets and an integer array for the ground truth. The Big ANN-Benchmarks format uses the following binary representation. For the train and test data, there is a 4-byte little-endian unsigned integer number of points followed by a 4-byte little-endian unsigned integer number of dimensions. This is followed by a flat list of num_points * num_dimensions values, where each value is a 32-bit float or an 8-bit integer (depending on the dataset type). The ground truth files consist of a 32-bit integer number of queries, followed by a 32-bit integer number of ground truth results for each query. This is followed by a flat list of ground truth results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.