Code Monkey home page Code Monkey logo

annoy.js's Introduction

Annoy.js

Annoy.js is a 0-dependency Hyperplane (Binary) Search Tree, inspired by Spotify Annoy as described by This Blog Post

Use this package to query for K Approximate Nearest Neighbors from a tree of n-dimensional vectors in O(logn) time.

*Annoy.js is inspired by but not affiliated with Spotify Annoy.

Installation

Install via NPM:

npm install annoy.js

Example

import Annoy from 'annoy.js';

// USING ANNOY TO QUERY K APPROXIMATE NEAREST NEIGHBORS

// 0. Define Annoy constants
const FOREST_SIZE: number = 10;
const MAX_LEAF_SIZE: number = 50;
const VECTOR_LEN: number = 10;

// 1. Init Annoy with constants
const a: Annoy = new Annoy(FOREST_SIZE, VECTOR_LEN, MAX_LEAF_SIZE);

// 2. Generate random data points to add to the Annoy forest
const POINT_COUNT = 100000;
const dataPoints: DataPoint[] = [];
for (let i = 0; i < POINT_COUNT; i++) {
    const v: Vector = [...new Array(VECTOR_LEN)].map(() => Math.random() * 40);

    const dp: DataPoint = {
        // Include a 'vector' property
        v,
        // Add random data to the 'data' property
        d: i,
    };

    dataPoints.push(dp);
}

// 3. Fill Annoy forest with our data points
for (let i = 0; i < dataPoints.length; i++) {
    const dp: DataPoint = dataPoints[i];

    a.add(dp);
}

// 4. Query K Approximate Nearest Neighbors to a random point
const K = 20;
const p: Vector = [...new Array(VECTOR_LEN)].map(() => Math.random() * 40);
const knn: DataPoint[] = a.get(p, K);

// 5. Log our random query point and its K approximate nearest neighbors
console.log(p);
console.log(knn.length);

// ----

// SAVING/LOADING ANNOY TO/FROM A JSON OBJECT

// 1. Serialize to json
const asJson: AnnoyJson = a.toJson();
const asJsonStr: string = JSON.stringify(asJson);

// 2. Deserialize back to an Annoy instance
const rebuiltTree: Annoy = new Annoy(FOREST_SIZE, VECTOR_LEN, MAX_LEAF_SIZE);
console.time('Rebuilt Annoy fromJson');
rebuiltTree.fromJson(asJsonStr);
console.timeEnd('Rebuilt Annoy fromJson');

// 3. Get KNN from rebuilt Annoy
console.time('Rebuilt Annoy KNN');
const rebuiltKnn: DataPoint[] = a.get(p, K);
console.timeEnd('Rebuilt Annoy KNN');

Public Api

Constructor

Annoy(forestSize: number, maxLeafSize: number)

add(point: { vector: number[], data: any }): void

Add a single point to the Annoy forest

get(vector: number[], k: number): { vector: number[], data: any }[]

Get the 'k' approximate nearest neighbors to a given 'point'

How it works

  1. A user initializes Annoy.js with a 'forestSize' and a 'maxLeafSize';

  2. As the user adds points, Annoy.js pools "nearby" points into the leaves of its trees.

  3. Once a leaf (L) has pooled more points than some threshold value (maxLeafSize), L splits into a 'right' leaf (LR) and a 'left' leaf (LL), and the points in L trickle down into LR or LL.

3.1. Prior to splitting, L finds the centroid of its pooled points (C)1 and defines an arbitrary hyperplane (H) that passes through C.

3.2. If a point (P) is on the "left" of the hyperplane, it trickles down to LL. If it is on the right, it trickles down to LR2.

  1. Annoy.js builds 'forestSize' number of trees, where each tree splits its points on a distinct hyperplane, separating each set of points into a slightly different tree structure.

  2. When a user queries Annoy for the "K" nearest neighbors to a point (Q), it traverses each tree in the forest for its pool of nearest neighbors and collects each pool into a set of points (S). If S contains more than K points, Annoy returns the K closest points to Q from S, else Annoy simply returns S.


NOTES:

  1. The centroid of a set of pooled points is the sum of all points divided by the total number of points: (sum of points) / (total # points)

  2. H is defined by 2 components: A vector 'w' that defines the normal to H and a scalar 'b' that defines some offset bias from the origin. A point (p) is on the "left" of H if w โ‹… p < 0. If w โ‹… p >= 0, then it is on the "right" of H.

Benchmarks

My machine was able to index 1,000,000 128-point vectors into Annoy in 3 min 55 sec, while a KNN query of the points took a measly 0.365 ms.

Improvements

License

ISC

annoy.js's People

Contributors

danielkring1 avatar

Stargazers

Roman Nesterov avatar Markus Strazds avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.