Code Monkey home page Code Monkey logo

fit-sne's Introduction

FFT-accelerated Interpolation-based t-SNE (FIt-SNE)

Introduction

t-Stochastic Neighborhood Embedding (t-SNE) is a highly successful method for dimensionality reduction and visualization of high dimensional datasets. A popular implementation of t-SNE uses the Barnes-Hut algorithm to approximate the gradient at each iteration of gradient descent. We modified this implementation as follows:

  • Computation of the N-body Simulation: Instead of approximating the N-body simulation using Barnes-Hut, we interpolate onto an equispaced grid and use FFT to perform the convolution, dramatically reducing the time to compute the gradient at each iteration of gradient descent. See the this post for some intuition on how it works.
  • Computation of Input Similiarities: Instead of computing nearest neighbors using vantage-point trees, we approximate nearest neighbors using the Annoy library. The neighbor lookups are multithreaded to take advantage of machines with multiple cores. Using "near" neighbors as opposed to strictly "nearest" neighbors is faster, but also has a smoothing effect, which can be useful for embedding some datasets (see Linderman et al. (2017)). If subtle detail is required (e.g. in identifying small clusters), then use vantage-point trees (which is also multithreaded in this implementation).
  • Early exaggeration: In Linderman and Steinerberger (2017), we showed that appropriately choosing the early exaggeration coefficient can lead to improved embedding of swissrolls and other synthetic datasets.
  • Late exaggeration: Increasing the exaggeration coefficient late in the optimization process (e.g. after 800 of 1000 iterations) can improve separation of the clusters.

Check out our preprint for more details and some benchmarks.

R, Matlab, and Python wrappers are fast_tsne.R, fast_tsne.m, and fast_tsne.py respectively. Gioele La Manno implemented a Python (Cython) wrapper, which is available on PyPI here.

Installation

Note: If you update to a new version of FIt-SNE using git pull, be sure to recompile.

OSX and Linux

The only prerequisite is FFTW, which can be downloaded and installed from the website. Then, from the root directory compile the code as:

    g++ -std=c++11 -O3  src/sptree.cpp src/tsne.cpp src/nbodyfft.cpp  -o bin/fast_tsne -pthread -lfftw3 -lm

See here for instructions in case one does not have sudo rights (one can install FFTW in the home directory and provide its path to g++).

Check out examples/ for usage.

Windows

A Windows binary is available here. Please extract to the bin/ folder, and you should be all set.

If you would like to compile it yourself see below. The code has been currently tested with MS Visual Studio 2015 (i.e., MS Visual Studio Version 14).

  1. First open the provided FItSNE solution (FItSNE.sln) using MS Visual Studio and rebuild it.
  2. Copy the binary file ( e.g. x64/Debug/FItSNE.exe) generated by the build process to the bin/ folder
  3. For Windows, we have added all dependencies, including the FFTW library, which is distributed under the GNU General Public License. For the binary to find the FFTW DLLs, you need to either add src/winlibs/fftw/ to your PATH, or to copy the DLLs into the bin/ directory.

As of this commit, only the R wrapper properly calls the Windows executable. The Python and Matlab wrappers can be trivially changed to call it (just changing bin/fast_tsne to bin/FItSNE.exe in the code), and will be changed in future commits.

Many thanks to Josef Spidlen for this Windows implementation!

References

If you use our software, please cite:

George C. Linderman, Manas Rachh, Jeremy G. Hoskins, Stefan Steinerberger, Yuval Kluger. (2017). Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding. (2017) arXiv:1712.09005 (link)

Our implementation is derived from the Barnes-Hut implementation:

Laurens van der Maaten (2014). Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research, 15(1):3221โ€“3245. (link)

fit-sne's People

Contributors

carmot avatar christophh avatar dkobak avatar josefflowjo avatar linqiaozhi avatar pavlin-policar avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.