Code Monkey home page Code Monkey logo

falconn's People

Contributors

akashin avatar azadkuh avatar csssaz avatar danix800 avatar dblalock avatar fil avatar ludwigschmidt avatar maumueller avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

falconn's Issues

Maximum inner product search

The readme states: "Despite being designed for the cosine similarity, FALCONN can often be used for nearest neighbor search under the Euclidean distance or a maximum inner product search."

So how would one go about using the maximum inner product search?

Make possible to link all src/test/*.cc to one binary

Now linker fails because of not inlined functon in src/test/test_utils.h.
The fix is trivial:

--- a/src/test/test_utils.h
+++ b/src/test/test_utils.h
@@ -11,7 +11,7 @@
 namespace falconn {
 namespace test {
 
-void compare_vectors(const std::vector<float>& expected,
+inline void compare_vectors(const std::vector<float>& expected,
                      const std::vector<float>& result, float eps) {
   ASSERT_EQ(expected.size(), result.size());
   for (size_t ii = 0; ii < expected.size(); ++ii) {

Write a Julia wrapper

It might make sense to wait with this until Cxx.jl (https://github.com/Keno/Cxx.jl) is supported by a default Julia installation (maybe in 0.5?). Otherwise, we might need to write a C wrapper around FALCONN (which might actually be useful in other cases, too).

Make a list of contributors

We should keep track of people who report bugs, help with testing, etc.!

So far we have:

  • Grey Ballard
  • Dhiraj Holden
  • Ilya Kornakov
  • Andrew Sabisch

Refactor the Makefile

The build targets for all the tests should be generated fully automatically by a script. The list of headers can also be generated by a script.

R / Rcpp

Seeing as you're planning a future Julia wrapper, I thought I'd ask for an R one too, so I can try FALCONN instead of RcppAnnoy.

Release v1.2.1

The main point of this release is to upload a package to PyPi built with a corrected version of swig (see #58 for the discussion). However, it would be good to address the following minor issues:

  • update the list of contributors
  • add more detailed instructions (especially, regarding recent idiosyncrasies with clang and swig, see #58)
  • re-generate and re-upload documentation

Store only hash tables in memory

As far as I understand, the underlying algorithm doesn't require storing the dataset in memory, only the generated hash tables.

Given the favorable computational complexity, falconn is very interesting to use with large datasets, but then storing whole datasets in memory becomes either infeasible or very expensive.

All in all, I'm not sure where the requirement of storing the whole dataset comes from โ€“ is it an inherent limtation of the method or was it just not implemented yet (and if so, are there plans to?)

Migrate Python wrapper to pybind11

pybind11 is a fairly new project for accessing C++ code from Python:

https://github.com/pybind/pybind11

There are three advantages for us:

  • They translate from Scipy sparse matrices to Eigen sparse matrices. This would make #24 easy to implement.
  • It's a header-only library, so we can add it to 'external'.
  • The project is actively developed and has very good documentation. It's specifically for connecting C++ and Python, so the semantics are often clearer than in the more generic swig.

pybind11 is planning to release v2 in early January. v2 brings many improvements for the Eigen wrapper, so it's probably good to start with v2 on our side.

Release v1.3

  • Expose linear scan
  • Make the inner tables return the amount of memory they use
  • Parameter search in the wrapper
  • Add query objects to wrapper for multi-threaded querying (#6)
  • Add check in wrapper for k and bit-packed table (#45)

clang missing brace warnings

Remove the following warnings. From a quick search it looked like clang might be too cautious here.

In file included from falconn/swig/falconn_wrap.cc:3591:
falconn/src/include/falconn/lsh_nn_table.h:179:5: warning: suggest braces around initialization of subobject [-Wmissing-braces]
    "unknown",
    ^~~~~~~~~~
falconn/src/include/falconn/lsh_nn_table.h:209:5: warning: suggest braces around initialization of subobject [-Wmissing-braces]
    "unknown",
    ^~~~~~~~~~
2 warnings generated.

How to find near neighbors in given candidates instead of all candidates

Hi,

I know the definition of find_near_neighbors are as follows:

def find_near_neighbors(self, query, threshold)
Find all the points within threshold distance from query.

I would like to know how to calculate near neighbors within distance of threshold from a GIVEN candidates. For example, given smaller candidates is provided as a list of indexes.

Actually, I have a large dataset, I only want to calculate a document's nearest neighbors in a time window [-3 days, +3 days]. And I want to avoid to build a LSH index for each days' data.

Thanks very much.

Rename find_closest ?

We have the following methods for getting near(est) neighbors:

  • find_closest
  • find_k_nearest_neighbors
  • find_near_neighbors

I find this a little inconsistent. Should we rename find_closest to find_nearest_neighbor?

Migrate to Eigen's sparse vector class?

It would be nice to use Eigen's sparse vector class because it supports linear algebra operations without additional work on our side (as opposed to the std::vector<std::pair<int, float>> we are currently using). But we should first do a small benchmark to compare the two options.

pip install error on OS X 10.11

Hello,
I'm experiencing the following error while trying to instal falconn with pip install.
Gist
My Setup: OS X 10.11, virtual environment with python 3.5 and numpy 1.11.1 . Shouldn't make a difference, but I have Eigen installed with homebrew.

Python wrapper for the sparse case

Currently, the Python wrapper supports only dense data (via Numpy arrays), would be nice to generalize it to handle sparse data as well.

Slow and memory-costly compilation

Currently, compiling anything that includes FALCONN takes quite a bit of time and RAM (on towhee running make python_package_install eats more than 2.4 Gb of RAM).

It's a bit painful: for instance, one can't run make python_package_install on the machine, where the website is hosted, which is unfortunate, since it'd be way more convenient to generate docs on it, instead of pulling them from somewhere.

Write a contributor guide

Points to cover:

  • Linting clang-format --style=Google -i <file-name>
  • Running unit tests (ideally also set up some continuous integration service, see #13)
  • Pull requests
  • What else?

Use helpers during the construction

Need to use the code like below instead of "exponential" case analysis in the wrapper.

#include <iostream>
#include <memory>

class Base {
public:
    virtual void impl() = 0;

    virtual ~Base() {}
};

template<bool u, bool v> class Derived : public Base {
public:
    void impl() {
        std::cout << u << " " << v << std::endl;
    }
};

class Factory {
public:
    static std::shared_ptr<Base> construct(bool u, bool v) {
        if (u) {
            return construct_1<true>(v);
        }
        else {
            return construct_1<false>(v);
        }
    }
private:
    template<bool u> static std::shared_ptr<Base> construct_1(bool v) {
        if (v) {
            return construct_2<u, true>();
        }
        else {
            return construct_2<u, false>();
        }
    }

    template<bool u, bool v> static std::shared_ptr<Base> construct_2() {
        return std::shared_ptr<Base>(new Derived<u, v>);
    }
};

int main() {
    std::shared_ptr<Base> x = Factory::construct(false, true);
    x->impl();
    return 0;
}

Write a Python wrapper

  • First version of the wrapper
  • Documentation (make clear that the NumPy array must stay valid!)
  • GloVe example
  • Helper functions (compute_number_of_hash_functions, etc.)
  • Performance debugging to get the wrapper closer to C++ performance.
  • Update README (include new license information for numpy.i)
  • Create a Python package.
  • Add a thin wrapper on the Python side.

find_near_neighbors for only one NN

Is there a way to get only one result when using the find_near_neighbors? I am interesting in solving the decision problem; is there a point in the pointset thats lies within an r radius with the query?

By using the find_near_neighbors it tries to find all the points that satisfy this condition.

Release v1.2

  • A bit-packed storage hash table.
  • Multi-threaded table setup.
  • Remove sorted candidate query methods.
  • Update Python wrapper.
  • Run GloVe benchmarks.
  • Test multi-threaded setup on a few more machines / platforms.
  • Update Documentation.
  • Create github release.
  • Update PyPI package.
  • Ludwig run GloVe locally.

Postponed to v1.3:

  • Add query objects to wrapper for multi-threaded querying. (#6)
  • Add check in wrapper for k and bit-packed table. (#45)

setup.py should NOT check if libraries (e.g. numpy) are installed

It is not common practice, and for a good reason.
With that code in place, it's impossible to do a one-pass pip install of a requirements.txt file including both numpy and falconn, because the setup.py of falconn would be run before numpy is installed (and fail!).

pip install FALCONN failed on windows 7

I use Anaconda2 64 bit on my Windows 7 system. After I typed "pip install FALCONN", it downloads the FALCONN package and attempt to install but I got the following errors:

KeyType = int; PointSet = falconn::PlainArrayPointSet]'
falconn/swig/python_wrapper.h:337:31: required from here
falconn/src/include/falconn/wrapper/../core/polytope_hash.h:90:9: error: 'posix_memalign' was not declared in this scope
In file included from falconn/external/eigen/Eigen/Core:344:0,
from falconn/external/eigen/Eigen/Dense:1,
from falconn/src/include/falconn/eigen_wrapper.h:11,
from falconn/src/include/falconn/falconn_global.h:8,
from falconn/swig/falconn_wrap.cc:3689:
falconn/external/eigen/Eigen/src/Core/arch/SSE/PacketMath.h: In function 'Packet Eigen::internal::pmadd(const Packet&, const Packet&, const Packet
&) [with Packet = __vector(2) double]':
falconn/external/eigen/Eigen/src/Core/arch/SSE/PacketMath.h:250:134: warning: control reaches end of non-void function [-Wreturn-type]
falconn/external/eigen/Eigen/src/Core/arch/SSE/PacketMath.h: In function 'Packet Eigen::internal::pmadd(const Packet&, const Packet&, const Packet
&) [with Packet = __vector(4) float]':
falconn/external/eigen/Eigen/src/Core/arch/SSE/PacketMath.h:249:134: warning: control reaches end of non-void function [-Wreturn-type]
error: command 'C:\Anaconda2\MinGW\bin\gcc.exe' failed with exit status 1

----------------------------------------

Command "C:\Anaconda2\python.exe -u -c "import setuptools, tokenize;file='c:\users\admini1\appdata\local\temp\pip-build-xmpbnu\FALCONN\se
tup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record c:\users\admini
1\ap
pdata\local\temp\pip-yj92v2-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\admini~1\app
data\local\temp\pip-build-xmpbnu\FALCONN\

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.