Code Monkey home page Code Monkey logo

Comments (3)

lmcinnes avatar lmcinnes commented on May 26, 2024

So the bad news is that distances calculated during the search are thrown away to save memory, so they just aren't accessible as you would want.

The possibly good news is that there are some hidden features that may do what you want. There is a module graph_utils that you can import as pynndescent .graph_utils that has added functionality to "complete" a knn graph to have a single connected component (specifically pynndescent.graph_utils.connect_graph). If that isn't quite what you need (for your clustering purposes) then the function find_component_connection_edge, which finds the shortest edge between two components in the graph (well, approximates it with an approximate search) can be modified to return a set of edges, and that might fit your needs. Are any of these useful, or do you need a more general random sampling of distances?

from pynndescent.

bobermayer avatar bobermayer commented on May 26, 2024

thanks for the reply. looking a bit more into the code, I suspected as much. I have to check whether my graphs are fully connected to see if the functionality you mention would help.
other than that, I was wondering if I could maybe increase the heap size, but I guess it's always assumed to be exactly n_neighbors and would anyway only save the smallest distances?
carrying through another bigger (unsorted?) heap could be a less efficient way of saving all those distances, not sure the overhead is worth it, because calculating them is not actually that expensive, so maybe that's what I'll try first.

from pynndescent.

bobermayer avatar bobermayer commented on May 26, 2024

ok, so I created a fork where the distances discarded in the heap_push functions are saved instead to backup arrays. this seems to work but I haven't done any detailed testing yet. I could try to implement this in a more elegant way in case you're interested.

however, I'm getting errors with numba when installing the package locally (using python setup.py install): I can run it once, but I'm getting seg faults afterwards (more specifically, I can run it multiple times in the same python session, but not in successive independent python sessions). this also happens with your master branch though.
stack traces suggest a problem in rp_trees.make_dense_tree (but unrelated to #209 as it happens with euclidean and correlation distance metrics) , and it might be caused by indices = np.arange(data.shape[0]).astype(np.int32) in line 830

No implementation of function Function(<built-in function arange>) found for signature:
  >>> arange(int64)

thanks for any input. I have python 3.8.0, numpy 1.23.5, numba 0.53.1 (or 0.56.1 on another system, similar problem)

update: turning off the numba caching for the make_tree functions in rp_trees.py appears to resolve this issue.

another update: maybe related to caching problems for recursive functions?

from pynndescent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.