Code Monkey home page Code Monkey logo

Comments (12)

directorscut82 avatar directorscut82 commented on July 19, 2024 1

Just for completeness,
after reviewing the link/tutorial you provided (within the codebase) and especially the original paper ("Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces"), i was (very) lucky to notice that the vp-tree author have already tested performance on two non-Euclidian cases where the first is a pseudometric angle based distance with a very positive performance report (yay).

from bhtsne.

directorscut82 avatar directorscut82 commented on July 19, 2024 1

@jlorince Maybe this can clear some things:: angular distance vs. metric

from bhtsne.

lvdmaaten avatar lvdmaaten commented on July 19, 2024

I think using a pairwise distance matrix as input does not make much sense: Barnes-Hut algorithms are intended to operate on large datasets for which you cannot keep such distance matrices in memory. If your distance matrix is small enough to keep it in memory, one may as well use a standard implementation of t-SNE.

Allowing the user to use any distance function would be a nice feature to have, and should be relatively straightforward to implement. My only concern here is that the distance function should implement a metric, or the vantage-point tree algorithm may not work correctly. As long as we make this clear in the documentation, I don't think it is necessary to explicitly check whether the metric assumptions are violated (as it would be expensive to do so). Feel free to send a PR implementing this.

from bhtsne.

directorscut82 avatar directorscut82 commented on July 19, 2024

In (hyper)spectral imaging we like our similarities to be insensitive to illumination, so an angle-based distance metric is much more preferred to a simple Euclidean.

+Thank you for the quick responce..

from bhtsne.

lvdmaaten avatar lvdmaaten commented on July 19, 2024

Angle-based metrics are generally not metric (they violate the identity of indiscernibles condition and the triangle inequality).

from bhtsne.

directorscut82 avatar directorscut82 commented on July 19, 2024

You are absolutely correct,
but for a physical quantity as a spectrum we can limit the possible angle range to 0-pi/2, violating only the triangle inequality.

The first three (non-negativity, identity of indiscernibles & symmetry) establish a positive-definite function (alas not a metric but i think its ok for kernel contraptions:).

from bhtsne.

jlorince avatar jlorince commented on July 19, 2024

Out of curiosity, has anyone implemented a version that allows for a user-defined distance metric? (for my use case I'm looking at cosine distance)

from bhtsne.

lvdmaaten avatar lvdmaaten commented on July 19, 2024

It should be straightforward to do as long as your distance is an actual metric, since the vantage-point tree is already takes a distance template. So you can replace this bit of code by your own distance metric: https://github.com/lvdmaaten/bhtsne/blob/master/vptree.h#L90-L104

Having said that, cosine distance violates the triangle inequality and identity of the indiscernibles, so you cannot use it with vantage-point trees (well you can, but there is no guarantee you'll get correct results).

from bhtsne.

jlorince avatar jlorince commented on July 19, 2024

Awesome - was hoping there was something simple to plug-and-play, as it were (my c++ is horribly weak). Perhaps the angular distance would be good to use here, as it's a proper distance metric and should behave similarly to cosine distance (I think...)

from bhtsne.

jlorince avatar jlorince commented on July 19, 2024

Are you certain that's the only thing that would need changing? As a hacked-together experiment, I tried modifying the euclidean distance function to a cosine function:

double euclidean_distance(const DataPoint &t1, const DataPoint &t2) {
    double dp = 0.0;
    double denom_a = 0.0;
    double denom_b = 0.0;
    double* x1 = t1._x;
    double* x2 = t2._x;
    for(int d = 0; d < t1._D; d++) {
        dp += x1[d] * x2[d];
        denom_a += x1[d] * x2[d];
        denom_b += x1[d] * x2[d];
    }
    return dp / (sqrt(denom_a) * sqrt(denom_b));
}

It compiles fine, but when I try running it I get AssertionError: ERROR: Call to bh_tsne exited with a non-zero return code exit status, please refer to the bh_tsne output for further details. But there's not output for me to refer to...

from bhtsne.

lvdmaaten avatar lvdmaaten commented on July 19, 2024

Yes I'm sure. The implementation you have now is not metric though, which means that apparently funky things can happen in the VP-tree search...?

I don't really understand why you'd want to use cosine distance anyway: squared Euclidean distance is a lot like cosine distance (in particular, when you normalize your inputs to have a L-2 norm of 1).

from bhtsne.

stonexjr avatar stonexjr commented on July 19, 2024

@jlorince Plus, your euclidean_distance implementation looks buggy.
why it was denom_a += x1[d] * x2[d] instead of denom_a += x1[d] * x1[d]?
Your denom_a and denom_b would have the same value after the for loop, and the function would always returns 1 disregard of the input t1 and t2

from bhtsne.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.