Code Monkey home page Code Monkey logo

Comments (6)

jamestwebber avatar jamestwebber commented on September 25, 2024

Two variables being perfectly anti-correlated (r = -1) implies a relationship, but it definitely doesn't imply similarity.

The current implementation is the definition of Pearson's distance.

You could certainly implement 1 - |corr| as a custom metric if you wished, if that makes sense for you.

from pynndescent.

DipeshNiraula avatar DipeshNiraula commented on September 25, 2024

Yes, it does. Both 1 and -1 Pearson correlation coefficient corresponds to cosine similarity of 1(or near 1), at least for the positive space. Because corr coeff is simply a cosine similarity in the sample space centered at the mean.

Yes, I understand the attempt to convert correlation into distance by subtracting it from 1. However, this does not work for correlation coefficient.

Currently I am defining 1-|corr| but I think I will just stick to 1-cos instead.

from pynndescent.

jamestwebber avatar jamestwebber commented on September 25, 2024

Yes, it does. Both 1 and -1 Pearson correlation coefficient corresponds to cosine similarity of 1(or near 1), at least for the positive space.

??? This just isn't true. Maybe I misunderstand the requirements for what you are describing, but e.g. np.arange(10) and np.arange(10)[::-1] are perfectly anti-correlated and certainly not cosine similarity of 1.

from pynndescent.

DipeshNiraula avatar DipeshNiraula commented on September 25, 2024

Thanks James, I just checked the cor is -1 but the cos is 0.571, but both cor and cos between np.arange(10) and np.arange(10) was 1. So, looks like cos distance does differentiate between positive and negative correlation, but considers positive correlation to be more similar than negative correlation. Both positive and negative correlation are similar (\in [0,1]) wrt cosine.

Anyways, the argument about 1-cor distance still holds. For negative correlation, the distance is large, and when pydescent finds NN its will only consider neighbor with positive cor and neglect neighbor negative cor. However, in reality finding a strong negative correlation is just as important.

from pynndescent.

jamestwebber avatar jamestwebber commented on September 25, 2024

Anyways, the argument about 1-cor distance still holds. For negative correlation, the distance is large, and when pydescent finds NN its will only consider neighbor with positive cor and neglect neighbor negative cor. However, in reality finding a strong negative correlation is just as important.

This will always be application and data-dependent, though. Sometimes it's important, sometimes it isn't, sometimes it's important but does not imply they are similar items that should be linked in a kNN graph. It depends on context, which is beyond the scope of a generic kNN package.

This package can't redefine "correlation distance" to mean something different from how it is defined in the literature and other packages. It is simple to write your own metric if you want your distances to work that way, but it won't be correlation distance.

from pynndescent.

DipeshNiraula avatar DipeshNiraula commented on September 25, 2024

Well in that case, it might be worth including one sentence on this issue: Defining correlation "distance" as 1-corr does not treat negative correlation as nearest Neighbour.

from pynndescent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.