Comments (6)
Two variables being perfectly anti-correlated (r = -1
) implies a relationship, but it definitely doesn't imply similarity.
The current implementation is the definition of Pearson's distance.
You could certainly implement 1 - |corr|
as a custom metric if you wished, if that makes sense for you.
from pynndescent.
Yes, it does. Both 1 and -1 Pearson correlation coefficient corresponds to cosine similarity of 1(or near 1), at least for the positive space. Because corr coeff is simply a cosine similarity in the sample space centered at the mean.
Yes, I understand the attempt to convert correlation into distance by subtracting it from 1. However, this does not work for correlation coefficient.
Currently I am defining 1-|corr| but I think I will just stick to 1-cos instead.
from pynndescent.
Yes, it does. Both 1 and -1 Pearson correlation coefficient corresponds to cosine similarity of 1(or near 1), at least for the positive space.
??? This just isn't true. Maybe I misunderstand the requirements for what you are describing, but e.g. np.arange(10)
and np.arange(10)[::-1]
are perfectly anti-correlated and certainly not cosine similarity of 1.
from pynndescent.
Thanks James, I just checked the cor is -1 but the cos is 0.571, but both cor and cos between np.arange(10) and np.arange(10) was 1. So, looks like cos distance does differentiate between positive and negative correlation, but considers positive correlation to be more similar than negative correlation. Both positive and negative correlation are similar (\in [0,1]) wrt cosine.
Anyways, the argument about 1-cor distance still holds. For negative correlation, the distance is large, and when pydescent finds NN its will only consider neighbor with positive cor and neglect neighbor negative cor. However, in reality finding a strong negative correlation is just as important.
from pynndescent.
Anyways, the argument about 1-cor distance still holds. For negative correlation, the distance is large, and when pydescent finds NN its will only consider neighbor with positive cor and neglect neighbor negative cor. However, in reality finding a strong negative correlation is just as important.
This will always be application and data-dependent, though. Sometimes it's important, sometimes it isn't, sometimes it's important but does not imply they are similar items that should be linked in a kNN graph. It depends on context, which is beyond the scope of a generic kNN package.
This package can't redefine "correlation distance" to mean something different from how it is defined in the literature and other packages. It is simple to write your own metric if you want your distances to work that way, but it won't be correlation distance.
from pynndescent.
Well in that case, it might be worth including one sentence on this issue: Defining correlation "distance" as 1-corr does not treat negative correlation as nearest Neighbour.
from pynndescent.
Related Issues (20)
- Sample identifiers for semantic search HOT 2
- uint8 as internal data HOT 1
- Cosine metric - error "Negative values in data passed to precomputed distance matrix" HOT 2
- Question about covariance matrix used when using Mahalanobis distance
- Tests fail: E SystemError: initialization of _internal failed without raising an exception
- Newest version breaks with UMAP HOT 3
- Slice error using mac M1-max ARM HOT 6
- Exceedingly large amount of memory usage
- Very high memory usage HOT 7
- Specifying threshold in distance metrics
- API to save and load index from disk
- Minor bias in split selection? HOT 1
- true_angular is not a distance?
- TSSS missing a factor of 2
- Reverse diversification is actually forward diversification again
- pynndescent might break with next numba release HOT 3
- How to navigate pyinstaller HOT 1
- Querying the training set: runtime tradeoff for large k HOT 2
- 1 test fails: ZeroDivisionError: division by zero
- `np.infty` replacement
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pynndescent.