Comments (3)
Hi Oskar!
If you want run the code using UMAP, it will ignore the perplexity
parameter. I will make this clear in the documentation! Thanks for the question!
from embedr.
Thanks! I was a bit confused by this:
perplexity: float
Similar to the perplexity parameter from van der Maaten (2008); sets
the scale of the affinity kernel used to measure embedding quality.
NOTE: In the EMBEDR algorithm, this parameter is used EVEN WHEN NOT
USING t-SNE! Default is 30
in the EMBEDR class.
from embedr.
Oh, of course! The perplexity parameter does double duty in that it is involved with how the embedding quality is assessed as well as in running t-SNE. That is, currently, the quality of an embedding is calculated as the similarity of two data-affinity matrices, one from the original data space and one from the embedded space. The high-dimensional affinity matrix depends on a perplexity parameter, perp_aff
, which needs to be set somehow.
If you use the same value for perp_aff
throughout a sweep of the UMAP n_neighbors
parameter, you are examining the quality with which neighborhoods of a size set by perp_aff
are embedded by UMAP as UMAP is allowed to use more or fewer neighbors to actually carry out the embedding. This is akin to fixing the resolution of your "quality ruler" and then examining the different conditions created by UMAP. I don't think there will be anything wrong with this.
Alternately, you can change perp_aff
to correspond to the neighborhood size that t-SNE/UMAP is operating at. This is easy to do with t-SNE because perp_aff
can be set to be the same as the canonical perplexity
. However, to do this with UMAP, we need to map perp_aff
to some sort of k_effective
number of nearest neighbors. I am currently working on implementing this.
However, if you're concerned after you've run your sweep that you've chosen the wrong perp_aff
for some reason, you don't have to re-run everything, but you will have to hack the methods a bit. What you can do is something like the following:
from embedr import EMBEDR
import matplotlib.pyplot as plt
import numpy as np
from openTSNE.affinity import PerplexityBasedNN
import utility as utl
X = np.loadtxt("./Data/mnist2500_X.txt")
old_perp = 30
new_perp = 100
n_jobs = -1
seed = 1
verbose = 5
n_data_embed = 1
n_null_embed = 2
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(12, 5))
## Initialize and fit the data like normal
UMAP_embed = EMBEDR(perplexity=old_perp,
dimred_params={'n_neighbors': n_neighbors},
# cache_results=False, ## Turn off file caching.
dimred_alg="UMAP",
n_jobs=n_jobs,
random_state=seed,
verbose=verbose,
n_data_embed=n_data_embed,
n_null_embed=n_null_embed,
project_name='changing_perplexity_test')
UMAP_embed.fit(X)
## Let's see the results!
UMAP_embed.plot(ax=ax1, show_cbar=False)
## Calculate a new affinity matrix at the new perplexity
new_aff_mat = PerplexityBasedNN(X,
perplexity=new_perp,
n_jobs=n_jobs,
random_state=seed,
verbose=verbose)
## Calculate null affinity matrices at the new perplexity
new_null_mat = {}
for nNo in range(n_null_embed):
null_X = utl.generate_nulls(X, seed=seed + nNo).squeeze()
nP = PerplexityBasedNN(null_X,
perplexity=new_perp,
n_jobs=n_jobs,
random_state=seed,
verbose=verbose)
new_null_mat[nNo] = nP
## Reset the affinity matrices in the method
UMAP_embed._affmat = new_aff_mat
UMAP_embed._null_affmat = new_null_mat
## Recalculate the p-Values and quality scores.
UMAP_embed.do_cache = False ## Need to turn off file caching to force the
## method to recalculate.
UMAP_embed._calc_EES()
## Let's see the results!
UMAP_embed.plot(ax=ax2)
ax1.set_title(f"Affinity Perplexity = {old_perp}")
ax2.set_title(f"Affinity Perplexity = {new_perp}")
ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax2.set_xticklabels([])
ax2.set_yticklabels([])
fig.tight_layout()
plt.show()
I'm going to leave this whole thing open as something to prioritize in the next version because this should be easier! Also, this really underscores how these parameters should be separated semantically in the code. In my reply, I invented perp_aff
, but I'll actually make this a more obvious parameter in the code!
TLDR: You can probably leave perplexity
fixed, but future methods will automatically update it depending on the DRA.
from embedr.
Related Issues (9)
- EMBEDR cannot be loaded from main folder.
- Trouble importing EMBEDR after installation HOT 4
- Errors with reloading EMBEDR, Simes' method, question about figures HOT 3
- Curse of dimensionality with Nearest Neighbour approaches? HOT 2
- PCA Initialisation Error: ValueError: Cannot generate PCA initialization because input data, `X`, is an affinity matrix, not a samples x features matrix.
- Null affinity matrices not being correctly normalized HOT 1
- n_jobs doesn't propagate to t-SNE and UMAP calculations
- Error when installing EMBEDR - "error: unknown file type '.pyx' (from 'EMBEDR/quad_tree.pyx')" HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from embedr.