Code Monkey home page Code Monkey logo

Comments (3)

ejohnson643 avatar ejohnson643 commented on July 23, 2024

Hi Oskar!

If you want run the code using UMAP, it will ignore the perplexity parameter. I will make this clear in the documentation! Thanks for the question!

from embedr.

ohickl avatar ohickl commented on July 23, 2024

Thanks! I was a bit confused by this:

perplexity: float
        Similar to the perplexity parameter from van der Maaten (2008); sets 
        the scale of the affinity kernel used to measure embedding quality.  
        NOTE: In the EMBEDR algorithm, this parameter is used EVEN WHEN NOT 
        USING t-SNE!  Default is 30

in the EMBEDR class.

from embedr.

ejohnson643 avatar ejohnson643 commented on July 23, 2024

Oh, of course! The perplexity parameter does double duty in that it is involved with how the embedding quality is assessed as well as in running t-SNE. That is, currently, the quality of an embedding is calculated as the similarity of two data-affinity matrices, one from the original data space and one from the embedded space. The high-dimensional affinity matrix depends on a perplexity parameter, perp_aff, which needs to be set somehow.

If you use the same value for perp_aff throughout a sweep of the UMAP n_neighbors parameter, you are examining the quality with which neighborhoods of a size set by perp_aff are embedded by UMAP as UMAP is allowed to use more or fewer neighbors to actually carry out the embedding. This is akin to fixing the resolution of your "quality ruler" and then examining the different conditions created by UMAP. I don't think there will be anything wrong with this.

Alternately, you can change perp_aff to correspond to the neighborhood size that t-SNE/UMAP is operating at. This is easy to do with t-SNE because perp_aff can be set to be the same as the canonical perplexity. However, to do this with UMAP, we need to map perp_aff to some sort of k_effective number of nearest neighbors. I am currently working on implementing this.

However, if you're concerned after you've run your sweep that you've chosen the wrong perp_aff for some reason, you don't have to re-run everything, but you will have to hack the methods a bit. What you can do is something like the following:

from embedr import EMBEDR
import matplotlib.pyplot as plt
import numpy as np
from openTSNE.affinity import PerplexityBasedNN
import utility as utl

X = np.loadtxt("./Data/mnist2500_X.txt")

old_perp = 30
new_perp = 100

n_jobs = -1
seed = 1
verbose = 5

n_data_embed = 1
n_null_embed = 2

fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(12, 5))

## Initialize and fit the data like normal
UMAP_embed = EMBEDR(perplexity=old_perp,
                    dimred_params={'n_neighbors': n_neighbors},
                    # cache_results=False,  ## Turn off file caching.
                    dimred_alg="UMAP",
                    n_jobs=n_jobs,
                    random_state=seed,
                    verbose=verbose,
                    n_data_embed=n_data_embed,
                    n_null_embed=n_null_embed,
                    project_name='changing_perplexity_test')
UMAP_embed.fit(X)

## Let's see the results!
UMAP_embed.plot(ax=ax1, show_cbar=False)

## Calculate a new affinity matrix at the new perplexity
new_aff_mat = PerplexityBasedNN(X,
                                perplexity=new_perp,
                                n_jobs=n_jobs,
                                random_state=seed,
                                verbose=verbose)

## Calculate null affinity matrices at the new perplexity
new_null_mat = {}
for nNo in range(n_null_embed):

    null_X = utl.generate_nulls(X, seed=seed + nNo).squeeze()
    nP = PerplexityBasedNN(null_X,
                           perplexity=new_perp,
                           n_jobs=n_jobs,
                           random_state=seed,
                           verbose=verbose)

    new_null_mat[nNo] = nP

## Reset the affinity matrices in the method
UMAP_embed._affmat = new_aff_mat
UMAP_embed._null_affmat = new_null_mat

## Recalculate the p-Values and quality scores.
UMAP_embed.do_cache = False  ## Need to turn off file caching to force the
                             ## method to recalculate.
UMAP_embed._calc_EES()

## Let's see the results!
UMAP_embed.plot(ax=ax2)

ax1.set_title(f"Affinity Perplexity = {old_perp}")
ax2.set_title(f"Affinity Perplexity = {new_perp}")
ax1.set_xticklabels([])
ax1.set_yticklabels([])
ax2.set_xticklabels([])
ax2.set_yticklabels([])

fig.tight_layout()

plt.show()

I'm going to leave this whole thing open as something to prioritize in the next version because this should be easier! Also, this really underscores how these parameters should be separated semantically in the code. In my reply, I invented perp_aff, but I'll actually make this a more obvious parameter in the code!

TLDR: You can probably leave perplexity fixed, but future methods will automatically update it depending on the DRA.

from embedr.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.