Code Monkey home page Code Monkey logo

embedr's Issues

Errors with reloading EMBEDR, Simes' method, question about figures

Hi,

I love the idea of EMBEDR and am very excited to try it out. I have a question and a couple of errors I am trying to deal with though.

  1. I was really hoping to be able to reproduce some of the figures from the papers supplementary, specifically the cell-wise ones (e.g. Fig s13 and s14). In the supp it says that all of these figures are available on the github repository, but the I cannot find them. Could you point me towards them, and the scripts used to produce them?

  2. When setting pVal_type='simes', embObj.fit returns the error:

 ~\AppData\Local\Temp/ipykernel_18496/4265488886.py in <module>
     24                     pVal_type='simes',
     25                     verbose=3)
---> 26     embObj.fit(datasets[dat]) ## Use 'fit' to generate the embeddings.
     27 
     28     embObjs[(alg, dat)] = embObj

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in fit(self, X)
    288 
    289         ## Get p-Values
--> 290         self.calculate_pValues()
    291 
    292     def _validate_with_data(self, X):

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in calculate_pValues(self)
   1840             simes_mult = n_embeds / np.arange(1, n_embeds + 1).reshape(-1, 1)
   1841             pVal_idx = np.argsort(pVals, axis=0)
-> 1842             summ_pVals = np.min(pVals[pVal_idx] * simes_mult, axis=0)
   1843 
   1844             self._pValues = pVals[:]
ValueError: operands could not be broadcast together with shapes (5,1000,1000) (5,1)

  1. When trying to run fit on a pre-run EMBEDR project to load it back in, I get the following error:
FileNotFoundError                         Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18496/4265488886.py in <module>
     24                     pVal_type='simes',
     25                     verbose=3)
---> 26     embObj.fit(datasets[dat]) ## Use `fit` to generate the embeddings.
     27 
     28     embObjs[(alg, dat)] = embObj

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in fit(self, X)
    279 
    280         ## Finally, we can do the computations!
--> 281         self._fit(null_fit=False)
    282 
    283         #####################

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in _fit(self, null_fit)
    516 
    517             ## Basically, no matter what, we need the kNN graph
--> 518             self.data_kNN = self.get_kNN_graph(self.data_X)
    519 
    520             ## If we're using t-SNE to embed or DKL as the EES, we need an

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in get_kNN_graph(self, X, null_fit)
    620         ## If we're doing file caching, first we want to try and load the graph
    621         if self.do_cache:
--> 622             loaded_kNN = self.load_kNN_graph(X,
    623                                              kNNObj=kNNObj,
    624                                              seed=seed,

c:\users\uqewats6\miniconda3\envs\distance\lib\site-packages\EMBEDR\embedr.py in load_kNN_graph(self, X, kNNObj, seed, null_fit, raise_error)
    752 
    753         ## If a path has been found to a matching kNN graph load it!
--> 754         with open(kNN_path, 'rb') as f:
    755             kNNObj = pkl.load(f)
    756             kNNObj.verbose = self.verbose

FileNotFoundError: [Errno 2] No such file or directory: `'./EMBEDR/projects/Run_3_180422_simes/tSNE_D_Sim\\57393bff6c57e524d74111f146808481\\tSNE_D_Sim\\57393bff6c57e524d74111f146808481\\Data_kNN_0000.knn'
  1. Do you have notebooks/scripts for your implementation of the DR other methods? I would like to test a new method with EMBEDR and these would be super helpful.

If you could give me a hand with any of these it would be great!

Thanks!
Ebony

Curse of dimensionality with Nearest Neighbour approaches?

Hi Authors,

Sorry this is less of a technical issue than a conceptual question.

You have alluded to the curse of dimensionality in the original paper, one major component of which is the dilution of "distances" (e.g. On the Surprising Behavior of Distance Metric in High-Dimensional Space, or a more accessible summary here).

Step 1 of EMBEDR relies on calculating NNs, the code appears to rely on default methods in numpy e.g. "euclidean", "l2", "sqeuclidean", ..., "sokalsneath", "yule"].

Do you have recommendations for these distance metrics to mitigate the "curse"? Or are there other parts of the algorithm to help with that?

EMBEDR cannot be loaded from main folder.

Attempting to load EMBEDR after a clean install in the main directory of this repo:
from EMBEDR import EMBEDR
causes the following error:

Traceback (most recent call last): 
  File "<stdin>", line 1, in <module> 
  File "~/TestEMBEDRInstall/EMBEDR-master/EMBEDR/__init__.py", line 1, in <module>
    from EMBEDR.embedr import EMBEDR, EMBEDR_sweep
  File "~/TestEMBEDRInstall/EMBEDR-master/EMBEDR/embedr.py", line 9, in <module>
    from EMBEDR.tsne import tSNE_Embed
  File "~/TestEMBEDRInstall/EMBEDR-master/EMBEDR/tsne.py", line 32, in <module>
    from EMBEDR import _tsne
ImportError: cannot import name '_tsne' from partially initialized module 'EMBEDR' (most likely due to a circular import) (~/TestEMBEDRInstall/EMBEDR-master/EMBEDR/__init__.py)

This is because the folder and the module have the same name, so that when from EMBEDR import _tsne is called, it first looks in the folder named EMBEDR for the _tsne.py module, which doesn't exist. We should rename the source folder to a different name so that EMBEDR applies to the package/module and not to that folder.

Error when installing EMBEDR - "error: unknown file type '.pyx' (from 'EMBEDR/quad_tree.pyx')"

Hello,

I hope you are doing well. I've been trying to clone the EMBEDR repository into a virtual environment but I seem to be running into a problem when I try to run the setup.py script. I've been getting this error:

./tmpd385kj9q/fftw3.c:1:10: fatal error: 'fftw3.h' file not found
#include <fftw3.h>
         ^~~~~~~~~
1 error generated.
FFTW3 header files couldn't be found. Using numpy for FFT.
running build
running build_py
running build_ext
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/aaronpresser/miniconda3/envs/data_analysis_practice/include -fPIC -O2 -isystem /Users/aaronpresser/miniconda3/envs/data_analysis_practice/include -I/Users/aaronpresser/miniconda3/envs/data_analysis_practice/include -I/Users/aaronpresser/miniconda3/envs/data_analysis_practice/Library/include -c ./tmpl63ikrc3/omp.c -o ./tmpl63ikrc3/omp.o
Found openMP.  Compiling with openmp flags...
error: unknown file type '.pyx' (from 'EMBEDR/quad_tree.pyx')

The most highly voted answer to this StackOverflow post (https://stackoverflow.com/questions/6846084/unknown-file-type-error-with-pyx-file) suggested it might be a bug - I don't know whether it is or isn't but I was wondering if you might be able to help me resolve this? The answer also suggested "hacking" the setup.py script to make it build the ".pyx" file, but I should say that I'm just a lowly biologist and my "programming abilities" don't extend much beyond installing Python packages! I know I would greatly appreciate any help you could provide.

I hope you had a nice Christmas!

Best,

Aaron

Trouble importing EMBEDR after installation

Hey Eric,

I hope you're doing well. I have a question similar to the one I asked you about a month or so ago - the instructions you provided then (#9) were very helpful for installing EMBEDR. I have been caught up with something for much of the last month and didn't have the chance to really use your package to realize it, but when I try to import the package into a Python script, it seems to be running into some dependency issues. I've got a few "ModuleNotFound" errors - one of them is for "pandas," another is for "matplotlib," and another is for "seaborn." I have resolved each of these by installing the packages into the virtual environment with conda, but then I am getting an "ImportError":

Traceback (most recent call last):
  File "/Users/aaronpresser/EMBEDR/test_embedr.py", line 1, in <module>
    import EMBEDR
  File "/Users/aaronpresser/EMBEDR/EMBEDR/__init__.py", line 1, in <module>
    from EMBEDR.embedr import EMBEDR, EMBEDR_sweep
  File "/Users/aaronpresser/EMBEDR/EMBEDR/embedr.py", line 9, in <module>
    from EMBEDR.tsne import tSNE_Embed
  File "/Users/aaronpresser/EMBEDR/EMBEDR/tsne.py", line 35, in <module>
    from EMBEDR import _tsne
ImportError: cannot import name '_tsne' from partially initialized module 'EMBEDR' (most likely due to a circular import) (/Users/aaronpresser/EMBEDR/EMBEDR/__init__.py)

For reference, after running:

conda create -n embedr python=3
git clone https://github.com/ejohnson643/EMBEDR.git
cd EMBEDR
conda install cython numba
pip install .

these are the packages that are installed:

(embedr) aaronpresser@Aarons-MacBook-Air EMBEDR % conda list
# packages in environment at /Users/aaronpresser/miniconda3/envs/embedr:
#
# Name                    Version                   Build  Channel
bzip2                     1.0.8                h0d85af4_4    conda-forge
ca-certificates           2021.10.8            h033912b_0    conda-forge
cython                    0.29.26         py310hba3363e_0    conda-forge
embedr                    2.1.1                    pypi_0    pypi
joblib                    1.1.0                    pypi_0    pypi
libblas                   3.9.0           13_osx64_openblas    conda-forge
libcblas                  3.9.0           13_osx64_openblas    conda-forge
libcxx                    12.0.1               habf9029_1    conda-forge
libffi                    3.4.2                h0d85af4_5    conda-forge
libgfortran               5.0.0           9_3_0_h6c81a4c_23    conda-forge
libgfortran5              9.3.0               h6c81a4c_23    conda-forge
liblapack                 3.9.0           13_osx64_openblas    conda-forge
libllvm11                 11.1.0               hd011deb_2    conda-forge
libopenblas               0.3.18          openmp_h3351f45_0    conda-forge
libzlib                   1.2.11            h9173be1_1013    conda-forge
llvm-openmp               12.0.1               hda6cdc1_1    conda-forge
llvmlite                  0.38.0          py310h003a20c_0    conda-forge
ncurses                   6.2                  h2e338ed_4    conda-forge
numba                     0.55.0          py310h3ca88a5_0    conda-forge
numpy                     1.21.5          py310ha69e199_0    conda-forge
openssl                   3.0.0                h0d85af4_2    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
pynndescent               0.5.6                    pypi_0    pypi
python                    3.10.2          h38b4d05_0_cpython    conda-forge
python_abi                3.10                    2_cp310    conda-forge
readline                  8.1                  h05e3726_0    conda-forge
scikit-learn              1.0.2                    pypi_0    pypi
scipy                     1.7.3                    pypi_0    pypi
setuptools                60.5.0          py310h2ec42d9_0    conda-forge
sqlite                    3.37.0               h23a322b_0    conda-forge
threadpoolctl             3.0.0                    pypi_0    pypi
tk                        8.6.11               h5dbffcc_1    conda-forge
tqdm                      4.62.3                   pypi_0    pypi
tzdata                    2021e                he74cb21_0    conda-forge
umap-learn                0.5.2                    pypi_0    pypi
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                haf1e3a3_1    conda-forge
zlib                      1.2.11            h9173be1_1013    conda-forge

I have tried resolving this by installing "pandas," "matplotlib," and "seaborn" alongside "cython" and "numba," before I run pip install . but it returns the same error. Do you know how I might be able to resolve this "ImportError"?

Also for reference, here is my the setup of my conda environment, if it is helpful:

(embedr) aaronpresser@Aarons-MacBook-Air EMBEDR % conda info

     active environment : embedr
    active env location : /Users/aaronpresser/miniconda3/envs/embedr
            shell level : 2
       user config file : /Users/aaronpresser/.condarc
 populated config files : /Users/aaronpresser/.condarc
          conda version : 4.11.0
    conda-build version : 3.21.4
         python version : 3.9.5.final.0
       virtual packages : __osx=10.16=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /Users/aaronpresser/miniconda3  (writable)
      conda av data dir : /Users/aaronpresser/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/osx-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/aaronpresser/miniconda3/pkgs
                          /Users/aaronpresser/.conda/pkgs
       envs directories : /Users/aaronpresser/miniconda3/envs
                          /Users/aaronpresser/.conda/envs
               platform : osx-64
             user-agent : conda/4.11.0 requests/2.26.0 CPython/3.9.5 Darwin/20.6.0 OSX/10.16
                UID:GID : 501:20
             netrc file : None
           offline mode : False

Best,

Aaron

PCA Initialisation Error: ValueError: Cannot generate PCA initialization because input data, `X`, is an affinity matrix, not a samples x features matrix.

When trying to run EMBEDR.fit() with DRA_params:{'initialization':'pca'}, this error is returned:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Input In [23], in <cell line: 2>() 2 for alg,dat in DR_params: #for alg, param in DR_params: 3 # ## If we're doing t-SNE, then we use the perplexityparameter... 4 # if alg.lower() == 'tsne': (...) 11 12 ## Initialize a new object at each loop. 13 embObj = EMBEDR(DRA=alg, 14 perplexity=30, 15 n_jobs=n_jobs, (...) 22 project_name=f'{alg}_{dat}', 23 project_dir=project_dir) ---> 25 embObj.fit(input_datasets[dat]) ## Usefit` to generate the embeddings.
27 embObjs[(alg, dat)] = embObj

File ~\Miniconda3\envs\DRaim2\lib\site-packages\EMBEDR\embedr.py:280, in EMBEDR.fit(self, X)
277 self._validate_with_data(X)
279 ## Finally, we can do the computations!
--> 280 self._fit(null_fit=False)
282 #####################
283 ## Fit to the NULL ##
284 #####################
286 self._fit(null_fit=True)

File ~\Miniconda3\envs\DRaim2\lib\site-packages\EMBEDR\embedr.py:532, in EMBEDR._fit(self, null_fit)
530 ## We then need to get the requested embeddings.
531 if (self.DRA in ['tsne', 't-sne']):
--> 532 dY, dEES = self.get_tSNE_embedding(X=self.data_X,
533 kNN_graph=self.data_kNN,
534 aff_mat=self.data_P)
536 elif (self.DRA in ['umap']):
537 dY, dEES = self.get_UMAP_embedding(X=self.data_X,
538 kNN_graph=self.data_kNN,
539 aff_mat=self.data_P)

File ~\Miniconda3\envs\DRaim2\lib\site-packages\EMBEDR\embedr.py:1208, in EMBEDR.get_tSNE_embedding(self, X, kNN_graph, aff_mat, null_fit, return_tSNE_objects)
1198 seed_offset = n_embeds_made + ii
1200 embObj = tSNE_Embed(n_components=self.n_components,
1201 perplexity=self.perplexity,
1202 n_jobs=self.n_jobs,
1203 random_state=self._seed + seed_offset,
1204 verbose=self.verbose,
1205 **self.DRA_params)
-> 1208 embObj.fit(aff_mat)
1210 tmp_embed_arr[ii] = embObj.embedding[:]
1212 ## Calculate the EES

File ~\Miniconda3\envs\DRaim2\lib\site-packages\EMBEDR\tsne.py:200, in tSNE_Embed.fit(self, X, **aff_kwargs)
196 if self.verbose >= 1:
197 print(f"\nGenerating {self.n_components}-dimensional embedding"
198 f" with t-SNE!")
--> 200 P = self.initialize_embedding(X, **aff_kwargs)
202 ## Optimize Early Exaggeration Phase
203 try:

File ~\Miniconda3\envs\DRaim2\lib\site-packages\EMBEDR\tsne.py:327, in tSNE_Embed.initialize_embedding(self, X, **aff_kwargs)
325 err_str += f", X, is an affinity matrix, not a samples x"
326 err_str += f" features matrix."
--> 327 raise ValueError(err_str)
329 ## If we wanted a spectral initialization, do that here.
330 elif self.initialization == 'spectral':

ValueError: Cannot generate PCA initialization because input data, X, is an affinity matrix, not a samples x features matrix.`

I have been trying to work through the code, but I am unclear on why initialisation of the t-SNE embeddings swaps from taking the input data to only taking the affinity matrix, and therefore producing this error.

Please help!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.