scverse / scanpy_usage Goto Github PK

Scanpy use cases.

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 47.02% Python 0.08% HTML 52.84% R 0.05%

scanpy_usage's Introduction

Scanpy Usage

Selected usage examples for Scanpy releases - use the GitHub history button when viewing a notebook to switch between different releases.

To navigate the repository, see the examples in the documentation.

scanpy_usage's People

Contributors

Stargazers

Watchers

Forkers

phoebidas volkerbergen wflynny stella-gao russellxie xchromosome219 cemalley swolock alabarga renbijie suger0917 ivirshup jun-lizst fidelram verohan flamehuang whdmstjr0702 tomwhite mpandela omomar96 polojacky egoecho s0fia-hu lkremer mengchengyao janinesengstack shobistassen yueyuxiaoyang letaylor shanzhyang preetida wariobrega fagan2888 sguo1989 vladie0 elenaramosv ksimi7 sangram-rout reomorimoto direnardak crsky1023 manuellessi zifeng-l jessijessi yqyuhao christian-heyer gabriel-pozo mechealeth hoangmgh notvaldemaras jmp448 divyanshu109 imet-k dzhao98 hqi87 zijunmeng

scanpy_usage's Issues

Problem with sc.pp.highly_variable_genes()

I'm going through the https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html#Finding-marker-genes tutorial. I'm using my own 10x data. Using scanpy 1.37 and now I tried the same with 1.4.

I'm at the line "sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)"

I'm getting:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-ea8d9dc47463> in <module>
----> 1 sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)

~/anaconda3/lib/python3.6/site-packages/scanpy/preprocessing/highly_variable_genes.py in highly_variable_genes(adata, min_disp, max_disp, min_mean, max_mean, n_top_genes, n_bins, flavor, subset, inplace)
    115         # a normalized disperion of 1
    116         one_gene_per_bin = disp_std_bin.isnull()
--> 117         gen_indices = np.where(one_gene_per_bin[df['mean_bin']])[0].tolist()
    118         if len(gen_indices) > 0:
    119             logg.msg(

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
    909             key = check_bool_indexer(self.index, key)
    910 
--> 911         return self._get_with(key)
    912 
    913     def _get_with(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in _get_with(self, key)
    951                 return self.loc[key]
    952 
--> 953             return self.reindex(key)
    954         except Exception:
    955             # [slice(0, 5, None)] will break if you convert to ndarray,

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in reindex(self, index, **kwargs)
   3732     @Appender(generic.NDFrame.reindex.__doc__)
   3733     def reindex(self, index=None, **kwargs):
-> 3734         return super(Series, self).reindex(index=index, **kwargs)
   3735 
   3736     def drop(self, labels=None, axis=0, index=None, columns=None,

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in reindex(self, *args, **kwargs)
   4344         # perform the reindex on the axes
   4345         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 4346                                   fill_value, copy).__finalize__(self)
   4347 
   4348     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4357             ax = self._get_axis(a)
   4358             new_index, indexer = ax.reindex(labels, level=level, limit=limit,
-> 4359                                             tolerance=tolerance, method=method)
   4360 
   4361             axis = self._get_axis_number(a)

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/category.py in reindex(self, target, method, level, limit, tolerance)
    501         else:
    502             if not target.is_unique:
--> 503                 raise ValueError("cannot reindex with a non-unique indexer")
    504 
    505             indexer, missing = self.get_indexer_non_unique(np.array(target))

ValueError: cannot reindex with a non-unique indexer

I'm not sure where the non-unique problem is coming from, my guess was from the gene list. I ran "adata.var_names_make_unique() " as outlined in the tutorial. So I'm guessing it's something else.

Thank you.

BPMC data link

Hi, sorry to bother you again,
It looks like that the BPMC link here https://github.com/theislab/scanpy_usage/tree/master/170505_seurat
directs to brain dataset in 10 x genomics (1.3 cells) whereas I was I expecting 2700 Peripheral Blood Mononuclear Cells!
Is this a typo or I am overlooking something?
thanks
Hashem

Incorrect reference doi in seurat.ipynb

Hi there,
Thank you for the nice tutorials.

I was just going through the seurat.ipynb tutorial and found that one of the reference links seems to be broken.

In the "Clustering the graph" Section, the first reference ("Levine et al. (2015)") should point to https://doi.org/10.1016/j.cell.2015.05.047

Currently it produces an DOI not found Error (seems like the last character from the URL is missing in the notebook).

can't reproduce the tSNE plot of zheng17.

I tried to reproduce the tSNE plot of 170503_zheng17.

However, I can't reproduce it. I used the following script.

adata = sc.read(path + 'matrix.mtx', cache=True).T
adata.var_names = pd.read_csv(path + 'genes.tsv', header=None, sep='\t')[1]
adata.obs_names = pd.read_csv(path + 'barcodes.tsv', header=None)[0]
adata.var_names_make_unique()
adata.obs['bulk_labels'] = pd.read_csv('./zheng17_bulk_lables.txt', header=None)[0].values
adata = adata[:use_first_n_observations]

sc.pp.filter_genes(adata, min_counts=1)
sc.pp.normalize_per_cell(adata)
filter_result = sc.pp.filter_genes_dispersion(
	adata.X, flavor='cell_ranger', n_top_genes=1000, log=False)
sc.logging.print_memory_usage()
sc.pl.filter_genes_dispersion(filter_result, log=True, show=False, save="hoge.pdf")
adata = adata[:, filter_result.gene_subset]
sc.pp.normalize_per_cell(adata)
sc.pp.log1p(adata)
sc.pp.scale(adata)
sc.tl.pca(adata, n_comps=50, svd_solver='arpack')
sc.logging.print_memory_usage()
sc.pl.pca_loadings(adata,save="hoge.pdf")

sc.tl.tsne(adata)
sc.pl.tsne(adata, color='bulk_labels', show=False, save="hoge.pdf")

The pakages environments I used are scanpy==1.0.4 anndata==0.6.16 numpy==1.14.1 scipy==1.0.0 pandas==0.21.0 scikit-learn==0.19.1 statsmodels==0.8.0.

The following figure is a tSNE plot I produced. However, the plot is a little different from the original one.

How do I change the script to reproduce the original t-SNE plot?

Assign cell types to clusters

I am wondering how to match the barcodes from text file to the barcodes from csv file in order to generate figure with the cell types assigned to each cluster? Any suggestions will be appreciated! I am working with Scanpy, running all the analysis in Jupyter notebook.

score_genes_cell_cycle not running in Scanpy 1.1a1+15.g1570f7c

Hi,
I tested the provided cell cycle notebook (with the data provided) in the following scanpy setup:

Running Scanpy 1.1a1+15.g1570f7c on 2018-05-18 16:30.
Dependencies: anndata==0.6.1+1.ga489245 numpy==1.13.1 scipy==0.19.1 pandas==0.21.0 scikit-learn==0.19.0 statsmodels==0.8.0 python-igraph==0.7.1 louvain==0.6.0+20.g3de109d

running the line
sc.tl.score_genes_cell_cycle(adata, s_genes=s_genes, g2m_genes=g2m_genes)

results in

calculating cell cycle phase
computing score 'S_score'
--> could not add 
    'S_score', score of gene set (adata.obs)
computing score 'G2M_score'
--> could not add 
    'G2M_score', score of gene set (adata.obs)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-5874179be677> in <module>()
----> 1 sc.tl.score_genes_cell_cycle(adata, s_genes=s_genes, g2m_genes=g2m_genes)

~/Documents/Python/scanpy/scanpy/tools/score_genes.py in score_genes_cell_cycle(adata, s_genes, g2m_genes, copy, **kwargs)
    170     # add g2m-score
    171     score_genes(adata, gene_list=g2m_genes, score_name='G2M_score', ctrl_size=ctrl_size, **kwargs)
--> 172     scores = adata.obs[['S_score', 'G2M_score']]
    173 
    174     # default phase is S

~/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2131         if isinstance(key, (Series, np.ndarray, Index, list)):
   2132             # either boolean or fancy integer index
-> 2133             return self._getitem_array(key)
   2134         elif isinstance(key, DataFrame):
   2135             return self._getitem_frame(key)

~/miniconda3/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_array(self, key)
   2175             return self._take(indexer, axis=0, convert=False)
   2176         else:
-> 2177             indexer = self.loc._convert_to_indexer(key, axis=1)
   2178             return self._take(indexer, axis=1, convert=True)
   2179 

~/miniconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
   1267                 if mask.any():
   1268                     raise KeyError('{mask} not in index'
-> 1269                                    .format(mask=objarr[mask]))
   1270 
   1271                 return _values_from_object(indexer)

KeyError: "['S_score' 'G2M_score'] not in index"

Cannot open louvain.csv.gz

Hi,

I have downloaded the louvain.csv.gz file. But I cannot extract it. Tried gunzip, reading it through python and R but getting the error that it is Not a gzipped file.

Can someone please point to what am I missing?

Thank you!

unable to find genes when export to SPRING object

Hi scanpy team,
I convert Seurat obj to anndata with:
mda_ad <- Convert(from = mda, to = "anndata",filename = 'mda_day27_ad.h5ad')

mda_ad
AnnData object with n_obs × n_vars = 6039 × 19331
obs: 'n_genes', 'n_counts', 'orig_ident', 'percent_mito', 'S_Score', 'G2M_Score', 'Phase', 'old_ident', 'res_1_2'
var: 'gene.mean', 'gene.dispersion', 'gene.dispersion.scaled'
obsm: 'X_pca', 'X_tsne'

Then I run as scanpy's tutorial https://github.com/theislab/scanpy_usage/blob/master/171111_SPRING_export/SPRING_export.ipynb
to export SPRING, however, I was unable to find no response when I enter a gene in the web browser.
I have all required files in the folder.

Any suggestion how to solve it?

Thanks!

can't replicate 170505_seurat/seurat.ipynb notebook

I tried to replicate the .. notebook using scanpy but I got some different results (see notebook here):

The tsne plot looks different (although similar groups can be seen).
The clustering is different, particularly the number of clusters is smaller
More concerning, the results of sc.tl.rank_genes_groups(adata, 'louvain', method='logreg') seem quite different compared to the results from the default method (which are similar to the original notebook for some groups). For example, for louvain cluster '0', the top ranking genes in the original notebook are LDHB and CD3D. I see these two genes using the default ranking method. However, for the 'logreg' method the list of top genes is quite different.

Would be possible for you to re-run the notebook to see if you get the same results that I get? Maybe the data that you are using is different than the one I use (I downloaded the pbmc3k data from 10x)?

Export to spring issue (AttributeError: 'numpy.ndarray' object has no attribute 'tocsc')

Hi,

I am having an issue exporting my h5ad file to spring for visualization. I have been following the spring export tutorial but every time I try to run sc.export_to.spring_project I get the following error:

sc.export_to.spring_project(adata1, '/home/seth/anaconda3/write', 'draw_graph')
Writing subplot to /home/seth/anaconda3
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-656bbf38658a> in <module>
----> 1 sc.export_to.spring_project(adata1, '/home/seth/anaconda3/write', 'draw_graph', cell_groupings='louvain_r0.8_sub')

~/anaconda3/envs/ddocent_env/lib/python3.7/site-packages/scanpy/_exporting.py in spring_project(adata, project_dir, embedding_method, subplot_name, cell_groupings, custom_color_tracks, total_counts_key, overwrite)
    101     # Ideally, all genes will be written from adata.raw
    102     if adata.raw is not None:
--> 103         E = adata.raw.X.tocsc()
    104         gene_list = list(adata.raw.var_names)
    105     else:

AttributeError: 'numpy.ndarray' object has no attribute 'tocsc'

I am assuming the error has to do with converting the sparse matrix adata1.X to csc format. However, I can't seem to get around this error. Any help would be greatly appreciated!

Thanks,
Seth

Can't download bulk labels in project example zheng17

Dear @falexwolf and Scanpy Team!
I can't download bulk labels as you mentioned at Zheng17 Project.

Can you re-upload file bulk.txt?
I will be very grateful to you!

Louvain method raises segmentation fault

Hello,
I am running the "zheng17_pbmc68k_cellranger_Py.ipynb" notebook and I am having a segmentation fault error when I try to run the code below:
sc.tl.louvain(adata, resolution=1.2)

I performed the test cases provided by the louvain-igraph package. They both work fine.
My scanpy version is 0.4.2
My louvain version is 0.6.1

Do you know what the cause of this error might be?
Thanks in advance,
anilbey

memory

Hi,
I am trying to run the full 1.3M 10X mouse cell dataset (using the 1M_neurons_filtered_gene_bc_matrices_h5.h5 file from 10X website).
I have 126GB RAM and Intel® Xeon(R) W-2123 CPU @ 3.60GHz × 8 which is above the requirements you mention needed to run the full cluster.py method without subsampling.
I get memory error at the filter_genes_dispersion stage, should i modify the code in anyway? (without subsampling)
Thanks,Shobi

adata = sc.read_10x_h5(filename)
adata.var_names_make_unique()
sc.pp.recipe_zheng17(adata)

running recipe zheng17
filtered out 3983 genes that are detected in less than 1 counts
Traceback (most recent call last):
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 61, in
main()
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 58, in main
basic_analysis(DIR+'1M_neurons_filtered_gene_bc_matrices_h5.h5')
File "/home/shobi/PycharmProjects/my_first_conda_project/10X_mousebrain.py", line 24, in basic_analysis
sc.pp.recipe_zheng17(adata)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_recipes.py", line 108, in recipe_zheng17
adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 109, in filter_genes_dispersion
mean, var = materialize_as_ndarray(_get_mean_var(X))
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scanpy/preprocessing/_utils.py", line 10, in _get_mean_var
mean = X.mean(axis=0)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/base.py", line 1077, in mean
inter_self = self.astype(inter_dtype)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/data.py", line 74, in astype
return self.copy()
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/data.py", line 91, in copy
return self._with_data(self.data.copy(), copy=True)
File "/home/shobi/anaconda3/envs/my_first_conda_project/lib/python3.6/site-packages/scipy/sparse/compressed.py", line 1124, in _with_data
return self.class((data,self.indices.copy(),self.indptr.copy()),
MemoryError

Export to SPRING error

I'm using this tutorial for exporting adata object to SPRING input.

First suggestion to update this tutorial, I had to modify this slightly per the updates in the command:
sc.external.exporting.spring_project(adata, './spring_adata',embedding_method='umap',overwrite=True)

However, my main issue is even this command does not completely output the files required for SPRING plot.
Specifically, the directory with files as follows are not created- gene_colors/color_data_all_genes-*.csv
This is a required input for SPRING.
Can you please update the code to reflect this output.