pinellolab / stream Goto Github PK

View Code? Open in Web Editor NEW

168.0 16.0 45.0 255.3 MB

STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data

Home Page: http://stream.pinellolab.org

License: GNU Affero General Public License v3.0

Python 0.95% Jupyter Notebook 99.05% Dockerfile 0.01%

singlecell trajectory lineage visualization scatac-seq scrna-seq python

stream's Introduction

STREAM (Latest version v1.1)

Latest News

Dec 17, 2021

Version 1.1 is now available.

fixed incompatible issues related to the latest version of pandas
fixed plotting issues related to the latest version of matplotlib and seaborn

Jun 1, 2020

Version 1.0 is now available. The v1.0 has added a lot of new functionality:

added QC metrics and plots
added support of scATAC-seq analysis using peaks as features
added support of interactive plots with plotly
redesigned all plotting-related functions
redesigned mapping procedure
removed support of STREAM command line interface

See v1.0 for more details.

Jan 14, 2020

Version 0.4.1 is now available. We added support of feature top_pcs for Mapping

Nov 26, 2019

Version 0.4.0 is now available. Numerous changes have been introduced. Please check v0.4.0 for details.

Introduction

STREAM (Single-cell Trajectories Reconstruction, Exploration And Mapping) is an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data.

STREAM is now published in Nature Communications! Please cite our paper Chen H, et al. Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM. Nature Communications, volume 10, Article number: 1903 (2019). if you find STREAM helpful for your research.

STREAM is written using the class anndata Wolf et al. Genome Biology (2018) and available as user-friendly open source software and can be used interactively as a web-application at stream.pinellolab.org, as a bioconda package https://bioconda.github.io/recipes/stream/README.html and as a standalone command-line tool with Docker https://github.com/pinellolab/STREAM

Installation with Bioconda (Recommended)

$ conda install -c bioconda stream

If you are new to conda environment:

If Anaconda (or miniconda) is already installed with Python 3, skip to 2) otherwise please download and install Python3 Anaconda from here: https://www.anaconda.com/download/
Open a terminal and add the Bioconda channel with the following commands:

$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge

Create an environment named env_stream , install stream, jupyter, and activate it with the following commands:

For single cell RNA-seq analysis:

$ conda create -n env_stream python=3.7 stream=1.0 jupyter
$ conda activate env_stream

For single cell ATAC-seq analysis:

$ conda create -n env_stream python=3.7 stream=1.0 stream_atac jupyter
$ conda activate env_stream

To perform STREAM analyis in Jupyter Notebook as shown in Tutorial, type jupyter notebook within env_stream:

$ jupyter notebook

You should see the notebook open in your browser.

Tutorial

Example for scRNA-seq: 1.1-STREAM_scRNA-seq (Bifurcation).ipynb
Example for scRNA-seq: 1.2-STREAM_scRNA-seq (Multifurcation) on 2D visulization.ipynb
Example for scRNA-seq: 1.3-STREAM_scRNA-seq (Multifurcation) on original embedding.ipynb
Example for scATAC-seq(using peaks): 2.1-STREAM_scATAC-seq_peaks.ipynb
Example for scATAC-seq(using k-mers): 2.2-STREAM_scATAC-seq_k-mers.ipynb
Example for scATAC-seq(using motifs): 2.3-STREAM_scATAC-seq_motifs.ipynb
Example for mapping feature: 3-STREAM_mapping.ipynb
Example for complex trajectories: 4-STREAM_complex_trajectories.ipynb

Tutorials for v0.4.1 and earlier versions can be found here

Installation with Docker

With Docker no installation is required, the only dependence is Docker itself. Users will completely get rid of all the installation and configuration issues. Docker will do all the dirty work for you!

Docker can be downloaded freely from here: https://store.docker.com/search?offering=community&type=edition

To get an image of STREAM, simply execute the following command:

$ docker pull pinellolab/stream

Basic usage of docker run

$ docker run [OPTIONS] IMAGE [COMMAND] [ARG...]

Options:

--publish , -p	Publish a container’s port(s) to the host  
--volume , -v	Bind mount a volume  
--workdir , -w	Working directory inside the container

To use STREAM inside the docker container:

Mount your data folder and enter STREAM docker container:

$ docker run --entrypoint /bin/bash -v /your/data/file/path/:/data -w /data -p 8888:8888 -it pinellolab/stream:1.0

Inside the container, launch Jupyter notebook:

root@46e09702ce87:/data# jupyter notebook --ip 0.0.0.0 --port 8888 --no-browser --allow-root

Access the notebook through your desktops browser on http://127.0.0.1:8888. The notebook will prompt you for a token which was generated when you create the notebook.

STREAM interactive website

In order to make STREAM user friendly and accessible to non-bioinformatician, we have created an interactive website: http://stream.pinellolab.org

The website can also run on a local machine. More details can be found https://github.com/pinellolab/STREAM_web

Credits: H Chen, L Albergante, JY Hsu, CA Lareau, GL Bosco, J Guan, S Zhou, AN Gorban, DE Bauer, MJ Aryee, DM Langenau, A Zinovyev, JD Buenrostro, GC Yuan, L Pinello

stream's People

Contributors

Stargazers

Watchers

Forkers

albluca nadiatahiri fangge-li biobenkj anjanbharadwaj gyd1990 lamhda chitrita zhiliu-git lucapinello huidongchen macklj tuqiang2014 pythseq mengchengyao chenxofhit mwang21-stat697 leonardherault cloudfora zhezhenwang wlyxpf thesallygardens duosu kant am-official jhuanglabtools yaqiangcao monahton haowang0508 natnaelt jakelehle gcyuan j-bac jksr kerwin12580 chengmingbo flower-drum quanruij hzauleibowen gladelephant hshcao

stream's Issues

Open a precomputed dataset on webapp for visualisation only

Hi,

First of all thank you for this great tool! I managed to apply the analysis process on my dataset from your website and I downloaded the results zip file, including DE and transitionnal gene analysis. I now have a folder having my data files, all result figures and data, and the precomputed folder.

My problem is that I can't find the way to visualise my results again on a new web server on my side using docker command. My guess was to run the command STREAM_webapp in the same folder as the results, but data does not show up on the precomputed dataset list. Am I doing something wrong? Or does this require another type of command ?

Best Wishes,

Louis Faure

Visual genes - issue

Hi,

I am Running your docker with stream web-application

There is apparently problem with Visual genes code.

It output the error:

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/RESULTS_FOLDER/STREAM_89e7f264-cb7f-46a0-bccd-bf287d4a50dd/S5/subway_coord_Sox9.csv'

but the file is there and has content:

(base) root@90cc84aab7c2:/stream_web# head -3 /tmp/RESULTS_FOLDER/STREAM_89e7f264-cb7f-46a0-bccd-bf287d4a50dd/S5/subway_coord_sox9.csv
D0 D1 Sox9
AAAGCAAGTTGGTTTG 0.6992542726449995 -0.2282641959402088 0.9517325
AAAGCAAGTCAACATC 0.40922910959385095 -0.015521488506199835 0.4657253

When I change the gene name in the file name to capital sox9 ->Sox9 it is able to show the plot.
Your code is saving gene name as small letter but tried to read as capital letters.

additionally there is only subway plot and no stream plot is showing up for "Visualize Genes of Interest"
Stream plot issue is the same small vs capital letter in the name of PNG file.

best,

Compatibility with singularity

Hi,

Thank you for STREAM.

For our government computer we have to run singularity instead of docker. It reads docker files, but pull is only supported for shub URI http://singularity.lbl.gov/singularity-tutorial

Would you be willing to provide an alternative for us to get the docker image?
Setting up the shub URI seems to be automated, http://singularity.lbl.gov/singularity-tutorial

Warm regards,
Lynn

cell_info.tsv not having all cells

Hi,

I was using the command line interface, and was looking through the cell_info.tsv file, but not all of the cells were there (I needed to look at this because I think that file has the true branches for each cell, right?). Is this because the tool is filtering cells (I started with 4000 cells, yet it was filtered to around 450)?

Issues adding labels and colours

I'm sorry, I'm having a very basic issue and for the life of me can not resolve it. This is what I run:

adata = st.read(file_name= 'stem.txt', workdir= './Stream')
adata

The above is my count file.
Then I pass :

st.add_cell_colors(adata, file_name = 'simplecolour.txt')
st.add_cell_labels(adata, file_name = 'lineage.txt')

Where 'simplecolour' is a text file (no headers) where column 1 is the name of the cells as they appear in the count file, and column two is the colour that that cell should have (I've used hex codes).

'lineage' is again all the cells, as named in the count file in column 1 and in column 2 is their appropriate lineage. These two commands passed together throw the following error :

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2896             try:
-> 2897                 return self._engine.get_loc(key)
   2898             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'label'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-252-a08df3834e60> in <module>
----> 1 st.add_cell_colors(adata, file_name = 'simplecolour.txt')
      2 st.add_cell_labels(adata, file_name = 'lineage.txt')

~/anaconda3/lib/python3.7/site-packages/stream/core.py in add_cell_colors(adata, file_path, file_name)
    289 
    290     _fp = lambda f:  os.path.join(file_path,f)
--> 291     labels_unique = adata.obs['label'].unique()
    292     if(file_name!=None):
    293         df_colors = pd.read_csv(_fp(file_name),sep='\t',header=None,index_col=None,names=['label','color'],

~/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2993             if self.columns.nlevels > 1:
   2994                 return self._getitem_multilevel(key)
-> 2995             indexer = self.columns.get_loc(key)
   2996             if is_integer(indexer):
   2997                 indexer = [indexer]

~/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2897                 return self._engine.get_loc(key)
   2898             except KeyError:
-> 2899                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2900         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2901         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'label'

I'm sorry if im being dumb. I tried doing the obvious thing of adding in the header for the appropriate column as 'label' and 'label_color' but that hasnt worked either.
Would anyone have a suggestion? or be able to share even a screenshot of how they present their data so I can replicate it?
Thanks

To much branches after learning elastic principal graph

Hi, thank you for this nice package.
I use Stream in Anaconda Python3 version 0.3.9 and when i follow the tutorial :" Example for scRNA-seq: 1.STREAM_scRNA-seq.ipynb" it's ok until the
st.elastic_principal_graph(adata) command, here is the result :

st.elastic_principal_graph(adata)
Learning elastic principal graph...
[1]
"Constructing tree 1 of 1 / Subset 1 of 1"
[1]
"Computing EPG with 50 nodes on 1656 points and 3 dimensions"
[1]
"Using a single core"
Nodes =
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
BARCODE ENERGY NNODES NEDGES NRIBS NSTARS NRAYS NRAYS2 MSE MSEP FVE FVEP UE UR URN URN2 URSD
1|1|11||50
0.0008962
50
49
19
11
0
0
0.0008341
0.001364
0.5398
0.2474
5.081e-05
1.131e-05
0.0005655
0.02827
0
139.597 sec elapsed
[[1]]
Number of branches after learning elastic principal graph: 30

could you please help me with this issue ?

Thanks

Charles

st.map_new_data "se" projection is not taken

The dimension_reduction method allows "se", but isn't taken by map_new_data. Only mlle, pca and umap are allowed.

In 1st tutorial,

Tips:

    by default n_components =3
    For biological process with simple bifurcation or linear trajectory, two components would be recommended
    e.g, st.dimension_reduction(adata,n_components =2)

    Several alternative dimension reduction methods are also supported, se(spectral embedding), umap, pca.
    by default, method ='mlle'.
        For large dataset, se(Spectral Embedding) works faster than MLLE while preserving the similar compact structure to MLLE.
        e.g. st.dimension_reduction(adata,method ='se')
        For large dataset, lowering the percentage of neighbors (nb_pct=0.1 by default) will speed up this step
        e.g, st.dimension_reduction(adata,nb_pct =0.01)

But in map_new_data (.0.3.9), no sepcificiation of se.

st.map_new_data(
    adata,
    adata_new,
    feature='var_genes',
    method='mlle',
    use_radius=True,
)
Docstring:
Map new data to the inferred trajectories

Parameters
----------
adata: AnnData
    Annotated data matrix.
adata_new: AnnData
    Annotated data matrix for new data (to be mapped).
feature: `str`, optional (default: 'var_genes')
    Choose from {{'var_genes','all'}}
    Feature used for mapping. This should be consistent with the feature used for inferring trajectories
    'var_genes': most variable genes
    'all': all genes
method: `str`, optional (default: 'mlle')
    Choose from {{'mlle','umap','pca'}}
    Method used for mapping. This should be consistent with the dimension reduction method used for inferring trajectories
    'mlle': Modified locally linear embedding algorithm
    'umap': Uniform Manifold Approximation and Projection
    'pca': Principal component analysis
use_radius: `bool`, optional (default: True)
    If True, when searching for the neighbors for each cell in MLLE space, STREAM uses a fixed radius instead of a fixed number of cells.

Getting "MemoryError" when running large data sets

Hi STREAM team!--

I was very excited to see your new pipeline, and couldn't wait to try it!

I am running an analysis on ~37500 cells. When I tried running the docker image on my desktop, the program would create several empty files and fail without error. However, it would work if I ran a smaller dataset. I moved to our high performance cluster, but because it doesn't allow docker image files I copied the contents of STREAM-master and have been running STREAM.py. The program runs through dimensional reduction and stops.

Loading input data...
Input: 37501 cells, 17125 genes
remove mitochondrial genes:
['mt-Nd1', 'mt-Nd2', 'mt-Co1', 'mt-Co2', 'mt-Atp8', 'mt-Atp6', 'mt-Co3', 'mt-Nd3', 'mt-Nd4l', 'mt-Nd4', 'mt-Nd5', 'mt-Nd6', 'mt-Cytb']
Filtering genes...
After filtering out low-expressed genes:
37501 cells, 13754 genes
Selecting features...
688 variable genes are selected
Number of CPUs being used: 64
Reducing dimension...
Traceback (most recent call last):
File "STREAM.py", line 4035, in
main()
File "STREAM.py", line 3949, in main
X = Dimension_Reduction(df_sc_final,lle_n_component,lle_n_nb_percent,file_path,file_path_precomp,n_processes)
File "STREAM.py", line 388, in Dimension_Reduction
sklearn_lle = sklearn_lle.fit(DR_input_values)
File "/data/conda/envs/stream/lib/python2.7/site-packages/sklearn/manifold/locally_linear.py", line 661, in fit
self._fit_transform(X)
File "/data/conda/envs/stream/lib/python2.7/site-packages/sklearn/manifold/locally_linear.py", line 645, in _fit_transform
random_state=random_state, reg=self.reg, n_jobs=self.n_jobs)
File "/data/conda/envs/stream/lib/python2.7/site-packages/sklearn/manifold/locally_linear.py", line 385, in locally_linear_embedding
V = np.zeros((N, n_neighbors, n_neighbors))
MemoryError

I am currently running with 250g memory and 32 CPUs. Do I need to allocate more memory, or is there something else you would recommend.

Thanks for your advice!

Issue with elastic_principal_graph

When I run this function (following the single-cell tutorial), my environment seems ot be missing the stringi.so library. I'm not sure if this is an issue with the instructions in the tutorial - which I've followed to the letter - of with my own system. However, I've been unable to find any other solutions to this specific problem in other contexts, and was hoping some user or dev might have insight into a work around.

No figure output when running STREAM on the cluster

Hi, thanks for developing this amazing tool! I ran into some problems when trying to run the customized python script on the cluster - there is no figure output. When I ran these same python commands on a python terminal, it generated the figure just fine. It seems like when running the script on the cluster, it stopped at the figure generating step, and wouldn't give any errors. I really don't know what's going on.

Here's my script test.py:

#!/usr/bin/env python
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import stream as st
adata=st.read(file_name='./ExpressionMat_input.tsv')
st.filter_genes(adata,min_num_cells = max(5,int(round(adata.shape[0]*0.001))),
                min_pct_cells = None,expr_cutoff = 1)
fig = st.select_variable_genes(adata,save_fig=True,fig_name='test.png')
print("hello world")

Here's the error file STREAM.e1234567 :

/cluster/data/project
STREAM.e1234567 (END)

Here's the output file STREAM.o1234567:

Using default working directory.
Saving results in: /cluster/data/project
Filter genes based on min_num_cells
After filtering out low-expressed genes: 
9490 cells, 11339 genes
STREAM.o1234567 (END)

Here's how I submitted the script to the cluster:
qsub -N STREAM -l mem_free=10G,h_vmem=10G -cwd test.py

I tried to increase the memory but it still would not work. Please help. Thank you for your time! :)

problem on visualizing gene

Hello,
I am running STREAM on command line, here is the command:
stream -m 1_Fresh_anno2TR.tsv -l 1_Fresh_cell_lable.tsv -c 1_Fresh_cell_label_color.tsv -g INS,HADH,PCSK1,NKX6-1,G6PC2 -s all -o test2

Visulizing genes...
Traceback (most recent call last):
File "/STREAM/STREAM.py", line 4033, in
main()
File "/STREAM/STREAM.py", line 4029, in main
Stream_Plot_Gene(df_rooted_tree,df_sc,flat_tree,dict_branches,node_start,dict_node_state,gene_list,flag_stream_log_view,flag_atac,file_path,flag_web)
File "/STREAM/STREAM.py", line 2847, in Stream_Plot_Gene
max_gene_values = np.percentile(gene_values[gene_values>0],90)
File "/opt/conda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4291, in percentile
interpolation=interpolation)
File "/opt/conda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4033, in _ureduce
r = func(a, **kwargs)
File "/opt/conda/lib/python2.7/site-packages/numpy/lib/function_base.py", line 4405, in _percentile
x1 = take(ap, indices_below, axis=axis) * weights_below
File "/opt/conda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 159, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/opt/conda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 52, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

It will be greatly appreciated if anyone could give me a clue.
Best regards,
Ting

Manually set range of heatmap scale

Hi,

Is it possible to set a fixed range of heatmap scale to the output of st.stream_plot_gene?

"Segmentation fault: 11" when trying to run elastic_principal_graph

Hello

I would like to use your package on my data, however, I'm not able to reproduce your tutorial, because I'm getting an error on the st.elastic_principal_graph(adata) step. I am using the provided data and the default parameters and up until that step everything works fine.

I was first running it in an ipython notebook through anaconda (either spyder or qtconsole) and then the kernel would just die and restart (just after I enter the st.elastic_principal_graph(adata) command).

I checked that it's the ElPiGraph.computeElasticPrincipalTree function that causes the kernel to die, but I couldn't troubleshoot much more than that (kernel was dying with no error message), so I tried to run the tutorial just in a python interpreter. There, running the command st.elastic_principal_graph(adata) gives me output of:

Learning elastic principal graph...
[1]
 "Constructing tree 1 of 1 / Subset 1 of 1"


[1]
 "Computing EPG with 50 nodes on 1656 points and 3 dimensions"


Segmentation fault: 11

Googling for 'Segmentation fault: 11' led me to this issue https://github.com/deeptools/pyBigWig/issues/58. Since I'm working on macOS Mojave 10.14.6 I tried a solution posted there and run the tutorial on different versions of python, 3.6, 3.6.4, 3.6.9. This didn't help.

Could you help me with my issue?

Best
Gregor

Error running detect_transition_genes()

I am using the STREAM python package, and I am having trouble running the detect_transition_genes() function on my data. I can process everything just fine up to the streamplots and the subwaymaps without any issues.

The following is the steps I take to process my data with STREAM:

I installed STREAM using bioconda as per the instructions on the STREAM github page.
Since my snRNA-seq data was generated using the 10X platform, I load my data as the matrix.mtx file from cellranger aggr as outlined in the scRNA-seq tutorial for STREAM.
I add meta-data table with clustering etc from Seurat
I performed a batch correction method on my gene expression data (iNMF, implemented in the R package Liger). So I stuck my iNMF matrix into the adata.obsm['top_pcs'] slot so I could run STREAM using that matrix.
Subset my anndata object for a specific cluster. This leaves me with about ~35k cells.
Normalize per cell, log transform, remove mt genes, filter genes using STREAM.
call st.dimension_reduction(adata, method='se', nb_pct=0.01, n_jobs=16, feature='top_pcs')
call st.seed_elastic_principal_graph(adata)
call st.elastic_principal_graph(adata)
call st.optimize_branching(adata, epg_trimmingradius=0.1)
call st.extend_elastic_principal_graph(adata ,epg_trimmingradius=0.1)
Finally plot the flat tree, streamplot, subwaymap etc all looking great, with branches corresponding well on a UMAP.
call st.detect_transistion_genes(adata, root='S4')

This is where I get the error:

Minimum number of cells expressing genes: 39
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/users/smorabit/bin/software/miniconda3/envs/stream/lib/python3.6/site-packages/stream/core.py", line 3974, in detect_transistion_genes
    input_genes_expressed = np.array(input_genes)[np.where((df_sc[input_genes]>0).sum(axis=0)>min_num_cells)[0]].tolist()
IndexError: index 59148 is out of bounds for axis 0 with size 58721

Interestingly, I tried running through the entire stream tutorial using the provided sample data (Nestorowa), and I did not run into the same error. Any ideas what is going on? Also, great work with this tool!

Error with dimension_reduction

Hi Stream team:
Thank your for developing this tool.
Currently, I am using the latest version of stream (0.4.0). My data set has about 8w cells and 17,971 genes. When I run the dimension_reduction step with the following command: stream.dimension_reduction(wt, n_components=10, nb_pct=0.025, n_jobs=4), then after a long time, error happens.

SystemError Traceback (most recent call last)
in
----> 1 stream.dimension_reduction(wt, n_components=10, nb_pct=0.025, n_jobs=4)

/miniconda3/envs/stream/lib/python3.7/site-packages/stream/core.py in dimension_reduction(adata, n_neighbors, nb_pct, n_components, n_jobs, feature, method, eigen_solver)
736 n_jobs = n_jobs,
737 eigen_solver = eigen_solver,random_state=10)
--> 738 trans = reducer.fit(input_data)
739 adata.uns['trans_se'] = trans
740 adata.obsm['X_se'] = trans.embedding_

/miniconda3/envs/stream/lib/python3.7/site-packages/sklearn/manifold/_spectral_embedding.py in fit(self, X, y)
554 n_components=self.n_components,
555 eigen_solver=self.eigen_solver,
--> 556 random_state=random_state)
557 return self
558

/miniconda3/envs/stream/lib/python3.7/site-packages/sklearn/manifold/_spectral_embedding.py in spectral_embedding(adjacency, n_components, eigen_solver, random_state, eigen_tol, norm_laplacian, drop_first)
270 _, diffusion_map = eigsh(
271 laplacian, k=n_components, sigma=1.0, which='LM',
--> 272 tol=eigen_tol, v0=v0)
273 embedding = diffusion_map.T[n_components::-1]
274 if norm_laplacian:

/miniconda3/envs/stream/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py in eigsh(A, k, M, sigma, which, v0, ncv, maxiter, tol, return_eigenvectors, Minv, OPinv, mode)
1626 if OPinv is None:
1627 Minv_matvec = get_OPinv_matvec(A, M, sigma,
-> 1628 hermitian=True, tol=tol)
1629 else:
1630 OPinv = _aslinearoperator_with_dtype(OPinv)

/miniconda3/envs/stream/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py in get_OPinv_matvec(A, M, sigma, hermitian, tol)
1063 A = A - sigma * eye(A.shape[0])
1064 A = _fast_spmatrix_to_csc(A, hermitian=hermitian)
-> 1065 return SpLuInv(A).matvec
1066 else:
1067 return IterOpInv(_aslinearoperator_with_dtype(A),

/miniconda3/envs/stream/lib/python3.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py in init(self, M)
906 """
907 def init(self, M):
--> 908 self.M_lu = splu(M)
909 self.shape = M.shape
910 self.dtype = M.dtype

/miniconda3/envs/stream/lib/python3.7/site-packages/scipy/sparse/linalg/dsolve/linsolve.py in splu(A, permc_spec, diag_pivot_thresh, relax, panel_size, options)
309 _options.update(options)
310 return _superlu.gstrf(N, A.nnz, A.data, A.indices, A.indptr,
--> 311 ilu=False, options=_options)
312
313

SystemError: gstrf was called with invalid arguments

Could you help me with my issue?

Best,
Peifeng

TODO List

Add warning info in CLI for no-branching event
Fix the error caused by ‘EPG_shift’

LOESS issue

When I am running stream using python (after bioconda install) or using jupyter, I have n issue when I try to select variable genes:

st.select_variable_genes(adata,loess_frac=0.01)
/data/users/ateissan/.conda/envs/stream/lib/python3.6/site-packages/statsmodels/nonparametric/smoothers_lowess.py:165: RuntimeWarning: invalid value encountered in true_divide
res = _lowess(y, x, frac=frac, it=it, delta=delta)
/data/users/ateissan/.conda/envs/stream/lib/python3.6/site-packages/statsmodels/nonparametric/smoothers_lowess.py:165: RuntimeWarning: invalid value encountered in greater_equal
res = _lowess(y, x, frac=frac, it=it, delta=delta)
st.select_variable_genes?
/data/users/ateissan/.conda/envs/stream/lib/python3.6/site-packages/stream/core.py:452: RuntimeWarning: invalid value encountered in less
mat_sign[np.where(residuals<0)[0]] = -1
/data/users/ateissan/.conda/envs/stream/lib/python3.6/site-packages/stream/core.py:457: RuntimeWarning: invalid value encountered in less_equal
id_non_var_genes = np.where(residuals<=cutoff)[0]
0 variable genes are selected

Thanks for your help,
Aurélie

memory issue

Hello,
Thank you for your great tool.
I ran the example successfully, which has less than 200 cells and 200 genes. however, when I tried my samples which are 6994 cells, 21029 genes, and I gave 12 thread and 12G for each thread, I got error message as:

STREAM Single-cell Trajectory Reconstruction And Mapping -
Version 0.3.9

Variable names are not unique. To make them unique, call .var_names_make_unique.
Observation names are not unique. To make them unique, call .obs_names_make_unique.
Saving results in: stream_result
Input: 6994 cells, 21029 genes
No cell label file is provided, 'unknown' is used as the default cell labels
No cell color file is provided, random color is generated for each cell label
Filtering genes...
Filter genes based on min_num_cells
After filtering out low-expressed genes:
6994 cells, 13960 genes
Removing mitochondrial genes...
remove mitochondrial genes:
['MT-ND6', 'MT-CO2', 'MT-CYB', 'MT-ND2', 'MT-ND5', 'MT-CO1', 'MT-ND3', 'MT-ND4', 'MT-ND1', 'MT-ATP6', 'MT-CO3', 'MT-ND4L', 'MT-ATP8']
Selecting most variable genes...
698 variable genes are selected
feature var_genes is being used ...
32 cpus are being used ...
OMP: Error #34: System unable to allocate necessary resources for OMP thread:
OMP: System error #11: Resource temporarily unavailable
OMP: Hint Try decreasing the value of OMP_NUM_THREADS.
Aborted

any help will be really appreciated.
Ting

Working with Batch Corrected Data and name 'dict_DE_greater' is not defined error

Hello,
Thank you for that amazing tool. I am using Stream on Docker and some of my data sets I have error "name 'dict_DE_greater' is not defined". I did not see any parameter about that. How can I solve this?

Second, I have a batch corrected data of a tissue and I want to subset a group of cells and use Stream on them. Is there a possibility to provide corrected expression matrix (Seurat batch corrected data) with normalized counts matrix?

Thank you in advance.

stream_plot issue

Hi,

Thank you for this very nice package. Everything is running well except for this cmd. Any help would be great.

STREAM working environment

I am trying to install STREAM in a Linux server. On the server, I have a /home/myhome/ directory which has limited space and memory. I also have a lab-share/ directory where I want to install STREAM.

Now, the command $ conda info gives the following information:

   package cache : lab-share/stream_2019/.conda/pkgs
   envs directories : /home/myhome/.conda/envs
                      /apps/software/gcc-6.2.0/miniconda3/4.7.10/envs
  platform : linux-64

My questiosn are:
Q1) Should I change the 'envs directories ?'
Q2) If yes, how?

Thanks,
Holly

st.plot_flat_tree error

We are trying the tutorial with version 0.3.9. The stream is installed in the local machine following the suggested way.

plot_flat_tree does not work, with the following error:


st.plot_flat_tree(adata)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-f0350c6f5f3c> in <module>
----> 1 st.plot_flat_tree(adata)

~/miniconda3/envs/envst/lib/python3.6/site-packages/stream/core.py in plot_flat_tree(adata, adata_new, show_all_cells, save_fig, fig_path, fig_name, fig_size, fig_legend_ncol)
   1720 
   1721     flat_tree = adata.uns['flat_tree']
-> 1722     dict_nodes_pos = nx.spring_layout(flat_tree,random_state=10)
   1723     bfs_root = list(flat_tree.nodes())[0]
   1724     bfs_edges = list(nx.bfs_edges(flat_tree, bfs_root))

TypeError: fruchterman_reingold_layout() got an unexpected keyword argument 'random_state'

What could be the issues? Thanks!

controlling the extention in extend_elastic_principal_graph

Dear Stream team,

Sorry to bother you again but I have some difficulties in tuning the parameter of st.extend_elastic_principal_graph.

The principle graph likes this.

While produce a needle structure in the stream plots.

You might noticed that the extreme blue dots in the S1 are far away from the leaf. I hope to control cells that are further included in the graph, or to make them a little bit closer to the leaf.

Firstly, am I correct to use the extend_elastic_principal_graph for that purpose? or is it possible at all?

I tried a few option,

#- option 1
st.extend_elastic_principal_graph(adata)
st.plot_branches_with_cells(adata, n_components=2, save_fig=True, fig_name='extended_branches_cells.pdf')

#- option 2
st.extend_elastic_principal_graph(adata, epg_ext_par=0.01)
st.plot_branches_with_cells(adata, n_components=2, save_fig=True, fig_name='extended_branches_cells_2.pdf')

#- option 3
st.extend_elastic_principal_graph(adata, epg_ext_par=0.99)
st.plot_branches_with_cells(adata, n_components=2, save_fig=True, fig_name='extended_branches_cells_3.pdf')

The pseudotime and stream plots are nearly identical.

Further, I have tried epg_trimmingradius and epg_ext_mode as well but couldn't see a clear difference. I guess I might misunderstand the command. Could you kindly explain the function a little bit more?

Thanks very much!

detect_leaf_genes

Dear Developers,

Got this error running your example pipeline (STREAM_scRNA-seq.ipynb) on my data.
"detect_leaf_genes" complains - index out of bound
Any suggestion how to fix it.

best,
Pawel

Minimum number of cells expressing genes: 5

IndexError Traceback (most recent call last)
in
----> 1 st.detect_leaf_genes(adata,root='S5',preference=['S2','S1'])

~/anaconda3/envs/streamEnv/lib/python3.6/site-packages/stream/core.py in detect_leaf_genes(adata, cutoff_zscore, cutoff_pvalue, percentile_expr, n_jobs, use_precomputed, root, preference)
4409 min_num_cells = max(5,int(round(df_gene_detection.shape[0]*0.001)))
4410 print('Minimum number of cells expressing genes: '+ str(min_num_cells))
-> 4411 input_genes_expressed = np.array(input_genes)[np.where((df_sc[input_genes]>0).sum(axis=0)>min_num_cells)[0]].tolist()
4412 df_gene_detection[input_genes_expressed] = df_sc[input_genes_expressed].copy()
4413

IndexError: index 27998 is out of bounds for axis 0 with size 27998

Bioconda channels not available

I am creating conda virtualenv as suggested in the readme file
conda create -n myenv python=3.6 stream jupyter

but I got the following errors:
Collecting package metadata: done
Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

stream

Current channels:

To search for alternate channels that may provide the conda package you're
looking for, navigate to

https://anaconda.org

and use the search bar at the top of the page.

dlopen: cannot load any more object with static TLS while import

Hi I had OSError: cannot load library '/home/xyang/.conda/envs/myenv/lib/R/lib/libR.so': dlopen: cannot load any more object with static TLS while importing the package

The full error message is below:
In [2]: import stream as st

OSError Traceback (most recent call last)
in
----> 1 import stream as st

~/.conda/envs/myenv/lib/python3.6/site-packages/stream/init.py in
----> 1 from .core import *
2
3 version = "0.3.9"

~/.conda/envs/myenv/lib/python3.6/site-packages/stream/core.py in
43
44
---> 45 from rpy2.robjects.packages import importr
46 from rpy2.robjects import r as R
47 import rpy2.robjects as robjects

~/.conda/envs/myenv/lib/python3.6/site-packages/rpy2/robjects/init.py in
12 import types
13 import array
---> 14 import rpy2.rinterface as rinterface
15 import rpy2.rlike.container as rlc
16

~/.conda/envs/myenv/lib/python3.6/site-packages/rpy2/rinterface.py in
4 import math
5 import typing
----> 6 from rpy2.rinterface_lib import openrlib
7 import rpy2.rinterface_lib._rinterface_capi as _rinterface
8 import rpy2.rinterface_lib.embedded as embedded

~/.conda/envs/myenv/lib/python3.6/site-packages/rpy2/rinterface_lib/openrlib.py in
21
22
---> 23 rlib = _dlopen_rlib(R_HOME)
24
25

~/.conda/envs/myenv/lib/python3.6/site-packages/rpy2/rinterface_lib/openrlib.py in _dlopen_rlib(r_home)
17 'Try python -m rpy2.situation')
18 lib_path = rpy2.situation.get_rlib_path(r_home, platform.system())
---> 19 rlib = ffi.dlopen(lib_path)
20 return rlib
21

OSError: cannot load library '/home/xyang/.conda/envs/myenv/lib/R/lib/libR.so': dlopen: cannot load any more object with static TLS

rpy2.rinterface.RRuntimeError

Hello,
Thanks for amazing tools.
I use STREAM in anaconda (v0.3.9).
my command is:
stream -m RD001_res0.6_T_scaled.tsv -l RD001_res0.6_T_label.tsv --lle_components 4 --EPG_shift --umap
RD001_res0.6_T_scaled.tsv contains expression profile scaled by Seurat V3, and RD001_res0.6_T_label.tsv contains cluster label.

However, I got this error:
rpy2.rinterface.RRuntimeError: Error in igraph::add.edges(graph = Net, edges = as.vector(rbind(NewBR, :
At type_indexededgelist.c:269 : invalid (odd) length of edges vector, Invalid edge vector

AND if I remove --EPG_shift, I got another error:
TypeError: fruchterman_reingold_layout() got an unexpected keyword argument 'random_state'

Do you know what I can do to fix it ?

Mengyan Zhu

st.detect_transistion_genes(adata,root='S0') # The payload is large so Nagle's algorithm won't be triggered

Hi, I ran into the following error message when trying to identify transition genes:
After filtering, there were 28827 cells, 19694 genes, and 985 variable genes were selected.
Please advise if there is any way I can fix/work around this issue. Thanks a lot!!

`st.detect_transistion_genes(adata,root='S0')
Minimum number of cells expressing genes: 29

error Traceback (most recent call last)
in
----> 1 st.detect_transistion_genes(adata,root='S0')

/scg/apps/software/stream_atac/0.3.4/envs/stream_atac_0.3.4/lib/python3.6/site-packages/stream/core.py in detect_transistion_genes(adata, cutoff_spearman, cutoff_logfc, percentile_expr, n_jobs, use_precomputed, root, preference)
3980 params = [(df_gene_detection,x,percentile_expr) for x in input_genes_expressed]
3981 pool = multiprocessing.Pool(processes=n_jobs)
-> 3982 results = pool.map(scale_gene_expr,params)
3983 pool.close()
3984 adata.uns['scaled_gene_expr'] = results

/scg/apps/software/stream_atac/0.3.4/envs/stream_atac_0.3.4/lib/python3.6/multiprocessing/pool.py in map(self, func, iterable, chunksize)
286 in a list that is returned.
287 '''
--> 288 return self._map_async(func, iterable, mapstar, chunksize).get()
289
290 def starmap(self, func, iterable, chunksize=None):

/scg/apps/software/stream_atac/0.3.4/envs/stream_atac_0.3.4/lib/python3.6/multiprocessing/pool.py in get(self, timeout)
668 return self._value
669 else:
--> 670 raise self._value
671
672 def _set(self, i, obj):

/scg/apps/software/stream_atac/0.3.4/envs/stream_atac_0.3.4/lib/python3.6/multiprocessing/pool.py in _handle_tasks(taskqueue, put, outqueue, pool, cache)
448 break
449 try:
--> 450 put(task)
451 except Exception as e:
452 job, idx = task[:2]

/scg/apps/software/stream_atac/0.3.4/envs/stream_atac_0.3.4/lib/python3.6/multiprocessing/connection.py in send(self, obj)
204 self._check_closed()
205 self._check_writable()
--> 206 self._send_bytes(_ForkingPickler.dumps(obj))
207
208 def recv_bytes(self, maxlength=None):

/scg/apps/software/stream_atac/0.3.4/envs/stream_atac_0.3.4/lib/python3.6/multiprocessing/connection.py in _send_bytes(self, buf)
391 n = len(buf)
392 # For wire compatibility with 3.2 and lower
--> 393 header = struct.pack("!i", n)
394 if n > 16384:
395 # The payload is large so Nagle's algorithm won't be triggered

error: 'i' format requires -2147483648 <= number <= 2147483647
`

branches are not shown by plot_branches wit umap as the dimension reduction method

I am using the latest stream 0.4.

With umap as the dimension reduction method,
st.dimension_reduction(adata, n_components=2, method ='umap')

Obtain the branches,

st.elastic_principal_graph(adata, epg_trimmingradius=0.1, epg_alpha=0.05)
st.plot_branches(adata, n_components=2, save_fig=True, fig_name='elastic_branches.pdf')
st.plot_branches_with_cells(adata, n_components=2, save_fig=True, fig_name='elastic_branches_cells.pdf')

Strangely, the elastic_branches.pdf and the elastic_branches_cells.pdf do not show branches.
See the screenshots below.

elastic_branches.pdf

elastic_branches_cells.pdf

The plots after seed,, and after extention steps are fine,

st.seed_elastic_principal_graph(adata)
st.plot_branches(adata, n_components=2, save_fig=True, fig_name='seed_branches.pdf')
st.plot_branches_with_cells(adata, n_components=2, save_fig=True, fig_name='seed_branches_cells.pdf')

After seed step

There is no issue with mlle as the dimension reduction method (at least in 0.3.9).

rpy2 error with stream_atac.preprocess_atac()

Hi,

Thank you for this tool! I think I successfully installed stream and stream_atac with anaconda3 in its own environment, but running into problems with stream_atac.preprocess_atac(). It gives me this error:

Traceback (most recent call last):
File "", line 1, in
File "/home/jingyaq/anaconda3-new/envs/STREAM/lib/python3.6/site-packages/stream_atac/core.py", line 142, in preprocess_atac
r_regions_dataframe = pandas2ri.py2ri(df_regions)
AttributeError: module 'rpy2.robjects.pandas2ri' has no attribute 'py2ri'

Do you know what the issue might be? My rpy2 version is 3.1.0 and pandas version is 0.25.3. Thanks so much!

Directory error

Hi there,

Thank you for this wonderful resource! I am an undergraduate working in a lab over the summer. I want to analyze scRNA data from two experiments, both with DMSO and RO48 runs and each run having a genes.tsv, barcodes.tsv, and matrix.mtx file.

Right now, I am trying to run this code; https://github.com/pinellolab/STREAM/blob/master/tutorial/1.STREAM_scRNA-seq.ipynb... ( with changing the file names to match mine)

I get stuck once I try to run the read function. I get a directory error. I have made sure I am doing the right path file and I have even tried to use the absolute path of the file, but I'm having no luck. I also tried changing the working directory, but that was unsuccessful as well.

Any help or advice would be greatly appreciated! Thank you in advance!

exchange head and tail

Hello,

I have a linear trajectory. And I would like to exchange head and tail of the trajectory. How can I do that?
Thanks,
Aurélie

ERROR with ValueError: min() arg is an empty sequence when I detect transition gene for each branch

Hi,
Thanks for your great tools with Single-cell trajectories analysis.
But I got an error with ValueError: min() arg is an empty sequence when I detect transition gene for each branch，all my command as follows:
stream -m exp.csv -l cell_label.tsv -c label_color.tsv --disable_EPG_optimize
stream -m exp.csv -l cell_label.tsv -c label_color.tsv --DE --TG --LG -p

I want to know why? And how can I solve this problem?

Best regards！

Select_variable_genes error "ValueError: exog must be a vector"

Hi,

I've been playing around witth STREAM (Bioconda version, through Jupyter) with my dataset for a few days now, and the first time I tried it, it worked without any issues. Since then, I scaled up (at first I tried it with only around 5000 cells, and now I'm running into errors when selecting variable genes.

First, I got an error message saying that "IndexError: tuple index out of range". After some troubleshooting, I found that the "np.std"-function was the cause, as my adata matrix (i.e. adata.X) was sparse. For some reason, np.mean was compatible with a sparse matrix, but np.std wasn't, although that's a different problem.

In order to bypass this, I changed my matrix into a dense one (i.e. "adata.X=adata.X.todense()"), however now I am running into "ValueError: exog must be a vector". Seemingly, this is caused by mean_genes and std_genes being 2-dimensional (i.e. they both have the dimensions 1 x n(genes)), but I do not really know how to fix this, without "manually" running the lowess function, i.e. running the code outside of the function after flattening both arrays. In doing so, the returned "loess_fitted" itself is 2-dimensional and I cannot calculate the "residuals" as the std_genes array is flattened.

I find it strange that this wouldn't occur everytime that the function is called. I'm using version 0.4.0

Any help is appreciated.
Thanks!

Full traceback:

ValueError Traceback (most recent call last)
in
----> 1 st.select_variable_genes(adata,loess_frac=0.01, percentile=95)

~/anaconda3/envs/myenv/lib/python3.7/site-packages/stream/core.py in select_variable_genes(adata, loess_frac, percentile, n_genes, n_jobs, save_fig, fig_name, fig_path, fig_size)
514 mean_genes = np.mean(adata.X,axis=0)
515 std_genes = np.std(adata.X,ddof=1,axis=0)
--> 516 loess_fitted = lowess(std_genes,mean_genes,return_sorted=False,frac=loess_frac)
517 residuals = std_genes - loess_fitted
518 XP = np.column_stack((np.sort(mean_genes),loess_fitted[np.argsort(mean_genes)]))

~/anaconda3/envs/myenv/lib/python3.7/site-packages/statsmodels/nonparametric/smoothers_lowess.py in lowess(endog, exog, frac, it, delta, is_sorted, missing, return_sorted)
131 # same length.
132 if exog.ndim != 1:
--> 133 raise ValueError('exog must be a vector')
134 if endog.ndim != 1:
135 raise ValueError('endog must be a vector')

ValueError: exog must be a vector

Issue when saving output from jupyterlab

Hello,

When using functions such as :

st.stream_plot_gene(adata,root='S0',fig_size=(8,8),genes=['Notch1', 'Notch2', 'Notch3'])

to plot a list of our favorite genes, is there a way for the plot to be saved directly into the directory by amending the above code?
Alternatively would you know a way to do this to save the image as possibly an .svg or any other format? When saving image as is used it only saves the first gene in the plot.
Thanks

stream plot of mapped cells and related gene expression

Many thanks for the nice tool and we get some interesting results!

One issue that we are not sure is how to explain the width of mapped data in the stream plot, both on mapping and gene expression.

For exmaple, the stream plot of reference data look likes the below. One branch is highlighted.

Then we mapped the perturbation data and plotted the data. Now the highlighted braches is stretched along the pesudotime, see the gray area in the highlighted regions.

st.map_new_data(adata1, adata2, method="umap")
st.stream_plot(adata1, adata_new=adata2, show_all_colors=False, root='S5', fig_legend_ncol=6, fig_size=(8,8), save_fig=True)

To get the stream of the adata2, we used

st.stream_plot(adata2, root='S5', fig_legend_ncol=6, fig_size=(8,8), save_fig=True, fig_name='stream2.pdf')

The width of stream plot of adata2 is much narrower than the reference (see the first figure).

Next we plot the gene expression

st.stream_plot_gene(adata1, root='S5', genes=genes, save_fig=True)
st.stream_plot_gene(adata2, root='S5', genes=genes, save_fig=True)

The width of stream plot of adata2 is clearly narrow than adata1. Actually adata2 have around 10% more cells that adata1, but it is not reflected on the width of the stream plot

We would like to show that the cells are differently distributed in adata1 and adata2, so is the
gene profile. With the commands I used above, the stream plots seem not comparable between these datasets.

My questions are:

how to get a nice stream plot of adata2 (mapped data) while the width is comparable to adata1 (ref data)
related, how to make the gene plot comparable between adata1 and adata2

Interestingly, stream_plot_genes has an argument "adata_new" but not working

Signature:
st.stream_plot_gene(
    adata,
    genes=None,
    percentile_expr=95,
    root='S0',
    factor_num_win=10,
    factor_min_win=2.0,
    factor_width=2.5,
    flag_log_view=False,
    factor_zoomin=100.0,
    preference=None,
    save_fig=False,
    fig_path=None,
    fig_size=(12, 8),
    fig_format='pdf',
    tick_fontsize=20,
    label_fontsize=25,
)
Docstring:
Generate stream plots of genes

Parameters
----------
adata: AnnData
    Annotated data matrix.
adata_new: AnnData
    Annotated data matrix for new data (to be mapped).

Impossible to upload cell_label.tsv and cell_label_colors.tsv

Hi,

As a user of ElPiGraph, i wanted to try your online tool because I was curious to see how well it would work with your MLLE dimRed method and with your LOESS feature selection. And the answer is that it works amazingly well and that the stream output was consistent with my previous findings. However, I found out that it was impossible for me to upload the cell_label and the cell_label_colors files on the online application. I have of course checked that the format of these files satisfied all the conditions you have written in the Readme. Perhaps it's just a small omission in the code of your wrapper...

Anyway, let me know when these uploads will be available, so that I can visualize the distribution of my clusters on your beautiful stream plots ;)

Best,

Charles

no output for TG, DE, LG

Hello Huidong,
I am trying to get TG (or DE,or LG). but I got the folder "transition_genes" generated, but nothing in there. here is the command:
/bin/stream -m matrix.tsv.gz -l cell_label.tsv.gz -c cell_label_Color.tsv.gz -g genelist106.txt --TG -r S0 -p -o output_stream
the run message as below, it seems not finished.
Importing the precomputed pkl file...
Saving results in: output_stream
Identifying transition genes...
Minimum number of cells expressing genes: 31

it will be greatly appreciated if you could help me out.
Thanks,
Ting

program quits when the matrix is big

Hi all,
Thank you for developing "STREAM". I have used STREAM for our data. It works very well when the gene-expression matrix is small.
Now I want to use STREAM for the new data containing 7465 cells and 8858 genes. And I have allocated 8 cores and 16G memory to Docker container. But the program would quit after printing "Selecting features...", without any error message.
Please help me to solve this problem.
Thank you so much!

What is the use of this parameter of for_web?

AttributeError: 'SpectralEmbedding' object has no attribute 'transform'

Hi:
Sorry for bother you again. When mapping the new data set to the reference, error happened.

Command lines:

Generating reference
stream.select_variable_genes(wt)
stream.dimension_reduction(wt, n_jobs=4)
stream.seed_elastic_principal_graph(wt)
stream.elastic_principal_graph(wt,incr_n_nodes=10, epg_alpha=0.03, epg_trimmingradius=0.1)
stream.optimize_branching(wt)
stream.prune_elastic_principal_graph(wt)
stream.shift_branching(wt)
stream.extend_elastic_principal_graph(wt)

This works well.

Mapping step
ko = sc.read_h5ad('ko.h5ad')
stream.set_workdir(ko, './stream')
stream.remove_mt_genes(ko)
ko.obs.rename(columns={'type':'label'}, inplace=True)
stream.add_cell_colors(ko)
sc.pp.subsample(ko, n_obs=10000)
stream.map_new_data(wt, ko, method='se')
Error:

AttributeError Traceback (most recent call last)
in
----> 1 stream.map_new_data(wt, ko, method='se')

/miniconda3/envs/stream/lib/python3.7/site-packages/stream/core.py in map_new_data(adata, adata_new, feature, method, use_radius)
4830 if(method == 'se'):
4831 trans = adata.uns['trans_se']
-> 4832 adata_new.obsm['X_se_mapping'] = trans.transform(input_data)
4833 adata_new.obsm['X_dr'] = adata_new.obsm['X_se_mapping'].copy()
4834 if(method == 'mlle'):

AttributeError: 'SpectralEmbedding' object has no attribute 'transform'

Thanks for you help!

Best,

Peifeng

2500 cell limit

Hi all,

I'm really enjoying STREAM and have starting pointing people over at HMS who are interested in trajectory analysis to the website. One issue that is popping up is the 2500 cell limit. Most projects at HMS are using inDrops or 10X Genomics, with at least ~3000 cells per sample, and many projects are analyzing multiple samples. Is there a way to use STREAM on larger datasets, say by running locally or with Docker in the cloud?

Best,
Mike

Perpetual uploading

I'm very interested to try your pipeline for inferring cell trajectories. However, when I upload my .tsv file, I get a perpetual "UPLOADING". Any suggestions?

I've tried 3 different browsers and several different computers with no success.

STREAM stops after "Selecting most variable genes" part with data more than 2500 cells

Hello,
I am using STREAM on Docker. Everything is looking well with data sets up to 2500 cells but when I try to use in on data sets with more than 2500 cells, it stops without any error or warning message when it selects the most variable genes.

The code I am using:
docker run -v ${PWD}:/data -w /data pinellolab/stream -m IM_exp_data.txt -l IM_cell_labels.tsv --clustering sc --LG --TG --DE --umap -o IM

Thank you in advance.

No 'se' method in the 'map_new_data' function

HI:
Thank you for developing this powerful tool. I'm using its 'map_new_data' function, the former data set I used for dimension reduction is 'se' (large scale data set). But when I mapping the new data set to the former one, I found that no 'se' option was provided. Thank you !

Quantified density for each cell type

Hi:

I see the subway or stream map could draw the change of every cell type along the pesudotime.
I'm wondering if it can draw the quantified density changes of each cell type. Like the results bellow?

Thank you!

Error in rpy2.rinterface.RRuntimeError

Hello,
I am trying to run STREAM on my computer, I use Ubuntu 18.04.2 LTS and Docker with my terminal.
I got this error :
$ sudo docker run -v ${PWD}:/data -w /data pinellolab/stream -m pbmc13.tsv -l pbmc13_cell_labels.tsv --lle_components 4 --EPG_shift
rpy2.rinterface.RRuntimeError: Error in NodePositionArrayAll[[i]] : subscript out of bounds

Do you know what I can do to fix it ?
Thank you very much !

Password problem with Docker installation

When I tried the command docker pull pinellolab/stream I got an error saying "Unauthorized: incorrect username or password". Is a password needed?

Eternal Upload

Hello there, STREAM Team!

I've found your tool most interesting! However, I'm still struggling to use it - files seem to keep uploading forever.

This happens for both your web tool and the docker image which I run on our local computing cluster.

Besides, I get this error on my server Linux command line interface when the docker image starts running:

`[2019-04-08 21:33:52,331] ERROR in app: Exception on /compute/_dash-update-component [POST]
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/opt/conda/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/conda/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/conda/lib/python3.7/site-packages/dash/dash.py", line 556, in dispatch
    return self.callback_map[target_id]['callback'](*args)
  File "/opt/conda/lib/python3.7/site-packages/dash/dash.py", line 513, in add_context
    output_value = func(*args, **kwargs)
  File "/stream_web/app.py", line 2145, in update_matrix_log
    with open(UPLOADS_FOLDER + '/params.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/UPLOADS_FOLDER/STREAM_fff4a25f-f5f4-4fb7-98e3-6adbd84f6a0b/params.json'

I'm trying to upload a raw counts dataset of about 1500 cells (which really isn't much).

Is there any way of goint through this? Am I doing anything wrong?

Thanks for the help in advance!

Best regards,

Davi

how to pruning some branch

Hello, Thanks for the soft, I want to collapse the trivial branch,and i set parameter --EPG_collapse --EPG_collapse_mode PointNumber --EPG_collapse_par 5,but the branch still exit, I also used parameter --EPG_collapse --EPG_collapse_mode EdgesLength --EPG_collapse_par 2,but this seem to be too strict.So what should i do to solve the problem