hemberg-lab / scmap Goto Github PK

View Code? Open in Web Editor NEW

88.0 88.0 11.0 3.94 MB

A tool for unsupervised projection of single cell RNA-seq data

Home Page: http://bioconductor.org/packages/scmap

License: GNU General Public License v3.0

R 81.90% C++ 18.10%

bioconductor-package human-cell-atlas projection-mapping r single-cell-rna-seq

scmap's People

Contributors

Stargazers

Watchers

Forkers

ay2sanger natnaelt henryjtaylor arlene-1998 multitalk liangyuan-njmu rstatistics yueyuxiaoyang genostack chenddathku helianthuszhu

scmap's Issues

Are SVM/RF implementations available through scmap?

Hi there,

The help page of of scmapCluster mentions that we can potentially give the probability values for the SVM/RF classifiers through the "threshold" argument. There are also such functions for SVM and RF in the "Utils.R" script on scmap's repository here: https://github.com/hemberg-lab/scmap/blob/92566d3db92d5595eedbdf20c7f03aa2a1a053ac/R/Utils.R

So, can we actually get scmap to implement the SVM and RF methods of classifications that you actually use in the paper for any given dataset?

Just wondering, and would appreciate your answer!

Thanks!

Heat map missing data

When I print my heat map several rows are empty not sure why. Have tried it with several datasets and still get the same issue.

How should I import data from seurat to scmap?

hello ,Vladimir Kiselev and Martin Hemberg

I got error for this code:

SAMPLE <- Seurat::Read10X("./00_Data/Sample_matrix_10X/")
sce <- SingleCellExperiment::SingleCellExperiment(assays=list(counts= SAMPLE ))
seur <- Seurat::CreateSeuratObject(SAMPLE )
sce <- Seurat::as.SingleCellExperiment(seur)
sce <- scater::logNormCounts(sce)
rowData(sce)$feature_symbol <- rownames(sce)

#Feature selection
sce <- scmap::selectFeatures(sce, suppress_plot = FALSE)

##error
sce <- indexCluster(sce)
Error in scmap::indexCluster(sce) :
Please define an existing cluster column of the colData slot of the input object using the cluster_col parameter!

Thanks a lot!

Questions about confidence scores

Hi, thanks for the insightful work! Now I am using scmap to annotate query data based on reference data. I wonder how to get the confidence score of the predicted cell type.

Following is my core code:

# scmap Cluster
    ref_sce <- scmap::indexCluster(ref_sce)


    scmapCluster_results <- scmapCluster(projection = query_sce,
                                         index_list = list(yan = metadata(ref_sce)$scmap_cluster_index),
                                         threshold=0)
    pred <- scmapCluster_results$scmap_cluster_labs

error in selectFeatures

Hi, scmap team. I have trouble in selectFeatures function.

When I run this code,
sce <- selectFeatures(sce, n_features = 500, suppress_plot = FALSE)

I get this error:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases

I looked up this command in your code and found that this error came from 'LinearModel' function.
One reason for this error probably is that 'isSpike' is deprecated.
Besides isSpike, my input data is already processed so all values in 'dropouts' are '0', therefore all genes are filtered out.
I have no idea how to solve this problem.

My script is:

sce <- SingleCellExperiment(assays = list(normcounts = as.matrix(refmatrix)), colData = annot)
logcounts(sce) <- log2(normcounts(sce) + 1)

rowData(sce)$feature_symbol <- rownames(sce)

sce <- sce[!duplicated(rownames(sce)), ]
sce

sce <- selectFeatures(sce, n_features = 500, suppress_plot = FALSE)

Thanks for your help.

Poor self-mapping with SMART-seq data

Hello,

Thank you for developing scmap.

I was running a self-projection of Tabula Muris dataset to test the method and found that on the SMART-seq version of the dataset the proportion of 'unassigned' cells is very high - up to 50%. This is apparently caused by similarity metrics falling below the specified threshold. The same analysis on the Tabula Muris UMI-based dataset yields much higher assignment rate, over 90% (data from this publication: https://doi.org/10.1186/s13059-019-1795-z).
I was wondering what could be a reason for this difference between technologies? Any suggestions would be appreciated.

Why are the clusters of the reference dataset changed?

Hi,

I'd like this package very well. But here I have a little problem about the scmap-cluster.

When I did all the process for projection, the results were a little confused. The clusters are changed and the ratio of different clusters are quite dfifferent. Also, there are some unassigned cells. But as I know, the clusters would not change after the process, so I don't know what's the problem.

Thanks very much and hope to get the reply as soon as possible.

not able to plot sankey diagram

Hi,

When I followed the tutorial, I cannot plot sankey diagram in my firefox browser(ubuntu 16.04), I really doubt that it is due to our internet connection to google(from china).....
here is what I got

Is there any alternative to change inside function to make plot works?

Thanks!

large part of the heatmap of select features is blank

when I draw heatmap after index cluster, it turned out a big blank, is this normal to see?
here is my code:
proj.sce <- SingleCellExperiment(assays = list(normcounts = proj.mat), colData = proj.cellname)
logcounts(proj.sce) <- log2(normcounts(proj.sce)+1)
rowData(proj.sce)$feature_symbol <- rownames(proj.sce)
proj.sce <- selectFeatures(proj.sce, suppress_plot=F)
proj.sce <- indexCluster(proj.sce)
heatmap(as.matrix(metadata(proj.sce)$scmap_cluster_index))

Issue with scmap-cluster function output/ Variable inconsistent assignment

Dear Dr. Kiselev, Dr. Hemberg,

I have been encountering an issue when running the scmap-cluster pipeline and would like to ask you for some input on this matter. It seems that independent runs result in variable and inconsistent assignments. There seem to be two main sets of results, one of which seems fair based on prior knowledge about the composition of the query dataset. The second set of results comprises assignments that seem to reflect an imperfect match rather than a complete mess. While it might still be to some degree possible that the assignment that I think is problematic is actually correct, the issue remains of the variability in the mapping results. I should stress that the "wrong" mapping occurs much more frequently than the one I feel would be correct, if I run scmap from scratch repeatedly, suggesting it may be right after all. Still, the variability troubles me.

I have possibly narrowed down the steps resulting in variable outcomes to the reference normalization (see script below). I tried increasing the number of selected features thinking that this may mitigate the impact of the previous steps, but that didn´t seem to be the case.

sf2 <- 2^rnorm(ncol(sce2))
sf2 <- sf2/mean(sf2)
normcounts(sce2) <- t(t(counts(sce2))/sf2)

counts(sce2) <- normcounts(sce2)
logcounts(sce2) <- log2(normcounts(sce2)+1)
rowData(sce2)$feature_symbol<-rownames(sce2)

I would be happy to receive some suggestions from your side.

Similarity values are not between 0 and 1

Hi there, I have been using scmap cell2cluster to annotate both human and mouse data sets. The cell type annotation results that we get seem to make sense but the similarity values are not in the expected range of 0 to 1. This seems to be a bug in scmap-cell.
When running a test where the reference dataset cells are split into test and train data, the values are in the correct range for all 3 settings (cluster, cell, cell2cluster). However, when applying our own query data the problem occurs with cell (but not cluster) and is then propagated to cell2cluster. We have experienced this issue with 2 unique datasets using 3 different reference datasets.
I would appreciate your help to address this issue!

For reference:
scmap version 1.8.0
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

trouble installing scmap

I want to run scmap on some single cell datasets but ran into the following problems when installing the package. Any help would be much appreciated. Thx!

Installing from Bioconductor (OS X El Capitan 10.11.6 )

source("https://bioconductor.org/biocLite.R")
Bioconductor version 3.5 (BiocInstaller 1.26.1), ?biocLite for help
biocLite("scmap")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.5 (BiocInstaller 1.26.1), R 3.4.2 (2017-09-28).
Installing package(s) ‘scmap’
Warning message:
package ‘scmap’ is not available (for R version 3.4.2)

Installing from github (OS X El Capitan 10.11.6 )

devtools::install_github("hemberg-lab/scmap")
** testing if installed package can be loaded
Creating a generic function for ‘toJSON’ from package ‘jsonlite’ in package ‘googleVis’
Error: package or namespace load failed for ‘scmap’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/Users/dat4git/Library/R/3.4/library/scmap/libs/scmap.so':
dlopen(/Users/dat4git/Library/R/3.4/library/scmap/libs/scmap.so, 6): Library not loaded: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
Referenced from: /Users/dat4git/Library/R/3.4/library/scmap/libs/scmap.so
Reason: Incompatible library version: scmap.so requires version 3.5.0 or later, but libRlapack.dylib provides version 3.4.0
Error: loading failed
Execution halted

Note: Looks like the libRlapack.dylib version is causing the problem. Is there a simple way to fix this on a Mac?

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
...

Also tried installing on Linux ..

devtools::install_github("hemberg-lab/scmap")
Downloading GitHub repo hemberg-lab/scmap@master
from URL https://api.github.com/repos/hemberg-lab/scmap/zipball/master
Installing scmap
'/broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.4.0/lib64/R/bin/R'
--no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL
'/tmp/15148918.1.interactive/Rtmp7l66Tu/devtools718c38bee8c4/hemberg-lab-scmap-b9cdf52'
--library='/home/unix/dat4git/R/x86_64-pc-linux-gnu-library/3.4'
--install-tests

installing source package 'scmap' ...
** libs
make: Nothing to be done for `all'.
installing to /home/unix/dat4git/R/x86_64-pc-linux-gnu-library/3.4/scmap/libs
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
Creating a generic function for 'toJSON' from package 'jsonlite' in package 'googleVis'
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
Creating a generic function for 'toJSON' from package 'jsonlite' in package 'googleVis'
Error: package or namespace load failed for 'scmap' in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/unix/dat4git/R/x86_64-pc-linux-gnu-library/3.4/scmap/libs/scmap.so':
/home/unix/dat4git/R/x86_64-pc-linux-gnu-library/3.4/scmap/libs/scmap.so: invalid ELF header
Error: loading failed
Execution halted
ERROR: loading failed
removing '/home/unix/dat4git/R/x86_64-pc-linux-gnu-library/3.4/scmap'
Installation failed: Command failed (1)

sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)

Matrix products: default
BLAS: /broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.4.0/lib64/R/lib/libRblas.so
LAPACK: /broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.4.0/lib64/R/lib/libRlapack.so
...

error on selectFeatures

Hello,

I use scmap to annotate cell types based on a reference annotation dataset. The reference annotation dataset was downloaded from celldex. However, I encounter an error when I choose HumanPrimaryCellAtlasData. The function of selectFeatures can run properly when I choose DatabaseImmuneCellExpressionData. Unlucky, I want to use HumanPrimaryCellAtlasData for future analysis.

There is the code that does not work properly(HumanPrimaryCellAtlasData)：

ref<-celldex::HumanPrimaryCellAtlasData()
colData(ref)$cell_type1 <- colData(ref)$label.fine
rowData(ref)$feature_symbol <- rownames(ref)
ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)),
colData=colData(ref), rowData=rowData(ref))
ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE）

The error is

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
In addition: Warning message:
In linearModel(object, n_features) :
Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...

The same code can be run rightly if I use DatabaseImmuneCellExpressionData

ref <- celldex::DatabaseImmuneCellExpressionData()
colData(ref)$cell_type1 <- colData(ref)$label.fine
rowData(ref)$feature_symbol <- rownames(ref)
ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)), colData=colData(ref), rowData=rowData(ref))
ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE)

I have changed logcounts to counts and expanded value by power 10, but that did not work.

I speculate the error is due to the value of logcounts.
This is the logcounts value on HumanPrimaryCellAtlasData which report an error

This is the logcounts value on DatabaseImmuneCellExpressionData which can run properly.

Could you please help me solve this problem?

Thanks

Error on scmapCell

Hi,
I try to compare the result of scmapCluster and scmapCell on same query dataset . The scmapCluster function runs smoothly but I got error when I run scmapCell on the following.

Error in NN(w, ncol(subcentroids[[1]]), dists$subcentroids, subclusters, :
Expecting a single value: [extent=0].
Calls: scmapCell -> scmapCell -> NN
Execution halted

I have no idea of what's wrong with this and I would appreciate if someone can help !
Here's my code:

pg <- c('Seurat','SummarizedExperiment','scmap','readr','data.table','SingleCellExperiment', 'scater','patchwork')
for(i in pg){
    suppressMessages(library(i, character.only = T))
}
# GSE117988_aftercluster.rds is a Seurat object
HCC <- read_rds("./GSE117988_aftercluster.rds") # Have both cell index and cluster index
query_sce <- SingleCellExperiment(assays = list(counts = as.matrix(HCC@assays$RNA@counts)), colData = [email protected])
query_sce <- logNormCounts(query_sce)
rowData(query_sce)$feature_symbol <- rownames(query_sce)

a <- list.files("./SingleR_data/")
b <- sapply(strsplit(a , split = "[.]"),"[",1)
ref_list <- list()
for(i in 1:length(b)){
    x = paste("./SingleR_data/",a[i], sep = "")
    # metadata in cell_index is a list ,can't use as.matrix
    ref_list[[i]] <- metadata(read_rds(x))$scmap_cell_index
    names(ref_list)[i] <- b[i]
    print(paste(b[i] , "is load into the ref_list !", sep=" "))
}

scmapCell_results <- scmapCell(query_sce, list(GSE115678 = ref_list$GSE115678), w=10)

Problem with selectFeatures

Dear scmap team,

I am trying to project my scRNA-seq dataset against the Tabula Muris droplet-based reference (which I download from here). This is my code

# Load the reference matrix and metadata
metadata_url <- "https://raw.githubusercontent.com/czbiohub/tabula-muris-vignettes/master/data/TM_droplet_metadata.csv"
tm_mat <- readRDS("data/TM_droplet_mat.rds")
tm_metadata <- read_csv(metadata_url, col_names = TRUE)

# Create SingleCellExperiment object
tm_metadata <- as.data.frame(tm_metadata)
rownames(tm_metadata) <- tm_metadata$cell
tm <- SingleCellExperiment(
  assays = list(counts = tm_mat),
  colData = tm_metadata
)

# Index reference dataset
logcounts(tm) <- log((counts(tm) / colSums(counts(tm)) * 10000) + 1)
rowData(tm)$feature_symbol <- rownames(tm)
tm <- selectFeatures(tm, suppress_plot = FALSE)

However, I get this error: Error in rowSums(count == 0) :
'x' must be an array of at least two dimensions

I looked for this command in your code but couldn't find it. If you could help me with this it would be very helpful. Thanks in advance for that and for your packages, they are amazing!

Best

Ramon

Big dataset issue

Hi,

Many thanks for creating this tool.

Please may I ask when I work with a big dataset set (say about 100k cells), can I divide the dataset into several smaller subsets and work on those subsets, subsequently merge the annotated cell metadata of those subsets and assign to the original dataset?

I came up with this question because litererally I can not run on the big dataset. The error message is as follow:

"Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Calls: run_scmap_seurat ... scmapCell -> as.matrix -> as.matrix.Matrix -> as -> asMethod
Execution halted"

Or please could you advise me any other solutions?

Many thanks.

Regards,
Dien

Some general questions about scmap

Hi there,

The more I use your package, the more I'm liking your method! However, to help me (and maybe others) understand the package better, I thought I will post the following general questions I had in my mind:

What does it mean when the similarity score is NA?
When I run scmapCluster on the same object as the reference (as shown in your example code), I get back a whole SingleCellExperiment object, with just the relevant slots filled with the results, correct? But when I run it on a different object than the reference, then the output is just a list with three elements. Correct? This was just for my understanding, and if there are specific reasons for this, it will be great if you can please clarify!
What exactly is in scmap_scores column in the rowData(object) slot?
And finally, the order of the cells in the output is the same as the order of the cells in the input that we give in the "projection" argument of scmapCluster. Correct? So, essentially, to identify the cells, I can simply assign the names of the cells in the input projection object to the rownames of the output "results" object. Correct?

I would very much appreciate the answers!

Thanks.

Feature selection

Hi there! Thank you for the very useful package. I have a question on selectFeatures and whether other methods, such as using highly variable genes, will be implemented? I realise that the M3Drop preprint suggests that it out performs other feature selection methods, at least in full-transcript scRNA-seq protocols (I have UMI data by the way). If I want to implement something myself, is it simply defining a logical vector to scmap_features for the genes I want to use?

My problem is that the M3Drop method doesn't identify representative genes that define one of my clusters and hence even when I project the data onto itself, only half of the cells in that cluster are assigned back to that cluster. I've tried increasing the number of features/genes up to 2,000 but it doesn't improve.

Saving the Sankey Diagram as a PDF file

Hello!

Just another question, this time about the Sankey diagram function of scmap. I like this very much, as it gives a good overview of how many cells go into which clusters and such. I see that the HTML version of the diagram that you have is also cool, as it is interactive. However, for other purposes like presentations and publications, PDFs would be convenient. So, is there an easy way to save the output of the "getSankey" function as a PDF file?

That will be great! Thanks a lot!

Question about using sorted cell bulk microarray data (ImmGen) as reference

Hello,

Thanks for this helpful package. I have a question about mapping my scRNAseq data to the Immgen microarray data as the reference. As you probably already know, Immgen contains microarray data from hundreds of sorted immune cells from mice. I was able to get something to work, but I'd like to double-check to make sure I'm using this appropriately. Please see my code below:

Setup scRNAseq data

combined is a Seurat object containing 4 separate 10x runs aggregated. There are 15 clusters in the experiments named as numbers from 1 to 15.

sce <- SingleCellExperiment(assays = list(counts = as.matrix(combined@assays$RNA@counts),
                                          logcounts = as.matrix(combined@assays$RNA@data)), 
                            colData = [email protected])

rowData(sce)$feature_symbol <- rownames(sce)

sce <- selectFeatures(sce, suppress_plot = FALSE)

table(rowData(sce)$scmap_features)

sce <- indexCluster(sce, cluster_col = "numeric_clusters")

head(metadata(sce)$scmap_cluster_index)

heatmap(as.matrix(metadata(sce)$scmap_cluster_index))

Setup Reference dataset

I prepared Immgen microarray data by using RMA normalization for microarrays previously and saved as immgen.rds. In this dataset I collapsed the biological replicates (in most of the cells there are 3 replicates) to one datapoint by taking average after normalizations. This is a data frame containing normalized and log2 transformed expression values. I'm reading this data frame as the reference for scmap. immannot provides long names and metadata for the individual samples.

immgen <- readRDS("immgen.rds")
immannot <- readRDS("immannot.rds")

ref_sce <- SingleCellExperiment(assay=list(logcounts = immgen[, 2:dim(immgen)[2]]),
                                colData = immannot)

rownames(ref_sce) <-immgen$GeneName
rowData(ref_sce)$feature_symbol <- immgen$GeneName

Map my own unknown data to immgen reference

scmapCluster_results <- scmapCluster(
  projection = ref_sce, 
  index_list = list(
  oconnell = metadata(sce)$scmap_cluster_index),
  threshold = 0     # I'm keeping this very low to see even the weekly associated cell types
)

# Prepare and informative and easy-to-explore data frame of the results
res_df <- data.frame(label = as.character(scmapCluster_results$scmap_cluster_labs),
                     similarity = as.character(scmapCluster_results$scmap_cluster_siml),
                     immgen_short = immannot$short_name,
                     immgen_type= immannot$reference_cell_type)

head(res_df)

##  label        similarity   immgen_short immgen_type
##1    11 0.871871179824524 MF.11c-11b+.Lu  Macrophage
##2     8 0.491127572913492      MF.Alv.Lu  Macrophage
##3     9 0.666813607696411     Mo.6+2+.BL    Monocyte
##4     9 0.591040179641838    Mo.6+2+.MLN    Monocyte
##5     7 0.882301399545328    Mo.6+2+.SLN    Monocyte
##6     9 0.667127563564599     Mo.6+2-.BL    Monocyte

Can you see anything wrong here? Is it a problem to project between scRNAseq data and bulk reference?

I also cross-posted this on Bioconductor. Feel free to close the issue if this is not the appropriate place to seek help for my question.

Best,
Atakan

problem with scmapCluster

Dear scmap team,
I am trying to project my scRNA-seq dataset(pyKid) against the Mus musculus kidney single-cell data (MCA). Following is my code and it report the error:

then I using the source code to check my data as following:

I am confused and looking forward to your help!

Best

ninggongzi

What kind of reference dataset can be use?

Hi,
I have questions about reference data. In tutorial , the reference dataset yan is normalized by FPKM. So , whether the input must be FPKM ?
What about raw counts of bulk RNA-seq data, scRNA-seq data or TPM data from TCGA ?
Thank you in advance.
Fiona

getSankey function

The current version of the output is very limited regarding customization. Are there any plans on upgrading that to use more flexible libraries, such as Plotly? https://plot.ly/r/sankey-diagram/

Error using scmapCluster

Hi,

I am trying to project my data to and index I have built but it yields an error and I do not know how to move on. Based on my search, this function seems to be associated to Seurat but I do not see where it would be employed since I have transformed the objects into SCE.


scmapCluster_results <- scmapCluster(
  projection = mir128_28H_SCE,
  index_list = list(metadata(zebrafish_Atlas_SCE)$scmap_cluster_index))

Warning: The following arguments are not used: drop
Error in CellsByIdentities(object = object, cells = cells, ...) :
unused argument (drop = FALSE)
In addition: Warning message:
In setFeatures(projection, rownames(index)) :
Features ATP5MD, cct2.1, CR318588.4, EIF1B, h3f3b.1, h3f3b.1.1, h3f3b.1.2, hbae1.3.1, her15.1.1, her4.2.1, hnrnpa0l.1, krt18, mt-co3, rrm2.1, SERP1, sinhcafl.1, zgc:92066 are not present in the 'SCESet' object and therefore were not set.

Here are the objects I am using:

> mir128_28H_SCE

class: SingleCellExperiment
dim: 27310 16186
metadata(0):
assays(1): ''
rownames(27310): aaas aacs ... LOC101886033 LOC100536144
rowData names(1): feature_symbol
colnames(16186): MT28_AAACCCAAGAGTAACT-1 MT28_AAACCCAAGATGCTAA-1 ... WT28_TTTGTTGTCGCCTTGT-1 WT28_TTTGTTGTCTATCGGA-1
colData names(0):
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

> zebrafish_Atlas_SCE

class: SingleCellExperiment
dim: 32520 44020
metadata(1): scmap_cluster_index
assays(3): counts logcounts scaledata
rownames(32520): ptpn12 phtf2 ... LAMP5 CABZ01109058.1
rowData names(3): feature_symbol scmap_features scmap_scores
colnames(44020): AAACCTGAGACAGGCT-1 AAACCTGAGAGCAATT-1 ... TTTGTCATCCGGCACA-6 TTTGTCATCCTTGCCA-6
colData names(16): orig.ident percent.mito ... nFeature_RNA ident
reducedDimNames(3): PCA TSNE UMAP
mainExpName: RNA
altExpNames(0):

This is my session info in case it helps:
sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)

Matrix products: default
BLAS/LAPACK: /gpfs/ycga/project/nicoli/gb587/conda_envs/r_env/lib/libopenblasp-r0.3.18.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] scmap_1.16.0 anndata_0.7.5.3 reticulate_1.23 basilisk_1.6.0 SingleCellExperiment_1.16.0
[6] SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 IRanges_2.28.0
[11] S4Vectors_0.32.3 BiocGenerics_0.40.0 MatrixGenerics_1.6.0 matrixStats_0.61.0 SeuratObject_4.0.4
[16] Seurat_4.1.0

loaded via a namespace (and not attached):
[1] Rtsne_0.15 colorspace_2.0-2 deldir_1.0-6 class_7.3-20 ellipsis_0.3.2 ggridges_0.5.3
[7] XVector_0.34.0 proxy_0.4-26 spatstat.data_2.1-2 farver_2.1.0 leiden_0.3.9 listenv_0.8.0
[13] ggrepel_0.9.1 fansi_1.0.2 codetools_0.2-18 splines_4.1.2 polyclip_1.10-0 jsonlite_1.7.3
[19] ica_1.0-2 cluster_2.1.2 png_0.1-7 uwot_0.1.11 shiny_1.7.1 sctransform_0.3.3
[25] spatstat.sparse_2.1-0 compiler_4.1.2 httr_1.4.2 googleVis_0.6.11 assertthat_0.2.1 Matrix_1.4-0
[31] fastmap_1.1.0 lazyeval_0.2.2 later_1.3.0 htmltools_0.5.2 tools_4.1.2 igraph_1.2.11
[37] gtable_0.3.0 glue_1.6.0 GenomeInfoDbData_1.2.7 RANN_2.6.1 reshape2_1.4.4 dplyr_1.0.7
[43] Rcpp_1.0.8 scattermore_0.7 vctrs_0.3.8 nlme_3.1-155 lmtest_0.9-39 stringr_1.4.0
[49] globals_0.14.0 mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.1 irlba_2.3.5 goftest_1.2-3
[55] future_1.23.0 basilisk.utils_1.6.0 MASS_7.3-55 zlibbioc_1.40.0 zoo_1.8-9 scales_1.1.1
[61] spatstat.core_2.3-2 promises_1.2.0.1 spatstat.utils_2.3-0 parallel_4.1.2 RColorBrewer_1.1-2 yaml_2.2.1
[67] pbapply_1.5-0 gridExtra_2.3 ggplot2_3.3.5 rpart_4.1-15 stringi_1.7.6 randomForest_4.6-14
[73] e1071_1.7-9 filelock_1.0.2 rlang_0.4.12 pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-45
[79] ROCR_1.0-11 purrr_0.3.4 tensor_1.5 labeling_0.4.2 patchwork_1.1.1 htmlwidgets_1.5.4
[85] cowplot_1.1.1 tidyselect_1.1.1 parallelly_1.30.0 RcppAnnoy_0.0.19 plyr_1.8.6 magrittr_2.0.1
[91] R6_2.5.1 generics_0.1.1 DelayedArray_0.20.0 DBI_1.1.2 pillar_1.6.4 mgcv_1.8-38
[97] fitdistrplus_1.1-6 survival_3.2-13 abind_1.4-5 RCurl_1.98-1.5 dir.expiry_1.2.0 tibble_3.1.6
[103] future.apply_1.8.1 crayon_1.4.2 KernSmooth_2.23-20 utf8_1.2.2 spatstat.geom_2.3-1 plotly_4.10.0
[109] grid_4.1.2 data.table_1.14.2 digest_0.6.29 xtable_1.8-4 tidyr_1.1.4 httpuv_1.6.5
[115] munsell_0.5.0 viridisLite_0.4.0

Thanks for your help!

Best,
Gabriel

Add 'codetools' to Suggests:

Hello,

your package vignette relies indirectly(*) on the codetools package, which is a package that is not guaranteed to be installed on all systems; it's only a recommended package. Because of this, R CMD check on you package fails on such systems (see below).

(*) Searching the code of your package for codetools gives zero hits, which suggests that it codetools is used by another package that your package depends on, but only as a soft dependency. Because of this, you need to declare codetools as a dependency in your package.

To fix this, please add codetools to Suggests:, i.e.

Suggests: knitr,
    rmarkdown,
    codetools

Example what happens when codetools is not installed:

* using R version 4.3.1 (2023-06-16)
* using platform: x86_64-pc-linux-gnu (64-bit)
...* checking re-building of vignette outputs ... ERROR
Error(s) in re-building vignettes:
  ...
  --- re-building ‘scmap.Rmd’ using rmarkdown
  Quitting from lines 15-17 [knitr-options] (scmap.Rmd)
  Error: processing vignette 'scmap.Rmd' failed with diagnostics:
  there is no package called 'codetools'
  --- failed re-building ‘scmap.Rmd’    
  SUMMARY: processing the following file failed:
  ‘scmap.Rmd’
    
Error: Vignette re-building failed.
Execution halted

Encountering problems with selectFeatures

Hello,

I am trying to use scTransformed counts instead of log-transformed counts for training and predicting cell types. I created a Seurat object, performed the sctranform and then created a SingleCellExperiment object from the sctransformed counts. The execution fails at this line:
sce_with_sctransformed_counts <- selectFeatures(sce_with_sctransformed_counts, suppress_plot = TRUE), producing the following error:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  0 (non-NA) cases
In addition: Warning message:
In linearModel(object, n_features) :
  Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...

It does not seem like there are any NA values in the logcounts

any(is.na(logcounts(sce_with_sctransformed_counts)))
[1] FALSE

I suspected that since the sctransformed values are negative, it could be a potential source of error. Hence, I executed the following:

logcounts(sce_with_sctransformed_counts) <- log2(normcounts(sce_with_sctransformed_counts) + 5)

But I still keep getting the same error.

Could you please help me with this?

This is a portion of the code I am using.

sce_seurat <- CreateSeuratObject(counts = Data[,Train_Idx[[i]]])
sce_seurat <- SCTransform(object = sce_seurat)
sce_with_sctransformed_counts = SingleCellExperiment(assays = list(normcounts = GetAssayData(object = sce_seurat, slot = "scale.data")), 
colData = data.frame(cell_type1 = Labels[Train_Idx[[i]]]))
rowData(sce_with_sctransformed_counts)$feature_symbol <- rownames(sce_with_sctransformed_counts)
logcounts(sce_with_sctransformed_counts) <- log2(normcounts(sce_with_sctransformed_counts) + 5)			
sce_with_sctransformed_counts <- selectFeatures(sce_with_sctransformed_counts, suppress_plot = TRUE)

Thank you.

scmapCell error

Hi,

I am running into a nearest neighbor computing error when running scmapCell function. Have you seen similar errors? What are some steps that I can take to resolve it?

Best
Yuqi

`sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] SingleCellExperiment_1.6.0 SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1 matrixStats_0.55.0
[6] Biobase_2.44.0 GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.3 S4Vectors_0.22.1
[11] BiocGenerics_0.30.0 scmap_1.6.0 singleCellNet_0.1.0 cowplot_1.0.0 reshape2_1.4.3
[16] pheatmap_1.0.12 dplyr_0.8.3 ggplot2_3.2.1

loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 purrr_0.3.3 lattice_0.20-38 expm_0.999-4 colorspace_1.4-1 yaml_2.2.0 rlang_0.4.2
[8] e1071_1.7-3 pillar_1.4.3 glue_1.3.1 withr_2.1.2 RColorBrewer_1.1-2 GenomeInfoDbData_1.2.1 lifecycle_0.1.0
[15] plyr_1.8.5 stringr_1.4.0 zlibbioc_1.30.0 munsell_0.5.0 gtable_0.3.0 mvtnorm_1.0-11 codetools_0.2-16
[22] class_7.3-15 Rcpp_1.0.3 scales_1.1.0 jsonlite_1.6 XVector_0.24.0 googleVis_0.6.4 stringi_1.4.3
[29] grid_3.6.1 tools_3.6.1 bitops_1.0-6 DescTools_0.99.31 magrittr_1.5 proxy_0.4-23 lazyeval_0.2.2
[36] RCurl_1.95-4.12 tibble_2.1.3 randomForest_4.6-14 crayon_1.3.4 pkgconfig_2.0.3 MASS_7.3-51.4 Matrix_1.2-18
[43] assertthat_0.2.1 rstudioapi_0.10 boot_1.3-23 R6_2.4.1 compiler_3.6.1 `

scmap and RF/SVM comparisons

Hi there,

I just came across your scmap R package. Pretty cool, and I am about to try it for my 10X data.

While I was going through your preprint, I saw that in your comparisons, you had said that scmap slightly under-performed compared to RF and SVM classifiers. Could you throw some more light onto this? The thing is, scmap readily seems to offer a threshold using which I can control the "unclassified" cases. I really like this! If I were to build an RF classifier myself, then that might take more time. So, I am planning to use scmap. However, I would like to know fully well as to how scmap compares to RF classifiers, so that I can answer any such questions to the biologist whose data I am analyzing. So, any inputs from your group would be appreciated!

By the way, when do you expect the full paper to be published?

Investigating bad self-mapping of a dataset

Hello. I'm trying to use scmap to analyse several related datasets that I have and encountered a problem: my reference dataset maps badly to itself (scmap-cluster method, kappa score 0.017 and unassigned fraction of 52%). I've tried to initialise logcounts to log2(rawcounts + 1), to seurat's normalised counts and to seurat's scaled counts. The first two strategies yield almost the same result, and the last one results in this error: “scmap index is empty because the median expression in the selected features is 0 in every cell cluster!”. I think I'm doing something wrong with the initialisation of SingleCellExperiment, or probably with using scmap, but I don't know how to investigate that. I will be glad for any suggestions

Order of cells in "scmapCluster_results$scmap_cluster_labs"

Hi there again,

I have another question for you. Is the order of the cells in the "scmapCluster_results$scmap_cluster_labs" column exactly the same as the order of the cells in the input to the "projection" argument to the "scmapCluster" function? Since there are no rownames in "scmapCluster_results", I was a bit confused about this. Adding rownames to the output would be a great addition! This order is very important so that I can identify the original cells and trace them back to the new labels.

Many thanks for your response!

broken URL in README

Hi there @wikiselev

I think the URL in the README directing users to the Hemberg Lab cloud service is broken---I was re-directed to all sorts of wild stuff.

Here is the issue:
"""
A Cloud implementation of scmap can be used for free without any restriction here, i.e. http://www.hemberg-lab.cloud/scmap
"""

Might be best to just remove the URL until the domain issues (or related problems) are solved.

Thanks, Evan

Cancer tumor reference dataset

Hello. Do you have recommended list or source of cancer tumor datasets that can be used as reference for projecting samples in scmap?

how close from a cluster is something unassigned?

is there a way to know the clusters that a cell classified as unassigned is closest to?
is there a way to output a confusion matrix?

Different number of elements passed to dot() function by scmapCell

I am getting the following error when I try to run the scmapCell function:

library(scmap)

library(SingleCellExperiment)

source <- readRDS("source.rds")

target <- readRDS("target.rds")

counts(target) <- normcounts(target) # normcounts have the same zeros as raw counts

target <- selectFeatures(target, n_features = 1000)

target <- indexCell(target)

# Parameter M was not provided, will use M = n_features / 10 (if n_features <= 1000), where n_features is the number of selected features, and M = 100 otherwise.
# Parameter k was not provided, will use k = sqrt(number_of_cells)

result <- scmapCell(source, index_list = list(metadata(target)$scmap_cell_index))

# error: dot(): objects must have the same number of elements
# Error in NN(w, ncol(subcentroids[[1]]), dists$subcentroids, subclusters,  : 
#  dot(): objects must have the same number of elements

I have uploaded the two SCE objects to Dropbox so you can have a reproducible example:

https://www.dropbox.com/sh/tzu8v67raw7g7bd/AADAZzyeNc0iU88ZdeaVgvWza?dl=0

Session information is listed below:

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggalluvial_0.12.1           scmap_1.9.3                 scran_1.16.0                scater_1.16.2              
 [5] ggplot2_3.3.2               SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.2 DelayedArray_0.14.1        
 [9] matrixStats_0.56.0          Biobase_2.48.0              GenomicRanges_1.40.0        GenomeInfoDb_1.24.2        
[13] IRanges_2.22.2              S4Vectors_0.26.1            BiocGenerics_0.34.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5                rsvd_1.0.3                locfit_1.5-9.4            lattice_0.20-41          
 [5] class_7.3-17              digest_0.6.25             packrat_0.5.0             plyr_1.8.6               
 [9] R6_2.4.1                  e1071_1.7-3               pillar_1.4.6              zlibbioc_1.34.0          
[13] rlang_0.4.7               rstudioapi_0.11           irlba_2.3.3               googleVis_0.6.6          
[17] Matrix_1.2-18             labeling_0.3              BiocNeighbors_1.6.0       BiocParallel_1.22.0      
[21] statmod_1.4.34            stringr_1.4.0             igraph_1.2.5              RCurl_1.98-1.2           
[25] munsell_0.5.0             proxy_0.4-24              compiler_4.0.2            vipor_0.4.5              
[29] BiocSingular_1.4.0        xfun_0.16                 pkgconfig_2.0.3           ggbeeswarm_0.6.0         
[33] tidyselect_1.1.0          tibble_3.0.3              gridExtra_2.3             GenomeInfoDbData_1.2.3   
[37] edgeR_3.30.3              codetools_0.2-16          randomForest_4.6-14       viridisLite_0.3.0        
[41] crayon_1.3.4              dplyr_1.0.2               withr_2.2.0               bitops_1.0-6             
[45] grid_4.0.2                jsonlite_1.7.0            gtable_0.3.0              lifecycle_0.2.0          
[49] magrittr_1.5              scales_1.1.1              dqrng_0.2.1               stringi_1.4.6            
[53] farver_2.0.3              reshape2_1.4.4            XVector_0.28.0            viridis_0.5.1            
[57] limma_3.44.3              DelayedMatrixStats_1.10.1 ellipsis_0.3.1            generics_0.0.2           
[61] vctrs_0.3.2               tools_4.0.2               glue_1.4.1                beeswarm_0.2.3           
[65] purrr_0.3.4               colorspace_1.4-1          knitr_1.29

Quite low accuracy using index cluster

Hi, I am trying to use scmap to predict unlabeled datasets (let's call query) based on reference dataset. I use scmap index cluster followed by this vignette. The prediction accuracy is quite low, about only 0.007. However accuracy produced by scmapCellindex is quite high, about 0.91. I checked other papers which also used scmap to predict labels, it seems like indexCluster function work well at their experiment. I don't know whether there is any mistake in my code.

I checked prediction results, there lots of "unassigned" label in it. Can you give an example of how to use scmap index cluster to do prediction work based on reference datasets projecting to query dataset ?

Outdated References

Q: Is scmap published?
A: Not yet, but a copy of scmap manuscript is available on bioRxiv.

That is no longer the case. Also, the CITATION file requires updating.

Normalisation best practices

Hi!
I'm following the tutorial here https://github.com/hemberg-lab/tutorials/blob/master/scmap/scmap.Rmd and the vignettes here http://bioconductor.org/packages/release/bioc/manuals/scmap/man/scmap.pdf and trying to project my one 10X experiment on another 10X experiment.

I suspect the datasets in the tutorial and the yan dataset in vignettes are already normalised, so you can just initialise SCE as SingleCellExperiment(assays = list(normcounts = as.matrix(yan)), colData = ann), directly providing normcounts. My datasets are just counts, and I tried to normalise them by using normalise function from scmap itself, but it is not exported for me when I do require(scmap).

I would appreciate any suggestions. Perhaps, I can use Seurat's normalisation with SCE object.
Thank you

selectFeatures() Error

Hi,

I'm attempting to use scmap on an object that I have converted from Seurat to SCE with Seurats Convert() function.

When attempting to call selectFeatures(), I receive the following error:

Error in rowSums(count == 0) : 'x' must be an array of at least two dimensions

I looked through the code on the following pages:
Core Methods
Utils

It looks like these lines from the linearModel() function are causing the error:

count <- as.matrix(counts(object))
dropouts <- rowSums(count == 0)/cols * 100

I am confused though, as if I simply run these lines in isolation, I receive no errors and my 'count==0' matrix is a normal sized matrix (i.e. at least two dimensions), and I don't see anything else in the code where it might have changed in a way to cause this error, as from my understanding, the object I'm passing to selectFeatures is passed to linearModel which then pulls out the counts so running the two lines in isolation should also cause the error?

Any advice would be greatly appreciated,

Liam

hemberg-lab / scmap Goto Github PK

scmap's People

Contributors

Stargazers

Watchers

Forkers

scmap's Issues

Setup scRNAseq data

Setup Reference dataset

Map my own unknown data to immgen reference

Recommend Projects

Recommend Topics

Recommend Org