hemberg-lab / scmap Goto Github PK
View Code? Open in Web Editor NEWA tool for unsupervised projection of single cell RNA-seq data
Home Page: http://bioconductor.org/packages/scmap
License: GNU General Public License v3.0
A tool for unsupervised projection of single cell RNA-seq data
Home Page: http://bioconductor.org/packages/scmap
License: GNU General Public License v3.0
Hi there,
The help page of of scmapCluster mentions that we can potentially give the probability values for the SVM/RF classifiers through the "threshold" argument. There are also such functions for SVM and RF in the "Utils.R" script on scmap's repository here: https://github.com/hemberg-lab/scmap/blob/92566d3db92d5595eedbdf20c7f03aa2a1a053ac/R/Utils.R
So, can we actually get scmap to implement the SVM and RF methods of classifications that you actually use in the paper for any given dataset?
Just wondering, and would appreciate your answer!
Thanks!
When I print my heat map several rows are empty not sure why. Have tried it with several datasets and still get the same issue.
hello ,Vladimir Kiselev and Martin Hemberg
I got error for this code:
SAMPLE <- Seurat::Read10X("./00_Data/Sample_matrix_10X/")
sce <- SingleCellExperiment::SingleCellExperiment(assays=list(counts= SAMPLE ))
seur <- Seurat::CreateSeuratObject(SAMPLE )
sce <- Seurat::as.SingleCellExperiment(seur)
sce <- scater::logNormCounts(sce)
rowData(sce)$feature_symbol <- rownames(sce)
#Feature selection
sce <- scmap::selectFeatures(sce, suppress_plot = FALSE)
##error
sce <- indexCluster(sce)
Error in scmap::indexCluster(sce) :
Please define an existing cluster column of the colData
slot of the input object using the cluster_col
parameter!
Thanks a lot!
Hi, thanks for the insightful work! Now I am using scmap to annotate query data based on reference data. I wonder how to get the confidence score of the predicted cell type.
Following is my core code:
# scmap Cluster
ref_sce <- scmap::indexCluster(ref_sce)
scmapCluster_results <- scmapCluster(projection = query_sce,
index_list = list(yan = metadata(ref_sce)$scmap_cluster_index),
threshold=0)
pred <- scmapCluster_results$scmap_cluster_labs
Hi, scmap team. I have trouble in selectFeatures function.
When I run this code,
sce <- selectFeatures(sce, n_features = 500, suppress_plot = FALSE)
I get this error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
I looked up this command in your code and found that this error came from 'LinearModel' function.
One reason for this error probably is that 'isSpike' is deprecated.
Besides isSpike, my input data is already processed so all values in 'dropouts' are '0', therefore all genes are filtered out.
I have no idea how to solve this problem.
My script is:
sce <- SingleCellExperiment(assays = list(normcounts = as.matrix(refmatrix)), colData = annot)
logcounts(sce) <- log2(normcounts(sce) + 1)
rowData(sce)$feature_symbol <- rownames(sce)
sce <- sce[!duplicated(rownames(sce)), ]
sce
sce <- selectFeatures(sce, n_features = 500, suppress_plot = FALSE)
Thanks for your help.
Hello,
Thank you for developing scmap.
I was running a self-projection of Tabula Muris dataset to test the method and found that on the SMART-seq version of the dataset the proportion of 'unassigned' cells is very high - up to 50%. This is apparently caused by similarity metrics falling below the specified threshold. The same analysis on the Tabula Muris UMI-based dataset yields much higher assignment rate, over 90% (data from this publication: https://doi.org/10.1186/s13059-019-1795-z).
I was wondering what could be a reason for this difference between technologies? Any suggestions would be appreciated.
Hi,
I'd like this package very well. But here I have a little problem about the scmap-cluster.
When I did all the process for projection, the results were a little confused. The clusters are changed and the ratio of different clusters are quite dfifferent. Also, there are some unassigned cells. But as I know, the clusters would not change after the process, so I don't know what's the problem.
Thanks very much and hope to get the reply as soon as possible.
BJ
when I draw heatmap after index cluster, it turned out a big blank, is this normal to see?
here is my code:
proj.sce <- SingleCellExperiment(assays = list(normcounts = proj.mat), colData = proj.cellname)
logcounts(proj.sce) <- log2(normcounts(proj.sce)+1)
rowData(proj.sce)$feature_symbol <- rownames(proj.sce)
proj.sce <- selectFeatures(proj.sce, suppress_plot=F)
proj.sce <- indexCluster(proj.sce)
heatmap(as.matrix(metadata(proj.sce)$scmap_cluster_index))
Dear Dr. Kiselev, Dr. Hemberg,
I have been encountering an issue when running the scmap-cluster pipeline and would like to ask you for some input on this matter. It seems that independent runs result in variable and inconsistent assignments. There seem to be two main sets of results, one of which seems fair based on prior knowledge about the composition of the query dataset. The second set of results comprises assignments that seem to reflect an imperfect match rather than a complete mess. While it might still be to some degree possible that the assignment that I think is problematic is actually correct, the issue remains of the variability in the mapping results. I should stress that the "wrong" mapping occurs much more frequently than the one I feel would be correct, if I run scmap from scratch repeatedly, suggesting it may be right after all. Still, the variability troubles me.
I have possibly narrowed down the steps resulting in variable outcomes to the reference normalization (see script below). I tried increasing the number of selected features thinking that this may mitigate the impact of the previous steps, but that didn´t seem to be the case.
sf2 <- 2^rnorm(ncol(sce2))
sf2 <- sf2/mean(sf2)
normcounts(sce2) <- t(t(counts(sce2))/sf2)
counts(sce2) <- normcounts(sce2)
logcounts(sce2) <- log2(normcounts(sce2)+1)
rowData(sce2)$feature_symbol<-rownames(sce2)
I would be happy to receive some suggestions from your side.
Hi there, I have been using scmap cell2cluster to annotate both human and mouse data sets. The cell type annotation results that we get seem to make sense but the similarity values are not in the expected range of 0 to 1. This seems to be a bug in scmap-cell.
When running a test where the reference dataset cells are split into test and train data, the values are in the correct range for all 3 settings (cluster, cell, cell2cluster). However, when applying our own query data the problem occurs with cell (but not cluster) and is then propagated to cell2cluster. We have experienced this issue with 2 unique datasets using 3 different reference datasets.
I would appreciate your help to address this issue!
For reference:
scmap version 1.8.0
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
I want to run scmap on some single cell datasets but ran into the following problems when installing the package. Any help would be much appreciated. Thx!
source("https://bioconductor.org/biocLite.R")
Bioconductor version 3.5 (BiocInstaller 1.26.1), ?biocLite for help
biocLite("scmap")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.5 (BiocInstaller 1.26.1), R 3.4.2 (2017-09-28).
Installing package(s) ‘scmap’
Warning message:
package ‘scmap’ is not available (for R version 3.4.2)
devtools::install_github("hemberg-lab/scmap")
** testing if installed package can be loaded
Creating a generic function for ‘toJSON’ from package ‘jsonlite’ in package ‘googleVis’
Error: package or namespace load failed for ‘scmap’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/Users/dat4git/Library/R/3.4/library/scmap/libs/scmap.so':
dlopen(/Users/dat4git/Library/R/3.4/library/scmap/libs/scmap.so, 6): Library not loaded: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
Referenced from: /Users/dat4git/Library/R/3.4/library/scmap/libs/scmap.so
Reason: Incompatible library version: scmap.so requires version 3.5.0 or later, but libRlapack.dylib provides version 3.4.0
Error: loading failed
Execution halted
Note: Looks like the libRlapack.dylib version is causing the problem. Is there a simple way to fix this on a Mac?
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
...
devtools::install_github("hemberg-lab/scmap")
Downloading GitHub repo hemberg-lab/scmap@master
from URL https://api.github.com/repos/hemberg-lab/scmap/zipball/master
Installing scmap
'/broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.4.0/lib64/R/bin/R'
--no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL
'/tmp/15148918.1.interactive/Rtmp7l66Tu/devtools718c38bee8c4/hemberg-lab-scmap-b9cdf52'
--library='/home/unix/dat4git/R/x86_64-pc-linux-gnu-library/3.4'
--install-tests
sessionInfo()
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.9 (Santiago)
Matrix products: default
BLAS: /broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.4.0/lib64/R/lib/libRblas.so
LAPACK: /broad/software/free/Linux/redhat_6_x86_64/pkgs/r_3.4.0/lib64/R/lib/libRlapack.so
...
Hello,
I use scmap to annotate cell types based on a reference annotation dataset. The reference annotation dataset was downloaded from celldex. However, I encounter an error when I choose HumanPrimaryCellAtlasData. The function of selectFeatures can run properly when I choose DatabaseImmuneCellExpressionData. Unlucky, I want to use HumanPrimaryCellAtlasData for future analysis.
There is the code that does not work properly(HumanPrimaryCellAtlasData):
ref<-celldex::HumanPrimaryCellAtlasData()
colData(ref)$cell_type1 <- colData(ref)$label.fine
rowData(ref)$feature_symbol <- rownames(ref)
ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)),
colData=colData(ref), rowData=rowData(ref))
ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE)
The error is
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
In addition: Warning message:
In linearModel(object, n_features) :
Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...
The same code can be run rightly if I use DatabaseImmuneCellExpressionData
ref <- celldex::DatabaseImmuneCellExpressionData()
colData(ref)$cell_type1 <- colData(ref)$label.fine
rowData(ref)$feature_symbol <- rownames(ref)
ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)), colData=colData(ref), rowData=rowData(ref))
ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE)
I have changed logcounts to counts and expanded value by power 10, but that did not work.
I speculate the error is due to the value of logcounts.
This is the logcounts value on HumanPrimaryCellAtlasData which report an error
This is the logcounts value on DatabaseImmuneCellExpressionData which can run properly.
Could you please help me solve this problem?
Thanks
Hi,
I try to compare the result of scmapCluster
and scmapCell
on same query dataset . The scmapCluster
function runs smoothly but I got error when I run scmapCell
on the following.
Error in NN(w, ncol(subcentroids[[1]]), dists$subcentroids, subclusters, :
Expecting a single value: [extent=0].
Calls: scmapCell -> scmapCell -> NN
Execution halted
I have no idea of what's wrong with this and I would appreciate if someone can help !
Here's my code:
pg <- c('Seurat','SummarizedExperiment','scmap','readr','data.table','SingleCellExperiment', 'scater','patchwork')
for(i in pg){
suppressMessages(library(i, character.only = T))
}
# GSE117988_aftercluster.rds is a Seurat object
HCC <- read_rds("./GSE117988_aftercluster.rds") # Have both cell index and cluster index
query_sce <- SingleCellExperiment(assays = list(counts = as.matrix(HCC@assays$RNA@counts)), colData = [email protected])
query_sce <- logNormCounts(query_sce)
rowData(query_sce)$feature_symbol <- rownames(query_sce)
a <- list.files("./SingleR_data/")
b <- sapply(strsplit(a , split = "[.]"),"[",1)
ref_list <- list()
for(i in 1:length(b)){
x = paste("./SingleR_data/",a[i], sep = "")
# metadata in cell_index is a list ,can't use as.matrix
ref_list[[i]] <- metadata(read_rds(x))$scmap_cell_index
names(ref_list)[i] <- b[i]
print(paste(b[i] , "is load into the ref_list !", sep=" "))
}
scmapCell_results <- scmapCell(query_sce, list(GSE115678 = ref_list$GSE115678), w=10)
Dear scmap team,
I am trying to project my scRNA-seq dataset against the Tabula Muris droplet-based reference (which I download from here). This is my code
# Load the reference matrix and metadata
metadata_url <- "https://raw.githubusercontent.com/czbiohub/tabula-muris-vignettes/master/data/TM_droplet_metadata.csv"
tm_mat <- readRDS("data/TM_droplet_mat.rds")
tm_metadata <- read_csv(metadata_url, col_names = TRUE)
# Create SingleCellExperiment object
tm_metadata <- as.data.frame(tm_metadata)
rownames(tm_metadata) <- tm_metadata$cell
tm <- SingleCellExperiment(
assays = list(counts = tm_mat),
colData = tm_metadata
)
# Index reference dataset
logcounts(tm) <- log((counts(tm) / colSums(counts(tm)) * 10000) + 1)
rowData(tm)$feature_symbol <- rownames(tm)
tm <- selectFeatures(tm, suppress_plot = FALSE)
However, I get this error: Error in rowSums(count == 0) :
'x' must be an array of at least two dimensions
I looked for this command in your code but couldn't find it. If you could help me with this it would be very helpful. Thanks in advance for that and for your packages, they are amazing!
Best
Ramon
Hi,
Many thanks for creating this tool.
Please may I ask when I work with a big dataset set (say about 100k cells), can I divide the dataset into several smaller subsets and work on those subsets, subsequently merge the annotated cell metadata of those subsets and assign to the original dataset?
I came up with this question because litererally I can not run on the big dataset. The error message is as follow:
"Error in asMethod(object) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
Calls: run_scmap_seurat ... scmapCell -> as.matrix -> as.matrix.Matrix -> as -> asMethod
Execution halted"
Or please could you advise me any other solutions?
Many thanks.
Regards,
Dien
Hi there,
The more I use your package, the more I'm liking your method! However, to help me (and maybe others) understand the package better, I thought I will post the following general questions I had in my mind:
What does it mean when the similarity score is NA?
When I run scmapCluster on the same object as the reference (as shown in your example code), I get back a whole SingleCellExperiment object, with just the relevant slots filled with the results, correct? But when I run it on a different object than the reference, then the output is just a list with three elements. Correct? This was just for my understanding, and if there are specific reasons for this, it will be great if you can please clarify!
What exactly is in scmap_scores column in the rowData(object) slot?
And finally, the order of the cells in the output is the same as the order of the cells in the input that we give in the "projection" argument of scmapCluster. Correct? So, essentially, to identify the cells, I can simply assign the names of the cells in the input projection object to the rownames of the output "results" object. Correct?
I would very much appreciate the answers!
Thanks.
Hi there! Thank you for the very useful package. I have a question on selectFeatures
and whether other methods, such as using highly variable genes, will be implemented? I realise that the M3Drop preprint suggests that it out performs other feature selection methods, at least in full-transcript scRNA-seq protocols (I have UMI data by the way). If I want to implement something myself, is it simply defining a logical vector to scmap_features
for the genes I want to use?
My problem is that the M3Drop method doesn't identify representative genes that define one of my clusters and hence even when I project the data onto itself, only half of the cells in that cluster are assigned back to that cluster. I've tried increasing the number of features/genes up to 2,000 but it doesn't improve.
Hello!
Just another question, this time about the Sankey diagram function of scmap. I like this very much, as it gives a good overview of how many cells go into which clusters and such. I see that the HTML version of the diagram that you have is also cool, as it is interactive. However, for other purposes like presentations and publications, PDFs would be convenient. So, is there an easy way to save the output of the "getSankey" function as a PDF file?
That will be great! Thanks a lot!
Hello,
Thanks for this helpful package. I have a question about mapping my scRNAseq data to the Immgen microarray data as the reference. As you probably already know, Immgen contains microarray data from hundreds of sorted immune cells from mice. I was able to get something to work, but I'd like to double-check to make sure I'm using this appropriately. Please see my code below:
combined
is a Seurat object containing 4 separate 10x runs aggregated. There are 15 clusters in the experiments named as numbers from 1 to 15.
sce <- SingleCellExperiment(assays = list(counts = as.matrix(combined@assays$RNA@counts),
logcounts = as.matrix(combined@assays$RNA@data)),
colData = [email protected])
rowData(sce)$feature_symbol <- rownames(sce)
sce <- selectFeatures(sce, suppress_plot = FALSE)
table(rowData(sce)$scmap_features)
sce <- indexCluster(sce, cluster_col = "numeric_clusters")
head(metadata(sce)$scmap_cluster_index)
heatmap(as.matrix(metadata(sce)$scmap_cluster_index))
I prepared Immgen microarray data by using RMA normalization for microarrays previously and saved as immgen.rds
. In this dataset I collapsed the biological replicates (in most of the cells there are 3 replicates) to one datapoint by taking average after normalizations. This is a data frame containing normalized and log2 transformed expression values. I'm reading this data frame as the reference for scmap. immannot
provides long names and metadata for the individual samples.
immgen <- readRDS("immgen.rds")
immannot <- readRDS("immannot.rds")
ref_sce <- SingleCellExperiment(assay=list(logcounts = immgen[, 2:dim(immgen)[2]]),
colData = immannot)
rownames(ref_sce) <-immgen$GeneName
rowData(ref_sce)$feature_symbol <- immgen$GeneName
scmapCluster_results <- scmapCluster(
projection = ref_sce,
index_list = list(
oconnell = metadata(sce)$scmap_cluster_index),
threshold = 0 # I'm keeping this very low to see even the weekly associated cell types
)
# Prepare and informative and easy-to-explore data frame of the results
res_df <- data.frame(label = as.character(scmapCluster_results$scmap_cluster_labs),
similarity = as.character(scmapCluster_results$scmap_cluster_siml),
immgen_short = immannot$short_name,
immgen_type= immannot$reference_cell_type)
head(res_df)
## label similarity immgen_short immgen_type
##1 11 0.871871179824524 MF.11c-11b+.Lu Macrophage
##2 8 0.491127572913492 MF.Alv.Lu Macrophage
##3 9 0.666813607696411 Mo.6+2+.BL Monocyte
##4 9 0.591040179641838 Mo.6+2+.MLN Monocyte
##5 7 0.882301399545328 Mo.6+2+.SLN Monocyte
##6 9 0.667127563564599 Mo.6+2-.BL Monocyte
Can you see anything wrong here? Is it a problem to project between scRNAseq data and bulk reference?
I also cross-posted this on Bioconductor. Feel free to close the issue if this is not the appropriate place to seek help for my question.
Best,
Atakan
Hi,
I have questions about reference data. In tutorial , the reference dataset yan
is normalized by FPKM. So , whether the input must be FPKM
?
What about raw counts of bulk RNA-seq data, scRNA-seq data or TPM data from TCGA ?
Thank you in advance.
Fiona
The current version of the output is very limited regarding customization. Are there any plans on upgrading that to use more flexible libraries, such as Plotly? https://plot.ly/r/sankey-diagram/
Hi,
I am trying to project my data to and index I have built but it yields an error and I do not know how to move on. Based on my search, this function seems to be associated to Seurat but I do not see where it would be employed since I have transformed the objects into SCE.
scmapCluster_results <- scmapCluster(
projection = mir128_28H_SCE,
index_list = list(metadata(zebrafish_Atlas_SCE)$scmap_cluster_index))
Warning: The following arguments are not used: drop
Error in CellsByIdentities(object = object, cells = cells, ...) :
unused argument (drop = FALSE)
In addition: Warning message:
In setFeatures(projection, rownames(index)) :
Features ATP5MD, cct2.1, CR318588.4, EIF1B, h3f3b.1, h3f3b.1.1, h3f3b.1.2, hbae1.3.1, her15.1.1, her4.2.1, hnrnpa0l.1, krt18, mt-co3, rrm2.1, SERP1, sinhcafl.1, zgc:92066 are not present in the 'SCESet' object and therefore were not set.
Here are the objects I am using:
> mir128_28H_SCE
class: SingleCellExperiment
dim: 27310 16186
metadata(0):
assays(1): ''
rownames(27310): aaas aacs ... LOC101886033 LOC100536144
rowData names(1): feature_symbol
colnames(16186): MT28_AAACCCAAGAGTAACT-1 MT28_AAACCCAAGATGCTAA-1 ... WT28_TTTGTTGTCGCCTTGT-1 WT28_TTTGTTGTCTATCGGA-1
colData names(0):
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):
> zebrafish_Atlas_SCE
class: SingleCellExperiment
dim: 32520 44020
metadata(1): scmap_cluster_index
assays(3): counts logcounts scaledata
rownames(32520): ptpn12 phtf2 ... LAMP5 CABZ01109058.1
rowData names(3): feature_symbol scmap_features scmap_scores
colnames(44020): AAACCTGAGACAGGCT-1 AAACCTGAGAGCAATT-1 ... TTTGTCATCCGGCACA-6 TTTGTCATCCTTGCCA-6
colData names(16): orig.ident percent.mito ... nFeature_RNA ident
reducedDimNames(3): PCA TSNE UMAP
mainExpName: RNA
altExpNames(0):
This is my session info in case it helps:
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)Matrix products: default
BLAS/LAPACK: /gpfs/ycga/project/nicoli/gb587/conda_envs/r_env/lib/libopenblasp-r0.3.18.solocale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=Cattached base packages:
[1] stats4 stats graphics grDevices utils datasets methods baseother attached packages:
[1] scmap_1.16.0 anndata_0.7.5.3 reticulate_1.23 basilisk_1.6.0 SingleCellExperiment_1.16.0
[6] SummarizedExperiment_1.24.0 Biobase_2.54.0 GenomicRanges_1.46.1 GenomeInfoDb_1.30.0 IRanges_2.28.0
[11] S4Vectors_0.32.3 BiocGenerics_0.40.0 MatrixGenerics_1.6.0 matrixStats_0.61.0 SeuratObject_4.0.4
[16] Seurat_4.1.0loaded via a namespace (and not attached):
[1] Rtsne_0.15 colorspace_2.0-2 deldir_1.0-6 class_7.3-20 ellipsis_0.3.2 ggridges_0.5.3
[7] XVector_0.34.0 proxy_0.4-26 spatstat.data_2.1-2 farver_2.1.0 leiden_0.3.9 listenv_0.8.0
[13] ggrepel_0.9.1 fansi_1.0.2 codetools_0.2-18 splines_4.1.2 polyclip_1.10-0 jsonlite_1.7.3
[19] ica_1.0-2 cluster_2.1.2 png_0.1-7 uwot_0.1.11 shiny_1.7.1 sctransform_0.3.3
[25] spatstat.sparse_2.1-0 compiler_4.1.2 httr_1.4.2 googleVis_0.6.11 assertthat_0.2.1 Matrix_1.4-0
[31] fastmap_1.1.0 lazyeval_0.2.2 later_1.3.0 htmltools_0.5.2 tools_4.1.2 igraph_1.2.11
[37] gtable_0.3.0 glue_1.6.0 GenomeInfoDbData_1.2.7 RANN_2.6.1 reshape2_1.4.4 dplyr_1.0.7
[43] Rcpp_1.0.8 scattermore_0.7 vctrs_0.3.8 nlme_3.1-155 lmtest_0.9-39 stringr_1.4.0
[49] globals_0.14.0 mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.1 irlba_2.3.5 goftest_1.2-3
[55] future_1.23.0 basilisk.utils_1.6.0 MASS_7.3-55 zlibbioc_1.40.0 zoo_1.8-9 scales_1.1.1
[61] spatstat.core_2.3-2 promises_1.2.0.1 spatstat.utils_2.3-0 parallel_4.1.2 RColorBrewer_1.1-2 yaml_2.2.1
[67] pbapply_1.5-0 gridExtra_2.3 ggplot2_3.3.5 rpart_4.1-15 stringi_1.7.6 randomForest_4.6-14
[73] e1071_1.7-9 filelock_1.0.2 rlang_0.4.12 pkgconfig_2.0.3 bitops_1.0-7 lattice_0.20-45
[79] ROCR_1.0-11 purrr_0.3.4 tensor_1.5 labeling_0.4.2 patchwork_1.1.1 htmlwidgets_1.5.4
[85] cowplot_1.1.1 tidyselect_1.1.1 parallelly_1.30.0 RcppAnnoy_0.0.19 plyr_1.8.6 magrittr_2.0.1
[91] R6_2.5.1 generics_0.1.1 DelayedArray_0.20.0 DBI_1.1.2 pillar_1.6.4 mgcv_1.8-38
[97] fitdistrplus_1.1-6 survival_3.2-13 abind_1.4-5 RCurl_1.98-1.5 dir.expiry_1.2.0 tibble_3.1.6
[103] future.apply_1.8.1 crayon_1.4.2 KernSmooth_2.23-20 utf8_1.2.2 spatstat.geom_2.3-1 plotly_4.10.0
[109] grid_4.1.2 data.table_1.14.2 digest_0.6.29 xtable_1.8-4 tidyr_1.1.4 httpuv_1.6.5
[115] munsell_0.5.0 viridisLite_0.4.0
Thanks for your help!
Best,
Gabriel
Hello,
your package vignette relies indirectly(*) on the codetools package, which is a package that is not guaranteed to be installed on all systems; it's only a recommended package. Because of this, R CMD check
on you package fails on such systems (see below).
(*) Searching the code of your package for codetools
gives zero hits, which suggests that it codetools is used by another package that your package depends on, but only as a soft dependency. Because of this, you need to declare codetools as a dependency in your package.
To fix this, please add codetools to Suggests:
, i.e.
Suggests: knitr,
rmarkdown,
codetools
Example what happens when codetools is not installed:
* using R version 4.3.1 (2023-06-16)
* using platform: x86_64-pc-linux-gnu (64-bit)
...* checking re-building of vignette outputs ... ERROR
Error(s) in re-building vignettes:
...
--- re-building ‘scmap.Rmd’ using rmarkdown
Quitting from lines 15-17 [knitr-options] (scmap.Rmd)
Error: processing vignette 'scmap.Rmd' failed with diagnostics:
there is no package called 'codetools'
--- failed re-building ‘scmap.Rmd’
SUMMARY: processing the following file failed:
‘scmap.Rmd’
Error: Vignette re-building failed.
Execution halted
Hello,
I am trying to use scTransformed counts instead of log-transformed counts for training and predicting cell types. I created a Seurat object, performed the sctranform and then created a SingleCellExperiment object from the sctransformed counts. The execution fails at this line:
sce_with_sctransformed_counts <- selectFeatures(sce_with_sctransformed_counts, suppress_plot = TRUE)
, producing the following error:
Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
0 (non-NA) cases
In addition: Warning message:
In linearModel(object, n_features) :
Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...
It does not seem like there are any NA values in the logcounts
any(is.na(logcounts(sce_with_sctransformed_counts)))
[1] FALSE
I suspected that since the sctransformed values are negative, it could be a potential source of error. Hence, I executed the following:
logcounts(sce_with_sctransformed_counts) <- log2(normcounts(sce_with_sctransformed_counts) + 5)
But I still keep getting the same error.
Could you please help me with this?
This is a portion of the code I am using.
sce_seurat <- CreateSeuratObject(counts = Data[,Train_Idx[[i]]])
sce_seurat <- SCTransform(object = sce_seurat)
sce_with_sctransformed_counts = SingleCellExperiment(assays = list(normcounts = GetAssayData(object = sce_seurat, slot = "scale.data")),
colData = data.frame(cell_type1 = Labels[Train_Idx[[i]]]))
rowData(sce_with_sctransformed_counts)$feature_symbol <- rownames(sce_with_sctransformed_counts)
logcounts(sce_with_sctransformed_counts) <- log2(normcounts(sce_with_sctransformed_counts) + 5)
sce_with_sctransformed_counts <- selectFeatures(sce_with_sctransformed_counts, suppress_plot = TRUE)
Thank you.
I am running into a nearest neighbor computing error when running scmapCell function. Have you seen similar errors? What are some steps that I can take to resolve it?
Best
Yuqi
`sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] SingleCellExperiment_1.6.0 SummarizedExperiment_1.14.1 DelayedArray_0.10.0 BiocParallel_1.18.1 matrixStats_0.55.0
[6] Biobase_2.44.0 GenomicRanges_1.36.1 GenomeInfoDb_1.20.0 IRanges_2.18.3 S4Vectors_0.22.1
[11] BiocGenerics_0.30.0 scmap_1.6.0 singleCellNet_0.1.0 cowplot_1.0.0 reshape2_1.4.3
[16] pheatmap_1.0.12 dplyr_0.8.3 ggplot2_3.2.1
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 purrr_0.3.3 lattice_0.20-38 expm_0.999-4 colorspace_1.4-1 yaml_2.2.0 rlang_0.4.2
[8] e1071_1.7-3 pillar_1.4.3 glue_1.3.1 withr_2.1.2 RColorBrewer_1.1-2 GenomeInfoDbData_1.2.1 lifecycle_0.1.0
[15] plyr_1.8.5 stringr_1.4.0 zlibbioc_1.30.0 munsell_0.5.0 gtable_0.3.0 mvtnorm_1.0-11 codetools_0.2-16
[22] class_7.3-15 Rcpp_1.0.3 scales_1.1.0 jsonlite_1.6 XVector_0.24.0 googleVis_0.6.4 stringi_1.4.3
[29] grid_3.6.1 tools_3.6.1 bitops_1.0-6 DescTools_0.99.31 magrittr_1.5 proxy_0.4-23 lazyeval_0.2.2
[36] RCurl_1.95-4.12 tibble_2.1.3 randomForest_4.6-14 crayon_1.3.4 pkgconfig_2.0.3 MASS_7.3-51.4 Matrix_1.2-18
[43] assertthat_0.2.1 rstudioapi_0.10 boot_1.3-23 R6_2.4.1 compiler_3.6.1 `
Hi there,
I just came across your scmap R package. Pretty cool, and I am about to try it for my 10X data.
While I was going through your preprint, I saw that in your comparisons, you had said that scmap slightly under-performed compared to RF and SVM classifiers. Could you throw some more light onto this? The thing is, scmap readily seems to offer a threshold using which I can control the "unclassified" cases. I really like this! If I were to build an RF classifier myself, then that might take more time. So, I am planning to use scmap. However, I would like to know fully well as to how scmap compares to RF classifiers, so that I can answer any such questions to the biologist whose data I am analyzing. So, any inputs from your group would be appreciated!
By the way, when do you expect the full paper to be published?
Hello. I'm trying to use scmap
to analyse several related datasets that I have and encountered a problem: my reference dataset maps badly to itself (scmap-cluster method, kappa score 0.017 and unassigned fraction of 52%). I've tried to initialise logcounts
to log2(rawcounts + 1)
, to seurat's normalised counts and to seurat's scaled counts. The first two strategies yield almost the same result, and the last one results in this error: “scmap index is empty because the median expression in the selected features is 0 in every cell cluster!”. I think I'm doing something wrong with the initialisation of SingleCellExperiment
, or probably with using scmap, but I don't know how to investigate that. I will be glad for any suggestions
Hi there again,
I have another question for you. Is the order of the cells in the "scmapCluster_results$scmap_cluster_labs" column exactly the same as the order of the cells in the input to the "projection" argument to the "scmapCluster" function? Since there are no rownames in "scmapCluster_results", I was a bit confused about this. Adding rownames to the output would be a great addition! This order is very important so that I can identify the original cells and trace them back to the new labels.
Many thanks for your response!
Hi there @wikiselev
I think the URL in the README directing users to the Hemberg Lab cloud service is broken---I was re-directed to all sorts of wild stuff.
Here is the issue:
"""
A Cloud implementation of scmap can be used for free without any restriction here, i.e. http://www.hemberg-lab.cloud/scmap
"""
Might be best to just remove the URL until the domain issues (or related problems) are solved.
Thanks, Evan
Hello. Do you have recommended list or source of cancer tumor datasets that can be used as reference for projecting samples in scmap?
is there a way to know the clusters that a cell classified as unassigned is closest to?
is there a way to output a confusion matrix?
I am getting the following error when I try to run the scmapCell function:
library(scmap)
library(SingleCellExperiment)
source <- readRDS("source.rds")
target <- readRDS("target.rds")
counts(target) <- normcounts(target) # normcounts have the same zeros as raw counts
target <- selectFeatures(target, n_features = 1000)
target <- indexCell(target)
# Parameter M was not provided, will use M = n_features / 10 (if n_features <= 1000), where n_features is the number of selected features, and M = 100 otherwise.
# Parameter k was not provided, will use k = sqrt(number_of_cells)
result <- scmapCell(source, index_list = list(metadata(target)$scmap_cell_index))
# error: dot(): objects must have the same number of elements
# Error in NN(w, ncol(subcentroids[[1]]), dists$subcentroids, subclusters, :
# dot(): objects must have the same number of elements
I have uploaded the two SCE objects to Dropbox so you can have a reproducible example:
https://www.dropbox.com/sh/tzu8v67raw7g7bd/AADAZzyeNc0iU88ZdeaVgvWza?dl=0
Session information is listed below:
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] ggalluvial_0.12.1 scmap_1.9.3 scran_1.16.0 scater_1.16.2
[5] ggplot2_3.3.2 SingleCellExperiment_1.10.1 SummarizedExperiment_1.18.2 DelayedArray_0.14.1
[9] matrixStats_0.56.0 Biobase_2.48.0 GenomicRanges_1.40.0 GenomeInfoDb_1.24.2
[13] IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rsvd_1.0.3 locfit_1.5-9.4 lattice_0.20-41
[5] class_7.3-17 digest_0.6.25 packrat_0.5.0 plyr_1.8.6
[9] R6_2.4.1 e1071_1.7-3 pillar_1.4.6 zlibbioc_1.34.0
[13] rlang_0.4.7 rstudioapi_0.11 irlba_2.3.3 googleVis_0.6.6
[17] Matrix_1.2-18 labeling_0.3 BiocNeighbors_1.6.0 BiocParallel_1.22.0
[21] statmod_1.4.34 stringr_1.4.0 igraph_1.2.5 RCurl_1.98-1.2
[25] munsell_0.5.0 proxy_0.4-24 compiler_4.0.2 vipor_0.4.5
[29] BiocSingular_1.4.0 xfun_0.16 pkgconfig_2.0.3 ggbeeswarm_0.6.0
[33] tidyselect_1.1.0 tibble_3.0.3 gridExtra_2.3 GenomeInfoDbData_1.2.3
[37] edgeR_3.30.3 codetools_0.2-16 randomForest_4.6-14 viridisLite_0.3.0
[41] crayon_1.3.4 dplyr_1.0.2 withr_2.2.0 bitops_1.0-6
[45] grid_4.0.2 jsonlite_1.7.0 gtable_0.3.0 lifecycle_0.2.0
[49] magrittr_1.5 scales_1.1.1 dqrng_0.2.1 stringi_1.4.6
[53] farver_2.0.3 reshape2_1.4.4 XVector_0.28.0 viridis_0.5.1
[57] limma_3.44.3 DelayedMatrixStats_1.10.1 ellipsis_0.3.1 generics_0.0.2
[61] vctrs_0.3.2 tools_4.0.2 glue_1.4.1 beeswarm_0.2.3
[65] purrr_0.3.4 colorspace_1.4-1 knitr_1.29
Hi, I am trying to use scmap to predict unlabeled datasets (let's call query) based on reference dataset. I use scmap index cluster followed by this vignette. The prediction accuracy is quite low, about only 0.007. However accuracy produced by scmapCellindex is quite high, about 0.91. I checked other papers which also used scmap to predict labels, it seems like indexCluster function work well at their experiment. I don't know whether there is any mistake in my code.
I checked prediction results, there lots of "unassigned" label in it. Can you give an example of how to use scmap index cluster to do prediction work based on reference datasets projecting to query dataset ?
Q: Is scmap published?
A: Not yet, but a copy of scmap manuscript is available on bioRxiv.
That is no longer the case. Also, the CITATION file requires updating.
Hi!
I'm following the tutorial here https://github.com/hemberg-lab/tutorials/blob/master/scmap/scmap.Rmd and the vignettes here http://bioconductor.org/packages/release/bioc/manuals/scmap/man/scmap.pdf and trying to project my one 10X experiment on another 10X experiment.
I suspect the datasets in the tutorial and the yan
dataset in vignettes are already normalised, so you can just initialise SCE as SingleCellExperiment(assays = list(normcounts = as.matrix(yan)), colData = ann)
, directly providing normcounts
. My datasets are just counts, and I tried to normalise them by using normalise
function from scmap
itself, but it is not exported for me when I do require(scmap)
.
I would appreciate any suggestions. Perhaps, I can use Seurat's normalisation with SCE object.
Thank you
Hi,
I'm attempting to use scmap on an object that I have converted from Seurat to SCE with Seurats Convert()
function.
When attempting to call selectFeatures()
, I receive the following error:
Error in rowSums(count == 0) : 'x' must be an array of at least two dimensions
I looked through the code on the following pages:
Core Methods
Utils
It looks like these lines from the linearModel()
function are causing the error:
count <- as.matrix(counts(object))
dropouts <- rowSums(count == 0)/cols * 100
I am confused though, as if I simply run these lines in isolation, I receive no errors and my 'count==0' matrix is a normal sized matrix (i.e. at least two dimensions), and I don't see anything else in the code where it might have changed in a way to cause this error, as from my understanding, the object I'm passing to selectFeatures
is passed to linearModel which then pulls out the counts so running the two lines in isolation should also cause the error?
Any advice would be greatly appreciated,
Liam
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.