carmonalab / stacas Goto Github PK

View Code? Open in Web Editor NEW

71.0 71.0 9.0 103.17 MB

R package for semi-supervised single-cell data integration

License: GNU General Public License v3.0

R 100.00%

stacas's People

Contributors

Stargazers

Watchers

Forkers

sudolin loicguille john-lee-johnson qindan2008 soap4 scjlee dbrg77 sukses24 allenlile

stacas's Issues

option to only use intersection of features between datasets

It is common that users integrate datasets that have inconsistent gene symbols (e.g. different versions of genome annotation were used). Include check on gene symbol overlaps across input datasets. Default conservative behaviour might be to consider only the intersection (and stop if number is below a min. threshold).

Process bar?

Hi team,
I used FindAnchors.STACAS() to handle my 48 sample matrices (over 120,000 cells) and after receiving this message

Finding neighborhoods
Finding anchors
Found 233 anchors
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=56m 57s

I found no success or error output message on the dashboard for a very very long time so that I terminated the "crashed" procedure several times although the task manager showed CPU was still running.

Can I run STACAS on Seurat 4?

I have recently installed the latest Seurat 4. Is it recommended to run STACAS on Seurat 4 or I should stick to Seruat3?
Thanks

error in FindAnchors.STACAS

Hello,
thanks you for the tutorial! I am trying to do the integration of my data (I have 10 samples with around 20,000 cells in total) with STACAS, but I got an error at the beginning when running FindAnchors.STACAS:

 Error in new(Class = "AnchorSet", object.list = object.list, reference.objects = reference %||%  : 
   trying to generate an object from a virtual class ("AnchorSet")

Do you know why I could get this?

thanks you a lot in advance,
Javiera.

CheckGC() not imported from SeuratObject

Hi Massimo,

this is a quick issue I ran into when calling Run.STACAS() from within a function when Seurat is not loaded in the namespace.

It's easily overcome by loading SeuratObject, but I thought you may want to consider importing it?

Thanks!

How to tune the STACAS guide tree

Hi,

How do I go about tuning the STACAS guide tree? I couldn't see instructions for this here: https://carmonalab.github.io/STACAS.demo/STACAS.demo.html#stacas-integration-guide-trees. And how would I decide on the best order for integration? I am correcting for "donor".

Best wishes,
Lucy

Add support for SCT normalization

Hi,
I am having trouble running IntegrateData.STACAS since the latest update. Im running it for SCT normalisation method and i keeping getting the following error
Error in SampleIntegrationOrder(tree = slot(object = reference.integrated, : could not find function "SampleIntegrationOrder"
This happens regardless of the standard or the semi-supervised approach. Any ideas of what might be causing this?

Thanks
Devika

implement option for automatic sketching of input datasets

Integration error (due to small number of cells?)

I've been trying to integrate some single cell RNAseq datasets (where the smallest is 544 cells) following the 3rd toturial but when I run IntegrateData i get the following error:

Integrating data
Merging dataset 15 into 5
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
Error in nn2(data = c(5.73634646392675, -24.8149986124873, -6.20325662067301,  :
                     Cannot find more nearest neighbours than there are points

When googleling the error I mainly find refrences to this where they suggest to use the k.filter argument of Seurat::FindIntegrationAnchors().

What is the equivalent filtering step when using the STACAS approach?

Cheers
Kristoffer

Dataset not integrating when increasing features.

Dear Stacas developers,

The integration method works great. At least on a dataset that isnt too big.

I am trying to integrate a large dataset with 450K cells. The problem with this dataset is the expression of lowly expressed genes in a smaller disease-specific population. If I use Anchor features of 5000, it works but I dont capture the genes that are lowly expressed. I would like to try to go up to 15K features to make sure I capture them in. But when integrating like this:

Idents(object = obj) <- "integration_col2"
#DefaultAssay(object = obj) <- "RNA"
obj_integrated <- obj %>% SplitObject(split.by = "Dataset") %>%
Run.STACAS(dims = 1:20, anchor.features = 7500, cell.labels = "integration_col2") %>%
RunUMAP(dims = 1:10)

I get the following error message:

Finding integration vector weights
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for function 't': Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 89
In addition: Warning messages:
1: In asMethod(object) :
sparse->dense coercion: allocating vector of size 8.4 GiB
2: In asMethod(object) :
sparse->dense coercion: allocating vector of size 3.3 GiB
3: In asMethod(object) :
sparse->dense coercion: allocating vector of size 1.4 GiB
4: In asMethod(object) :
sparse->dense coercion: allocating vector of size 3.7 GiB
5: In asMethod(object) :
sparse->dense coercion: allocating vector of size 2.0 GiB
6: In asMethod(object) :
sparse->dense coercion: allocating vector of size 1.3 GiB
7: In asMethod(object) :
sparse->dense coercion: allocating vector of size 1.4 GiB
8: In asMethod(object) :
sparse->dense coercion: allocating vector of size 2.3 GiB

Its a memory issue. I am using an instance like this one:

unibi highmem 2xlarge: 56 VCPUs - 933 GB RAM - 50 GB root disk
Image: RStudio-ubuntu20.04 de.NBI (2023-08-17)

This is the max it can go to.

Is there a way to further process in order to save on RAM? I noticed that the newer integrated counts matrix as in numeric counts with several numbers and decimal places. I wonder if its possible to do something at this step or in any other step to save on RAM. Basically, any tips would be great to the point where I can see those rarer genes appearing in corrected integrated counts matrix.

Thanks very much.

IntegrateData.STACAS function not found

Hi,

I have loaded STACAS v1.1.0, however I can't find the IntegrateData.STACAS function as shown in the "Standard integration" section of this tutorial.
https://carmonalab.github.io/STACAS.demo/STACAS.demo.html

Best wishes,
Lucy

working with Seurat v5 layers?

Hello,

I am working with Seurat v5 with my samples in different layers to test the different integration methods. Unfortunately the version of STACAS I am working with does not work with V5 layers. I get the following error when I try run FindAnchors.STACAS


stacas_anchors <- FindAnchors.STACAS(obj.list, 
+                                      anchor.features = nfeatures,
+                                      dims = 1:ndim)

Computing 2000 integration features
Error in `GetAssayData()`:
! GetAssayData doesn't work for multiple layers in v5 assay.
Run `rlang::last_trace()` to see where the error occurred.

rlang::last_trace()`

<error/ You can run 'object <- JoinLayers(object = object, layers = layer)'.>
Error in `GetAssayData()`:
! GetAssayData doesn't work for multiple layers in v5 assay.
---
Backtrace:
     ▆
  1. ├─... %>% RunUMAP(dims = 1:ndim)
  2. ├─Seurat::RunUMAP(., dims = 1:ndim)
  3. └─STACAS::Run.STACAS(., dims = 1:ndim, anchor.features = nfeatures)
  4.   └─STACAS::FindAnchors.STACAS(...)
  5.     └─base::lapply(...)
  6.       └─STACAS (local) FUN(X[[i]], ...)
  7.         └─STACAS::FindVariableFeatures.STACAS(x, nfeat = n.this, genesBlockList = genesBlockList)
  8.           ├─base::apply(...)
  9.           ├─SeuratObject::GetAssayData(obj, assay = assay, slot = "data")
 10.           └─SeuratObject:::GetAssayData.Seurat(obj, assay = assay, slot = "data")
 11.             ├─SeuratObject::GetAssayData(object = object[[assay]], layer = layer)
 12.             └─SeuratObject:::GetAssayData.StdAssay(object = object[[assay]], layer = layer)
Run rlang::last_trace(drop = FALSE) to see 1 hidden frame.

Is there any chance of adding support for layers in STACAS?

Thank you

Errors with small datasets

Hi there,

Have you ever encountered issues with integrating datasets that contains some samples with small cell numbers (<500)? I am applying STACAS to a dataset like this, and keep running into the following error during the IntegrateData() step:

Error in idx[i, ] <- res[[i]][[1]] : number of items to replace is not a multiple of replacement length Calls: IntegrateData ... FindWeights -> NNHelper -> do.call -> AnnoyNN -> AnnoySearch Execution halted

Some seurat forums have suggested reducing the k.weight parameter to oversome this error. I have tried increments of 5 for k.weight values between 100 (default) to 5 with no success. Similarly, I do not get this error on the same dataset when performing RPCA integration with Seurat which uses the IntegrateData() function.

Do you have any advice on how to overcome this issue? Perhaps in some of the steps upstream of IntegrateData().

Thank you!

Laura

Integrating using STACAS and Seurat

Hi,
firstly, I want to say what an excellent tool thank you!

I have a question regarding the Integrating scRNA-seq data using STACAS and Seurat3 vignette. You remove mito/ribo/cell cycle genes from the actual expression matrix. I wanted to know if you had tried just regressing it out as shown via SCT in seurat.

Do you recommend when working with T cell populations to remove mito/ribo/cell cycle genes?

Best,
M

new.assay.name in Run.STACAS

Hi,
Would it be possible to add the new.assay.name option to the Run.STACAS function (as in IntegrateData.STACAS)?
Thanks,

Léonard

Cholmod error 'problem too large'

Hi there,

I am trying to run STACAS integration on a dataset comprised of ~270,000 cells from 52 samples. I keep running into the error below on the IntegrateData() step. Seurat forums recommend using their new "rpca" integration method to overcome this - but it hasn't performed as well as STACAS on my dataset, and I'm not sure how RPCA would integrate into the STACAS workflow. Do you have any workarounds to overcome this error on large datasets? I am running

Error in .cbind2Csp(x, y) : Cholmod error 'problem too large' at file ../Core/cholmod_sparse.c, line 89 Calls: ... cbind -> cbind2 -> cbind2 -> cbind2sparse -> .cbind2Csp Execution halted

Error when running STACAS on spatial data

Hi! I am tying to use STACAS on the spatial snRNA-seq data. The following error occurs:

> Run.STACAS(ob.list)
Run.STACAS(ob.list)
94.8 % of genes found across all datasets
Error in FindVariableFeatures.STACAS(x, nfeat = n.this, genesBlockList = genesBlockList) :
  trying to get slot "var.features" from an object of a basic class ("NULL") with no slots

Is it connected somehow to low amount of genes in the protocol (n=100)? Or something with Seurat version incompatibilty? The following versions are used: STACAS_2.0.5, Seurat_4.3.0 , R 4.2.2

file and link not accessible

Hi team,
The online file https://gitlab.unil.ch/carmona/STACAS.demo/blob/master/aux/cellCycle.symbol.DE.specific.170120.csv mentioned at the STACAS tutorial page (https://carmonalab.github.io/STACAS.demo/tutorial.html) is not accessible along with this GitLab Repo link https://gitlab.unil.ch/groups/carmona mentioned at https://carmonalab.github.io/. How could I get that csv file?
Thanks
DUAN

Run.STACAS Error in GetAssayData

Hi,

I am trying STACAS on my 12 samples Seurat data obj, scdata.Each sample is preprocessed (i.e. QCd, filtered and normalized).
The range for the N of cells in the samples is [245, 8958].
The range for the N of features in the RNA assay in the samples is [16900, 27554]

When I try to integrate using
scdata.combined <- Run.STACAS(scdata, dims = 1:nPCs, anchor.features = intfeatures, new.assay.name = "stacas")
where intfeatures results from SelectIntegrationFeatures and is a vector of ~3000 elements

I get the following error :
Integrating dataset 4 with reference dataset
Error in GetAssayData(object = object, assay = umi.assay, slot = "counts")[features_to_compute, :
no 'dimnames' attribute for array

Same error if I set anchor.features = 3000

As far as I understand this probably is because the AssayData is an empty matrix, which I assume it is because features_to_compute may be an empty vector.

I am not sure why this is happening. o you have any suggestions ?

Thanks

Discrepancy in genes included in block list between Human and Mouse

The mouse block list contains Heatshock and Ifn-response genes, while the human block list does not.

Finding markers after integration

Dear team,
Thanks for developing a great package! After the last update, I cannot find any markers with FindMarkers / FindAllMarkers on the integrated object. I'm getting the following warning message:

“When testing 0 versus all:
	normalization.method is not a valid parameter for FindIntegrationAnchors.wdist”

Here's a reproducible workflow that generates the issue.

library(SeuratData)
InstallData("cbmc")
InstallData("pbmc3k")
pbmc  <- SeuratData::LoadData(ds = "pbmc3k")
cbmc  <- SeuratData::LoadData(ds = "cbmc")
merged.list  <- list(pbmc, cbmc)

merged.list <- lapply(X = merged.list, FUN = function(x) {
    DefaultAssay(x)  <- "RNA"
    x <- NormalizeData(x)
    x <- FindVariableFeatures(x, nfeatures = 2000)
})

integrated_object <- Run.STACAS(merged.list, dims = 1:12)
integrated_object <- RunUMAP(integrated_object, dims = 1:12) 
integrated_object  <- FindNeighbors(integrated_object)
integrated_object  <- FindClusters(integrated_object, res = 0.1)

mrk  <- FindAllMarkers(integrated_object)

Thanks a lot for your help!
Best,
Veronika

Error in idx[i, ] <- res[[i]][[1]] : number of items to replace is not a multiple of replacement length

Hello again!

Ran into the following error with traceback when trying to integrate my datasets:

Error in idx[i, ] <- res[[i]][[1]] : number of items to replace is not a multiple of replacement length

8. | AnnoySearch(index = idx, query = query, k = k, search.k = search.k, include.distance = include.distance)
7. | AnnoyNN(data = structure(c(3.13718055738747, -17.5297419820542, -39.4198942048947, -39.9469173324642, -55.9331075524452, -11.8060751948174, -47.9033112759149, -53.0559687221709, -56.4878981446388, 3.69744529001283, -14.5937055643128, -50.08776244703, -22.8133583697235, -42.0250376486113, ...
6. | do.call(what = "AnnoyNN", args = args)
5. | NNHelper(data = data.use[anchors.cells2, ], query = data.use, k = k, method = nn.method, n.trees = n.trees, eps = eps)
4. | 	FindWeights(object = merged.obj, integration.name = integration.name, reduction = dr.weights, dims = dims, k = k.weight, sd.weight = sd.weight, eps = eps, verbose = verbose)
3. | RunIntegration(filtered.anchors = filtered.anchors, normalization.method = normalization.method, reference = object.1, query = object.2, cellnames.list = cellnames.list, new.assay.name = new.assay.name, features.to.integrate = features.to.integrate, features = features, dims = dims, weight.reduction = weight.reduction, ...
2. | PairwiseIntegrateReference(anchorset = anchorset, new.assay.name = new.assay.name, normalization.method = normalization.method, features = features, features.to.integrate = features.to.integrate, dims = dims, k.weight = k.weight, weight.reduction = weight.reduction, ...
1. | IntegrateData(anchorset = ref.anchors.filtered, dims = 1:ndim, features.to.integrate = all.genes, sample.tree = mySampleTree, preserve.order = T)

Here is the code I used to generate it:
Note: I used the tutorial to form the basis of most of this code. My list of seurat objects to merge is slist

# cell cycle markers
cellCycle.symbol <- read.csv('.../data/stacas/cellCycle.symbol.DE.specific.170120.csv', as.is = T)$x

# normalization and variable feature identification
var.genes.n <- 800
var.genes.integrated.n <- 500

for (i in 1:length(slist)) {
    slist[[i]] <- NormalizeData(slist[[i]], verbose = FALSE)
    
    slist[[i]] <- FindVariableFeatures(slist[[i]], selection.method = "vst", 
        nfeatures = var.genes.n*2, verbose = FALSE)
    
    mito.genes <- grep(pattern = "^mt-", rownames(slist[[i]]), value = TRUE)
    ribo.genes <- grep(pattern = "^Rp[ls]", rownames(slist[[i]]), value = TRUE)
    
    slist[[i]]@[email protected] <- setdiff(slist[[i]]@[email protected], cellCycle.symbol)
    slist[[i]]@[email protected] <- setdiff(slist[[i]]@[email protected], mito.genes)
    slist[[i]]@[email protected] <- setdiff(slist[[i]]@[email protected], ribo.genes)
    slist[[i]]@[email protected] <- head( slist[[i]]@[email protected], var.genes.n)
}

# Find integration anchors w/ pairwise distances
ndim=10
ref.anchors <- FindAnchors.STACAS(slist, dims=1:ndim, anchor.features=var.genes.integrated.n)

# Plot the distance distribution between the anchors calculated in the previous step
names <- names(slist)
plots <- PlotAnchors.STACAS(ref.anchors, obj.names=names, dist.thr = .9)

g.cols <- 2
g.rows <- as.integer((length(plots)+2)/g.cols)
g <- do.call("arrangeGrob", c(plots, ncol=g.cols, nrow=g.rows))
plot(g)

# filter anchors
ref.anchors.filtered <- FilterAnchors.STACAS(ref.anchors)

# determine optimal integration tree
all.genes <- row.names(slist[[1]])
for (i in 2:length(slist)) {
   all.genes <- intersect(all.genes, row.names(slist[[i]]))
}
mySampleTree <- SampleTree.STACAS(ref.anchors.filtered)
print(mySampleTree)

# integrate - This is where the error is thrown
ref.integrated <- IntegrateData(anchorset=ref.anchors.filtered, dims=1:ndim, features.to.integrate=all.genes,
                                sample.tree=mySampleTree, preserve.order=T)

Here is my session info:

R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas-r0.3.3.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reshape2_1.4.4              ggplot2_3.3.5               AUCell_1.12.0               gridExtra_2.3               TILPRED_1.0.2               STACAS_1.1.0               
 [7] fastICA_1.2-2               velocyto.R_0.6              UCell_1.0.0                 Matrix_1.3-4                remotes_2.4.0               openxlsx_4.2.4             
[13] SeuratWrappers_0.3.0        SingleCellExperiment_1.12.0 SummarizedExperiment_1.20.0 Biobase_2.50.0              GenomicRanges_1.42.0        GenomeInfoDb_1.26.7        
[19] IRanges_2.24.1              S4Vectors_0.28.1            BiocGenerics_0.36.1         MatrixGenerics_1.2.1        matrixStats_0.60.0          magrittr_2.0.1             
[25] data.table_1.14.0           stringr_1.4.0               dplyr_1.0.7                 plyr_1.8.6                  SeuratObject_4.0.2          Seurat_4.0.3               

loaded via a namespace (and not attached):
  [1] igraph_1.2.6           lazyeval_0.2.2         GSEABase_1.52.1        splines_4.0.0          listenv_0.8.0          scattermore_0.7        digest_0.6.27          htmltools_0.5.1.1     
  [9] fansi_0.5.0            memoise_2.0.0          tensor_1.5             cluster_2.1.0          ROCR_1.0-11            annotate_1.68.0        globals_0.14.0         R.utils_2.10.1        
 [17] spatstat.sparse_2.0-0  prettyunits_1.1.1      colorspace_2.0-2       blob_1.2.2             ggrepel_0.9.1          xfun_0.25              callr_3.7.0            crayon_1.4.1          
 [25] RCurl_1.98-1.3         jsonlite_1.7.2         graph_1.68.0           spatstat.data_2.1-0    survival_3.1-12        zoo_1.8-9              glue_1.4.2             polyclip_1.10-0       
 [33] gtable_0.3.0           zlibbioc_1.36.0        XVector_0.30.0         leiden_0.3.9           DelayedArray_0.16.3    pkgbuild_1.2.0         future.apply_1.7.0     abind_1.4-5           
 [41] scales_1.1.1           DBI_1.1.1              miniUI_0.1.1.1         Rcpp_1.0.7             viridisLite_0.4.0      xtable_1.8-4           reticulate_1.20        spatstat.core_2.3-0   
 [49] bit_4.0.4              rsvd_1.0.5             htmlwidgets_1.5.3      httr_1.4.2             RColorBrewer_1.1-2     ellipsis_0.3.2         ica_1.0-2              XML_3.99-0.6          
 [57] farver_2.1.0           pkgconfig_2.0.3        R.methodsS3_1.8.1      uwot_0.1.10            deldir_0.2-10          utf8_1.2.2             AnnotationDbi_1.52.0   tidyselect_1.1.1      
 [65] labeling_0.4.2         rlang_0.4.11           later_1.2.0            cachem_1.0.5           munsell_0.5.0          tools_4.0.0            cli_3.0.1              RSQLite_2.2.7         
 [73] generics_0.1.0         ggridges_0.5.3         evaluate_0.14          fastmap_1.1.0          yaml_2.2.1             goftest_1.2-2          bit64_4.0.5            processx_3.5.2        
 [81] knitr_1.33             fitdistrplus_1.1-5     zip_2.2.0              purrr_0.3.4            RANN_2.6.1             pbapply_1.4-3          future_1.21.0          nlme_3.1-147          
 [89] mime_0.11              R.oo_1.24.0            compiler_4.0.0         rstudioapi_0.13        curl_4.3.2             plotly_4.9.4.1         png_0.1-7              spatstat.utils_2.2-0  
 [97] tibble_3.1.3           stringi_1.7.3          ps_1.6.0               lattice_0.20-41        vctrs_0.3.8            pillar_1.6.2           lifecycle_1.0.0        BiocManager_1.30.16   
[105] spatstat.geom_2.2-2    lmtest_0.9-38          RcppAnnoy_0.0.19       cowplot_1.1.1          bitops_1.0-7           irlba_2.3.3            httpuv_1.6.1           patchwork_1.1.1       
[113] R6_2.5.0               pcaMethods_1.82.0      promises_1.2.0.1       KernSmooth_2.23-16     parallelly_1.27.0      codetools_0.2-16       MASS_7.3-51.5          assertthat_0.2.1      
[121] rprojroot_2.0.2        withr_2.4.2            sctransform_0.3.2      GenomeInfoDbData_1.2.4 mgcv_1.8-31            rpart_4.1-15           tidyr_1.1.3            rmarkdown_2.10        
[129] Rtsne_0.15             shiny_1.6.0

Please let me know what other information to provide and I will promptly do so. Thanks again for your work on this great tool.

Ryan

seurat v3

Hi, is the previous STACAS version compatible with seurat v3 is still available to install, or has this been deprecated? Thanks!

Deletion of FilterAnchors.STACAS

Thank you for providing a great package.

Now, I am attempting to reproduce the prior study, which uses the STACAS package.
However, I can not find FilterAnchors.STACAS in the namespace. Has the FilterAnchors.STACAS function been removed in the latest version?

Seurat Normalization

Thanks for the nice tutorials. In there you use NormalizeData(). Do you have experience with using SCTransform() instead as suggested int the SCTransform workflow here which privides "improved pre-processing and normalization"?

package installation suddenly stopped working

Hi,

I have been regularly building Docker images with STACAS installed, however it has just started to throw the error below (which has never happened previously).

The installation was attempted using the following command:

remotes::install_github('carmonalab/STACAS')

Error:

Error: Failed to install 'STACAS' from GitHub:
  System command 'R' failed, exit status: 1, stdout + stderr:
E> * checking for file ‘/private/var/folders/pg/nxjtjhms3k55ll9nb3wmpfnm0000gn/T/RtmpqtJBGm/remotes32eb7dd6b0ec/carmonalab-STACAS-f535645/DESCRIPTION’ ... OK
E> cp: carmonalab-STACAS-f535645/docs/index.html: No such file or directory
E>  ERROR
E> copying to build directory failed

The installation works when building from 1.1.0

remotes::install_github('carmonalab/[email protected]')

For now I will use one of the previous tagged versions, but I thought I would raise this incase anyone else is having the same issue.

Thanks,
Alex

Preparing PCA embeddings for objects... 1/5 2/5 3/5 4/5 5/5 Finding integration anchors... Error in Embeddings(object = object.list[[x]][[nn.reduction]])[, dims] : subscript out of bounds

Hi,

I am integrating datasets from 4 distantly related species. We thought STACAS would be a great fit for our purposes as we want to keep the biological differences across datasets and default Seurat seems too harsh on it. However, when running the FindAnchors.STACAS function, I am unable to return the anchor object and I get the Embeddings error. I initially thought it was because I am having datasets with different genes, but after subsetting the data to keep only the genes that are shared across all 4 datasets, I am still getting this error.

In my pipeline I am grabbing the raw counts from each dataset, merging them and running the default Seurat pipeline with FindVariableFeatures %>% NormalizeData() %>% ScaleData() %>% RunPCA() %>% RunUMAP() %>% and then splitting this merged object in a list object split by dataset (I am not using the SCT assay since it seems STACAS is not yet able to handle them). After this, I run stacas_anchors <- FindAnchors.STACAS(comb.list, anchor.features = ngenes, dims = 1:nDims), where ngenes = 5000 and the same VariableGenes calculated and nDims = 80 and the same ndims that were used for RunPCA and RunUMAP... How can I circumvent this problem?

R v4.0 / Seurat v3.2.1

Hello,

Thanks for the package. I made a script and it worked out using R v3.6.1, but I needed to upgrade to R v4.0.2, and now using the same script I get an error Error: $ operator not defined for this S4 class. Indeed, I get the same error using the STACAS demo (see below). I tried also in a Linux with R v4.0.0 and got the same error.

Have you made STACAS work in R v4.0.X?

Thanks.

> library(STACAS)
> data(STACAS.sampledata)
> STACAS.anchors <- Run.STACAS(STACAS.sampledata)
Computing 500 integration features
Preparing PCA embeddings for objects...
 1/3 2/3 3/3
Computing within dataset neighborhoods
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01s  
Finding all pairwise anchors
  |                                                  | 0 % ~calculating  Finding neighborhoods
Finding anchors
	Found 1785 anchors
Error: $ operator not defined for this S4 class

My sessionInfo() is:

[1] "R version 4.0.2 (2020-06-22)"                                                                                           
 [2] "Platform: x86_64-apple-darwin17.0 (64-bit)"                                                                             
 [3] "Running under: macOS High Sierra 10.13.6"                                                                               
 [4] ""                                                                                                                       
 [5] "Matrix products: default"                                                                                               
 [6] "BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib"
 [7] "LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib"                                    
 [8] ""                                                                                                                       
 [9] "locale:"                                                                                                                
[10] "[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8"                                                      
[11] ""                                                                                                                       
[12] "attached base packages:"                                                                                                
[13] "[1] stats     graphics  grDevices utils     datasets  methods   base     "                                              
[14] ""                                                                                                                       
[15] "other attached packages:"                                                                                               
[16] "[1] STACAS_1.0.0"

Interpreting plot of distance distribution between the anchors

Hello,

Thanks for the great tool! I'm trying to implement this for my dataset where I'm integrating ~20 cancer samples however i'm having trouble interpreting the distance distribution plot between the anchors. Can you provide some more explanation for what the plot means? I've attached the one referenced in your tutorial as well as one from my data.

For example some questions of mine:

what does each graph within the overall image represent? Can you more clearly explain the axes and title?
where should you aim to have your dashed line threshold within the distribution?
what does it mean if one sample seems to have a very different distribution?
what is the end goal of this process?

Yours:

Mine:

Thanks again!
Ryan

Error in StandardizeGeneSymbols()

Hi,
Thank you for developing this helpful integration tool!
I tried to start from standardizing gene symbols across datasets, but I encountered the following error:

> library('Seurat')
> library('STACAS')
> data(EnsemblGeneTable.Hs)
> seu
An object of class Seurat 
23309 features across 51952 samples within 1 assay 
Active assay: RNA (23309 features, 0 variable features)
 1 layer present: counts
> seu <- StandardizeGeneSymbols(seu, EnsemblGeneTable=EnsemblGeneTable.Hs)
Number of genes in input object: 23309
Number of genes with standard symbols: 17561 (75.34%)
Examples of non-standard Gene.names:
FO538757.2,AP006222.2,RP11-206L10.9,CPSF3L,RP5-832C2.5,C1orf233
Additional number of genes with accepted Gene name synonym:  965
Number of duplicated Gene.name: 21 (0.09%)
Final number of genes: 18505 (79.39%)
Error in `rownames<-`(`*tmp*`, value = unname(genesAllowList[rows.select])) : 
  attempt to set 'rownames' on an object with no dimensions
In addition: Warning message:
Layer ‘data’ is empty

I would appreciate suggestions regarding the issue.
Thanks

FindAnchors.STACAS ignoring the "reference" list

Error while finding anchors with semi supervised approach

Hello!

Thans a lot for releasing this new method. It looks promising, and I have been trying to test it with my datasets that have in total more than 500 K cells. As you did with the CD8 human atlas T cells, I used the 20 biggest datasets as a reference. However, when finding anchors between the integrated reference and the rest of the datasets, I obtained this error:
Error in flag[anchors$Consistent == FALSE] <- accept :
NAs are not allowed in subscripted assignments
Calls: FindAnchors.STACAS -> inconsistent_anchors -> probabilistic_reject
Execution halted

Do you know why? Do you think is that in some of my datasets I might not have all cell types I am using in the cell.labels argument?
Thanks!
Dayme

carmonalab / stacas Goto Github PK

stacas's People

Contributors

Stargazers

Watchers

Forkers

stacas's Issues

Recommend Projects

Recommend Topics

Recommend Org