Code Monkey home page Code Monkey logo

scgsva's People

Contributors

egeulgen avatar guokai8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

scgsva's Issues

numbers of columns of arguments do not match

when running

res<-scgsva(pbmc,hsko,method="ssgsea")

[1] "Normalizing..."
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
In addition: Warning message:

how to solve this ?
thank you

no slot of name "counts" for this object of class "Assay5"

res<-scgsva(pbmc,df,method="ssgsea",useTerm=F)
Error in scgsva(pbmc, df, method = "ssgsea", useTerm = F) :
no slot of name "counts" for this object of class "Assay5"

But if I check:

pbmc
An object of class Seurat
26577 features across 2400 samples within 1 assay
Active assay: RNA (26577 features, 2000 variable features)
3 layers present: counts, data, scale.data
2 dimensional reductions calculated: pca, umap

MSIGB annotation

Please use the newest version 0.0.16 with the following command:

hsmg<-buildMSIGDB(species="human",keytype = "SYMBOL",anntype = "HALLMARK")

Btw, I copied the issue into the issues on my github.
Best,
Kai

On Oct 4, 2023, at 03:16, xavier tekpli [email protected] wrote:

Dear guokai,

Thanks for developing the scGSVA package.

I see that there are functions to build database from the MSIGdb.

With the function: buildMSIGDB(species = "human", keytype = "SYMBOL", anntype = "GO")

I want to specifically use the 50 HALLMARK pathhways from the MSIG.
But I am not finding how to do it.
Is it possible?

Thanks a lot in advance for your answer.
Have a nice day.
Xavier

GSVA calculation takes extremely long

Dear @guokai8,

thanks for your great package. I am currently struggling a little to use it on my dataset, as the GSVA calculation takes extremely long.
I am using a custom gene set in this structure:

GeneID | Annot
PTGS2 | Ferroptosis

And I am running these commands:

gene_set <- read.csv("gene_set.csv")
res<-scgsva(nft_ad,annot=gene_set,method="gsva",useTerm = F)

This produces the following console messages (which look fine in my opinion):

Setting parallel calculations through a MulticoreParam back-end
with workers=4 and tasks=100.
Estimating GSVA scores for 1 gene sets.
Estimating ECDFs with Poisson kernels
Estimating ECDFs in parallel on 4 cores

About 21 iterations (I assume cells) took around 12 hours. I am running this on a M1 Pro MacBook with 32 GB RAM - do you think it will be faster once I switch to a computer with better specifications? I want to run GSVA analysis on around 100000 cells...this would take ages.

I am keen to get your recommendations!
Thanks and best regards,
Jonas

Memory issue

Hi!

I'm trying to run the function on 100756 cells for the GO genesets and I'm always running into the same memory issue:

In asMethod(object) :
sparse->dense coercion: allocating vector of size 17.8 GiB

I'm doing this on a cluster with quite a big capacity so I decided to fraction the cells to be able to run it. I was wondering if then the data from different runs would be comparable, because from what I understand from normal GSVA the results are only comparable intradataset.

Thank you!

More pathways and databases?

Hi,

Thanks for developing an amazing package

I was wondering whether we can use databases other than KEGG, also pathways other than wnt signaling pathways.

Thank you!

Heatmap issues input is not a matrix, it should be a simple vector

`hsamsi<-buildMSIGDB(species="human",keytype = "SYMBOL",anntype = "REACTOME")

res_reactome <- scgsva(groupA_DCC,hsamsi,method="ssgsea", maxRank=2000, cores = 6)

featurePlot(res_reactome,features = c("EUKARYOTIC_TRANSLATION_INITIATION"), group_by = "sample")

Heatmap(res_reactome)
Error: If input is not a matrix, it should be a simple vector.`

After building the reactome database and running scgsva, I get results that run with featureplot, ridgeplot, but I get errors for Heatmap.

Any suggestions?

`> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8 LC_CTYPE=English_Canada.utf8 LC_MONETARY=English_Canada.utf8
[4] LC_NUMERIC=C LC_TIME=English_Canada.utf8

time zone: America/Vancouver
tzcode source: internal

attached base packages:
[1] stats4 grid stats graphics grDevices datasets utils methods base

other attached packages:
[1] data.table_1.15.4 DT_0.33 pheatmap_1.0.12 EnhancedVolcano_1.20.0
[5] ggrepel_0.9.5 devtools_2.4.5 usethis_2.2.3 remotes_2.5.0
[9] BiocManager_1.30.22 cowplot_1.1.3 monocle3_1.3.7 SingleCellExperiment_1.24.0
[13] SummarizedExperiment_1.32.0 GenomicRanges_1.54.1 GenomeInfoDb_1.38.8 IRanges_2.36.0
[17] S4Vectors_0.40.2 MatrixGenerics_1.14.0 matrixStats_1.3.0 Biobase_2.62.0
[21] BiocGenerics_0.48.1 SeuratWrappers_0.3.5 fgsea_1.28.0 UCell_2.6.2
[25] jsonlite_1.8.8 writexl_1.5.0 here_1.0.1 BiocParallel_1.36.0
[29] knitr_1.46 glue_1.7.0 limma_3.58.1 igraph_2.0.3
[33] viridis_0.6.5 viridisLite_0.4.2 clustree_0.5.1 ggraph_2.2.1
[37] RColorBrewer_1.1-3 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[41] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[45] ggplot2_3.5.1 tidyverse_2.0.0 dplyr_1.1.4 Seurat_5.0.3
[49] msigdbr_7.5.1 ComplexHeatmap_2.15.4 SeuratObject_5.0.1 scGSVA_0.0.22

loaded via a namespace (and not attached):
[1] fs_1.6.4 GSVA_1.50.5 spatstat.sparse_3.0-3 bitops_1.0-7
[5] httr_1.4.7 doParallel_1.0.17 profvis_0.3.8 tools_4.3.1
[9] sctransform_0.4.1 backports_1.4.1 utf8_1.2.4 R6_2.5.1
[13] HDF5Array_1.30.1 lazyeval_0.2.2 uwot_0.2.2 rhdf5filters_1.14.1
[17] GetoptLong_1.0.5 urlchecker_1.0.1 withr_3.0.0 sp_2.1-3
[21] gridExtra_2.3 progressr_0.14.0 cli_3.6.2 spatstat.explore_3.2-7
[25] fastDummies_1.7.3 labeling_0.4.3 spatstat.data_3.0-4 ggridges_0.5.6
[29] pbapply_1.7-2 R.utils_2.12.3 sessioninfo_1.2.2 parallelly_1.37.1
[33] rstudioapi_0.16.0 RSQLite_2.3.6 generics_0.1.3 shape_1.4.6.1
[37] ica_1.0-3 spatstat.random_3.2-3 car_3.1-2 Matrix_1.6-5
[41] fansi_1.0.6 abind_1.4-5 R.methodsS3_1.8.2 lifecycle_1.0.4
[45] carData_3.0-5 rhdf5_2.46.1 SparseArray_1.2.4 Rtsne_0.17
[49] blob_1.2.4 promises_1.3.0 crayon_1.5.2 miniUI_0.1.1.1
[53] lattice_0.21-8 beachmat_2.18.1 annotate_1.80.0 KEGGREST_1.42.0
[57] pillar_1.9.0 boot_1.3-28.1 rjson_0.2.21 future.apply_1.11.2
[61] codetools_0.2-19 fastmatch_1.1-4 leiden_0.4.3.1 vctrs_0.6.5
[65] png_0.1-8 spam_2.10-0 gtable_0.3.5 cachem_1.0.8
[69] xfun_0.43 S4Arrays_1.2.1 mime_0.12 tidygraph_1.3.1
[73] survival_3.5-5 iterators_1.0.14 statmod_1.5.0 ellipsis_0.3.2
[77] fitdistrplus_1.1-11 ROCR_1.0-11 nlme_3.1-162 bit64_4.0.5
[81] RcppAnnoy_0.0.22 rprojroot_2.0.4 irlba_2.3.5.1 KernSmooth_2.23-21
[85] colorspace_2.1-0 DBI_1.2.2 processx_3.8.4 tidyselect_1.2.1
[89] curl_5.2.1 bit_4.0.5 compiler_4.3.1 graph_1.80.0
[93] BiocNeighbors_1.20.2 desc_1.4.3 DelayedArray_0.28.0 plotly_4.10.4
[97] scales_1.3.0 lmtest_0.9-40 callr_3.7.6 digest_0.6.35
[101] goftest_1.2-3 minqa_1.2.6 spatstat.utils_3.0-4 XVector_0.42.0
[105] htmltools_0.5.8.1 pkgconfig_2.0.3 lme4_1.1-35.3 sparseMatrixStats_1.14.0
[109] fastmap_1.1.1 rlang_1.1.3 GlobalOptions_0.1.2 htmlwidgets_1.6.4
[113] shiny_1.8.1.1 DelayedMatrixStats_1.24.0 farver_2.1.1 zoo_1.8-12
[117] R.oo_1.26.0 BiocSingular_1.18.0 RCurl_1.98-1.14 magrittr_2.0.3
[121] GenomeInfoDbData_1.2.11 dotCall64_1.1-1 patchwork_1.2.0 Rhdf5lib_1.24.2
[125] munsell_0.5.1 Rcpp_1.0.12 babelgene_22.9 reticulate_1.36.1
[129] stringi_1.8.3 zlibbioc_1.48.2 MASS_7.3-60 pkgbuild_1.4.4
[133] plyr_1.8.9 parallel_4.3.1 listenv_0.9.1 deldir_2.0-4
[137] Biostrings_2.70.3 graphlayouts_1.1.1 splines_4.3.1 tensor_1.5
[141] hms_1.1.3 circlize_0.4.16 ps_1.7.6 spatstat.geom_3.2-9
[145] RcppHNSW_0.6.0 pkgload_1.3.4 reshape2_1.4.4 ScaledMatrix_1.10.0
[149] XML_3.99-0.16.1 renv_1.0.7 nloptr_2.0.3 tzdb_0.4.0
[153] foreach_1.5.2 tweenr_2.0.3 httpuv_1.6.15 RANN_2.6.1
[157] polyclip_1.10-6 future_1.33.2 clue_0.3-65 scattermore_1.2
[161] ggforce_0.4.2 rsvd_1.0.5 broom_1.0.5 xtable_1.8-4
[165] RSpectra_0.16-1 rstatix_0.7.2 later_1.3.2 memoise_2.0.1
[169] AnnotationDbi_1.64.1 cluster_2.1.4 timechange_0.3.0 globals_0.16.3
[173] GSEABase_1.64.0 `

'group' column in scgsva result@obj

I am newer to GSVA, And i want to find the sig kegg path using the scrna data. But in the demo scripts, there is no 'group' in the metadata of the pbmc,How does the 'group' column appear in the data results? and what's the real meaning of group?

Error when calculating GSVA based on UCell method

Hi,

Thank you for making this amazing package! It has been a life saver for me.
However, I ran into a problem when trying to calculate gsva results using the method "UCell".
` #annotation
hsko<-buildAnnot(species="mouse",keytype="SYMBOL",anntype="GO")

#result based on ssgsea
res<-scgsva(scRNA_all_int,hsko,method="UCell") ## or use UCell`

The error message is as followed:

"Error in calculate_Uscore(m, features = features, maxRank = maxRank, chunk.size = chunk.size, :
One or more signatures contain more genes than maxRank parameter.
Increase maxRank parameter or make shorter signatures
Calls: scgsva".

I tried to change the anntype to "KEGG" and the function works smoothly, so I guess it's due to the fact that GO is a more enriched dataset than KEGG.
I understand that this might be more related to the UCell package, but I wonder if there's any way we could set the parameter for maxrank in the scgsva() function.

Discrepancy in results from v0.0.16 to 0.0.17

First of all, thank you for releasing this package.

Results ran on these versions show small changes. Commands used:

# gene set
hsmg <-
  buildMSIGDB(species = "human",
              keytype = "SYMBOL",
              anntype = "HALLMARK")

# v0.0.16
res_HALLMARK_1 <-
  scgsva(obj = seurat_object, hsmg, assay = "RNA")

# v0.0.17
res_HALLMARK_2 <-
  scgsva(obj = seurat_object, hsmg, assay = "RNA")

# correlation plots
plot(x = res_HALLMARK_1@gsva$TNFA_SIGNALING_VIA_NFKB,y = res_HALLMARK_2@gsva$TNFA_SIGNALING_VIA_NFKB)

I checked & both results have the same results in the annot slot & were built upon the same seurat object.

Correlations between both results are shown int his plot

image
image

Is there an explanation for this?

Thank you

Error after running parallel on HPC

Hi, I am trying to run the gsva on a seurat object with an assembled database form multiple origins. I already tested it with a subset of the seurat object and the subset of the database.
For instance, below works on the local machine;

> head(gene_annodf)

  gene_symbol gene_symbol               gs_name
1       ABCA1       ABCA1 HALLMARK_ADIPOGENESIS
2       ABCB8       ABCB8 HALLMARK_ADIPOGENESIS
3       ACAA2       ACAA2 HALLMARK_ADIPOGENESIS
4       ACADL       ACADL HALLMARK_ADIPOGENESIS
5       ACADM       ACADM HALLMARK_ADIPOGENESIS
6       ACADS       ACADS HALLMARK_ADIPOGENESIS
## subset of annotations
> table(gene_annodf[1:300,]$gs_name)

       HALLMARK_ADIPOGENESIS HALLMARK_ALLOGRAFT_REJECTION 
                         210                           90 

>gsva_res <- scgsva(
    fib[,1:50], ## SUBSET OF SEUOBJECT
    annot = gene_annodf[1:300,], ## SUBSET OF ANNOTATIONS
    kcdf = "Poisson",
    abs.ranking = FALSE,
    min.sz = 1,
    max.sz = 500,
    mx.diff = TRUE,
    method = "gsva",
    useTerm = TRUE,
    cores = 10,
    verbose = TRUE
)
> dim(gsva_res@gsva)
[1] 50  2
> head(gsva_res@gsva)
                             HALLMARK_ADIPOGENESIS HALLMARK_ALLOGRAFT_REJECTION
CC.C_1_Prox_AAAGAACGTGGATCGA           -0.18195536                   -0.2752013
CC.C_1_Prox_AAAGTCCAGAGTTGAT           -0.14447515                   -0.3477315
CC.C_1_Prox_AAAGTCCGTATCGAGG           -0.17582418                   -0.4758325
CC.C_1_Prox_AACAACCCAATTGAGA           -0.24332880                   -0.3421558
CC.C_1_Prox_AACACACCACAGTCGC           -0.05431398                   -0.4594946
CC.C_1_Prox_AACCACATCCAGCACG           -0.04031415                   -0.3908996




(Apparently, two columsn wouldn't work for the annotation data frame contrary to what was suggested in #6). After I tried to run this on the full seurat object and full annotation dataframe on the HPC with 35 cores (21 gb per core memory), it ran for 23 hours and threw an error which I do no understand. Help is greatly appreciated.

Input details

> dim(gene_annodf)
[1] 1823001       3
> length(unique(gene_annodf$gs_name))
[1] 18134
> head(gene_annodf)

  gene_symbol gene_symbol               gs_name
1       ABCA1       ABCA1 HALLMARK_ADIPOGENESIS
2       ABCB8       ABCB8 HALLMARK_ADIPOGENESIS
3       ACAA2       ACAA2 HALLMARK_ADIPOGENESIS
4       ACADL       ACADL HALLMARK_ADIPOGENESIS
5       ACADM       ACADM HALLMARK_ADIPOGENESIS
6       ACADS       ACADS HALLMARK_ADIPOGENESIS


> fib

An object of class Seurat 
33538 features across 20140 samples within 1 assay 
Active assay: RNA (33538 features, 0 variable features)
 1 dimensional reduction calculated: umap

> gsva_res <- scgsva(
    fib,
    annot = gene_annodf,
    kcdf = "Poisson",
    abs.ranking = FALSE,
    min.sz = 1,
    max.sz = 500,
    mx.diff = TRUE,
    method = "gsva",
    useTerm = TRUE,
    cores = 35,
    verbose = TRUE
)

Error!!

Attaching SeuratObject
iteration:  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100
Error in reducer$value.cache[[as.character(idx)]] <- values : 
  wrong args for environment subassignment
Calls: scgsva -> .sgsva
In addition: Warning message:
In asMethod(object) :
  sparse->dense coercion: allocating vector of size 4.1 GiB

Details from error (from the traceback I called to the output file from HPC)

Setting parallel calculations through a MulticoreParam back-end
with workers=35 and tasks=100.
Estimating GSVA scores for 18127 gene sets.
Estimating ECDFs with Poisson kernels
Estimating ECDFs in parallel on 35 cores

  |                                                                            
  |                                                                      |   0%19: (function () 
    {
        traceback(2, max.lines = 100)
        if (!interactive()) 
            quit(save = "no", status = 1, runLast = T)
    })()
18: .reducer_add(reducer, njob, value)
17: .reducer_add(reducer, njob, value)
16: .collect_result(manager, reducer, progress, BPPARAM)
15: .bploop_impl(ITER = ITER, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, 
        BPOPTIONS = BPOPTIONS, BPREDO = BPREDO, reducer = reducer, 
        progress.length = length(redo_index))
14: bploop.lapply(manager, BPPARAM = BPPARAM, BPOPTIONS = BPOPTIONS, 
        ...)
13: bploop(manager, BPPARAM = BPPARAM, BPOPTIONS = BPOPTIONS, ...)
12: .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, 
        BPOPTIONS = BPOPTIONS, BPREDO = BPREDO)
11: bplapply(gset.idx.list, ks_test_m, gene.density = rank.scores, 
        sort.idxs = sort.sgn.idxs, mx.diff = mx.diff, abs.ranking = abs.ranking, 
        tau = tau, verbose = verbose, BPPARAM = BPPARAM)
10: bplapply(gset.idx.list, ks_test_m, gene.density = rank.scores, 
        sort.idxs = sort.sgn.idxs, mx.diff = mx.diff, abs.ranking = abs.ranking, 
        tau = tau, verbose = verbose, BPPARAM = BPPARAM)
9: compute.geneset.es(expr, gset.idx.list, 1:n.samples, rnaseq = rnaseq, 
       abs.ranking = abs.ranking, parallel.sz = parallel.sz, mx.diff = mx.diff, 
       tau = tau, kernel = kernel, verbose = verbose, BPPARAM = BPPARAM)
8: .gsva(expr, mapped.gset.idx.list, method, kcdf, rnaseq, abs.ranking, 
       parallel.sz, mx.diff, tau, kernel, ssgsea.norm, verbose, 
       BPPARAM)
7: .local(expr, gset.idx.list, ...)
6: gsva(input, annotation, method = method, kcdf = kcdf, tau = tau, 
       ssgsea.norm = ssgsea.norm, parallel.sz = cores, BPPARAM = SerialParam(progressbar = verbose))
5: gsva(input, annotation, method = method, kcdf = kcdf, tau = tau, 
       ssgsea.norm = ssgsea.norm, parallel.sz = cores, BPPARAM = SerialParam(progressbar = verbose))
4: withCallingHandlers(expr, warning = function(w) if (inherits(w, 
       classes)) tryInvokeRestart("muffleWarning"))
3: suppressWarnings(gsva(input, annotation, method = method, kcdf = kcdf, 
       tau = tau, ssgsea.norm = ssgsea.norm, parallel.sz = cores, 
       BPPARAM = SerialParam(progressbar = verbose)))
2: .sgsva(input = input, annotation = annotation, method = method, 
       kcdf = kcdf, abs.ranking = abs.ranking, min.sz = min.sz, 
       max.sz = max.sz, cores = cores, tau = tau, ssgsea.norm = ssgsea.norm, 
       verbose = verbose)
1: scgsva(fib, annot = gene_annodf, kcdf = "Poisson", abs.ranking = FALSE, 
       min.sz = 1, max.sz = 500, mx.diff = TRUE, method = "gsva", 
       useTerm = TRUE, cores = 35, verbose = TRUE)

Input issue

Hi, I have an issue with trying to load the SingleCellExperiment objects. Please check the report that uses 5k PBMC dataset from 10X:

scGSVA input issue.pdf
(skip to page 22 for stuff that did not work for me)

I've tried both loading the SCE from HDF5 and from filtered_feature_bc_matrix and both resulted in errors. I've also tried to convert the counts and logcounts matrices to dgCmatrix and it did not help. Conversion to Seurat also did not work.

Cannot locate raw NES

First off, thank you so much for releasing this extremely helpful tool! The majority of the functions have worked well for me, and have been instrumental in my analysis of scRNA-seq data.

In the interest of further downstream analyses, I would love to be able to access the raw NES values clearly used for figure generation. However, I have been unable to locate them anywhere in the created object. Would you be able to point me in the direction of these raw scores?

Additionally, I encountered issues running these commands:

genes(res, features = mk)
Error in cbind.Matrix(x, y, deparse.level = 0L) :
invalid type "character" in 'cbind.Matrix'

Heatmap(res,group_by="cluster")
No error, but it hangs and crashes RStudio

Any help would be greatly appreciated!

'ssgseaParam' is not an exported object from 'namespace:GSVA'

I used an annotated single-cell data from seurat, but the following problems occurred when using scgsva:
'ssgseaParam' is not an exported object from 'namespace:GSVA'
my codes:`set.seed(123)
library(scGSVA)

anno <- buildAnnot(species="human", keytype="SYMBOL", anntype="KEGG")
res <- scgsva(seurat_object.filt.filt, anno,cores = 60)
`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.