nathanskene / ewce Goto Github PK

View Code? Open in Web Editor NEW

53.0 5.0 25.0 36.16 MB

Expression Weighted Celltype Enrichment. See the package website for up-to-date instructions on usage.

Home Page: https://nathanskene.github.io/EWCE/index.html

R 99.61% TeX 0.39%

transcriptomics single-cell single-cell-rna-seq deconvolution

ewce's People

Contributors

Stargazers

Watchers

ewce's Issues

How much does parallelisation increase speed?

`generate_bootstrap_plots`: address warning

ctd <- ewceData::ctd()
example_genelist <- ewceData::example_genelist()
level <- 1
reps <- 10

 full_results <-
        EWCE::bootstrap_enrichment_test(
            sct_data = ctd,
            sctSpecies = "mouse",
            hits = example_genelist,
            genelistSpecies = "human",
            reps = reps,
            annotLevel = level)

 boot_plot_dir1 <- EWCE::generate_bootstrap_plots(
        sct_data = ctd,
        sctSpecies = "mouse",
        hits = example_genelist,
        genelistSpecies = "human",
        annotLevel = level,
        full_results = full_results,
        listFileName = "VignetteGraphs",
        savePath = tempdir()
    )

Warning messages: 1: Transformation introduced infinite values in continuous y-axis

Enable offline runs of EWCE with ExperimentHub

Enable EWCE to run offline by using local cache of ExperiementHub. Issue was highlighted here by a user.

This will require:

Updating ewceData so that the neurogenomics/ewceData/R/utils.R and the get_ExperimentHub function along with all dataset functions handle a localHub=T parameter
Update all EWCE functions that call and load datasets from ewceData with the parmeter. The parameter will need to propagate up through the functions.
Add a test case
Push to Bioconductor

Add check bootstrap_enrichment_test gene list not a factor

Add check bootstrap_enrichment_test gene list not a factor, if so convert to character

bin.specificity.into.quantiles NOT in namespace

> ctd = prepare.quantile.groups(ctd,specificity_species="human",numberOfBins=40)
Error: 'bin.specificity.into.quantiles' is not an exported object from 'namespace:EWCE'

This hasn't happened before. Must be a consequence of me updating to the last version, as the error is still there after

> devtools::install_github("nathanskene/ewce")
Skipping install of 'EWCE' from a github remote, the SHA1 (50b9fe0c) has not changed since last install.
  Use `force = TRUE` to force installation
> devtools::install_github("nathanskene/ewce", force=TRUE)
[...]
Installing package into *************************
(as ‘lib’ is unspecified)
* installing *source* package ‘EWCE’ ...
** R
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
** help
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (EWCE)
> library(EWCE)
> ctd = prepare.quantile.groups(ctd,specificity_species="human",numberOfBins=40)
Error: 'bin.specificity.into.quantiles' is not an exported object from 'namespace:EWCE'

can not install EWCE

Hello,
I try to install EWCE following this page (https://nathanskene.github.io/EWCE/articles/EWCE.html), but I got the following errors:
Error: package or namespace load failed for ‘RNOmni’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-RNOmni/00new/RNOmni/libs/RNOmni.so':
/nas/longleaf/apps/gcc/6.3.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-RNOmni/00new/RNOmni/libs/RNOmni.so)
Error: loading failed
Execution halted
ERROR: loading failed

removing ‘/nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0/RNOmni’

The downloaded source packages are in
‘/tmp/RtmpdPeehs/downloaded_packages’
✔ checking for file ‘/tmp/RtmpdPeehs/remotes74cba85791/neurogenomics-EWCE-793daf3/DESCRIPTION’ ...
─ preparing ‘EWCE’:
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘EWCE_0.99.2.tar.gz’
Warning: invalid uid value replaced by that for user 'nobody'

Installing package into ‘/nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
ERROR: dependency ‘RNOmni’ is not available for package ‘EWCE’

removing ‘/nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0/EWCE’
Warning messages:
1: In i.p(...) : installation of package ‘RNOmni’ had non-zero exit status
2: In i.p(...) :
installation of package ‘/tmp/RtmpdPeehs/file74cb713b194/EWCE_0.99.2.tar.gz’ had non-zero exit status

Does significance of an association always increase as the complexity of the scRNA-seq dataset increases

Does significance of an association always increase as the complexity of the scRNA-seq dataset increases? E.g. if you start with the Zeisel2018 dataset, and drop 20/40/60/80% of it’s

problems running docker image of ewce

Dear EWCE-team,

I try to use the Docker version of EWCE. The initial steps work without errors. However, when I try to execute:
ggplot(cellExpDist) + geom_boxplot(aes(x=l1,y=e)) + xlab("Cell type") + ylab("Unique Molecule Count") + theme(axis.text.x = element_text(angle = 90, hjust = 1))

I get thrown out and no graphics appear. I guess at some stage a graphics window should appear, but being quite new to R and stuff, I do not know where or how to activate it.
Is there a way to run the docker image in RStudio?

Thanks for any help in advance.
Best
Matthias

bootstrap.enrichment.test returns error about cantrecognize gene

HI EWCE developer,i used EWCE to analyse my data:
full_results = bootstrap.enrichment.test(sct_data=merged_KI,hits=list,bg=bg,sctSpecies = "human", genelistSpecies = "human",reps=10000,annotLevel=1)
and i got error:

ERROR: At least four genes which are present in the single cell dataset & background gene set are required to test for enrichment
But i dont think this is true for me:
length(intersect(rownames(merged_KI$exp),bg))

10622
So I cant understand what this error actually means.Wish to get help from you.Thanks!

How does 1:1 ortholog definition affect results?

We’ve altered how 1:1 orthologs are defined, does this affect results? How does the stringency of the definition of orthologous affect results? (Brian can explain in more detail)

Give error if generate_celltype_data is called with ensembl ID's as gene names

Enable meta-data (such as that generated by `get.celltype.table` and Brian's `plot_gene_metrics`) to be stored in the CTD

Will probably break most existing functions that expect ctd[[1]] to have $specificity, so might want to adapt so ctd[[1]] has meta-data, and ctd[[2]] has the meanexp/specificity data.

Would need to cascade any changes to magma_celltyping as well

Provide default background genes

If users don't provide a background gene list, an appropropriate ones will now be created by default.

https://github.com/bschilder/EWCE/blob/8e9ca154c98f2afb4c01a0a0e1ade92a690a2cce/R/bootstrap_enrichment_test.R#L111

Build tests must confirm that EWCE docker image works as expected

E.g. library(EWCE) doesn't work in the current docker image

We need the standard vignette to work inside the docker image

The docker image is used for undergraduate + masters teaching so it is vital that it always work as expected

Calculating cell type specificity scores

Hi! Thanks for the great package.

I'm interested in running the cell type specificity score calculation for some 10x snRNA-seq datasets. Is there a clear reason to use either mean or median for zero-inflated datasets?

Also, curious for the rationale of dividing the median scores by colSums of the mean per cell type/subtype as opposed to the colSums of the median ("generate.celltype.data.R", lines 89/91)?

Setup unittesting with testthat

Enable `merge_two_expfiles` to handle various kinds of (sparse) matrices

Implemented here:
https://github.com/bschilder/EWCE/blob/8e9ca154c98f2afb4c01a0a0e1ade92a690a2cce/R/merge_two_expfiles.r#L75

Improve GHA workflow checks

Travis checks, and even the R-CMD checks we've been using for MungeSumstats, tend to be a bit buggy and randomly fail sometimes (unrelated to the pushed changes to our packages).

biocthis::use_bioc_github_action() automatically creates a yaml file for a GHA workflow that is both more robust (in part thanks for using Docker containers on the fly) and comprehensive (currently working on MacOS, Windows, and Linux).

As a bonus, this workflow ensures the devel version of Bioc is being used (3.14 currently), meaning we can easily install recently added Bioc submissions (e.g. orthogene)

I've now implemented this on the bschilder_dev branch. The only thing in the yaml I had to modify was the name of one of my GH Secrets variables: GITHUB_TOKEN-->PAT_GITHUB

"drop.uninformative.genes" uses functions for mainly microarray data?

Hi,
I was looking into the source code for the drop.uninformative.genes function and I saw that it uses an ANOVA test implemented in limma. And limma, together with the eBayes() function, was developed for (mainly) microarray data and also RNAseq data; but it makes me wonder if it is applicable to scRNAseq data?

Compute q-values during during `bootstrap_enrichment_test`

Compute multiple-testing corrected q-values during bootstrap_enrichment_test rather than making it a separate step.

Remove the docker push from the travis commands

Shouldn't be necessary as docker hub should build it automatically

EWCE should use One2One instead of data("mouse_to_human_homologs)

fix HGNC names exits with error

Running the fix hgnc script errors out.

> fix.bad.hgnc.symbols(ctd_darmanis, dropNonHGNC = F)
[1] "2841 of 22088 are not proper HGNC symbols"
[1] "Possible corruption of gene names by excel: SEP15, SEPN1, SEPP1, SEPT5-GP1BB, SEPT7L, SEPW1, SEPX1"
[1] "1597 of 22088 gene symbols corrected"
[1] "1365 of 22088 gene symbols cannot be mapped"
           astrocytes-GSM1657885 astrocytes-GSM1657932 astrocytes-GSM1657938 astrocytes-GSM1657965
SEC24B-AS1                     0                     0                     0                     0
A1BG                           0                     0                     0                     0
A1BG-AS1                       0                     0                     0                     0
.
.
.
Warning messages:
1: In fix.bad.hgnc.symbols(ctd_darmanis, dropNonHGNC = F) :
  Possible corruption of gene names by excel: SEP15, SEPN1, SEPP1, SEPT5-GP1BB, SEPT7L, SEPW1, SEPX1
2: In checkGeneSymbols(rownames(exp_CORRECTED), unmapped.as.na = TRUE) :
  Some lower-case letters were found and converted to upper-case.
                 HGNChelper is intended for human symbols only, which should be all
                 upper-case except for open reading frames (orf).
3: In checkGeneSymbols(rownames(exp_CORRECTED), unmapped.as.na = TRUE) :
  x contains non-approved gene symbols
4: In checkGeneSymbols(rownames(exp_CORRECTED), unmapped.as.na = FALSE) :
  Some lower-case letters were found and converted to upper-case.
                 HGNChelper is intended for human symbols only, which should be all
                 upper-case except for open reading frames (orf).
5: In checkGeneSymbols(rownames(exp_CORRECTED), unmapped.as.na = FALSE) :
  x contains non-approved gene symbols

Get it working with large input matrices.

understanding the inputs for EWCE

Hi,

I have a few questions regarding EWCE. Its a very useful tool. I have gone through paper quickly and tried EWCE.

I would like to know if we need to normalize the scRNA data prior to running EWCE ?
There will be many genes with zero counts in scRNA datasets. As there is drop.uninformative.genes function that takes care of noise, but I don't think its doing the job well. When I input all genes (~19,000), after running drop.uninformative.genes, it still retains 16,000 which is a huge number given the low capture rate of scRNA. How does this function takes care of genes with 0 counts across many cells ?
bootstrap.enrichment.test is this function testing if the gene expression distribution is "higher" than the background set or "different" than background set ?

Also the Vignette is more tailored for mouse/human conversions, so this is my code I am using as I have human data. Would like to know if I am missing anything.

annot <- read.table("metadata_for_EWCE_v2.txt", header = T, sep ="\t")

SCT <- read.table("scRNA_for_EWCE_GeneNames.txt", header = T, row.names = 1, sep = "\t")

annotLevels = list(level1class=annot$level1class,level2class=annot$level2class)

exp_DROPPED = EWCE::drop.uninformative.genes(exp=SCT,level2annot = annot$level2class)

fNames = EWCE::generate.celltype.data(exp=exp_DROPPED, annotLevels=annotLevels, groupName="Foo", no_cores=10)

load(fNames[1])

GWAS_named_genes  <- as.vector(read.table("GWAS_Named_Loci.txt")$V1)

full_results = bootstrap.enrichment.test(sct_data=ctd,sctSpecies="human", 
								genelistSpecies="human",
								hits=GWAS_named_genes,
                                                                bg=rownames(exp_DROPPED),
								reps=10000, annotLevel=2)

Add arg checks to ensure `bootstrap_enrichment_test` won't fail later

Best to have these checks early so users don't waste time on functions destined to fail.

Implemented here:

https://github.com/bschilder/EWCE/blob/8e9ca154c98f2afb4c01a0a0e1ade92a690a2cce/R/bootstrap_enrichment_test.R#L89

How do the various methods of filtering low-expressed genes now implemented affect the results?

Automatic DockerHub builds

Automatically build and upload new Docker container to DockerHub every time a push is made (via GHA).

Get EWCE back onto bioconductor

Setup codecov and add badge to the repo

Cannot connect to ExperimentHub server, using 'localHub=TRUE' instead

This last version seems to use ExperimentHub for doing something when I perform bootstrap_enrichment_test.
However, in some cases, we do not connect to the internet or cannot access the web service of bioconductor.org.
Even I set ExperimentHub(localHub=T), I still not have a local cache.
So, I can not use this package anymore.

Docker issues

Docker container of EWCE works fine until you try to print some plots.

Just noticed this warning message:

R graphics engine version 15 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed.

Apparently this happens when the version of R is too far ahead of the version of Rstudio. For some reason there is no back compatibility?
https://community.rstudio.com/t/warning-message-r-graphics-engine-version-14-is-not-supported-by-this-version-of-rstudio-the-plots-tab-will-be-disabled-until-a-newer-version-of-rstudio-is-installed/110386

Source

Our Dockerfile currently uses bioconductor/bioconductor_docker:devel as the base. Which means that this is an issue with how Bioconductor is setting up their docker container, not ours.

Potential solutions

We could try changing our Dockerfile to one of the following:

bioconductor/bioconductor_docker:latest: May not always be accurate. Maintainer doesn't seem keen on keep this up to date, though a verdict on this was never communicated.
bioconductor/bioconductor_docker:RELEASE_3_14: Should work, but means we will have to remember to manually update the Dockerfile anytime there is a new release. My intent with using thelatest tag was to avoid this and make the file usable in the long-term as is.

Or we could simply wait until the bioconductor_docker maintainer fixes the issue, and then rebuild any containers that used that version. I've posted the Issue here:
Bioconductor/bioconductor_docker#39

Installation error: package 'ewceData' requires R >= 4.1

Hi,
Thanks for writing this great tools. However, I came across an error across all operation systems (Windows, Mac and Ubuntu 20.4) when I installed the package.

ERROR: this R is version 4.0.4, package 'ewceData' requires R >= 4.1
Error: Failed to install 'ewceData' from GitHub:
  (converted from warning) installation of package ‘/tmp/Rtmp9xG1JD/file26f92ae904c2/ewceData_0.99.6.tar.gz’ had non-zero exit status

Could you suggest if this is a bug or if I miss something?
Many thanks!
Zhang

Setup one initial unit test with testthat and understand how it works with travis

A good initial test would be to confirm that when a simple CTD is created, the mean is calculated correctly (I did actually have an edit a while back, that was intended to speed up generation of the CTD, which meant it stopped calculating the mean properly)

parallel computing issues during generate.celltype.data

Before calculating specificity data for my own dataset, I was following the example code on my sysyem (Ubuntu 18.04, R v3.6.0)

download.file("goo.gl/r5Y24y",
    destfile="expression_mRNA_17-Aug-2014.txt") 
path = "expression_mRNA_17-Aug-2014.txt"
cortex_mrna  = load.linnarsson.sct.data(path)
exp_CortexOnly_DROPPED = drop.uninformative.genes(exp=cortex_mrna$exp,level2annot = cortex_mrna$annot$level2class)
annotLevels = list(level1class=cortex_mrna$annot$level1class,level2class=cortex_mrna$annot$level2class)
fNames_CortexOnly = generate.celltype.data(exp=exp_CortexOnly_DROPPED,annotLevels=annotLevels,groupName="kiCortexOnly")

Which returns the following error:

> fNames_CortexOnly = generate.celltype.data(exp=exp_CortexOnly_DROPPED,annotLevels=annotLevels,groupName="kiCortexOnly")
Loading required package: parallel
Error in makePSOCKcluster(names = spec, ...) : 
  numeric 'names' must be >= 1

Apparently, determining the n.o. cores (source code) returns a NA, which in turn causes makeCluster() to fail. On what OS was the code developed / tested? Perhaps this is a Ubuntu specific issue?

invalid number of intervals error when running generate.celltype.data()

> celltype_rda_file = generate.celltype.data(exp=exp, annotLevels = annotLevels, "Project name")
Error in cut.default(matrixIn[matrixIn > 0], breaks = unique(quantile(matrixIn[matrixIn >  : 
  invalid number of intervals

two compressed RDS files for exp and annotLevels: RDS_files_for_exp_and_annotLevels.zip

Split documentation into more manageable vignettes

Includes a quick minimal example on the Getting started page.

https://bschilder.github.io/EWCE/

Don't run tests on 32-bit Windows

Avoid running tests twice on Windows by detecting 32-bit Windows OS.

Made new function to make this a bit easier, used like so:

if(!is_32bit()){
<test functions>
}

EWCE:::is_celltypedataset(MAGMA.Celltyping::ctd_Zeisel2018)

[1] FALSE

Add functions for running lots of lists in parallel + graphing results

Imputed data for EWCE object

Is it possible to use an imputed matrix (eg: MAGIC) to construct the EWCE object?

nathanskene / ewce Goto Github PK

ewce's People

Contributors

Stargazers

Watchers

Forkers

ewce's Issues

Source

Potential solutions

Recommend Projects

Recommend Topics

Recommend Org