Code Monkey home page Code Monkey logo

ewce's People

Contributors

al-murphy avatar bobgsmith avatar bschilder avatar eturkes avatar jwokaty avatar nathanskene avatar nfancy avatar nturaga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ewce's Issues

`generate_bootstrap_plots`: address warning

ctd <- ewceData::ctd()
example_genelist <- ewceData::example_genelist()
level <- 1
reps <- 10

 full_results <-
        EWCE::bootstrap_enrichment_test(
            sct_data = ctd,
            sctSpecies = "mouse",
            hits = example_genelist,
            genelistSpecies = "human",
            reps = reps,
            annotLevel = level)

 boot_plot_dir1 <- EWCE::generate_bootstrap_plots(
        sct_data = ctd,
        sctSpecies = "mouse",
        hits = example_genelist,
        genelistSpecies = "human",
        annotLevel = level,
        full_results = full_results,
        listFileName = "VignetteGraphs",
        savePath = tempdir()
    )
Warning messages: 1: Transformation introduced infinite values in continuous y-axis

Enable offline runs of EWCE with ExperimentHub

Enable EWCE to run offline by using local cache of ExperiementHub. Issue was highlighted here by a user.

This will require:

  • Updating ewceData so that the neurogenomics/ewceData/R/utils.R and the get_ExperimentHub function along with all dataset functions handle a localHub=T parameter
  • Update all EWCE functions that call and load datasets from ewceData with the parmeter. The parameter will need to propagate up through the functions.
  • Add a test case
  • Push to Bioconductor

bin.specificity.into.quantiles NOT in namespace

> ctd = prepare.quantile.groups(ctd,specificity_species="human",numberOfBins=40)
Error: 'bin.specificity.into.quantiles' is not an exported object from 'namespace:EWCE'

This hasn't happened before. Must be a consequence of me updating to the last version, as the error is still there after

> devtools::install_github("nathanskene/ewce")
Skipping install of 'EWCE' from a github remote, the SHA1 (50b9fe0c) has not changed since last install.
  Use `force = TRUE` to force installation
> devtools::install_github("nathanskene/ewce", force=TRUE)
[...]
Installing package into *************************
(as ‘lib’ is unspecified)
* installing *source* package ‘EWCE’ ...
** R
** data
*** moving datasets to lazyload DB
** byte-compile and prepare package for lazy loading
** help
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
* DONE (EWCE)
> library(EWCE)
> ctd = prepare.quantile.groups(ctd,specificity_species="human",numberOfBins=40)
Error: 'bin.specificity.into.quantiles' is not an exported object from 'namespace:EWCE'

can not install EWCE

Hello,
I try to install EWCE following this page (https://nathanskene.github.io/EWCE/articles/EWCE.html), but I got the following errors:
Error: package or namespace load failed for ‘RNOmni’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-RNOmni/00new/RNOmni/libs/RNOmni.so':
/nas/longleaf/apps/gcc/6.3.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-RNOmni/00new/RNOmni/libs/RNOmni.so)
Error: loading failed
Execution halted
ERROR: loading failed

  • removing ‘/nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0/RNOmni’

The downloaded source packages are in
‘/tmp/RtmpdPeehs/downloaded_packages’
✔ checking for file ‘/tmp/RtmpdPeehs/remotes74cba85791/neurogenomics-EWCE-793daf3/DESCRIPTION’ ...
─ preparing ‘EWCE’:
✔ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘EWCE_0.99.2.tar.gz’
Warning: invalid uid value replaced by that for user 'nobody'

Installing package into ‘/nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
ERROR: dependency ‘RNOmni’ is not available for package ‘EWCE’

  • removing ‘/nas/longleaf/home/R/x86_64-pc-linux-gnu-library/4.0/EWCE’
    Warning messages:
    1: In i.p(...) : installation of package ‘RNOmni’ had non-zero exit status
    2: In i.p(...) :
    installation of package ‘/tmp/RtmpdPeehs/file74cb713b194/EWCE_0.99.2.tar.gz’ had non-zero exit status

problems running docker image of ewce

Dear EWCE-team,

I try to use the Docker version of EWCE. The initial steps work without errors. However, when I try to execute:
ggplot(cellExpDist) + geom_boxplot(aes(x=l1,y=e)) + xlab("Cell type") + ylab("Unique Molecule Count") + theme(axis.text.x = element_text(angle = 90, hjust = 1))

I get thrown out and no graphics appear. I guess at some stage a graphics window should appear, but being quite new to R and stuff, I do not know where or how to activate it.
Is there a way to run the docker image in RStudio?

Thanks for any help in advance.
Best
Matthias

bootstrap.enrichment.test returns error about cantrecognize gene

HI EWCE developer,i used EWCE to analyse my data:
full_results = bootstrap.enrichment.test(sct_data=merged_KI,hits=list,bg=bg,sctSpecies = "human", genelistSpecies = "human",reps=10000,annotLevel=1)
and i got error:

ERROR: At least four genes which are present in the single cell dataset & background gene set are required to test for enrichment
But i dont think this is true for me:
length(intersect(rownames(merged_KI$exp),bg))

10622
So I cant understand what this error actually means.Wish to get help from you.Thanks!

Calculating cell type specificity scores

Hi! Thanks for the great package.

I'm interested in running the cell type specificity score calculation for some 10x snRNA-seq datasets. Is there a clear reason to use either mean or median for zero-inflated datasets?

Also, curious for the rationale of dividing the median scores by colSums of the mean per cell type/subtype as opposed to the colSums of the median ("generate.celltype.data.R", lines 89/91)?

Improve GHA workflow checks

Travis checks, and even the R-CMD checks we've been using for MungeSumstats, tend to be a bit buggy and randomly fail sometimes (unrelated to the pushed changes to our packages).

biocthis::use_bioc_github_action() automatically creates a yaml file for a GHA workflow that is both more robust (in part thanks for using Docker containers on the fly) and comprehensive (currently working on MacOS, Windows, and Linux).

As a bonus, this workflow ensures the devel version of Bioc is being used (3.14 currently), meaning we can easily install recently added Bioc submissions (e.g. orthogene)

I've now implemented this on the bschilder_dev branch. The only thing in the yaml I had to modify was the name of one of my GH Secrets variables: GITHUB_TOKEN-->PAT_GITHUB

"drop.uninformative.genes" uses functions for mainly microarray data?

Hi,
I was looking into the source code for the drop.uninformative.genes function and I saw that it uses an ANOVA test implemented in limma. And limma, together with the eBayes() function, was developed for (mainly) microarray data and also RNAseq data; but it makes me wonder if it is applicable to scRNAseq data?

fix HGNC names exits with error

Running the fix hgnc script errors out.

> fix.bad.hgnc.symbols(ctd_darmanis, dropNonHGNC = F)
[1] "2841 of 22088 are not proper HGNC symbols"
[1] "Possible corruption of gene names by excel: SEP15, SEPN1, SEPP1, SEPT5-GP1BB, SEPT7L, SEPW1, SEPX1"
[1] "1597 of 22088 gene symbols corrected"
[1] "1365 of 22088 gene symbols cannot be mapped"
           astrocytes-GSM1657885 astrocytes-GSM1657932 astrocytes-GSM1657938 astrocytes-GSM1657965
SEC24B-AS1                     0                     0                     0                     0
A1BG                           0                     0                     0                     0
A1BG-AS1                       0                     0                     0                     0
.
.
.
Warning messages:
1: In fix.bad.hgnc.symbols(ctd_darmanis, dropNonHGNC = F) :
  Possible corruption of gene names by excel: SEP15, SEPN1, SEPP1, SEPT5-GP1BB, SEPT7L, SEPW1, SEPX1
2: In checkGeneSymbols(rownames(exp_CORRECTED), unmapped.as.na = TRUE) :
  Some lower-case letters were found and converted to upper-case.
                 HGNChelper is intended for human symbols only, which should be all
                 upper-case except for open reading frames (orf).
3: In checkGeneSymbols(rownames(exp_CORRECTED), unmapped.as.na = TRUE) :
  x contains non-approved gene symbols
4: In checkGeneSymbols(rownames(exp_CORRECTED), unmapped.as.na = FALSE) :
  Some lower-case letters were found and converted to upper-case.
                 HGNChelper is intended for human symbols only, which should be all
                 upper-case except for open reading frames (orf).
5: In checkGeneSymbols(rownames(exp_CORRECTED), unmapped.as.na = FALSE) :
  x contains non-approved gene symbols

understanding the inputs for EWCE

Hi,

I have a few questions regarding EWCE. Its a very useful tool. I have gone through paper quickly and tried EWCE.

  1. I would like to know if we need to normalize the scRNA data prior to running EWCE ?

  2. There will be many genes with zero counts in scRNA datasets. As there is drop.uninformative.genes function that takes care of noise, but I don't think its doing the job well. When I input all genes (~19,000), after running drop.uninformative.genes, it still retains 16,000 which is a huge number given the low capture rate of scRNA. How does this function takes care of genes with 0 counts across many cells ?

  3. bootstrap.enrichment.test is this function testing if the gene expression distribution is "higher" than the background set or "different" than background set ?

Also the Vignette is more tailored for mouse/human conversions, so this is my code I am using as I have human data. Would like to know if I am missing anything.

annot <- read.table("metadata_for_EWCE_v2.txt", header = T, sep ="\t")

SCT <- read.table("scRNA_for_EWCE_GeneNames.txt", header = T, row.names = 1, sep = "\t")

annotLevels = list(level1class=annot$level1class,level2class=annot$level2class)

exp_DROPPED = EWCE::drop.uninformative.genes(exp=SCT,level2annot = annot$level2class)

fNames = EWCE::generate.celltype.data(exp=exp_DROPPED, annotLevels=annotLevels, groupName="Foo", no_cores=10)

load(fNames[1])

GWAS_named_genes  <- as.vector(read.table("GWAS_Named_Loci.txt")$V1)

full_results = bootstrap.enrichment.test(sct_data=ctd,sctSpecies="human", 
								genelistSpecies="human",
								hits=GWAS_named_genes,
                                                                bg=rownames(exp_DROPPED),
								reps=10000, annotLevel=2)

Automatic DockerHub builds

Automatically build and upload new Docker container to DockerHub every time a push is made (via GHA).

Cannot connect to ExperimentHub server, using 'localHub=TRUE' instead

This last version seems to use ExperimentHub for doing something when I perform bootstrap_enrichment_test.
However, in some cases, we do not connect to the internet or cannot access the web service of bioconductor.org.
Even I set ExperimentHub(localHub=T), I still not have a local cache.
So, I can not use this package anymore.

Docker issues

Docker container of EWCE works fine until you try to print some plots.

Just noticed this warning message:

R graphics engine version 15 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed. 

Apparently this happens when the version of R is too far ahead of the version of Rstudio. For some reason there is no back compatibility?
https://community.rstudio.com/t/warning-message-r-graphics-engine-version-14-is-not-supported-by-this-version-of-rstudio-the-plots-tab-will-be-disabled-until-a-newer-version-of-rstudio-is-installed/110386

Source

Our Dockerfile currently uses bioconductor/bioconductor_docker:devel as the base. Which means that this is an issue with how Bioconductor is setting up their docker container, not ours.

Potential solutions

We could try changing our Dockerfile to one of the following:

  • bioconductor/bioconductor_docker:latest: May not always be accurate. Maintainer doesn't seem keen on keep this up to date, though a verdict on this was never communicated.
  • bioconductor/bioconductor_docker:RELEASE_3_14: Should work, but means we will have to remember to manually update the Dockerfile anytime there is a new release. My intent with using thelatest tag was to avoid this and make the file usable in the long-term as is.

Or we could simply wait until the bioconductor_docker maintainer fixes the issue, and then rebuild any containers that used that version. I've posted the Issue here:
Bioconductor/bioconductor_docker#39

Installation error: package 'ewceData' requires R >= 4.1

Hi,
Thanks for writing this great tools. However, I came across an error across all operation systems (Windows, Mac and Ubuntu 20.4) when I installed the package.

ERROR: this R is version 4.0.4, package 'ewceData' requires R >= 4.1
Error: Failed to install 'ewceData' from GitHub:
  (converted from warning) installation of package ‘/tmp/Rtmp9xG1JD/file26f92ae904c2/ewceData_0.99.6.tar.gz’ had non-zero exit status

Could you suggest if this is a bug or if I miss something?
Many thanks!
Zhang

parallel computing issues during generate.celltype.data

Before calculating specificity data for my own dataset, I was following the example code on my sysyem (Ubuntu 18.04, R v3.6.0)

download.file("goo.gl/r5Y24y",
    destfile="expression_mRNA_17-Aug-2014.txt") 
path = "expression_mRNA_17-Aug-2014.txt"
cortex_mrna  = load.linnarsson.sct.data(path)
exp_CortexOnly_DROPPED = drop.uninformative.genes(exp=cortex_mrna$exp,level2annot = cortex_mrna$annot$level2class)
annotLevels = list(level1class=cortex_mrna$annot$level1class,level2class=cortex_mrna$annot$level2class)
fNames_CortexOnly = generate.celltype.data(exp=exp_CortexOnly_DROPPED,annotLevels=annotLevels,groupName="kiCortexOnly")

Which returns the following error:

> fNames_CortexOnly = generate.celltype.data(exp=exp_CortexOnly_DROPPED,annotLevels=annotLevels,groupName="kiCortexOnly")
Loading required package: parallel
Error in makePSOCKcluster(names = spec, ...) : 
  numeric 'names' must be >= 1

Apparently, determining the n.o. cores (source code) returns a NA, which in turn causes makeCluster() to fail. On what OS was the code developed / tested? Perhaps this is a Ubuntu specific issue?

Don't run tests on 32-bit Windows

Avoid running tests twice on Windows by detecting 32-bit Windows OS.

Made new function to make this a bit easier, used like so:

if(!is_32bit()){
<test functions>
}

Progress bar

Hi, can generate.celltype.data() get a progress bar? There are some huge expression matrices that we're working with and it'd be really useful.

generate.celltype.data()

I got an error when running function generate.celltype.data(),
Error in mclapply(ctd, calculate.meanexp.for.level, exp, mc.cores = no_cores) : 'mc.cores' > 1 is not supported on Windows
Does it only run on Linux systems?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.