Code Monkey home page Code Monkey logo

rjmcmcnucleosomes's Introduction

Genome-Wide Nucleosome Positioning using High-Throughput Short-Read Data (MNase-Seq)

Build Status codecov

This package uses informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling.

This package is in R with an optimized section in C++.

Citing

If you use this package for a publication, we would ask you to cite the following:

Samb R, Khadraoui K, Belleau P, et al. (2015). "Using informative Multinomial-Dirichlet prior in a t-mixture with reversible jump estimation of nucleosome positions for genome-wide profiling." Statistical Applications in Genetics and Molecular Biology. Volume 14, Issue 6, Pages 517-532, ISSN (Online) 1544-6115, ISSN (Print) 2194-6302, December 2015, doi: 10.1515/sagmb-2014-0098

Bioconductor Package

Bioconductor Time

RJMCMCNucleosomes is now an official package of Bioconductor. The current release can be directly downloaded from their website: Current release

Notes

To use this current RJMCMCNucleosomes version, the use of the Rcpp package is mandatory. This implies having GSL library installed.

Authors

Pascal Belleau, Rawane Samb, Astrid Deschênes, Khader Khadraoui, Lajmi Lakhal-Chaieb and Arnaud Droit.

See AD Lab website.

Maintainer

Astrid Deschênes

License

This package and the underlying RJMCMCNucleosomes code are distributed under the Artistic license 2.0. You are free to use and redistribute this software.

For more information on Artistic 2.0 License see http://opensource.org/licenses/Artistic-2.0

Bugs/Feature requests

If you have any bugs or feature requests, let us know.

Thanks!

rjmcmcnucleosomes's People

Contributors

adeschen avatar belleau avatar rawsamb avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

belleau

rjmcmcnucleosomes's Issues

C++ src indent

consistent use of spaces (rather than mixing tabs and spaces, e.g., in rjmcmcNucleo.cpp) and formatting would help readability.

NEWS file: remove or populate

Please either populate the NEWS file in a format that can be parsed
by news() (see, e.g., the NEWS file in GenomicRanges or remove
this file.

Improve postMerge() function

postMerge(): avoid 1:nbOverlap in for loops, which has
surprising results when nbOverlap == 0; use seq_len(nbOverlap)
instead. It seems like (parts of?) this iteration could be avoided
by vectorization?

Remove Author: from DESCRIPTION

It's sufficient to include Authors@R, without the Author: or
Maintainer: fields; these are generated when the package is built
(R CMD build RJMCMCNucleosomes).

Typo in vignettes

minor typos in documentation, e.g. RJMCMCNucleosomes.R:48 "froward"

Modify segmentation() function

1- segmentation(). The 'dataIP' argument is described 'reads that
need to be segmented'. Reads are represented most readily as
GAlignments() or GAlignmentPairs(), from the
GenomicAlignments package; should these be supported as input?

2- segmentation(). The example might more easiy coerce to GRanges
with
as(syntheticNucleosomeReads$dataIP, "GRanges")

3- segmentation() returns a simple list, but would be more convenient
as a GRangesList(), the equivalent of
GRangesList(segmentation(...))

4- segmentation() seems to assume that
length(unique(seqnames(dataIP))) == 1L?

5- segmentation() it seems like you're splitting based on overlap, so
a much more efficient implementation is
starts = seq(posMin, posMax, by = (maxLength - (zeta + delta)))
subject = GRanges(seqlevels(dataIP), IRanges(starts, width=maxLength)
hits = findOverlaps(dataIP, subject)
splitAsList(dataIP[queryHits(hits)], subjectHits(hits))

Improve mergeAllRDSFiles() function

mergeAllRDSFiles() the pattern used here is 'copy-and-append',
which scales quadratically with the size of data (all existing data
is copied each time a new data element is added). If the data is
large, consider a better strategy, e.g., 'pre-allocate-and-fill',
e.g.,

mu <- setNames(vector("list", length(arrayOfFiles)), arrayOfFiles)
for (fileName in arrayOfFiles)
...
mu[[fileName]] <- data$mu
...
mu <- unlist(mu, use.names=FALSE

Inter-operate with the GenomicRanges package

1 - It seems like the major functions should inter-operate with the
GenomicRanges package and in particular GRanges() class. The first
two arguments of rjmcm() should be (minimally, support) GRanges
instances. The return value should be a GRanges, perhaps with
additional information in the metadata().

2 - This leads to use of S4 classes that extend GRanges, rather than ad
hoc S3 classes like rjmcmNucleosomes. The classes do not have to be
complicated, e.g.,
setClass("RJMCMNucleosomes", contains="GRanges")
with minimal additional methods, e.g., perhaps show().

Improve rjmcmCHR() function

rjmcmCHR() I suggest you use BiocParallel::bplappy() rather
than mclapply(), so that Windows users also get parallel
evaluation.

a <- bplapply(1:nbSeg, FUN = runCHR, seg, niter = nbrIterations,
kmax = kMax, lambda = lambda, ecartmin = minInterval,
ecartmax = maxInterval, minReads = minReads,
adaptNbrIterations = adaptIterationsToReads,
vSeed = vSeed, saveAsRDS = saveAsRDS)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.