Code Monkey home page Code Monkey logo

icetea's Introduction

icetea

Build Status Build status Coverage Status

An R package for analysis of data produced by transcript 5' profiling methods like RAMPAGE and MAPCap.

icetea_logo

The icetea R package for analysis of TSS profiling data allows users to processes data from multiplexed, 5โ€™-profiling techniques such as RAMPAGE and the recently developed MAPCap protocol. TSS detection and differential TSS analysis can be performed using replicates on any 5โ€™-profiling dataset. Left panel : Typical analysis steps for MAPCap data that can be performed using icetea. Right panel : Showing some of the quality control and visualization outputs from the package. Proportion of sequencing reads used for each step (Top), comparison of TSS accuracy (w.r.t. annotated TSS) between samples (middle), and MA-plots from differential TSS analysis (Bottom).

Additionally, analysis of RNA-binding protein locations via the FLASH protocol can also be preformed via icetea.

Installing icetea

Stable version

Stable release of icetea is available via bioconductor.

## first install BiocManager
install.packages("BiocManager")

## then install icetea
BiocManager::install("icetea")

Latest version

The latest (development) version of icetea can be installed using Devtools. Devtools can be installed from CRAN.

## first install devtools
install.packages("devtools")

## then install icetea
devtools::install_github("vivekbhr/icetea")

Documentation

Please visit the icetea website for the package documentation and vignette.

Citation

If you use icetea in your analysis, please cite:

Bhardwaj, V., Semplicio, G., Erdogdu, N. U., Manke, T. & Akhtar, A. MAPCap allows high-resolution detection and differential expression analysis of transcription start sites. Nat. Commun. 10, 3219 (2019)

icetea's People

Contributors

kayla-morrell avatar nturaga avatar vivekbhr avatar vobencha avatar

Stargazers

 avatar  avatar

Watchers

 avatar

icetea's Issues

use summarizeoverlaps

Instead of using regioncounts and windowcounts from csaw, use summarizeOverlaps so that strand could be considered.

selective input of reads into duplicate removal

This would be a big change from the beginning of the workflow. In principle it's possible to do the PCR duplicate removal and demultiplexing at the same time. The idea is :

  • During mapping, add the readGroupID and readGroup tags to the bam file, based on the fastq sample index (I don't know how).
  • Directly use the filterDuplicates function, but input the reads from the bam file selectively by tag, then output one filtered file per tag, using the tagFilter arg in ScanBamParam

Travis

Follwing changes pending for unit testing.

  • Activate travis builds
  • Write testthat checks.
  • Make all examples work (if possible).

filterDuplicates with `UMI` option

filterDuplicates assumes that UMI is present and throws an error for CAGE data (no UMI) . Add an option called UMI (default = TRUE) such that users could set this to FALSE to remove duplicates using only 5' end.

taking into consideration strandedness

Looking in to detetcTSS.R file and the strandBinCounts function precisely (line 28), reads are counted as single-end no matter the info here (CSobject@paired_end).

fdata <-
        GenomicAlignments::summarizeOverlaps(
            features = windows$gr.plus,
            reads = bam.files,
            mode = "IntersectionStrict",
            ignore.strand = FALSE,
            inter.feature = ignoreMultiMap,
            singleEnd = TRUE,
            fragments = FALSE,
            preprocess.reads = func,
            param = bam_param,
            BPPARAM = bp_param)

So I think that you don't want to count twice the reads in both the plus (fdata) and minus signs (rdata), but what if we have a stranded library.

fdata <-
        GenomicAlignments::summarizeOverlaps(
            features = windows$gr.plus,
            reads = BamFileList(bam.files,asMates = TRUE),
            mode = "IntersectionStrict",
            ignore.strand = FALSE,
            inter.feature = ignoreMultiMap,
            singleEnd = FALSE,
            fragments = FALSE,
            strandMode = 2,
            preprocess.reads = func,
            param = bam_param,
            BPPARAM = bp_param)

And same thing in line 203:

wider <-
                suppressWarnings({
                GenomicAlignments::summarizeOverlaps(
                    features = neighbors,
                    reads = BamFileList(bam.files,asMates = TRUE),
                    mode = "IntersectionStrict",
                    ignore.strand = FALSE,
                    inter.feature = FALSE,
                    singleEnd = FALSE,
                    fragments = FALSE,
                    strandMode=2,
                    preprocess.reads = ppfunc,
                    param = bamParams,
                    BPPARAM = bpParams)
                  })

Would you tell me if this is ok? or if counting reads as paired end will have an effect on the analysis?
Thanks in advance.

reduce deps

try to reduce dependencies : plyr and reshape2 can go away

count paired-end

In order to avoid filtering the files for single end reads, paired-end reads should be counted.

  • Find a way to consider both reads of a fragment during PCR duplicate filtering.
  • During TSS calling, first base of the read 1 (for MAPCap/RAMPAGE) or read 2 (for TruSeq) should be overlapped and counted within the windows. 2nd read should be ignored.
  • But since bamFiltering would not be needed after implementing this, we can turn on the counting of fragments in case of gene-level summarization of counts later (for DE, correlation with RNAseq etc.)

Move to subjunc for alignment

Move to subjunc for alignment since subread performs bad at the splice anchors, leading to false mapping of the spliced read.

multicore improvements

  • Initiate parallel backend early when multiple calls to bplapply is used in the function
  • add BPPARAM to fitdiffTSS

plot read metrics

provide a function to plot the read processing matrices from the CapSet object

S4 implementation

For ease of use and going along with bioc infrastructure, I am implementing S4 classes to handle the analysis..

  • S4 class CapSet contructed with raw fastq files and barcode info.

  • function trimfastqIndex should use this and trim the index, plus should attach the path to trimmed fastq into it.

  • function demultiplex_fastq can use the trimmed output to demultiplex fastq and add the demult_reads info + path to demult files into sampleinfo slot.

  • function mapCaps should be able to execute in a multiplexed and demultiplxed mode, depending upon whether or not the sampleinfo slot has multiplxed info. It should then attach the prop of mapped reads into the object (either in sampleinfo or somewhere else.

  • function splitBAM_byIndex may or may not be needed depending on pre/post mapping demultiplexing. in case of post-mapping demultiplexing, it can still use the capset object and fill the demult_reads info in the sampleInfo slot.

  • function filterduplicates should be able to use the CapSet object. In this case the function would run on all the bamfiles and append the stats of dup removal in the sampleInfo slot called dedup_reads. If not given the CapSet object, it would work as a single sample function (current status).

  • function detectTSS should also be able to use the capset object, in this case It would append the design matrix to the object. In this case the samplenames should match the existing sample names in the sampleinfo slot.

  • To push it further, the function fit_diffTSS can also use the CapSet object, replacing the args bam.files and design and taking it from the object. detect_diffTSS would stay the same.

New plotting functions can then be added, taking info from the sampleinfo slot, one can make the plots for mapped reads, demult_reads, dedup_reads and reads_inTSS (i.e. number of reads falling in the detected TSS) for each sample

calculate FRIT score

during TSS detection, also get number of reads per sample within peaks and save within the S4 object

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.