Code Monkey home page Code Monkey logo

dymas's Introduction

Dynamic Mapping Simulator

Introduction

This is repository contains Dynamic Mapping Simulator and supporting information for the paper K. Břinda, V. Boeva, G. Kucherov: Dynamic read mapping and online consensus calling for better variant detection (arXiv:1605.09070).

Dynamic mapping is mapping to a reference, which is continuously corrected according to alignments computed so far. Dynamic Mapping Simulator is a pipeline to simulate dynamic mapping using existing software to evaluate its benefits in comparison to standard static mapping and iterative referencing. For more details, see the paper.

Simulation algorithm

Scheme of the simulation pipeline:

Reads are taken in the following way:

SM = static mapping, DM = dynamic mapping without remapping, DM-remap = dynamic mapping with remapping, IR = iterative referencing

Structure of this repository

  • docs - supplementary materials (S1 and S2 files)
  • dymas - Dynamic Mapping Simulator (Python package)
  • experiments - all runs of all experiments
  • reports - generated reports

Reports

Replication of results

Prerequisities

Experiments

Additional software for reports

  • GNU Parallel
  • LaTeX
  • Inkscape
  • Gnuplot 5

Recommended way of installation using Anaconda

Environment installation:

	conda create -y --name dymas \
	  -c bioconda \
		python==3.4 \
		snakemake samtools git cmake gnuplot ococo numpy biopython pysam==0.8.3

Environment activation:

source activate dymas

Installation of Python packages (in the activated environment)

pip install -r requirements.txt

Replication steps

  1. Install all required software and activate the corresponding Conda environment.
  source activate dymas
  1. Remove computed data
make clean
  1. Download reference genomes
make -C experiments/exp0*
  1. Run experiments (this step will take several hours)
make -C experiments -j 10
  1. Generate reports
make -C reports -j 10

dymas's People

Contributors

karel-brinda avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dymas's Issues

Format of VCF

Correct VCF output of call_variants

[W::vcf_parse] INFO 'mt' is not defined in the header, assuming Type=String

Overlaps with another variant

Why are variants overlapping (it should not happen)?

29 of 47 steps (62%) done
rule 54:
abix, 2_alignments.itref/1_reference/00002.fa, 2_alignments.itref/1_reference/00002.fa.fai, 2_alignments.itref/5_vcf/00002.vcf.gz  
The site gi|386853410|ref|NC_017717.1|:27511 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:27512 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:56819 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:79988 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:122891 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:208363 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:225832 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:256017 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:286078 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:293988 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:307690 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:325324 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:421297 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:505903 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:548364 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:548367 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:676833 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:710469 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:755389 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:873401 overlaps with another variant, skipping...
The site gi|386853410|ref|NC_017717.1|:888498 overlaps with another variant, skipping...

BgZip bloking Snakemake

       input: /home/karel/.smbl/bin/bcftools, /home/karel/.smbl/bin/samtools, /home/karel/.smbl/bin/bgzip, /home/karel/.smbl/bin/tabix, 2_alignments.dyn/1_reference/00000.fa, 2_alignments.dyn/4_pileup/00000.pileup.gz
        output: 2_alignments.dyn/5_vcf/00000.vcf.gz
"/home/karel/.smbl/bin/samtools" mpileup -uf "2_alignments.dyn/1_reference/00000.fa" "2_alignments.dyn/3.2_sorted_bam/00000.bam" | "/home/karel/.smbl/bin/bcftools" call -c | "../../dymas/dymas/filter_bcftools_consensus.pl" > "2_alignments.dyn/5_vcf/00000.vcf"
Note: Neither --ploidy nor --ploidy-file given, assuming all sites are diploid
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
"/home/karel/.smbl/bin/bgzip" "2_alignments.dyn/5_vcf/00000.vcf"
[bgzip] 2_alignments.dyn/5_vcf/00000.vcf.gz already exists; do you wish to overwrite (y or n)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.