Code Monkey home page Code Monkey logo

rdscan's Introduction

pipeline for MTBC putative regions of difference discovery

citation Snakemake Tests

Description

RDscan is a snakemake workflow to find deletions and putative regions of difference (RDs) in mycobacterium tuberculosis complex (MTBC) genomes, it is also capable to determine already known or user defined RDs.

Installation

The usage of this workflow is described in the Snakemake Workflow Catalog, alternatively it can be installed as described below.

Use the Conda package manager and BioConda channel to install RDscan.

If you do not have conda installed do the following:

# Download Conda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
# Set permissions
chmod -X Miniconda3-latest-Linux-x86_64.sh
# Install
bash Miniconda3-latest-Linux-x86_64.sh

Set up channels:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

Get RDscan snakemake workflow:

git clone https://github.com/dbespiatykh/RDscan.git

Install all required dependencies:

cd RDscan
conda install -c conda-forge mamba
mamba env create --file environment.yml

Usage

Rulegraph of the pipeline

Rulegraph


Activate RDscan environment:

conda activate RDscan

Run pipeline:

snakemake --conda-frontend mamba --use-conda -j {Number of cores}

It is recommended to use dry run if you are running pipeline for the first time, to see if everything is in working order, for this you can use -n flag:

snakemake -n

Output

Output in the results directory will contain four tables: RD_putative.tsv, RD_known.tsv, RD_known.xlsx, and RD_known.bin.tsv

Example of the RD_putative.tsv: Table containing all discovered putative RDs.

RD - Known RDs that intersects with deletion breakpoints; SIZE - Estimated size of predicted deletion.

Values in cells represent deletion length in the sample.

CHROM START END SIZE RD TYPE ERR015582 ERR017778 ERR017782 ERR019852
NC_000962 333828 338580 5800 DEL 7113 7084 7050
NC_000962 340400 340645 245 DEL
NC_000962 350935 351175 238 DEL 300 204 240
NC_000962 361769 362988 1391 DEL 1833 1392 1833 1390

Example of the RD_known.tsv:

Table containing proportion of coverage in particular RDs.

Sample N-RD25_tbA N-RD25_tbB N-RD25bov/cap N-RD25das
ERR015582 0.883562 0.856164 0.856164 0.808219
ERR017778 0 0 0 0.41791
ERR017782 1.021277 1.042553 1.106383 0.978723
ERR019852 0 0 0 0.386364

Example of the RD_known.xlsx:

Same as the RD_known.tsv, but in a XLSX format with applied contiditional formatting.
Conditional formatting corresponds with threshold value in a config.yml file.

Binary version of the RD_known.bin.tsv:

Sample N-RD25_tbA N-RD25_tbB N-RD25bov/cap N-RD25das
ERR015582 0 0 0 0
ERR017778 1 1 1 0
ERR017782 0 0 0 0
ERR019852 1 1 1 0

Citation

If you use RDscan for your research, please cite the pipeline:

D. Bespiatykh, J. Bespyatykh, I. Mokrousov, and E. Shitikov, A Comprehensive Map of Mycobacterium tuberculosis Complex Regions of Difference, mSphere, Volume 6, Issue 4, 21 July 2021, Page e00535-21, https://doi.org/10.1128/mSphere.00535-21

All references for the tools utilized by the RDscan can be found in the CITATIONS.md file.

License

MIT

rdscan's People

Contributors

dbespiatykh avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sandrababirye

rdscan's Issues

Error in rule compute_coverage:

Error in rule compute_coverage:
jobid: 16
output: bed/ERR386816.bed
conda-env: /mnt/e/Karthisir_mtb/mtb/RDscan/.snakemake/conda/f1da7f6336cc754e535647fdb7c37a30
shell:

    parallel bedtools genomecov -bga -ibam ::: mapped/ERR386816.bam         | awk '$4<'$(samtools depth  -aa mapped/ERR386816.bam |  awk '{sum+=$3} END {print sum/4411532*0.1}')''            | bedtools merge -d 1500 | bedtools subtract -f 0.30 -a stdin -b config/IS6110.bed -A > bed/ERR386816.bed

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Unable to execute RDscan..

Looking for: ['snakemake', 'snakedeploy', 'biopython']

Encountered problems while solving:

  • package snakedeploy-0.1.1-pyhdfd78af_1 requires python >=3.8, but none of the providers can be installed

How to execute properly without any error?

Input when running RDscan

What is the exact input to provide to the command below when trying to determine the RD regions of a sample? Is the working directory to contain the files to be analyzed? Please provide examples if possible. Thanks,

snakemake --conda-frontend mamba --use-conda -j {Number of cores}

AmbiguousRuleException: Rules calculate_proportion and compute_coverage are ambiguous for the file results/bed/28 /test/reads/28_1.fastq.gz.proportion.bed.

RDscan end with this error with our data, and dry run give probably the results

Dry run
/mnt/f/RD/RDscan/workflow/rules/novel_discovery.smk:52: SyntaxWarning: invalid escape sequence '\w'
input:
Building DAG of jobs...
Nothing to be done (all requested files are present and up to date).

Our data

AmbiguousRuleException:
Rules calculate_proportion and compute_coverage are ambiguous for the file results/bed/28 /test/reads/28_1.fastq.gz.proportion.bed.
Consider starting rule output with a unique prefix, constrain your wildcards, or use the ruleorder directive.
Wildcards:
calculate_proportion: sample=28 /test/reads/28_1.fastq.gz
compute_coverage: sample=28 /test/reads/28_1.fastq.gz.proportion
Expected input files:
calculate_proportion: results/mosdepth_bed/28 /test/reads/28_1.fastq.gz.regions.bed.gz
compute_coverage: results/mapped/28 /test/reads/28_1.fastq.gz.proportion.bam resources/IS6110.bed
Expected output files:
calculate_proportion: results/bed/28 /test/reads/28_1.fastq.gz.proportion.bed
compute_coverage: results/bed/28 /test/reads/28_1.fastq.gz.proportion.bed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.