Code Monkey home page Code Monkey logo

trimnami's Introduction

install with bioconda GitHub last commit (branch) Unit tests Env builds codecov


Trim lots of metagenomics samples all at once.

Motivation

We keep writing pipelines that start with read trimming. Rather than copy-pasting code each time, this standalone Snaketool handles our trimming needs. The tool will collect sample names and files from a directory or TSV file, optionally remove host reads, and trim with your favourite read trimmer. Read trimming methods supported so far:

  • Fastp
  • Prinseq++
  • BBtools for Round A/B viral metagenomics
  • Filtlong + Rasusa for longreads

Install

Trimnami is still in development but can be easily installed with pip:

Easy install

pip install trimnami

Developer install

git clone https://github.com/beardymcjohnface/Trimnami.git
cd Trimnami/
pip install -e .

Test

Trimnami comes with inbuilt tests which you can run to check everything works fine.

# test fastp only (default method)
trimnami test

# test all SR methods
trimnami test fastp prinseq roundAB

# test all SR methods with host removal
trimnami testhost fastp prinseq roundAB

# test nanopore method (with host removal)
trimnami testnp

Usage

Trim reads with Fastp or Prinseq++

# Fastp (default)
trimnami run --reads reads/

# Prinseq++
trimnami run --reads reads/ prinseq

# Why not both!
trimnami run --reads reads/ fastp prinseq

Include host removal

trimnami run --reads reads/ --host host_genome.fasta

Longreads with host removal. Specify 'nanopore' for targets and use the appropriate minimap preset.

trimnami run \
    --reads reads/ \
    --host host_genome.fasta \
    --minimap map-ont \
    nanopore

Parsing samples with --reads

You can pass either a directory of reads or a TSV file to --reads.

  • Directory: Trimnami will infer sample names and _R1/_R2 pairs from the filenames.
  • TSV file: Trimnami expects 2 or 3 columns, with column 1 being the sample name and columns 2 and 3 the reads files.

More information and examples here

Configure trimming parameters

You can customise the trimming parameters via the config file. Copy the default config file.

trimnami config

Then edit the config file trimnami.out/trimnami.config.yaml in your favourite text editor. Run trimnami like normal, or point to your custom config file if you've moved it.

trimnami run ... --configfile /my/awesome/config.yaml

Outputs

Trimmed reads will be saved in various subfolders in the output directory. e.g. if trimming with Fastp or Prinseq++, trimmed reads will be in trimnami.out/fastp/ or trimnami.out/prinseq/. Paired reads will yield three files: The R1 and R2 paired reads, and any singletons from trimming or host removal. Subsampling will produce extra files of subsampled trimmed reads. Multiqc-fastqc reports for any runs will be available in trimnami.out/reports/

Example outputs

Click to expand

prinseq

trimnami.out/
└── prinseq
    ├── A13-04-182-06_TAGCTT.paired.R1.fastq.gz
    ├── A13-04-182-06_TAGCTT.paired.R2.fastq.gz
    ├── A13-04-182-06_TAGCTT.paired.S.fastq.gz
    ├── A13-12-250-06_GGCTAC.paired.R1.fastq.gz
    ├── A13-12-250-06_GGCTAC.paired.R2.fastq.gz
    ├── A13-12-250-06_GGCTAC.paired.S.fastq.gz
    └── A13-135-177-06_AGTTCC.single.fastq.gz

prinseq with fastqc reports

trimnami.out/
├── prinseq
│   ├── A13-04-182-06_TAGCTT.paired.R1.fastq.gz
│   ├── A13-04-182-06_TAGCTT.paired.R2.fastq.gz
│   ├── A13-04-182-06_TAGCTT.paired.S.fastq.gz
│   ├── A13-12-250-06_GGCTAC.paired.R1.fastq.gz
│   ├── A13-12-250-06_GGCTAC.paired.R2.fastq.gz
│   ├── A13-12-250-06_GGCTAC.paired.S.fastq.gz
│   └── A13-135-177-06_AGTTCC.single.fastq.gz
└── reports
    ├── prinseq.fastqc.html
    └── untrimmed.fastqc.html

prinseq with host removal

trimnami.out/
└── prinseq
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R1.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R2.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.S.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R1.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R2.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.S.fastq.gz
    └── A13-135-177-06_AGTTCC.host_rm.single.fastq.gz

prinseq with host removal and subsampling

trimnami.out/
└── prinseq
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R1.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R1.subsampled.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R2.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.R2.subsampled.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.S.fastq.gz
    ├── A13-04-182-06_TAGCTT.host_rm.paired.S.subsampled.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R1.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R1.subsampled.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R2.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.R2.subsampled.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.S.fastq.gz
    ├── A13-12-250-06_GGCTAC.host_rm.paired.S.subsampled.fastq.gz
    ├── A13-135-177-06_AGTTCC.host_rm.single.fastq.gz
    └── A13-135-177-06_AGTTCC.host_rm.single.subsampled.fastq.gz

trimnami's People

Contributors

beardymcjohnface avatar

Stargazers

 avatar Julian Zaugg avatar Yair Motro avatar George Bouras avatar Vijini Mallawaarachchi avatar Silas Kieser avatar Brad Hart avatar  avatar

Watchers

Bhavya Papudeshi avatar Vijini Mallawaarachchi avatar  avatar

trimnami's Issues

Rasusa new version has a new command options

Trimnami is installing the new version of rasusa, v1.0.0. In the new version, they have changed the parameters so the command line in the trimnami is running into the below error.

Command in trimnami:
rasusa -i sphae.out/PROCESSING/temp/fastp/test_R2.fastq.gz -o sphae.out/PROCESSING/temp/fastp/test_R2.subsampled.fastq.gz -O g --bases 1000M 2> sphae.out/PROCESSING/logs/rasusa_single.fastp.test_R2.log

Error:

error: unexpected argument '-i' found

Usage: rasusa [OPTIONS] <COMMAND>

For more information, try '--help'.

It looks like the command should change to

rasusa reads -o {output} -O g --bases 1000M

Add cutadapt for fasta processing

fastp only handles fastq files, while cutadapt can handle fastq and fasta files. There are times when you might want to trim out adapters from a fasta file (e.g. if you are downloading fasta from SRA).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.