Code Monkey home page Code Monkey logo

nohuman's Introduction

NoHuman

Rust CI Crates.io License: MIT github release version DOI:10.1093/gigascience/giae010

👤➡️🚫 Remove human reads from a sequencing run 👤➡️🚫

nohuman removes human reads from sequencing reads by classifying them with kraken2 against a custom database built from all of the genomes in the Human Pangenome Reference Consortium's (HPRC) first draft human pangenome reference. It can take any type of sequencing technology. Read more about the development of this method here.

Install

Conda (recommended)

Conda (channel only) bioconda version Conda

$ conda install -c bioconda nohuman

Precompiled binary

Note: you will need to install kraken2 yourself using this install method.

curl -sSL nohuman.mbh.sh | sh
# or with wget
wget -nv -O - nohuman.mbh.sh | sh

You can also pass options to the script like so

$ curl -sSL nohuman.mbh.sh | sh -s -- --help
install.sh [option]

Fetch and install the latest version of nohuman, if nohuman is already
installed it will be updated to the latest version.

Options
        -V, --verbose
                Enable verbose output for the installer

        -f, -y, --force, --yes
                Skip the confirmation prompt during installation

        -p, --platform
                Override the platform identified by the installer [default: apple-darwin]

        -b, --bin-dir
                Override the bin installation directory [default: /usr/local/bin]

        -a, --arch
                Override the architecture identified by the installer [default: x86_64]

        -B, --base-url
                Override the base URL used for downloading releases [default: https://github.com/mbhall88/nohuman/releases]

        -h, --help
                Display this help message

Cargo

Crates.io

Note: you will need to install kraken2 yourself using this install method.

$ cargo install nohuman

Container

Docker images are hosted at quay.io.

singularity

Prerequisite: singularity

$ URI="docker://quay.io/mbhall88/nohuman"
$ singularity exec "$URI" nohuman --help

The above will use the latest version. If you want to specify a version then use a tag (or commit) like so.

$ VERSION="0.1.0"
$ URI="docker://quay.io/mbhall88/nohuman:${VERSION}"

docker

Docker Repository on Quay

Prerequisite: docker

$ docker pull quay.io/mbhall88/nohuman
$ docker run quay.io/mbhall88/nohuman nohuman --help

You can find all the available tags on the quay.io repository.

Build from source

Note: you will need to install kraken2 yourself using this install method.

$ git clone https://github.com/mbhall88/nohuman.git
$ cd nohuman
$ cargo build --release
$ target/release/nohuman -h

Usage

Download the database

$ nohuman -d

by default, this will place the database in $HOME/.nohuman/db. If you want to download it somewhere else, use the --db option.

Check dependecies are available

$ nohuman -c
[2023-12-14T04:10:46Z INFO ] All dependencies are available

Remove human reads

$ nohuman -t 4 in.fq

this will pass 4 threads to kraken2 and output the clean reads as in.nohuman.fq.

You can specify where to write the output file with -o

$ nohuman -t 4 -o clean.fq in.fq

If you have paired-end Illumina reads

$ nohuman -t 4 in_1.fq in_2.fq

or to specify a different path for the output

$ nohuman -t 4 --out1 clean_1.fq --out2 clean_2.fq in_1.fq in_2.fq

Note: output will always be uncompressed, even if compressed input is provided.

$ nohuman -h
Remove human reads from a sequencing run

Usage: nohuman [OPTIONS] [INPUT]...

Arguments:
  [INPUT]...  Input file(s) to remove human reads from

Options:
  -o, --out1 <OUTPUT_1>  First output file
  -O, --out2 <OUTPUT_2>  Second output file - if two input files given
  -c, --check            Check that all required dependencies are available
  -d, --download         Download the database
  -D, --db <PATH>        Path to the database [default: /home/mihall/.nohuman/db]
  -t, --threads <INT>    Number of threads to use in kraken2 [default: 1]
  -v, --verbose          Set the logging level to verbose
  -h, --help             Print help (see more with '--help')
  -V, --version          Print version

Full usage

$nohuman --help
Remove human reads from a sequencing run

Usage: nohuman [OPTIONS] [INPUT]...

Arguments:
  [INPUT]...
          Input file(s) to remove human reads from

Options:
  -o, --out1 <OUTPUT_1>
          First output file.

          Defaults to the name of the first input file with the suffix "nohuman" appended. e.g. "input_1.fastq.gz" -> "input_1.nohuman.fq". NOTE: kraken2 output cannot be compressed, so the output will always be uncompressed.

  -O, --out2 <OUTPUT_2>
          Second output file - if two input files given.

          Defaults to the name of the first input file with the suffix "nohuman" appended. e.g. "input_2.fastq.gz" -> "input_2.nohuman.fq". NOTE: kraken2 output cannot be compressed, so the output will always be uncompressed.

  -c, --check
          Check that all required dependencies are available

  -d, --download
          Download the database

  -D, --db <PATH>
          Path to the database

          [default: /home/mihall/.nohuman/db]

  -t, --threads <INT>
          Number of threads to use in kraken2

          [default: 1]

  -v, --verbose
          Set the logging level to verbose

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Alternates

Hostile is an alignment-based approach that performs well. It take longer and uses more memory than the nohuman kraken approach, but has slightly better accuracy for Illumina data. See the paper for more details and for other alternate approaches.

Cite

DOI:10.1093/gigascience/giae010

Hall, Michael B., and Lachlan J. M. Coin. “Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data” GigaScience, April 4, 2024. https://doi.org/10.1093/gigascience/giae010

@article{hall_pangenome_2024,
	title = {Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data},
	volume = {13},
	issn = {2047-217X},
	url = {https://doi.org/10.1093/gigascience/giae010},
	doi = {10.1093/gigascience/giae010},
	urldate = {2024-04-07},
	journal = {GigaScience},
	author = {Hall, Michael B and Coin, Lachlan J M},
	month = jan,
	year = {2024},
	pages = {giae010},
}

nohuman's People

Contributors

mbhall88 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

alienzj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.