Code Monkey home page Code Monkey logo

aesop_metagenomics_read_length's Introduction

Optimizing Next-Generation Sequencing Efficiency in Clinical Settings: Analysis of Read Length Impact on Cost and Performance

Abstract

Background: The expansion of sequencing technologies as a result of the response to the COVID-19 pandemic enabled pathogen (meta)genomics to be deployed as a routine component of surveillance in many countries. Scaling genomic surveillance, however, comes with associated costs in both equipment and sequencing reagents, which should be optimized. Here, we evaluate the cost efficiency and performance of different read lengths in identifying pathogens in metagenomic samples. We carefully evaluated performance metrics, costs, and time requirements relative to choices of 75 bp, 150 bp and 300 bp read lengths in pathogen identification.

Results: Our findings revealed that moving from 75 bp to 150 bp read length approximately doubles both the cost and sequencing time. Opting for 300 bp reads leads to four- and three-fold increases, respectively, in cost and sequencing time compared to 75 bp reads. For viral pathogen detection, the sensitivity median ranged from 97.9% with 75 bp reads to 100% with 150 or 300 bp reads. However, bacterial pathogens detection was less effective with shorter reads: 76% with 75 bp, 90% with 150 bp, and 94.3% with 300 bp reads. These findings were consistent across different levels of taxa abundance.

Conclusions: During disease outbreak situations, when swift responses are required for pathogen identification, we suggest prioritizing 75 bp read lengths. Shorter reads enable quicker sequencing times (approximately three times faster) and reduce costs (approximately two times lower). Despite the shorter read length, the performance in terms of precision is comparable to that of longer reads across most viral and bacterial taxa, while sensitivity can be more variable, especially if bacterial identification is aimed. This practical approach allows better use of resources, enabling the sequencing of more samples using streamlined workflows, while maintaining a reliable response capability.

Methods

Our work performed the following steps:

  1. Generation of Synthetic Metagenomes
    1. Defining the metagenome composition
    2. Defining each taxon abundance
    3. Collecting the synthetic sample taxonomic data
    4. Downloading the genomes of these taxa
    5. Generation of the synthetic metagenomes
  2. Execution of Analysis Pipeline
    1. Adapter Trimming and Quality Filtering
    2. Taxa Annotation
    3. Species-Level Taxa Abundance Retrieval
    4. Calculate Each Taxa Confusion Matrix
  3. Creating and Plotting Results

Installation

Install the necessary software using the following commands:

# Install Fastp
conda install -c bioconda fastp

# Install Kraken2
conda install -c bioconda kraken2

# Install Bracken
conda install -c bioconda bracken

Usage

  1. Clone the repository
git clone https://github.com/your_username/your_repository.git
cd your_repository
  1. Execute the pipeline

Follow the steps detailed in our METHODS

Citation

If you use this pipeline in your research, please cite the following paper:

Meirelles, P. M.; Viana, P. A. B.; Tschoeke, D. A.; de Moraes, L.; Amorim, L.; Barral-Netto, M.; Khouri, R.; Ramos, P. I. P. (2024). Optimizing Next-Generation Sequencing Efficiency in Clinical Settings: Analysis of Read Length Impact on Cost and Performance.

aesop_metagenomics_read_length's People

Contributors

pabviana avatar pabloivan avatar diogoburgos avatar

Watchers

Lucian avatar George C. G. Barbosa avatar Roberto Carreiro avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.