Code Monkey home page Code Monkey logo

poresippr_code's Introduction

Poresippr

ALT Poresippr

Poresippr is a bioinformatics pipeline that automates the process of running Guppy for basecalling, barcoding and performs downstream analysis such as mapping with Minimap2 and Samtools. The script takes a CSV input file containing the necessary parameters and runs in a loop, continuously processing new data every 30 minutes.

Requirements

Python 3.x
Guppy basecaller ( This uses nvidia gpu nodes) 6.5.7
Minimap2 2.24-r1122
samtools 1.13
Singularity (optional, when running with singularity container) 4.1.0

Installation requires ubuntu > 22.0

Author

Mathu Malar C [email protected]

Introduction

Poresippr is a Python workflow designed to run while nanopore sequencing is in progress. This program performs GPU-accelerated Guppy basecalling, barcoding, and mapping with Minimap2 using the stx database. It calculates coverage and detects stx gene targets, facilitating the rapid detection of virulence genes. The database used is curated for stx genes to classify STEC types. Additionally, this program runs every 30 minutes, processing new nanopore data as it becomes available. The Poresippr database is not available for public usage yet.

  • guppy basecalling: Currently, this program uses the Nanopore Flowcell version R10.4.1 and the barcode kit SQK-RBK114.24. Running Guppy requires NVIDIA GPU nodes. The program also uses the dna_r10.4.1_e8.2_260bps_fast.cfg settings, which typically come with the Guppy installation.

  • Read Mapping: Minimap2 is used for mapping nanopore reads with the Poresippr database and detects gene targets using samtools. It outputs a CSV file every 30 minutes containing gene targets and genome coverage for each of the sample. .

Input requirements

An input CSV file containing the following information is needed. It should include:

  • The location of the Poresippr database (reference)
  • The path for nanopore FAST5 reads
  • The path where output files, such as CSV and barcoded reads, will be saved
  • The appropriate model/config required for basecalling
  • The barcode kit used during sequencing
  • The barcode numbers

input.csv:

reference,fast5_dir,output_dir,config,barcode,barcode_values
/home/olcbio/PoreSippRDB_240509.fasta,/home/olcbio/240125_MC26299/test,/home/olcbio/240125_MC26299/test_out,dna_r10.4.1_e8.2_260bps_fast.cfg,SQK-RBK114-24,"06,07,08,09,10,11"

Note

Ensure that the barcode numbers are enclosed in double quotes (") as shown in the barcode_values field. This is necessary to correctly parse multiple barcode values. Metadata CSV file containing the following information is needed. It should include:

  • The sequence Id information
  • Barcode numbers
  • OLC strain ids

metadata.csv:

Barcode,SEQID,OLNID
06,2022-MIN-0023,OLCX1
07,2023-MIN-0024,OLC2
08,2023-MIN-0025,OLC3
09,2024-MIN-0026,OLC4
10,2024-MIN-0028,OLC5

Instructions to Run the Code

To run the Python script, use the following command:

git clone https://github.com/madhubioinfo/poresippr_code.git
cd poresippr_code
python poresippr_basecall_scheduler.py input.csv Metadata.csv

To build the singularity container and run from here use the below command [ need to have sudo access and GPU]
sudo singularity build poresippr_container.sif cuda_singularity.def
singularity run --nv poresippr_container.sif input.csv Metadata.csv

To download and run the script as a Singularity container, use the following command still need GPU:
click and download the zipped folder containing code https://zenodo.org/records/12628468
singularity run --nv poresippr_container.sif input.csv Metadata.csv

Output

The pipeline will output the following files:

test_out/
├── fail
└── pass
    ├── barcode01
    ├── barcode02
    ├── barcode..
    └── unclassified
├── 2024-MIN-0001_iteration1.csv
├── 2024-MIN-0002_iteration2.csv
└── 2024-MIN..XXX.csv

citations

If you use the code for your research please use https://doi.org/10.5281/zenodo.12628468 for citation

poresippr_code's People

Contributors

madhubioinfo avatar largekowalski888 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.