Code Monkey home page Code Monkey logo

stxtyper's Introduction

This is for development purposes only. Please do not use this for research, public health, or diagnostic purposes because we are still developing and testing this software and it may have errors or change at any time.

StxTyper

StxTyper is used to determine stx type from nucleotide sequence. Stx (shiga-toxin) genes are found in some strains of Escherichia coli and code for powerful toxins that can cause severe illness. StxTyper is software to classify these genes from assembled sequence using a standard algorithm.

Installation

Prerequisites

C compiler and make

These generally come standard for unix systems, if not the user will need to intall make and GCC. MacOS users will need to go to the App store and install Xcode.

NCBI BLAST

StxTyper needs NCBI BLAST binaries in your path (specifically tblastn). If you don't already have BLAST installed see https://www.ncbi.nlm.nih.gov/books/NBK569861/ for instructions to install BLAST binaries.

Blast can also be installed using bioconda by first installing bioconda, then making sure to activate the environment in which it's installed.

Then run:

source ~/miniconda3/bin/activate
conda create -y -c conda-forge -c bioconda -n blast blast
conda activate blast

If you install BLAST via conda in this way you will need to run conda activate blast before you can run StxTyper.

Compiling

git clone https://github.com/evolarjun/stxtyper.git
cd stxtyper
make
make test

Usage

stxtyper -n <assembled_nucleotide.fa> [<options>]

Example

stxtyper -n nucleotide.fa

Parameters

  • -nucleotide <nucleotide_fasta> or -n <nucleotide_fasta> Assembled nucleotide sequence to search in FASTA format.

  • --name <assembly_identifier> Add an identifier as the first column in each row of the report. This is useful when combining results for many assemblies.

  • --output <output_file> or -o <output_file> Write the output to <output_file> instead of STDOUT

  • --blast_bin <path> Directory to search for tblastn binary. Overrides environment variable $BLAST_BIN and the default PATH.

  • -q or --quiet Suppress the status messages normally written to STDERR.

  • --log <log_file> Error log file, appended and opened when you first run the application. This is used for debugging

Output

The output of StxTyper is a tab-delimited file with the following fields, all percent identity and coverage metrics are measured in proportion of amino-acids.

  1. target_contig: The contig identifier from the input FASTA file
  2. stx_type: The stx type called by the algorithm, for "operon = COMPLETE" it will be stx plus two characters (e.g., stx1a), for other values of operon stx_type will be stx1, stx2, or just stx if it can't resolve at all.
  3. operon: What status the operon was found to be. It can be
    • COMPLETE for complete and fully typeable known stx types
    • PARTIAL for partial operons that are internal to contigs and not terminating at contig boundaries
    • PARTIAL_CONTIG_END for partial operons that could be split by contig boundaries due to sequencing or assembly artifacts
    • EXTENDED The coding sequence extends beyond the reference stop codon for one or both of the reference proteins
    • INTERNAL_STOP for Stx operons where one of the subunits has a nonsense mutation
    • FRAMESHIFT where StxTyper detected an indel in the coding sequence that would cause a frame shift in one or more of the subunits
    • COMPLETE_NOVEL a full-length stx operon that is not typeable using the current scheme
  4. identity The combined percent identity for both A and B subunits
  5. target_start The detected start of the alignments
  6. target_stop The detected end of the alignments
  7. target_strand What strand the target is on
  8. A_reference The closest reference protein for the A subunit, empty if none aligned
  9. A_identity The percent identity to the reference for the A subunit, empty if none aligned
  10. A_coverage The percentage of the reference for the A subunit that is covered by the alignment, empty if none aligned
  11. B_reference The closest reference protein for the B subunit, empty if none aligned
  12. B_identity The percent identity to the reference for the B subunit, empty if none aligned
  13. B_coverage The percentage of the reference for the B subunit that is covered by the alignment, empty if none aligned

stxtyper's People

Contributors

evolarjun avatar vbrover avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

kapsakcj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.