Code Monkey home page Code Monkey logo

tnseq_gi's Introduction

tnseq-GI

Tool for analysis of genetic interactions (GI) from TnSeq data.

Versionn 1.1.0 changes

  • Using empirical priors
  • Added type classification in output;
  • More options in arguments

Example files are provided below to test the execution of the script and help verify that input files are in the appropriate format:

Before running the python script, please make sure you have installed all the necessary prerequisites listed above, and have downloaded the latest version of the script. Once the prerequisite software and libraries are installed, to run the script simply type the following command:

python tnseq_GI.py -wt1 H37Rv_d0_r1.wig,H37Rv_d0_r2.wig -wt2 H37Rv_d32_r1.wig,H37Rv_d32_r2.wig -ko1 Rv260_KO_d0_r1.wig,Rv260_KO_d0_r2.wig -ko2 Rv2680_KO_d32_r1.wig,Rv2680_KO_d32_r2.wig -pt H37Rv.prot_table

Below the input file formats are described, followed by description of the input flags available, and finally a description of the output format for the results.

Read data must be contained in a text-file according to the WIG format. This format has two space-delimited representing the position of all insertion sites (i.e. including ones where no reads were mapped), and the read count information at that site as shown below:

# Generated by tpp from U19_75_R1.fastq and U19_75_R2.fastq
variableStep chrom=H37Rv
60 0
72 0
102 0
188 0
246 0
333 0
360 0
426 0

The table below outlines the flags accepted by the python script:

Flag Value Definition
-wt1 [String] Comma separated list of paths to WIG formatted datasets for the reference strain under the first condition. Example: -wt1 H37Rv_d0_r1.wig,H37Rv_d0_r2.wig
-wt2 [String] Comma separated list of paths to WIG formatted datasets for the reference strain under the second condition. Example: -wt1 H37Rv_d32_r1.wig,H37Rv_d32_r2.wig
-ko1 [String] Comma separated list of paths to WIG formatted datasets for the knockout strain under the first condition. Example: -wt1 Rv2680_KO_d0_r1.wig,Rv2680_KO_d0_r2.wig
-ko2 [String] Comma separated list of paths to WIG formatted datasets for the knockout strain under the second condition. Example: -wt1 Rv2680_KO_d32_r1.wig,Rv2680_KO_d32_r2.wig
-pt [String] Path to the annotation file in prot_table format or GFF3 format. Example: -pt H37Rv.prot_table
-s [Integer] Number of samples to take for estimate of posterior distributions. Default: -s 20000
-rope [Float] +/- Window to define Region of Potential Equivalency (ROPE) i.e. the region that defines non-interaction. Default: -rope 0.5
--debug [String] Comma-separated list of ORF IDs. Limits analysis to those genes and outputs more information. Useful for debugging.

Results are printed to screen in tab-separated format:

# Copyright 2016\. Michael A. DeJesus & Thomas R. Ioerger
# Version 1.00; http://saclab.tamu.edu/essentiality/GI
#
# python tnseq_GI.py -wt1 H37Rv_day0_rep1.wig -wt2 H37Rv_day32_rep1.wig -ko1 Rv2680_day0_rep1.wig -ko2 Rv2680_day32_rep1.wig -pt H37Rv.prot_table
# mu0=18.02, S=20000, s20=1.0, k0=1.0, nu0=2.0
# ROPE: 0.5
#orf    Name    Description N   Mean WT-1   Mean WT-2   Mean KO-1   Mean KO-2   Mean logFC WT-2/WT-1    Mean log FC KO-2/KO-1   Mean delta logFC    L. Bound    U. Bound    Outside of HDI?
Rv1473  -   PROBABLE MACROLIDE-TRANSPORT ATP-BINDING PROTEIN ABC TRANSPORTER    25  4.40    0.08    1.56    11.41   -3.95   2.43    6.38    1.67    16.04   True
Rv1780  -   hypothetical protein Rv1780     13  129.62  2.43    9.65    27.63   -5.22   0.79    6.00    1.85    10.46   True
Rv3821  -   PROBABLE CONSERVED INTEGRAL MEMBRANE PROTEIN    18  43.56   7.89    21.25   377.20  -2.28   3.69    5.97    1.66    19.45   True
Rv2632c -   hypothetical protein Rv2632c    3   21.33   0.00    7.41    44.97   -3.64   1.77    5.42    1.04    17.46   True
Rv3823c mmpL8   PROBABLE CONSERVED INTEGRAL MEMBRANE TRANSPORT PROTEIN MMPL8    78  11.63   2.53    4.70    51.32   -2.38   2.97    5.35    1.59    9.40    True

Header/Comments are proceded with "#" tags, and are followed by a row for each ORF found in the input file. Below, each of the columns in is defined:

Column Header Column Definition
ORF ORF ID found in annotation.
Name Name of the gene found in the annotation.
Description Description of the gene found in the annotation.
N Total Number of TA dinucleotides within the ORF.
Mean WT-1 Mean read-count for reference strain in condition 1
Mean WT-2 Mean read-count for reference strain in condition 2
Mean KO-1 Mean read-count for knockout strain in condition 1
Mean KO-2 Mean read-count for knockout strain in condition 2
Mean logFC WT-2/WT-1 Average logFC for reference strain between condition 2 and 1
Mean logFC KO-2/KO-1 Average logFC for knockout strain between condition 2 and 1
Mean delta logFC Average difference between the logFCs
L. Bound Lower Bound of the 95% Highest Density Interval
U. Bound Upper Bound of the 95% Highest Density Interval
Outside of HDI? True if HDI is significanlty different than ROPE (i.e Significantly different logFCs, indicating an interaction)

The execution of the software on the representative dataset takes roughly 10 minutes on running on a linux server using the default values. In general, this will depend on the number of genes in the genome being analyzed, the number of samples desired, and the number of replicate datasets. The number of samples to be taken can be controlled using the flag "-s".

License

GNU General Public License v3.0 © Copyright 2012. Michael A. DeJesus & Thomas R. Ioerger.

tnseq_gi's People

Contributors

mad-lab avatar

Watchers

Ashton Trey Belew avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.