Tool for analysis of genetic interactions (GI) from TnSeq data.
Versionn 1.1.0 changes
- Using empirical priors
- Added type classification in output;
- More options in arguments
- Python 2.7+ www.python.org
- Scipy 0.6.0+ www.scipy.org/Download
- Numpy 1.2.1+ www.scipy.org/Download
Example files are provided below to test the execution of the script and help verify that input files are in the appropriate format:
- Reference strain
- Knockout strain
- H37Rv annotation
Before running the python script, please make sure you have installed all the necessary prerequisites listed above, and have downloaded the latest version of the script. Once the prerequisite software and libraries are installed, to run the script simply type the following command:
python tnseq_GI.py -wt1 H37Rv_d0_r1.wig,H37Rv_d0_r2.wig -wt2 H37Rv_d32_r1.wig,H37Rv_d32_r2.wig -ko1 Rv260_KO_d0_r1.wig,Rv260_KO_d0_r2.wig -ko2 Rv2680_KO_d32_r1.wig,Rv2680_KO_d32_r2.wig -pt H37Rv.prot_table
Below the input file formats are described, followed by description of the input flags available, and finally a description of the output format for the results.
Read data must be contained in a text-file according to the WIG format. This format has two space-delimited representing the position of all insertion sites (i.e. including ones where no reads were mapped), and the read count information at that site as shown below:
# Generated by tpp from U19_75_R1.fastq and U19_75_R2.fastq variableStep chrom=H37Rv 60 0 72 0 102 0 188 0 246 0 333 0 360 0 426 0
The table below outlines the flags accepted by the python script:
Flag | Value | Definition |
---|---|---|
-wt1 | [String] | Comma separated list of paths to WIG formatted datasets for the reference strain under the first condition. Example: -wt1 H37Rv_d0_r1.wig,H37Rv_d0_r2.wig |
-wt2 | [String] | Comma separated list of paths to WIG formatted datasets for the reference strain under the second condition. Example: -wt1 H37Rv_d32_r1.wig,H37Rv_d32_r2.wig |
-ko1 | [String] | Comma separated list of paths to WIG formatted datasets for the knockout strain under the first condition. Example: -wt1 Rv2680_KO_d0_r1.wig,Rv2680_KO_d0_r2.wig |
-ko2 | [String] | Comma separated list of paths to WIG formatted datasets for the knockout strain under the second condition. Example: -wt1 Rv2680_KO_d32_r1.wig,Rv2680_KO_d32_r2.wig |
-pt | [String] | Path to the annotation file in prot_table format or GFF3 format. Example: -pt H37Rv.prot_table |
-s | [Integer] | Number of samples to take for estimate of posterior distributions. Default: -s 20000 |
-rope | [Float] | +/- Window to define Region of Potential Equivalency (ROPE) i.e. the region that defines non-interaction. Default: -rope 0.5 |
--debug | [String] | Comma-separated list of ORF IDs. Limits analysis to those genes and outputs more information. Useful for debugging. |
Results are printed to screen in tab-separated format:
# Copyright 2016\. Michael A. DeJesus & Thomas R. Ioerger # Version 1.00; http://saclab.tamu.edu/essentiality/GI # # python tnseq_GI.py -wt1 H37Rv_day0_rep1.wig -wt2 H37Rv_day32_rep1.wig -ko1 Rv2680_day0_rep1.wig -ko2 Rv2680_day32_rep1.wig -pt H37Rv.prot_table # mu0=18.02, S=20000, s20=1.0, k0=1.0, nu0=2.0 # ROPE: 0.5 #orf Name Description N Mean WT-1 Mean WT-2 Mean KO-1 Mean KO-2 Mean logFC WT-2/WT-1 Mean log FC KO-2/KO-1 Mean delta logFC L. Bound U. Bound Outside of HDI? Rv1473 - PROBABLE MACROLIDE-TRANSPORT ATP-BINDING PROTEIN ABC TRANSPORTER 25 4.40 0.08 1.56 11.41 -3.95 2.43 6.38 1.67 16.04 True Rv1780 - hypothetical protein Rv1780 13 129.62 2.43 9.65 27.63 -5.22 0.79 6.00 1.85 10.46 True Rv3821 - PROBABLE CONSERVED INTEGRAL MEMBRANE PROTEIN 18 43.56 7.89 21.25 377.20 -2.28 3.69 5.97 1.66 19.45 True Rv2632c - hypothetical protein Rv2632c 3 21.33 0.00 7.41 44.97 -3.64 1.77 5.42 1.04 17.46 True Rv3823c mmpL8 PROBABLE CONSERVED INTEGRAL MEMBRANE TRANSPORT PROTEIN MMPL8 78 11.63 2.53 4.70 51.32 -2.38 2.97 5.35 1.59 9.40 True
Header/Comments are proceded with "#" tags, and are followed by a row for each ORF found in the input file. Below, each of the columns in is defined:
Column Header | Column Definition | |
---|---|---|
ORF | ORF ID found in annotation. | |
Name | Name of the gene found in the annotation. | |
Description | Description of the gene found in the annotation. | |
N | Total Number of TA dinucleotides within the ORF. | |
Mean WT-1 | Mean read-count for reference strain in condition 1 | |
Mean WT-2 | Mean read-count for reference strain in condition 2 | |
Mean KO-1 | Mean read-count for knockout strain in condition 1 | |
Mean KO-2 | Mean read-count for knockout strain in condition 2 | |
Mean logFC WT-2/WT-1 | Average logFC for reference strain between condition 2 and 1 | |
Mean logFC KO-2/KO-1 | Average logFC for knockout strain between condition 2 and 1 | |
Mean delta logFC | Average difference between the logFCs | |
L. Bound | Lower Bound of the 95% Highest Density Interval | |
U. Bound | Upper Bound of the 95% Highest Density Interval | |
Outside of HDI? | True if HDI is significanlty different than ROPE (i.e Significantly different logFCs, indicating an interaction) |
The execution of the software on the representative dataset takes roughly 10 minutes on running on a linux server using the default values. In general, this will depend on the number of genes in the genome being analyzed, the number of samples desired, and the number of replicate datasets. The number of samples to be taken can be controlled using the flag "-s".
GNU General Public License v3.0 © Copyright 2012. Michael A. DeJesus & Thomas R. Ioerger.