tnseq-GI

Tool for analysis of genetic interactions (GI) from TnSeq data.

Version History

Versionn 1.1.0 changes

Using empirical priors
Added type classification in output;
More options in arguments

Requirements:

Python 2.7+ www.python.org
Scipy 0.6.0+ www.scipy.org/Download
Numpy 1.2.1+ www.scipy.org/Download

Data

Example files are provided below to test the execution of the script and help verify that input files are in the appropriate format:

Before running the python script, please make sure you have installed all the necessary prerequisites listed above, and have downloaded the latest version of the script. Once the prerequisite software and libraries are installed, to run the script simply type the following command:

python tnseq_GI.py -wt1 H37Rv_d0_r1.wig,H37Rv_d0_r2.wig -wt2 H37Rv_d32_r1.wig,H37Rv_d32_r2.wig -ko1 Rv260_KO_d0_r1.wig,Rv260_KO_d0_r2.wig -ko2 Rv2680_KO_d32_r1.wig,Rv2680_KO_d32_r2.wig -pt H37Rv.prot_table

Below the input file formats are described, followed by description of the input flags available, and finally a description of the output format for the results.

Input File Formats

Read data must be contained in a text-file according to the WIG format. This format has two space-delimited representing the position of all insertion sites (i.e. including ones where no reads were mapped), and the read count information at that site as shown below:

# Generated by tpp from U19_75_R1.fastq and U19_75_R2.fastq
variableStep chrom=H37Rv
60 0
72 0
102 0
188 0
246 0
333 0
360 0
426 0

Flags

The table below outlines the flags accepted by the python script:

Flag	Value	Definition
-wt1	[String]	Comma separated list of paths to WIG formatted datasets for the reference strain under the first condition. Example: -wt1 H37Rv_d0_r1.wig,H37Rv_d0_r2.wig
-wt2	[String]	Comma separated list of paths to WIG formatted datasets for the reference strain under the second condition. Example: -wt1 H37Rv_d32_r1.wig,H37Rv_d32_r2.wig
-ko1	[String]	Comma separated list of paths to WIG formatted datasets for the knockout strain under the first condition. Example: -wt1 Rv2680_KO_d0_r1.wig,Rv2680_KO_d0_r2.wig
-ko2	[String]	Comma separated list of paths to WIG formatted datasets for the knockout strain under the second condition. Example: -wt1 Rv2680_KO_d32_r1.wig,Rv2680_KO_d32_r2.wig
-pt	[String]	Path to the annotation file in prot_table format or GFF3 format. Example: -pt H37Rv.prot_table
-s	[Integer]	Number of samples to take for estimate of posterior distributions. Default: -s 20000
-rope	[Float]	+/- Window to define Region of Potential Equivalency (ROPE) i.e. the region that defines non-interaction. Default: -rope 0.5
--debug	[String]	Comma-separated list of ORF IDs. Limits analysis to those genes and outputs more information. Useful for debugging.

Output Format

Results are printed to screen in tab-separated format:

# Copyright 2016\. Michael A. DeJesus & Thomas R. Ioerger
# Version 1.00; http://saclab.tamu.edu/essentiality/GI
#
# python tnseq_GI.py -wt1 H37Rv_day0_rep1.wig -wt2 H37Rv_day32_rep1.wig -ko1 Rv2680_day0_rep1.wig -ko2 Rv2680_day32_rep1.wig -pt H37Rv.prot_table
# mu0=18.02, S=20000, s20=1.0, k0=1.0, nu0=2.0
# ROPE: 0.5
#orf    Name    Description N   Mean WT-1   Mean WT-2   Mean KO-1   Mean KO-2   Mean logFC WT-2/WT-1    Mean log FC KO-2/KO-1   Mean delta logFC    L. Bound    U. Bound    Outside of HDI?
Rv1473  -   PROBABLE MACROLIDE-TRANSPORT ATP-BINDING PROTEIN ABC TRANSPORTER    25  4.40    0.08    1.56    11.41   -3.95   2.43    6.38    1.67    16.04   True
Rv1780  -   hypothetical protein Rv1780     13  129.62  2.43    9.65    27.63   -5.22   0.79    6.00    1.85    10.46   True
Rv3821  -   PROBABLE CONSERVED INTEGRAL MEMBRANE PROTEIN    18  43.56   7.89    21.25   377.20  -2.28   3.69    5.97    1.66    19.45   True
Rv2632c -   hypothetical protein Rv2632c    3   21.33   0.00    7.41    44.97   -3.64   1.77    5.42    1.04    17.46   True
Rv3823c mmpL8   PROBABLE CONSERVED INTEGRAL MEMBRANE TRANSPORT PROTEIN MMPL8    78  11.63   2.53    4.70    51.32   -2.38   2.97    5.35    1.59    9.40    True

Header/Comments are proceded with "#" tags, and are followed by a row for each ORF found in the input file. Below, each of the columns in is defined:

Column Header	Column Definition
ORF	ORF ID found in annotation.
Name	Name of the gene found in the annotation.
Description	Description of the gene found in the annotation.
N	Total Number of TA dinucleotides within the ORF.
Mean WT-1	Mean read-count for reference strain in condition 1
Mean WT-2	Mean read-count for reference strain in condition 2
Mean KO-1	Mean read-count for knockout strain in condition 1
Mean KO-2	Mean read-count for knockout strain in condition 2
Mean logFC WT-2/WT-1	Average logFC for reference strain between condition 2 and 1
Mean logFC KO-2/KO-1	Average logFC for knockout strain between condition 2 and 1
Mean delta logFC	Average difference between the logFCs
L. Bound	Lower Bound of the 95% Highest Density Interval
U. Bound	Upper Bound of the 95% Highest Density Interval
Outside of HDI?	True if HDI is significanlty different than ROPE (i.e Significantly different logFCs, indicating an interaction)

Running Time

The execution of the software on the representative dataset takes roughly 10 minutes on running on a linux server using the default values. In general, this will depend on the number of genes in the genome being analyzed, the number of samples desired, and the number of replicate datasets. The number of samples to be taken can be controlled using the flag "-s".

abelew / tnseq_gi Goto Github PK

tnseq_gi's Introduction

tnseq-GI

Version History

Requirements:

Data

Instructions

Input File Formats

Flags

Output Format

Running Time

License

tnseq_gi's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent