Code Monkey home page Code Monkey logo

tn-seq's Introduction

Manual for Tn-seq: A pipeline for processing next-generation DNA sequence files (fastq files) generated by Tn-seq methods (transposon-insertion sequencing), a powerful technique for quantitatively profiling complex populations of transposon mutant bacteria (e.g., Gallagher et al. 2011. mBio.00315-10)

Overview

The Tn-seq pipeline is run using two master scripts. The first step generates lists of reads per location and can be run seperately for multiple Tn-seq runs. The second step annotates the locations and tabulates hits per gene and can incorporate input files from multiple Tn-seq runs for comparison.

Software requirements

Python - we use version 2.6.6 but any 2.x version above that should work.

Read mapping software: BWA (tested with version 0.7.4) or Bowtie (tested with version 0.12.7)

Running the scripts

Step 1: mapping (process_map.py)
Usage: process_map.py [options] firstend_fastq_1 index_fastq_1 secondend_fastq_1 ...
Options:
  -h, --help                show this help message and exit
  -r, --reffile             path to reference genome fasta
  -j, --tn_verify_by_read1  use 1st-end read to verify transposon end (default: False)
  -i, --tn_verify_by_index  use index read to verify transposon end (default: False)
  -t, --tn_end              expected transposon end sequence
  -d, --demux_index         use index read to demultiplex (default: False)
  -e, --demux_read2         use 2nd end read to demultiplex (default: False)
  -b, --barcodefile         path to file listing expected barcode sequences
  -c, --chastity            run chastity filter (default: False)
  -n, --normfactor          read count normalization factor (default: 10,000,000)
                            (0 = don't normalize)
  -s, --merge_slipped       merge slipped reads (default: False)
  -u, --use_bowtie          map reads using Bowtie (default: use BWA)
  -w, --workingdir          working directory for input and output files (default: work)

Step 2: annotating (process_annotate_tablulate.py)
Usage: process_annotate_tabulate.py [options] reads_list_1 reads_list_2 ...
Options:
  -h, --help            show this help message and exit
  -r, --reffile         path to reference genome fasta
  -a, --annofiles       path to reference .ptt annotation file(s) (comma-separated list if
      			using more than one; order must match sequences in reference fasta)
  -o, --outfile_anno    path to final annotated output file (default: work/AnnotatedHits.txt)
  -p, --outfile_tab     path to final counts tabulated by gene (default: work/HitsPerGene.txt)
  -w, --workingdir      working directory for input and output files (default: work)

Examples

python process_map.py --barcodefile barcodes.txt --chastity --demux_read2 --tn_verify_by_index --reffile combined.fna --tn_end AGACAG --workingdir work r1.fq ind.fq r2.fq

python process_annotate_tabulate.py --annofiles CP000086.ptt,CP000085.ptt --reffile combined.fna --workingdir work work/r1_ch_iPass_ACGTGA_sum_norm.txt work/r1_ch_iPass_CTAGTG_sum_norm.txt work/r1_ch_iPass_GATCAC_sum_norm.txt work/r1_ch_iPass_TGCACT_sum_norm.txt

Additional notes

Before running the scripts for the first time, check common.py to make sure paths and constants are correct for your environment. For example, you may need to change the path to the BWA executable.

The comma-separated list of .ptt annotation files should not have spaces between the files (only a comma).

If your reference genome contains multiple replicons, combine their fasta files into a single fasta before running this software. The order of the sequences should be the same as the order of the comma-separated list of .ptt files.

For best results, the header lines of your combined fasta file should have simple names, such as the accession number of the replicon. BWA parses these headers and a simple header will be the most compatible.

tn-seq's People

Contributors

elijweiss avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.