Code Monkey home page Code Monkey logo

pecaller's Introduction

PEMapper/PECaller

Citation: Johnston HR, Chopra P, Wingo T, et al. PEMapper / PECaller: A simplified approach to whole-genome sequencing. Proc Natl Acad Sci U S A. 2017 Feb 21. doi: 10.1073/pnas.1618065114

Build genome index

  • Download the genome fa files.

For example, to download all of mm10 to the current directory from UCSC:

rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/mm10/chromosomes/ ./
  • Merge all of the fa files into one large fa file.

Use merge_dir_fa.pl to merge the files into a single fa file with chromosomes sorted in a natural order (e.g., 1-19, M, X, and Y) with unmapped chromosome after chrY.

./merge_dir_fa.pl -d ../mm10_2015-01-25/ -c '1-19,M,X,Y' -o mm10
  • Index the merged fa file.

Use index_genome either interactively redirect input from a file. For example, index_genome < in.cmds where in.cmds contains:

d
1000
mm10.fa
mm10
n

Map Fastq files

  • Single-ended or paired-ended mapping of fastq files is supported.
  • Mapping a collection of files is also supported and made easier using map_directory_array.pl, which does make some assumptions about the naming of the files and that paired-ended files either have _1_ vs _2_ or _R1_ vs _R2_.

Basecalling

  • place all pileup files into a single directory and use pecaller launched from that directory.
  • The user specified bed file can restrict calling to sites within the file. The chromosome order must be exactly the same as in the sdx file of the indexed genome.
  • Note: if you experience a segmentation fault just after running the command it is likely that you have not supplied the command correctly. It is easy to miss a parameter, and an argtable3 or getopt interface is on our wish list.
  • pecaller will make a base and snp file for all sites in the user supplied region (or every site covered in the pileup file if one is not provided).

Merging basecall files together

  • To merge multiple base and snp files from different pemapper runs place all of the called files (i.e., snp, base, and indel files) into the same directory (or symlink them) and run make_snplist_formerge.pl, which will examine the snp files and create a "good" and "bad" list of sites.
  • Provide pecall_merger with the base files and sites that should be merged.

Adding Indels and sorting variant sites in a snpfile

  • pecaller calls each base indepently, unlike a haplotype caller. The multithreaded nature of pecaller also means that small deletions (or SNPs) may not be called contiguously. Also, insertions are stored in a separate indel file.
  • To sort the variants and place all final indels into the snpfile use merge_indel_snp.pl.

Q/C

  • snp_tran_counter.pl and snp_tran_silent_rep.pl give transition to transversion counts for variants obsered. The latter provides counts specific to different kinds of classes of variants (i.e., replacement sites).
  • snp_tran_silent_rep.pl expects the annotation to come from SeqAnt.

Contributing / Folding in Dave's changes

  1. Change EOL characters to unix set ff=unix in vim, for example.
  2. indent things in a consistent way: indent -bli0 -l120.

pecaller's People

Contributors

djcutler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.