Code Monkey home page Code Monkey logo

aaf's Introduction

AAF (Alignment and Assembly Free)

This is a package for constructing phylogeny without doing alignment or assembly. For instruction on usage, check out aafUserManual.doc.

If you need to cite AAF: Fan H, Ives A, Surget-Groba Y, Cannon C (2015). An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data, BMC Genomics 16:522

Installation

Prerequisites

AAF can be used on a UNIX system (Linux, OsX...) with Python 2.7+ and higher versions (including Python 3.X+), and g++/gcc compilers. Biopython (http://biopython.org/wiki/Main_Page) is required for the non-parametric bootstrap, and R (http://cran.r-project.org/) and the R package 'ape' are required for the parametric bootstrap.

Install

  1. Get the source code

     wget https://github.com/fanhuan/AAF/AAF20160831.zip
    
  2. Compile kmer_count(x) and kmer_merge as follows. "path_to_AAF" stands for your path to the AAF folder generated by decompressing AAF.tar.gz.

     a. path_to_AAF/AAF$ cd phylokmer
    
     b. path_to_AAF/AAF/phylokmer$ make
    
     c. Add kmer_count(x) and kmer_merge to your PATH or working directory
    
  3. Compile fitch_kmerX, consense and treedist

     a. path_to_AAF/AAF$ cd phylip_src
    
     b. path_to_AAF/AAF/phylip_src$ make all
    
     c. Add fitch_kmerX and consense to your PATH or working directory  
    

Bootstrap

The most feedback I received about AAF are around bootsrap. It is very computationally intensive to do the two-step nonparametric bootstrap. In case you have a higher coverage (>8X), we assume that the incomplete coverage problem is minor. To reduce the computational load, you can choose only to carry out the seconde step of the bootstrap (nonparametric_bootstrap_s2only.py): sample the kmer table with replacement 1/k of the number of the rows of the table. To further reduce the computaiton, here is a version to sample from the shared kmer table (nonparametric_bootstrap_s2only_skt.py). Singletons from each sample (i.e. kmers that only appear in one sample) are calculated from the difference between the total diversity file and the shared kmer table. Then those singletons are added back during the the calculation of pariwise distance, following a poisson distribution with a mean of 1/k of each singletone number.

BetaVersion/nonparametric_bootstrap_s2only.py

BetaVersion/nonparametric_bootstrap_s2only_skt.py

This only does ONE boostrap. It is designed this way since some users use high throughput facilities. For high performance facility users, increase the ram and threads so each boostrap takes less time. You can wrap this script with a shell script. Be sure not to overwrite the boostrap tree generated each time.

Example:

python singletonCalculator.py phylokmer.dat.gz kmer_diversity.wc 25 -t 10  
[25 is k, compulsary, -t is the number of threads to use, optional. Default = 1.]  
[This would produce a file containing the number of singletons in each sample, in this case phylokmer_singleton.wc]
for ((i=1;i<=100;i++)) #boostrap 100 times
do
	python nonparametric_bootstrap_s2only_skt.py -i phylokmer.dat.gz --fs phylokmer_singleton.wc -t 10
	cat phylokmer_bootstrap.tre >> phylokmer_bootstrap
done
consense #use phylokmer_bootstrap_trees as infile

Note that to loop through 1 to 100, the syntax is different for Unix command. This works for bash on Ubuntu 16.04. If you are having problem with it, check out this post .

FAQ

  1. Dear User: If I have paired end (sample.1.fq, sample.2.fq) files for each sample, should I merge them as input for AAF or should I keep them separately in the ./data/ folder?

    Huan: If you have multiple files for one sample, please put them in the same folder. AAF detects things in one folder as one sample and take the name of the folder as the sample name. Unfortunately AAF does not deal with a mixture of folders and files in the data directory. Therefore if you have one sample that has multiple input files, the rest need to be in folders as well, even if some of them only have one sequence file. Of course you could merge input files from one sample into one so there are only files in the data directory. This way no subdirectories need to be made. Either way it should work. Just no mix of files and folders. I hope Iโ€™m not making this sounds more complicated than it needs to be.

  2. Dear User: Should I use the BetaVersion?

    Huan: Like any BetaVersion, it might not work on your machine and most importantly, it might not be consistant with the user manual. But let's be reckless and give it a try! Please email me or report an issue if it does not work. Thanks for your help!

aaf's People

Contributors

fanhuan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.