Code Monkey home page Code Monkey logo

argweaver's Introduction

ARGweaver

Sampling and manipulating genome-wide ancestral recombination graphs (ARGs).

The ARGweaver software package contains programs and libraries for sampling and manipulating ancestral recombination graphs (ARGs). An ARG is a rich data structure for representing the ancestry of DNA sequences undergoing coalescence and recombination.

ARGweaver citation: Matthew D. Rasmussen, Adam Siepel. Genome-wide inference of ancestral recombination graphs. 2013. arXiv:1306.5110 [q-bio.PE]

Download

ARGweaver can be downloaded or forked from GitHub.

Documentation

See the manual for documentation on the programs and file formats associated with ARGweaver.

Requirements

The following dependencies must be installed to compile and run ARGweaver:

Install

ARGweaver can be installed using any of the normal Python mechanisms. For example, to install from PyPI you can use pip with the following command:

pip install argweaver

Alternatively, ARGweaver can be install using the setup.py file:

python setup.py install

Lastly, ARGweaver can be installed using the Makefile:

make

Once compiled, install the ARGweaver programs (default install in /usr) using:

make install

By default this will install all files into /usr, which may require super user permissions. To specify your own installation path use:

make install prefix=$HOME/local

If you use this option, make sure $HOME/local/bin is in your PATH and $HOME/local/lib/python2.X/site-packages is in your PYTHONPATH.

ARGweaver can also run directly from the source directory. Simply add the bin/ directory to your PATH environment variable or create symlinks to the scripts within bin/ to any directory on your PATH. Also add the argweaver source directory to your PYTHONPATH. See examples/ for details.

Quick Start

Here is a brief example of an ARG simulation and analysis. To generate simulated data containing a set of DNA sequences and an ARG describing their ancestry the following command can be used:

arg-sim \
    -k 8 -L 100000 \
    -N 10000 -r 1.6e-8 -m 1.8e-8 \
    -o test1/test1

This will create an ARG with 8 sequences each 100kb in length evolving in a population of effective size 10,000 (diploid), with recombination rate 1.6e-8 recombinations/site/generation and mutation rate 1.8e-8 mutations/generation/site. The output will be stored in the following files:

test1/test1.arg   -- an ARG stored in *.arg format
test1/test1.sites -- sequences stored in *.sites format

To infer an ARG from the simulated sequences, the following command can be used:

arg-sample \
    -s test1/test1.sites \
    -N 10000 -r 1.6e-8 -m 1.8e-8 \
    --ntimes 20 --maxtime 200e3 -c 10 -n 100 \
    -o test1/test1.sample/out

This will use the sequences in test1/test1.sites and it assumes the same population parameters as the simulation (i.e. -N 10000 -r 1.6e-8 -m 1.8e-8). Also several sampling specific options are given (i.e. 20 discretized time steps, a maximum time of 200,000 generations, a compression of 10bp for the sequences, and 100 sampling iterations. After sampling the following files will be generated:

test1/test1.sample/out.log
test1/test1.sample/out.stats
test1/test1.sample/out.0.smc.gz
test1/test1.sample/out.10.smc.gz
test1/test1.sample/out.20.smc.gz
...
test1/test1.sample/out.100.smc.gz

The file out.log contains a log of the sampling procedure, out.stats contains various ARG statistics (e.g. number of recombinations, ARG posterior probability, etc), and out.0.smc.gz through out.100.smc.gz contain 11 samples of an ARG in *.smc file format.

To estimate the time to most recent common ancestor (TMRCA) across these samples, the following command can be used:

arg-extract-tmrca test1/test1.sample/out.%d.smc.gz \
    > test1/test1.tmrca.txt

This will create a tab-delimited text file containing six columns: chromosome, start, end, posterior mean TMRCA (generations), lower 2.5 percentile TMRCA, and upper 97.5 percentile TMRCA. The first four columns define a track of TMRCA across the genomic region in BED file format.

Many other statistics can be extracted from sampled ARGs. For more details see examples/.

Development

The following Python libraries are needed for developing ARGweaver:

nose
pyflakes
pep8

These dependencies can be installed using

pip install -r requirements-dev.txt

The python tests can be run either with nose or make:

# Run tests with nose
nosetests test

# Run tests with make
make test

There are also C++ tests written using googletest, Google's unit-testing framework. Googletest can either be installed system-wide or within the ARGweaver source tree. For convenience, googletest can be installed in the source tree using

make gtest

Once installed, c++ unit tests can be run using

make ctest

argweaver's People

Contributors

mdrasmus avatar mjhubisz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.