Code Monkey home page Code Monkey logo

ugat's Introduction

UGAT

About

UGAT implements a method to fit an exponential decay to the first 20 base pairs of C-to-T conversions typically observed in ancient DNA (aDNA) sequencing data.

A BAM file is needed with reads aligned to a suitable reference genome. From these alignments, substitution patterns are extracted with the help of htslib, and the frequency of C-to-T conversions at the first 20 bases from the 5โ€™-end is extracted.

This frequency pattern is then used to fit an exponential decay with a non-linear least squares regression using the Levenberg-Marquardt algorithm implemented in the GNU Scientific Library. From the best fit parameters, a one-sided p-value of the rate parameter is reported to test for significant exponential decay.

To test how this p-value behaves for specified numbers of sequences, UGAT also implements two ways of subsampling alignments from the BAM file either by specifying a target number or target fraction of alignments to sample.

For more background see also the publication at elife, which describes an R-based version of the method re-implemented here.

Get it

UGAT makes use of two external libraries, which are included as submodules in this repository. These libraries are the GNU Scientific Library (GSL), and htslib.

To get the UGAT source code, as well as the required library code, clone the repository recursively:

git clone --recursive https://github.com/clwgg/ugat

Please note, that you will need libtool installed to compile the GSL library, along with the regular GNU toolchain for compilation.

After cloning, first compile the submodules, and then the UGAT code:

cd ugat
make submodules
make

This will create the static ugat binary, which you can copy or move anywhere for subsequent use.

Updating

When updating to the current version, please make sure to also update the submodules:

git pull origin master
git submodule update
make submodules
make

Usage

Usage: ./ugat [options]

Options:
	-b	-	BAM file

	-n	-	Number to subsample (in conflict with -f)
	-s	-	Seed for random subsample
	-f	-	Fraction to subsample (in conflict with -n)

	-t	-	Show C to T fraction at first 20 bases instead of exp-fit p-value
	-c	-	Just count alignments in BAM file, no subsampling is done (conflict with -f, -n, -t and -s)


ugat's People

Contributors

clwgg avatar

Watchers

 avatar

ugat's Issues

wrong link?

Hi the proper link to htslib is this git://git.savannah.gnu.org/git/gsl.git
instead of this git://git.savannah.gnu.org/gsl.git

Can you confirm it and update ?

Cheers,
N.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.