Code Monkey home page Code Monkey logo

hfufs's Introduction

hfufs

Improved Fu's Fs calculators for large data sets

Fu's Fs is a population genetics statistic that may be useful for detecting population expansion due to appearance of a recent mutation conferring higher fitness. Calculation of Fu's Fs can involve Stirling numbers of the first kind, which grow very large very fast, leading to potential overflow problems. It also involves a logit transformation which raises possible floating point underflow issues.

hfufs accounts for floating point overflow and underflow issues in the calculation of Fu's Fs, enabling calculation on very large alignments, even those that have extreme positive or negative values of Fu's Fs.

An additional improvement is to do a single term estimate - hfufs uses a Stirling number estimator for each term of the sum for calculating Fu's Fs. Nico Temme developed a single asymptotic estimator that can be used to directly calculate the appropriate sum, increasing both accuracy and speed. This is implemented in the afufs function and is the recommended way to calculate Fu's Fs.

Fu's Fs requires three parameters:

  1. n - number of individuals sampled
  2. k - number of different alleles/haplotypes observed
  3. theta - average pairwise differences between different individuals These parameters can be calculated from an alignment or other representation of sequence alleles using many different software programs, such as PopGenome (https://CRAN.R-project.org/package=PopGenome).

Installation (in R)

Make sure your library paths are all set for R, and that you have write access to them.

library(devtools)
devtools::install_github("swainechen/hfufs")

Basic usage

n <- 100
k <- 30
theta <- 12.345
afufs(n, k, theta)
# -0.7368616

More practical usage

The hfufs/afufs functions require you to calculate n, k, and theta from your data yourself. There are quite a few other packages to do this from an alignment. One of these is PopGenome (https://cran.r-project.org/web/packages/PopGenome/index.html). If you have that package installed, you can simply do:

library(PopGenome)
library(devtools)
devtools::install_github("swainechen/hfufs")
library(hfufs)
fasta_file <- "/path/to/aligned.fasta"
pg.object <- hf.readData(fasta_file)
pg.dataframe <- hf.alignment.stats(pg.object, slide=T, window=1000, step=500)

This interface is a bit easier for reading in single fasta files. The pg.object variable will still hold all the PopGenome information, and parts will be extracted for convenience into pg.dataframe. The hf.alignment.stats function will apply a sliding window to your data if you like with specified window and step size.

In addition, the Stirling number estimator and the actual hfufs/afufs functions are independent and should work well for non-population genetics applications also!

hfufs's People

Contributors

swainechen avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.