Code Monkey home page Code Monkey logo

imkt's Introduction

Build Status

iMKT: integrative McDonald and Kreitman Test

Overview

iMKT is an R package to compute the McDonald and Kreitman test (McDonald and Kreitman 1991 Nature) on polymorphism and divergence genomic data provided by the user or automatically downloaded from PopFly (Hervas et al. 2017 Bioinformatics) or PopHuman (Casillas et al. 2018 Nucleic Acids Res.). It includes five MK derived methodologies which allow inferring the rate of adaptive evolution (α) as well as the fraction of strongly deleterious (d), weakly deleterious (b), and neutral (f) sites.

Installation

The package is deposited in GitHub and can be installed using the devtools library.

## Install devtools package if necessary
install.packages("devtools")

## Install iMKT package from GitHub
devtools::install_github("sergihervas/iMKT")

## Load iMKT library
library(iMKT)

Usage

In summary, iMKT allows performing diverse MK-derived tests using the number of polymorphic (P, classified in Derived Allele Frequency (DAF) categories), divergent (D) and analyzed (m) sites for neutral (0) and selected (i) classes. Briefly, most functions require two input parameters: daf (data frame containing DAF, Pi and P0) and divergence (data frame containing mi, Di, m0, D0) and return the estimation of α together with specific details of the methodology.

The package includes two sample data frames (myDafData, myDivergenceData). The vignettes and manual documentation contain detailed descriptions and examples of each function and type of analysis, and instructions on how to use PopFly and PopHuman genomic data.

The following example, which shows how to perform a standard MK test using sample data, is aimet at illustrating how the iMKT package works.

## Sample daf data included in the package
head(myDafData)
#>     daf    Pi    P0
#> 1 0.025 22490 17189
#> 2 0.075  3217  4780
#> 3 0.125  1616  2874
#> 4 0.175   999  2088
#> 5 0.225   754  1685
#> 6 0.275   679  1443

## Sample divergence data included in the package
myDivergenceData
#>        mi    Di     m0    D0
#> 1 2598805 54641 620019 52537

## Perform standard MKT
standardMKT(myDafData, myDivergenceData)
#> $alpha.symbol
#> [1] 0.2364499

#> $`Fishers exact test P-value`
#> [1] 1.480943e-183

#> $`MKT table`
#> |               | Polymorphism| Divergence|
#> |:--------------|------------:|----------:|
#> |Neutral class  |        45101|      52537|
#> |Selected class |        35816|      54641|

#> $`Divergence metrics`
#> |        Ka|        Ks|     omega|   omegaA|   omegaD|
#> |---------:|---------:|---------:|--------:|--------:|
#> | 0.0210254| 0.0847345| 0.2481331| 0.058671| 0.189462|

Citation

Citation to paper

Licence

Licence of package

Development & Contact

iMKT has been developed by Sergi Hervas ([email protected]), Marta Coronado ([email protected]) and Jesús Murga ([email protected]), from the Bioinformatics of Genome Diversity group from the Universitat Autònoma de Barcelona (UAB) and the Institut de Biotecnologia i Biomedicina (IBB).

If you have any feedback or feature requests regarding iMKT, please contact [email protected] or [email protected].

imkt's People

Contributors

sergihervas avatar jmurga avatar marta-coronado avatar velasgui83 avatar

Stargazers

André Soares avatar Rowan J. Schley avatar Hanbo Zhao (Hanjabolgo Jakuta)  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

imkt's Issues

PopFlyAnalysis() producing empty list as output

Hello-

I am having an issue with the function PopFlyAnalysis(). It runs without error, but just produces an empty list as output. For example, for the gene FBgn0000008 in population RAL, I'm trying to run:

PopFlyAnalysis(genes = c("FBgn0000008"), pops = "RAL", recomb = FALSE, test = "FWW", plot = FALSE)

And here's what the output looks like:
image

If I try to assign the output to a variable instead:
output = PopFlyAnalysis(genes = c("FBgn0000008"), pops = "RAL", recomb = FALSE, test = "FWW", plot = FALSE)
I get:
image
image

I'm not sure what the issue is; any insight/advice would be much appreciated.

Thanks!
Alissa

Problems running iMKT R code

Hi,

I was trying to use iMKT example data and I keep getting errors when using the PopHumanAnalysis() or PopFlyAnalysis() functions. For example, if I run the following example data in the vignette:

> loadPopFly()
Loading PopFly data into your workspace.
This process may take several seconds to complete, please be patient.
> ls()
[1] "data"         "genes"        "my_pops"      "mygenes"      "PopFlyData"   "PopHumanData"
> names(PopFlyData)
 [1] "Name"  "Pop"   "DAF0f" "DAF4f" "p0"    "pi"    "di"    "d0"    "Chr"   "mi"    "m0"    "cM_Mb"
> PopFlyAnalysis(genes=c("FBgn0000055","FBgn0003016"), pops=c("RAL","ZI"), recomb=F, test="DGRP", plot=TRUE)

list()
Warning messages:
1: In length(genes) == 0 || genes == "" :
  'length(x) = 2 > 1' in coercion to 'logical(1)'
2: In length(pops) == 0 || pops == "" :
  'length(x) = 2 > 1' in coercion to 'logical(1)'

Running the examples from the functions also triggers the same error:

> mygenes <- c("FBgn0053196", "FBgn0086906", "FBgn0261836", "FBgn0031617", 
+                           "FBgn0260965", "FBgn0028899", "FBgn0052580", "FBgn0036181",
+                           "FBgn0263077", "FBgn0013733", "FBgn0031857", "FBgn0037836")
> ## Perform analyses
> PopFlyAnalysis(genes=mygenes, pops="RAL", recomb=FALSE, test="iMKT", xlow=0, xhigh=0.9, plot=TRUE)

list()
Warning message:
In length(genes) == 0 || genes == "" :
  'length(x) = 12 > 1' in coercion to 'logical(1)'

The PopHumanData has the same issue, and doesn't even recognize the populations specified in the example itself:

> loadPopHuman()
Loading PopHuman data into your workspace.
This process may take several seconds to complete, please be patient.
> ls()
[1] "data"         "genes"        "my_pops"      "mygenes"      "PopFlyData"   "PopHumanData"
> names(PopHumanData)
 [1] "ID"       "chr"      "pop"      "pi"       "p0"       "DAF0f"    "DAF4f"    "mi"       "m0"       "di"       "d0"      
[12] "symbol"   "recomb"   "globalID"
> mygenes <- c("ENSG00000011021.21_3","ENSG00000091483.6_3","ENSG00000116191.17_3",
+              "ENSG00000116337.15_4","ENSG00000116584.17_3","ENSG00000116745.6_3",
+              "ENSG00000116852.14_3","ENSG00000116898.11_3","ENSG00000117010.15_3",
+              "ENSG00000117090.14_3","ENSG00000117222.13_3","ENSG00000117394.20_3")
> ## Perform analyses
> PopHumanAnalysis(genes=mygenes , pops=c("CEU","YRI"), recomb=FALSE, test="standardMKT")
Error in PopHumanAnalysis(genes = mygenes, pops = c("CEU", "YRI"), recomb = FALSE,  : 
  MKT data is not available for the sequested populations(s).
Select among the following populations:
ACB, ASW, BEB, CDX, CEU, CHB, CHS, CLM, ESN, FIN, GBR, GIH, GWD, IBS, ITU, JPT, KHV, LWK, MSL, MXL, PEL, PJL, PUR, STU, TSI, YRI!.
The populations that caused the error are: .
In addition: Warning messages:
1: In length(genes) == 0 || genes == "" :
  'length(x) = 12 > 1' in coercion to 'logical(1)'
2: In length(pops) == 0 || pops == "" :
  'length(x) = 2 > 1' in coercion to 'logical(1)'

I believe part of the issue is related to some change in R where you can no longer combine && or || with length arguments as explained here.

I also noticed that none of the tutorials in the iMKT website work, since I always get the following warning:
A minimum number of segregating sites is needed to run an asymptotic MKT. Please add more genes to the analysis or change the MKT extension.

Not sure if this is related, but thought I would also point it out.

Thank you,
Mafalda

check_input P0

Warning in check_input(daf, divergence, 0, 1): Input daf file contains P0 values = 0.
This can bias the function fitting and the estimation of alpha.

Only should appear in asymptotic and iMK, not in DGRP, standard and FWW.

DGRP plot_fractions()

Plot does not allow list_cutoffs != 3. This means, it only works when passing 3 cutoffs, not 1, 2, 4, etc.

PopFlyAnalysis

  • Recombination from Comeron et al. (reference)
  • Include report recomb bins (numer genes, mean values, etc)

Plots iMK

  • Change order: 1) DAF 2) alpha 3) Fractions

rename functions

check_input --> checkInput
theme_publication --> themePublication

consistency and vignettes pdf output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.