Code Monkey home page Code Monkey logo

Comments (3)

uqrmaie1 avatar uqrmaie1 commented on July 28, 2024

Can you share the parameter file you used for Admixtools 1, the exact commands you used in Admixtools 2, and the outputs you get?

If the results are substantially different, the most likely reason is that the settings don't match. If the settings match, there can still be a small difference; I never tracked down the reason for these very small differences. They seemed too small to be of much relevance. It's possible that it has to do with how the error matrix is estimated. If the boundaries of the jackknife blocks don't match perfectly between Admixtools 1 and 2, that can be enough to make the estimated variances slightly different.

from admixtools.

dongkeeeee avatar dongkeeeee commented on July 28, 2024

Thank you for quick answer. Here are the commands & results.
I don't understand why chisq values are significantly different while weight values are not.

Admixtools 1
Command
library(admixtools)

fn1 <- "test"

left = c('Muturu','Sahiwal')
right =c('EuropeanBison', 'AmericanBison', 'Simmental', 'Hanwoo')
target = 'Afar'

a <- qpadm_wrapper(fn1,left,right,target,'/opt/ohpc/pub/apps/AdmixTools/bin/qpAdm', parfile="./parfile", outdir = "./")

Results
weights
target left weight mean se z
1 Afar Muturu 0.273 0.273 0.005 54.5
2 Afar Sahiwal 0.727 0.727 0.005 145.

popdrop
pat wt dof chisq p Muturu Sahiwal feasible
1 00 0 2 2.65 0.266 0.273 0.727 TRUE
2 01 1 3 8971. 0 1 0 TRUE
3 10 1 3 2751. 0 0 1 TRUE

Admixtools 2
Commands
library(admixtools)

fn1 <- "test"

left = c('Muturu','Sahiwal')
right =c('EuropeanBison', 'AmericanBison', 'Simmental', 'Hanwoo')
target = 'Afar'

mypops <- c(left,right,target)

dt <- f2_from_geno(fn1, pops=mypops ,auto_only = F, maxmem = 10000, poly_only=F, blgsize=5000000)

b <- qpadm(dt,left,right,target)

Results
weights
target left weight se z
1 Afar Muturu 0.273 0.00488 55.8
2 Afar Sahiwal 0.727 0.00488 149.

popdrop
pat wt dof chisq p f4rank Muturu Sahiwal feasible best dofdiff
1 00 0 2 4.19 0.123 1 0.273 0.727 TRUE NA NA
2 01 1 3 10135. 0 0 1 NA TRUE TRUE 0
3 10 1 3 3098. 0 0 NA 1 TRUE TRUE NA

from admixtools.

uqrmaie1 avatar uqrmaie1 commented on July 28, 2024

Thank you for posting the commands!

I had another look at this, and I found a few reasons why qpadm chi-squared- and p-values can be different between Admixtools 1 and Admixtools 2.

  1. SNP block boundaries

In Admixtools 1, missing SNPs are not excluded when SNP block boundaries are calculated, but in Admixtools 2 they are excluded first when running f2_from_geno() or extract_f2() (but not when reading the data from genotype files directly). This can result in slightly different SNP blocks, which can affect the p-values a bit. I could change that so the SNP blocks are always computed based on all SNPs to make things more consistent, but it's not too straight-forward to change this, and it's possible that the potential for introducing new bugs is greater than the benefit from making that change.

  1. SNP block lengths

Admixtools 1 calculates SNP block boundaries based on all SNPs, but the calculated block lengths do not include missing SNPs. Admixtools 2, when reading the data from genotype files directly, calculated the block lengths based on all SNPs, which is wrong. I changed that in the latest version. So in the latest version, if you run your example again not with f2_from_geno(), but reading the data from genotype files directly, you should see a smaller difference in p-values compared to Admixtools 1.

  1. Covariance matrix regularization

Before the covariance matrix is inverted, a regularization term is added to the diagonal elements, which is proportional to the fudge parameter times the trace of the matrix. In Admixtools 1, this is done twice. I now added a parameter fudge_twice to the qpadm function so that this behavior can be imitated.

  1. Very large chi-squared values

After fixing the bug in 2., and setting fudge_twice = TRUE, I now get very close to identical chi-squared and p-values in my tests for qpadm in Admixtools 1 and 2, except sometimes when the chi-squared values are very large. I might get 100 vs 1000 in Admixtools 1 vs 2, for example. I haven't looked into this further, because I don't think differences in that range matter for qpadm.

Please let me know if you still see differences after setting fudge_twice = TRUE and reading the data from genotype files directly. If yes, the best way for me to look into this would be to replicate this on your data, if you can share it!

from admixtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.