Hello. My name is Donghee Kim. I’m a phD student of Jeong lab, of which the project in

Question: The difference of qpAdm/qpWave model fitting between Admixtools 1 and 2. about admixtools HOT 3 OPEN

uqrmaie1 commented on July 28, 2024

Question: The difference of qpAdm/qpWave model fitting between Admixtools 1 and 2.

from admixtools.

Comments (3)

uqrmaie1 commented on July 28, 2024

Can you share the parameter file you used for Admixtools 1, the exact commands you used in Admixtools 2, and the outputs you get?

If the results are substantially different, the most likely reason is that the settings don't match. If the settings match, there can still be a small difference; I never tracked down the reason for these very small differences. They seemed too small to be of much relevance. It's possible that it has to do with how the error matrix is estimated. If the boundaries of the jackknife blocks don't match perfectly between Admixtools 1 and 2, that can be enough to make the estimated variances slightly different.

from admixtools.

dongkeeeee commented on July 28, 2024

Thank you for quick answer. Here are the commands & results.
I don't understand why chisq values are significantly different while weight values are not.

Admixtools 1
Command
library(admixtools)

fn1 <- "test"

left = c('Muturu','Sahiwal')
right =c('EuropeanBison', 'AmericanBison', 'Simmental', 'Hanwoo')
target = 'Afar'

a <- qpadm_wrapper(fn1,left,right,target,'/opt/ohpc/pub/apps/AdmixTools/bin/qpAdm', parfile="./parfile", outdir = "./")

Results
weights
target left weight mean se z
1 Afar Muturu 0.273 0.273 0.005 54.5
2 Afar Sahiwal 0.727 0.727 0.005 145.

popdrop
pat wt dof chisq p Muturu Sahiwal feasible
1 00 0 2 2.65 0.266 0.273 0.727 TRUE
2 01 1 3 8971. 0 1 0 TRUE
3 10 1 3 2751. 0 0 1 TRUE

Admixtools 2
Commands
library(admixtools)

fn1 <- "test"

left = c('Muturu','Sahiwal')
right =c('EuropeanBison', 'AmericanBison', 'Simmental', 'Hanwoo')
target = 'Afar'

mypops <- c(left,right,target)

dt <- f2_from_geno(fn1, pops=mypops ,auto_only = F, maxmem = 10000, poly_only=F, blgsize=5000000)

b <- qpadm(dt,left,right,target)

Results
weights
target left weight se z
1 Afar Muturu 0.273 0.00488 55.8
2 Afar Sahiwal 0.727 0.00488 149.

popdrop
pat wt dof chisq p f4rank Muturu Sahiwal feasible best dofdiff
1 00 0 2 4.19 0.123 1 0.273 0.727 TRUE NA NA
2 01 1 3 10135. 0 0 1 NA TRUE TRUE 0
3 10 1 3 3098. 0 0 NA 1 TRUE TRUE NA

from admixtools.

uqrmaie1 commented on July 28, 2024

Thank you for posting the commands!

I had another look at this, and I found a few reasons why qpadm chi-squared- and p-values can be different between Admixtools 1 and Admixtools 2.

SNP block boundaries

In Admixtools 1, missing SNPs are not excluded when SNP block boundaries are calculated, but in Admixtools 2 they are excluded first when running f2_from_geno() or extract_f2() (but not when reading the data from genotype files directly). This can result in slightly different SNP blocks, which can affect the p-values a bit. I could change that so the SNP blocks are always computed based on all SNPs to make things more consistent, but it's not too straight-forward to change this, and it's possible that the potential for introducing new bugs is greater than the benefit from making that change.

SNP block lengths

Admixtools 1 calculates SNP block boundaries based on all SNPs, but the calculated block lengths do not include missing SNPs. Admixtools 2, when reading the data from genotype files directly, calculated the block lengths based on all SNPs, which is wrong. I changed that in the latest version. So in the latest version, if you run your example again not with f2_from_geno(), but reading the data from genotype files directly, you should see a smaller difference in p-values compared to Admixtools 1.

Covariance matrix regularization

Before the covariance matrix is inverted, a regularization term is added to the diagonal elements, which is proportional to the fudge parameter times the trace of the matrix. In Admixtools 1, this is done twice. I now added a parameter fudge_twice to the qpadm function so that this behavior can be imitated.

Very large chi-squared values

After fixing the bug in 2., and setting fudge_twice = TRUE, I now get very close to identical chi-squared and p-values in my tests for qpadm in Admixtools 1 and 2, except sometimes when the chi-squared values are very large. I might get 100 vs 1000 in Admixtools 1 vs 2, for example. I haven't looked into this further, because I don't think differences in that range matter for qpadm.

Please let me know if you still see differences after setting fudge_twice = TRUE and reading the data from genotype files directly. If yes, the best way for me to look into this would be to replicate this on your data, if you can share it!

from admixtools.

Question: The difference of qpAdm/qpWave model fitting between Admixtools 1 and 2. about admixtools HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent