Hi! I'm running some Ougroup f3 comparisons on a dataset of modern a

Outgroup f3 statistics not working with pseudohaploidized data. about admixtools HOT 2 OPEN

uqrmaie1 commented on September 2, 2024

Outgroup f3 statistics not working with pseudohaploidized data.

from admixtools.

Comments (2)

uqrmaie1 commented on September 2, 2024

I suspect that the problem is that the first population (out) consists of only a single pseudohaploidized sample. This makes it impossible to get unbiased estimates of f3, which is why you get NAs.

f4(out,pop1;out,pop2) and f3(out;pop1,pop2) will be identical in the limit as the number of samples in each population increases. With small sample counts, unbiased estimates of f3 depend on a bias correction term for the first population. That term is proportional to sample size, and it can't be computed when one of the populations has only a single pseudohaploidized sample.

In addition to that, the default option for f3 when it is computed from genotype files, is that the base estimate is normalized by the heterozygosity of the first population. Just like the bias correction term, the heterozygosity can only be estimated if the first population has at least two haplotypes (one diploid sample or two pseudohaploid samples).

There are options in the f3 function to get around that limitation. The resulting f3 estimates may be biased, but that bias may be small, and sometimes the bias doesn't matter even if it is large. An example of a case where a large bias doesn't matter is qpGraph, where multiple f3-statistics with the same out population are compared to each other, and it's only the relative difference between f3-statistics that matters.

The heterozygosity normalization can be turned off with outgroupmode = TRUE.
The bias correction term can effectively be set to 0 with apply_corr = FALSE. The bias correction term is positive and is normally subtracted from the uncorrected f3-estimate, so setting this to FALSE will result in an upward biased f3-estimate.

When both options are passed to the f3 function, it should give the same result as the f4 function.
So f3('prefix', 'out', 'pop1', 'pop2', outgroupmode = TRUE, apply_corr = FALSE) should give the same result as f4('prefix', 'out', 'pop1', 'out', 'pop2').

from admixtools.

Hjorvik commented on September 2, 2024

Thank you very much for the explanation. Knowing that f3() expects a diploid outgroup or more than one sample, we can plan our dataset accordingly. I save the commands, however, in case that's not feasible in the future.

Best,
Pedro.

from admixtools.

Recommend Projects

Outgroup f3 statistics not working with pseudohaploidized data. about admixtools HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent