<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Fst calculation inconsistency about admixtools HOT 1 CLOSED

uqrmaie1 commented on September 2, 2024

Fst calculation inconsistency

from admixtools.

Comments (1)

uqrmaie1 commented on September 2, 2024

Thanks for bringing this to my attention!

The likely reason why you see the warning about the discarded blocks is that in these blocks both the numerator and the denominator of FST are 0 for one or more population pairs. This can happen if all SNPs in a block have the same genotype in two populations. It will happen more often if you read data for many populations, since every block where at least one pair is missing will be discarded. If you don't want those blocks to be discarded, you can pass the option remove_na = FALSE to fst(). This will still trigger a warning about blocks having missing data, but now they shouldn't be discarded.

I added the option to compute FST relatively late when making the package, and because of that some FST-related function don't always behave as expected. In particular, fst(f2_blocks) doesn't compute FST, it just turns a 3d-array of per-block-f2-statistics into a data frame of f2-statistics with standard errors. I changed the documentation of the fst() function to make this clearer. FST can't actually be computed from f2 alone. f2 and FST are calculated separately and are stored in different files. You can still pass a 3d-array of per-block numbers to the fst() function, because those numbers could be per-block FST estimates which could be turned into FST estimates plus standard errors. You can get per-block FST estimates by running f2_from_precomp() while setting fst = TRUE. There is nothing in that 3d-array of numbers that remembers whether f2 or FST was read by f2_from_precomp() (the function defaults to reading f2), so the fst() function doesn't complain about getting the wrong input when you pass it an array of per-block-f2-estimates. And it doesn't complain about missing data, because in places where FST estimates are 0/0 = NaN, the f2 estimates are just 0.

Another thing I noticed is that you use the option apply_corr = FALSE. This option only affects f2-statistics, not FST (where this correction is always applied), but I don't know if there are any cases where you want to not apply the correction factor. Without it, your estimates might be biased upwards. I just included this option to make debugging easier for myself!

Hope this makes a bit more sense now, and sorry that the documentation was misleading here!

from admixtools.

Recommend Projects

Fst calculation inconsistency about admixtools HOT 1 CLOSED

Comments (1)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent