running find_graphs with the initgraph parameter about admixtools HOT 3 OPEN

fangdm commented on July 28, 2024

running find_graphs with the initgraph parameter

from admixtools.

Comments (3)

uqrmaie1 commented on July 28, 2024

many thanks for developing and maintaining this software

Happy to take credit for developing it

I have noticed significant score differences between repeated runs of find_graphs

To get identical results for repeated runs of find_graphs, you can run set.seed() just before find_graphs()

it appears that some results with lower scores align better with my expectations

Lower scores correspond to graphs with better fits (the scores represent log likelihoods and should be negative, but the minus sign is omitted, so lower is better)

I recently discovered the initgraph parameter

This is how the initgraph argument can be used:

run1 = find_graphs(example_f2_blocks)
bestgraph = run1 %>% slice_min(score) %>% pluck('graph', 1)
run2 = find_graphs(example_f2_blocks, initgraph = bestgraph)

The graph should be passed as an R igraph object, and the initgraph argument has to be named (the documentation has an outdated example where initgraph is a positional argument).

Functions for reading/writing graphs from/to disk are described here.

... align better with my expectations
... which I would like to use when adding additional populations to the previous best graph result

When graph fitting depends a lot on prior expectations, and when it is used in this kind of supervised, stepwise manner, it's very easy to overfit, and it becomes hard to tell whether any conclusions can be drawn from the fitting models.
I realize that not constraining the models or using prior expectations tends to lead to useless results, but often this just means that there is not enough information in the data to draw any strong conclusions.

from admixtools.

fangdm commented on July 28, 2024

Thank you very much.

I had a question: based on the following command line, we did add some new groups in add_data, why don't the newly added groups in add_data appear in the new graph? Should the individuals/groups in the initgraph object (referred to as bestgraph) had to be consistent with add_data?

run2 = find_graphs(add_data, initgraph = bestgraph)

In our project, we had a lot of groups with complex relations. To address this, we propose a gradual augmentation of the previous bestgraph by incorporating new groups, and determined the final graph, whether this approach is feasible?

Thanks again.

from admixtools.

uqrmaie1 commented on July 28, 2024

Yes, the groups in initgraph should have the same groups as the groups in add_data. I think it's a good idea to let find_graphs() add additional groups to the initgraph model, but it's not set up in that way, and I probably won't have time to make that change.

What you could do instead is to add a new group to all possible positions in bestgraph using graph_addleaf() and evaluate all resulting graphs:

bestgraph %>%
  graph_addleaf('newgroup') %>%
  rowwise %>%
  mutate(score = qpgraph(add_data, graph)$score) %>%
  ungroup

To address this, we propose a gradual augmentation of the previous bestgraph by incorporating new groups, and determined the final graph, whether this approach is feasible?

It's possible in principle, but there are things to look out for:

The order in which the groups are added can have a big impact on the final best topologies, so it might be a good idea to repeat the analysis while changing the order in which the groups are added.
It's very easy to overfit. One simple way to guard against it is to only use even chromosomes for the whole analysis. Then at the very end, repeat the analysis with only odd chromosomes. If the results for even and odd chromosomes are different, it's probably due to overfitting. This is overly conservative because it only uses half the data each time. On the other hand, if the results are the same or very similar for even and odd chromosomes, it's strong evidence that the resulting models are not overfitted.

In my experience, it is rarely possible to fit models with more than 6 populations and more than 2 admixture events without some degree of overfitting.

from admixtools.

running find_graphs with the initgraph parameter about admixtools HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent