Code Monkey home page Code Monkey logo

Comments (3)

uqrmaie1 avatar uqrmaie1 commented on July 28, 2024

many thanks for developing and maintaining this software

Happy to take credit for developing it

I have noticed significant score differences between repeated runs of find_graphs

To get identical results for repeated runs of find_graphs, you can run set.seed() just before find_graphs()

it appears that some results with lower scores align better with my expectations

Lower scores correspond to graphs with better fits (the scores represent log likelihoods and should be negative, but the minus sign is omitted, so lower is better)

I recently discovered the initgraph parameter

This is how the initgraph argument can be used:

run1 = find_graphs(example_f2_blocks)
bestgraph = run1 %>% slice_min(score) %>% pluck('graph', 1)
run2 = find_graphs(example_f2_blocks, initgraph = bestgraph)

The graph should be passed as an R igraph object, and the initgraph argument has to be named (the documentation has an outdated example where initgraph is a positional argument).

Functions for reading/writing graphs from/to disk are described here.

... align better with my expectations
... which I would like to use when adding additional populations to the previous best graph result

When graph fitting depends a lot on prior expectations, and when it is used in this kind of supervised, stepwise manner, it's very easy to overfit, and it becomes hard to tell whether any conclusions can be drawn from the fitting models.
I realize that not constraining the models or using prior expectations tends to lead to useless results, but often this just means that there is not enough information in the data to draw any strong conclusions.

from admixtools.

fangdm avatar fangdm commented on July 28, 2024

Thank you very much.

I had a question: based on the following command line, we did add some new groups in add_data, why don't the newly added groups in add_data appear in the new graph? Should the individuals/groups in the initgraph object (referred to as bestgraph) had to be consistent with add_data?

run2 = find_graphs(add_data, initgraph = bestgraph)

In our project, we had a lot of groups with complex relations. To address this, we propose a gradual augmentation of the previous bestgraph by incorporating new groups, and determined the final graph, whether this approach is feasible?

Thanks again.

from admixtools.

uqrmaie1 avatar uqrmaie1 commented on July 28, 2024

Yes, the groups in initgraph should have the same groups as the groups in add_data. I think it's a good idea to let find_graphs() add additional groups to the initgraph model, but it's not set up in that way, and I probably won't have time to make that change.

What you could do instead is to add a new group to all possible positions in bestgraph using graph_addleaf() and evaluate all resulting graphs:

bestgraph %>%
  graph_addleaf('newgroup') %>%
  rowwise %>%
  mutate(score = qpgraph(add_data, graph)$score) %>%
  ungroup

To address this, we propose a gradual augmentation of the previous bestgraph by incorporating new groups, and determined the final graph, whether this approach is feasible?

It's possible in principle, but there are things to look out for:

  1. The order in which the groups are added can have a big impact on the final best topologies, so it might be a good idea to repeat the analysis while changing the order in which the groups are added.

  2. It's very easy to overfit. One simple way to guard against it is to only use even chromosomes for the whole analysis. Then at the very end, repeat the analysis with only odd chromosomes. If the results for even and odd chromosomes are different, it's probably due to overfitting. This is overly conservative because it only uses half the data each time. On the other hand, if the results are the same or very similar for even and odd chromosomes, it's strong evidence that the resulting models are not overfitted.

In my experience, it is rarely possible to fit models with more than 6 populations and more than 2 admixture events without some degree of overfitting.

from admixtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.