Comments (3)
many thanks for developing and maintaining this software
Happy to take credit for developing it
I have noticed significant score differences between repeated runs of find_graphs
To get identical results for repeated runs of find_graphs
, you can run set.seed()
just before find_graphs()
it appears that some results with lower scores align better with my expectations
Lower scores correspond to graphs with better fits (the scores represent log likelihoods and should be negative, but the minus sign is omitted, so lower is better)
I recently discovered the initgraph parameter
This is how the initgraph
argument can be used:
run1 = find_graphs(example_f2_blocks)
bestgraph = run1 %>% slice_min(score) %>% pluck('graph', 1)
run2 = find_graphs(example_f2_blocks, initgraph = bestgraph)
The graph should be passed as an R igraph object, and the initgraph
argument has to be named (the documentation has an outdated example where initgraph
is a positional argument).
Functions for reading/writing graphs from/to disk are described here.
... align better with my expectations
... which I would like to use when adding additional populations to the previous best graph result
When graph fitting depends a lot on prior expectations, and when it is used in this kind of supervised, stepwise manner, it's very easy to overfit, and it becomes hard to tell whether any conclusions can be drawn from the fitting models.
I realize that not constraining the models or using prior expectations tends to lead to useless results, but often this just means that there is not enough information in the data to draw any strong conclusions.
from admixtools.
Thank you very much.
I had a question: based on the following command line, we did add some new groups in add_data, why don't the newly added groups in add_data appear in the new graph? Should the individuals/groups in the initgraph object (referred to as bestgraph) had to be consistent with add_data?
run2 = find_graphs(add_data, initgraph = bestgraph)
In our project, we had a lot of groups with complex relations. To address this, we propose a gradual augmentation of the previous bestgraph by incorporating new groups, and determined the final graph, whether this approach is feasible?
Thanks again.
from admixtools.
Yes, the groups in initgraph
should have the same groups as the groups in add_data
. I think it's a good idea to let find_graphs()
add additional groups to the initgraph
model, but it's not set up in that way, and I probably won't have time to make that change.
What you could do instead is to add a new group to all possible positions in bestgraph
using graph_addleaf()
and evaluate all resulting graphs:
bestgraph %>%
graph_addleaf('newgroup') %>%
rowwise %>%
mutate(score = qpgraph(add_data, graph)$score) %>%
ungroup
To address this, we propose a gradual augmentation of the previous bestgraph by incorporating new groups, and determined the final graph, whether this approach is feasible?
It's possible in principle, but there are things to look out for:
-
The order in which the groups are added can have a big impact on the final best topologies, so it might be a good idea to repeat the analysis while changing the order in which the groups are added.
-
It's very easy to overfit. One simple way to guard against it is to only use even chromosomes for the whole analysis. Then at the very end, repeat the analysis with only odd chromosomes. If the results for even and odd chromosomes are different, it's probably due to overfitting. This is overly conservative because it only uses half the data each time. On the other hand, if the results are the same or very similar for even and odd chromosomes, it's strong evidence that the resulting models are not overfitted.
In my experience, it is rarely possible to fit models with more than 6 populations and more than 2 admixture events without some degree of overfitting.
from admixtools.
Related Issues (20)
- est_to_boo does not preserve the SNP block names (number of SNPs), but est_to_loo() does. Is that intended? HOT 2
- large difference of f3 result between admixtools and admixtools2 HOT 1
- Zero drift edges HOT 2
- Behavior affected by other R packages? HOT 2
- Error in if (gimp > 0 && gimp%%plusminus_generations == 0) { : missing value where TRUE/FALSE needed HOT 2
- please add tags/releases HOT 2
- Inconsistent bootstrap significance testing HOT 5
- keyword argument typo lack of error HOT 3
- Comparing graphs with compare_fits HOT 2
- auto_only: change default value to FALSE HOT 1
- qpWave returning one fewer ranks than expected HOT 1
- qpAdm computation stucks when the num. of letf pops below 3
- Issue with f3 and f4 HOT 2
- running many replicates of find_graphs? HOT 1
- Issue with extract_f2 HOT 2
- Problem making f2 file from plink .bed .fam and .bim files HOT 6
- Suggestion: Change to p_emp calculation in compare_fits() HOT 1
- possible to get worst residuals from `find_graphs` models? HOT 1
- Installation / Update error HOT 4
- Interpretation of qpadm results
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from admixtools.