The problem TreeTime has currently a very rudimentary way of resol

As <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

We have time number of mutations that happened on bra

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

Improve polytomy resolution about treetime HOT 8 OPEN

neherlab commented on July 26, 2024

Improve polytomy resolution

from treetime.

Comments (8)

matsen commented on July 26, 2024

My first thought was to throw geographic information in a second data partition. IQTree allows this: http://www.iqtree.org/doc/Complex-Models#partition-models . There aren't currently any completely custom models, but you can supply a custom amino acid transition matrix, and so as a proof of concept we could use 20 states.

If it gives reasonable results, I very much suspect that the IQTree folks would accept a PR to implement more general custom models, given the importance of your application.

@afmagee points out, correctly, that this has an inherent difference to the types of discrete-trait migration models we are using here, versus those that BEAST does. Namely, in BEAST they are using a clock proportional to calendar time on the time tree, whereas here migration would happen in units of mutation-time.

My response to that issue is that you're re-doing all the branch lengths in TreeTime anyway, and this would just be a way to get a branch for cases where we know things should cluster together.

We had a short meeting with @trvrb about this. He thought it was not-crazy, but I'd like to hear your thoughts.
I acknowledge that this doesn't solve your larger problem of polytomy resolution, but we're thinking about that too.

from treetime.

afmagee commented on July 26, 2024

As @matsen pointed out to me, one possible way to resolve polytomies is to simply run neighbor joining. I came up with a back of the envelope calculation that might allow one to combine neighbor joining distances between two different data sources, such as DNA and geography, and then tried it out on some simulated datasets.

The attached PDF has some more details on the experiments. I think the key takeaways are:

Plain NJ might be able to resolve polytomies.
Combining nucleotide and geography data at the level of unrooted trees could work (that is, the suggestion to add geography at the IQTree level could work despite my previous objections).

One could also consider using bootstrapping to generate a set of possible resolutions to the polytomy and considering those as well.

unrooted_trees_with_nucleotides_and_mugration.pdf

from treetime.

rneher commented on July 26, 2024

The main puzzle I always faced here is how to combine temporal and geographical constraints. With genetic constraints, we assume a strict hierarchy (genetics first, temporal constraints affect topology only in absence of mutations). But geography and time don't have a clear hierarchy as far as I can tell. So somehow I think we need an objective function that listens to time and geo constraints...

from treetime.

afmagee commented on July 26, 2024

@matsen and I were thinking about geography as data in the same way that the genetic sequences are data. In a full BEAST analysis, with nucleotide data and a geographic character, and not using a structured coalescent model, the geographic character is treated another site with its own substitution model and clock rate. The clock rate helps determine its weight in the likelihood against the nucleotide sites.

If geography incorporated in the initial tree inference, before running TreeTime, then it might reduce the number/size of polytomies that exist. Otherwise, it would have to be incorporated later, probably at the polytomy resolution stage.

If geography evolves enough faster than the nucleotides, then given both a single mutation and a single change in geography, the mutation should be weighted more highly by the likelihood to resolve a relationship. In this case, using geography as data only the polytomy-breaking stage, to favor some relationships over others, might be close to equivalent to having used it to infer the tree as well as during a TreeTime analysis. I think in general the rate of geographic change (the mugration rate) should be higher, so this might be viable as an approximation.

from treetime.

matsen commented on July 26, 2024

We have

time
number of mutations that happened on branches (these are, practically-speaking, terminal mutations)
sampling location

from treetime.

afmagee commented on July 26, 2024

I think that there may be a relatively fast way to pull out a subtree from a polytomy, and in doing so increase the likelihood. This is a partial solution to the overall problem, but it may be a useful starting point.

I'm imagining that we have a polytomy somewhere in the tree with n children. If we can identify that m of these children are identical to the parent node (which is an O(n) comparison), then what we essentially want to do is minimize the sum of the branch lengths involving these nodes. The likelihood for a single branch with identical mother and daughter sequences is maximized when the branch is length 0, and should decrease monotonically. There are some number of identical sequences with currently non-zero branch lengths between them, so if we shorten those branches we should have a better likelihood.

I think that an agglomerative approach would work to make a new subtree for the m identical sequences. One would need the time t_i of each node. The algorithm proceeds as follows, where node p is the parent of the polytomy, time increases backwards from the present, and l_i is the length of the branch subtending node i:

Initialize a set of active nodes with the m nodes.

while ( size(active) > 1 ) { // this would be if( size(active) > 2 ) if m == n
    choose the pair of nodes i,j with the most similar ages
    define d = t_p - max(t_i,t_j)
    remove i,j as children of p
    add a new node k that is a child of p with branch length l_p = d
    give node k children i,j with branch lengths l_i - d and l_k - d
    remove i,j from the active set
    add k to the active set
}

Each iteration of the algorithm reduces the sum of branch lengths with no mutations (by an amount d), and in doing so should increase the likelihood. When done, it is possible that one would need to account for the possible placements of the n-m non-identical nodes into the subtree. I don't know that there is a guarantee that they would not best be placed somewhere in the new subtree.

Here's a cartoon of the process with m=4 and n=6, where the branches with tick-marks indicate that the child node is not identical to the parent (there are mutations) and the bare branches indicate that the child node is identical (there are no notations).

from treetime.

matsen commented on July 26, 2024

Thanks, @afmagee ! I might also suggest two things:

this could merely supply a ranking of pairs of nodes to try, and how far down that list would be up to TreeTime
such a list could also incorporate geography in a very simple-minded way, just biasing us towards parsimonious geographic transitions in the absence of any signal to the contrary.

from treetime.

rneher commented on July 26, 2024

Thanks! I am just trying to think under what conditions we might actually want to group the nodes with >0 mutations with some that have none. But the case of subset of identical nodes is definitely a super useful one to explore.

from treetime.

Improve polytomy resolution about treetime HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent