Hello, thanks for the excellent software! This is a request for a feature rather than

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks for the feature request <a class="user-mention notranslate" data-hovercard-type

Good to know, thanks <a class="user-mention notranslate" data-hovercard-type="user" da

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I think I was able to solve this problem. <a class="user-mention notranslate" data-hov

Great, I'm glad this helped <a class="user-mention notranslate" data-hovercard-type="u

Hi all, sorry for the late response. <a class="user-mention notransl

`Chromosome painting' with tree sequences about tskit HOT 11 CLOSED

tskit-dev commented on August 12, 2024

`Chromosome painting' with tree sequences

from tskit.

Comments (11)

gtsambos commented on August 12, 2024 1

Hi @jeromekelleher and @gdurif - thanks for your comments and thoughtful engagement with this! I am about to fly to New Zealand and will be computer-less for a few days, but will engage with this properly next week when my brain is back in work mode.

Just quickly though:

Thanks for the feature request @gtsambos --- we're actually thinking about something pretty similar at the moment where we want to compute for every node in the tree sequence, the mean fraction of samples descending from that node that are from each population. So, suppose we had node in a tree that had 100 samples from p1 below it and 100 samples from p2 below it (and these we only have two populations, p1 and p2). Then the fractions would be 0.5 and 0.5.

Is this similar to what you're thinking about?

I was thinking about this at the more de-aggregated level of individual leaf nodes, like @gdurif is describing. Say the two admixing populations are labelled X and Y, and you're using msprime to generate sequences from admixed descendants of these populations. Then for any specific tree in a TreeSequence, every leaf node should descend from exactly one of those populations, and thus have ancestry with that population. Say your TreeSequence has 3 trees spanning the intervals [0,1], [1,2] and [2,3]; then a sample might have ancestry X on [0,1], Y on [1,2] and X on [2,3] for example.

Will write more when I'm back from holidays!

from tskit.

gtsambos commented on August 12, 2024 1

Blast from the past! 🕰

from tskit.

jeromekelleher commented on August 12, 2024

Thanks for the feature request @gtsambos --- we're actually thinking about something pretty similar at the moment where we want to compute for every node in the tree sequence, the mean fraction of samples descending from that node that are from each population. So, suppose we had node in a tree that had 100 samples from p1 below it and 100 samples from p2 below it (and these we only have two populations, p1 and p2). Then the fractions would be 0.5 and 0.5.

Is this similar to what you're thinking about?

from tskit.

gdurif commented on August 12, 2024

Hi, also thanks for your great software. I have the same request, it would be very useful to be able to get the population origin from each SNP in a genotype. For instance, we have an admixture between two populations and we would like to know for each locus if it is inherited from the first or the second population

In the mean time, do you think that it would be possible to manually get this local ancestry information by checking the breakpoints introduced by recombination combined to an ascendant exploration of the tree structure ?

Thanks again

from tskit.

jeromekelleher commented on August 12, 2024

Good to know, thanks @gdurif. We thought about this sort of thing a bit for the book chapter on msprime we wrote this year. There's a notebook here that explores it a little bit. Would you mind looking through this and seeing if it answers your questions please?

from tskit.

gdurif commented on August 12, 2024

@jeromekelleher thanks for this example, I am still working on it but I have hope that it will give me answers about this question.

from tskit.

gdurif commented on August 12, 2024

I think I was able to solve this problem. @gtsambos I hope it will help you, tell me if you think it corresponds to what you had in mind.

For instance, let's say we have a simulated tree sequence ts with mutations, recombinations and an admixture event. We can get recombination breakpoints with

breakpoints = np.array(list(ts.breakpoints()))

We can get mutation sites with:

mutations = [variant.site.position for variant in ts.variants()]

Then for each mutation sites within a recombination chunk (i.e. between 2 breakpoints), and for an admixed individual at time 0 we just have to explore the corresponding tree until a given time (just before the admixture happens) to get the population ancestry of the SNP:

u0 = 0 # admixed individual
adm_time = 50 # admixture 50 generation ago
tree = ts.first() # first recombination chunk
u = u0 # starting from u0, climb in the tree
while tree.time(u) < adm_time:
    u = tree.parent(u)
print("ancestral pop = {}".format(tree.population(u)))

So for each mutations (and thus each chunk of recombination) and individuals, we just have to repeat this to get the "chromosome painting".

@jeromekelleher, thanks again for your help. I was just wondering if all individuals share the same recombination breakpoints or if it is possible to know when recombinations happen ?

from tskit.

jeromekelleher commented on August 12, 2024

Great, I'm glad this helped @gdurif. I haven't thought about your solution here and what it does, but there's one important gotcha in there I want to point out:

trees = [tree for tree in ts.trees()] # list of tree

This doesn't work, see the warning here for why.

I think we've probably addressed some of your questions in the book chapter @gdurif, and it's probably worth going through it. If you send me an email ([email protected]) I'll send you the PDF.

from tskit.

gdurif commented on August 12, 2024

I'll send you an e-mail about the book chapter. Regarding your comment, I edited my post, I missed this point in the doc, thanks for the reminder.

from tskit.

gtsambos commented on August 12, 2024

Hi all, sorry for the late response.

@gdurif: I think your idea will work, but it might potentially be quite slow because you are extracting the ancestral node for each sample individual (and each site/mutation?) separately. There will be a lot of repeated information for consecutive sites on the same 'edge' of the tree, and for closely related sample nodes.

I think it would be more efficient to make use of the edge information captured in TreeSequences to do this. I've put a sketch of an idea of how to do this here. As mentioned in #616 , I'm going to keep working on this and can let you know when I have something in a more share-able state!

Copying my supervisor @dvukcevic into this convo too.

from tskit.

jeromekelleher commented on August 12, 2024

I'm going to close this as I think it's been covered by more recent developments. Feel free to reopen if you disagree.

from tskit.

`Chromosome painting' with tree sequences about tskit HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent