Code Monkey home page Code Monkey logo

Comments (11)

gtsambos avatar gtsambos commented on August 12, 2024 1

Hi @jeromekelleher and @gdurif - thanks for your comments and thoughtful engagement with this! I am about to fly to New Zealand and will be computer-less for a few days, but will engage with this properly next week when my brain is back in work mode.

Just quickly though:

Thanks for the feature request @gtsambos --- we're actually thinking about something pretty similar at the moment where we want to compute for every node in the tree sequence, the mean fraction of samples descending from that node that are from each population. So, suppose we had node in a tree that had 100 samples from p1 below it and 100 samples from p2 below it (and these we only have two populations, p1 and p2). Then the fractions would be 0.5 and 0.5.

Is this similar to what you're thinking about?

I was thinking about this at the more de-aggregated level of individual leaf nodes, like @gdurif is describing. Say the two admixing populations are labelled X and Y, and you're using msprime to generate sequences from admixed descendants of these populations. Then for any specific tree in a TreeSequence, every leaf node should descend from exactly one of those populations, and thus have ancestry with that population. Say your TreeSequence has 3 trees spanning the intervals [0,1], [1,2] and [2,3]; then a sample might have ancestry X on [0,1], Y on [1,2] and X on [2,3] for example.

Will write more when I'm back from holidays!

from tskit.

gtsambos avatar gtsambos commented on August 12, 2024 1

Blast from the past! 🕰

from tskit.

jeromekelleher avatar jeromekelleher commented on August 12, 2024

Thanks for the feature request @gtsambos --- we're actually thinking about something pretty similar at the moment where we want to compute for every node in the tree sequence, the mean fraction of samples descending from that node that are from each population. So, suppose we had node in a tree that had 100 samples from p1 below it and 100 samples from p2 below it (and these we only have two populations, p1 and p2). Then the fractions would be 0.5 and 0.5.

Is this similar to what you're thinking about?

from tskit.

gdurif avatar gdurif commented on August 12, 2024

Hi, also thanks for your great software. I have the same request, it would be very useful to be able to get the population origin from each SNP in a genotype. For instance, we have an admixture between two populations and we would like to know for each locus if it is inherited from the first or the second population

In the mean time, do you think that it would be possible to manually get this local ancestry information by checking the breakpoints introduced by recombination combined to an ascendant exploration of the tree structure ?

Thanks again

from tskit.

jeromekelleher avatar jeromekelleher commented on August 12, 2024

Good to know, thanks @gdurif. We thought about this sort of thing a bit for the book chapter on msprime we wrote this year. There's a notebook here that explores it a little bit. Would you mind looking through this and seeing if it answers your questions please?

from tskit.

gdurif avatar gdurif commented on August 12, 2024

@jeromekelleher thanks for this example, I am still working on it but I have hope that it will give me answers about this question.

from tskit.

gdurif avatar gdurif commented on August 12, 2024

I think I was able to solve this problem. @gtsambos I hope it will help you, tell me if you think it corresponds to what you had in mind.

For instance, let's say we have a simulated tree sequence ts with mutations, recombinations and an admixture event. We can get recombination breakpoints with

breakpoints = np.array(list(ts.breakpoints()))

We can get mutation sites with:

mutations = [variant.site.position for variant in ts.variants()]

Then for each mutation sites within a recombination chunk (i.e. between 2 breakpoints), and for an admixed individual at time 0 we just have to explore the corresponding tree until a given time (just before the admixture happens) to get the population ancestry of the SNP:

u0 = 0 # admixed individual
adm_time = 50 # admixture 50 generation ago
tree = ts.first() # first recombination chunk
u = u0 # starting from u0, climb in the tree
while tree.time(u) < adm_time:
    u = tree.parent(u)
print("ancestral pop = {}".format(tree.population(u)))

So for each mutations (and thus each chunk of recombination) and individuals, we just have to repeat this to get the "chromosome painting".

@jeromekelleher, thanks again for your help. I was just wondering if all individuals share the same recombination breakpoints or if it is possible to know when recombinations happen ?

from tskit.

jeromekelleher avatar jeromekelleher commented on August 12, 2024

Great, I'm glad this helped @gdurif. I haven't thought about your solution here and what it does, but there's one important gotcha in there I want to point out:

trees = [tree for tree in ts.trees()] # list of tree

This doesn't work, see the warning here for why.

I think we've probably addressed some of your questions in the book chapter @gdurif, and it's probably worth going through it. If you send me an email ([email protected]) I'll send you the PDF.

from tskit.

gdurif avatar gdurif commented on August 12, 2024

I'll send you an e-mail about the book chapter. Regarding your comment, I edited my post, I missed this point in the doc, thanks for the reminder.

from tskit.

gtsambos avatar gtsambos commented on August 12, 2024

Hi all, sorry for the late response.

@gdurif: I think your idea will work, but it might potentially be quite slow because you are extracting the ancestral node for each sample individual (and each site/mutation?) separately. There will be a lot of repeated information for consecutive sites on the same 'edge' of the tree, and for closely related sample nodes.

I think it would be more efficient to make use of the edge information captured in TreeSequences to do this. I've put a sketch of an idea of how to do this here. As mentioned in #616 , I'm going to keep working on this and can let you know when I have something in a more share-able state!

Copying my supervisor @dvukcevic into this convo too.

from tskit.

jeromekelleher avatar jeromekelleher commented on August 12, 2024

I'm going to close this as I think it's been covered by more recent developments. Feel free to reopen if you disagree.

from tskit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.