Comments (11)
Hi @jeromekelleher and @gdurif - thanks for your comments and thoughtful engagement with this! I am about to fly to New Zealand and will be computer-less for a few days, but will engage with this properly next week when my brain is back in work mode.
Just quickly though:
Thanks for the feature request @gtsambos --- we're actually thinking about something pretty similar at the moment where we want to compute for every node in the tree sequence, the mean fraction of samples descending from that node that are from each population. So, suppose we had node in a tree that had 100 samples from p1 below it and 100 samples from p2 below it (and these we only have two populations, p1 and p2). Then the fractions would be 0.5 and 0.5.
Is this similar to what you're thinking about?
I was thinking about this at the more de-aggregated level of individual leaf nodes, like @gdurif is describing. Say the two admixing populations are labelled X and Y, and you're using msprime to generate sequences from admixed descendants of these populations. Then for any specific tree in a TreeSequence, every leaf node should descend from exactly one of those populations, and thus have ancestry with that population. Say your TreeSequence has 3 trees spanning the intervals [0,1], [1,2] and [2,3]; then a sample might have ancestry X on [0,1], Y on [1,2] and X on [2,3] for example.
Will write more when I'm back from holidays!
from tskit.
Blast from the past! 🕰
from tskit.
Thanks for the feature request @gtsambos --- we're actually thinking about something pretty similar at the moment where we want to compute for every node in the tree sequence, the mean fraction of samples descending from that node that are from each population. So, suppose we had node in a tree that had 100 samples from p1 below it and 100 samples from p2 below it (and these we only have two populations, p1 and p2). Then the fractions would be 0.5 and 0.5.
Is this similar to what you're thinking about?
from tskit.
Hi, also thanks for your great software. I have the same request, it would be very useful to be able to get the population origin from each SNP in a genotype. For instance, we have an admixture between two populations and we would like to know for each locus if it is inherited from the first or the second population
In the mean time, do you think that it would be possible to manually get this local ancestry information by checking the breakpoints introduced by recombination combined to an ascendant exploration of the tree structure ?
Thanks again
from tskit.
Good to know, thanks @gdurif. We thought about this sort of thing a bit for the book chapter on msprime we wrote this year. There's a notebook here that explores it a little bit. Would you mind looking through this and seeing if it answers your questions please?
from tskit.
@jeromekelleher thanks for this example, I am still working on it but I have hope that it will give me answers about this question.
from tskit.
I think I was able to solve this problem. @gtsambos I hope it will help you, tell me if you think it corresponds to what you had in mind.
For instance, let's say we have a simulated tree sequence ts
with mutations, recombinations and an admixture event. We can get recombination breakpoints with
breakpoints = np.array(list(ts.breakpoints()))
We can get mutation sites with:
mutations = [variant.site.position for variant in ts.variants()]
Then for each mutation sites within a recombination chunk (i.e. between 2 breakpoints), and for an admixed individual at time 0 we just have to explore the corresponding tree until a given time (just before the admixture happens) to get the population ancestry of the SNP:
u0 = 0 # admixed individual
adm_time = 50 # admixture 50 generation ago
tree = ts.first() # first recombination chunk
u = u0 # starting from u0, climb in the tree
while tree.time(u) < adm_time:
u = tree.parent(u)
print("ancestral pop = {}".format(tree.population(u)))
So for each mutations (and thus each chunk of recombination) and individuals, we just have to repeat this to get the "chromosome painting".
@jeromekelleher, thanks again for your help. I was just wondering if all individuals share the same recombination breakpoints or if it is possible to know when recombinations happen ?
from tskit.
Great, I'm glad this helped @gdurif. I haven't thought about your solution here and what it does, but there's one important gotcha in there I want to point out:
trees = [tree for tree in ts.trees()] # list of tree
This doesn't work, see the warning here for why.
I think we've probably addressed some of your questions in the book chapter @gdurif, and it's probably worth going through it. If you send me an email ([email protected]) I'll send you the PDF.
from tskit.
I'll send you an e-mail about the book chapter. Regarding your comment, I edited my post, I missed this point in the doc, thanks for the reminder.
from tskit.
Hi all, sorry for the late response.
@gdurif: I think your idea will work, but it might potentially be quite slow because you are extracting the ancestral node for each sample individual (and each site/mutation?) separately. There will be a lot of repeated information for consecutive sites on the same 'edge' of the tree, and for closely related sample nodes.
I think it would be more efficient to make use of the edge information captured in TreeSequences to do this. I've put a sketch of an idea of how to do this here. As mentioned in #616 , I'm going to keep working on this and can let you know when I have something in a more share-able state!
Copying my supervisor @dvukcevic into this convo too.
from tskit.
I'm going to close this as I think it's been covered by more recent developments. Feel free to reopen if you disagree.
from tskit.
Related Issues (20)
- Use reusable workflows across repos? HOT 2
- Error pip install tskit on Windows HOT 2
- Add node_is_sample array HOT 2
- TreeSequence.f2 is not symmetric with multiallelic sites HOT 3
- Bug assertion in `ts.allele_frequency_spectrum` HOT 8
- Folded AFS between branch and site modes differs by factor of two HOT 7
- edges lost after merging two trees HOT 20
- Codecov upload issues HOT 1
- keep_intervals() giving _tskit.LibraryError: Can't squash, flush, simplify or link ancestors... HOT 7
- Update GitHub upload/download artefacts
- Add XTable.drop_metadata HOT 1
- trees.c Compilation Error HOT 2
- Split large numbers in html/cli print out.
- Post release for 0.5.7 HOT 1
- Fixup tests for lshmm 0.0.6 HOT 7
- Fix tests for numpy 2.0 HOT 1
- Drop "benchmark" CI job?
- Pass a numpy array of booleans to python simplify? HOT 7
- Cannot pickle '_tskit.Tree' object HOT 3
- AFS folding (polarised=False) HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tskit.