Good afternoon! Sorry to leave another response, I have been trying for several days t

Multiple Trees in File about bad_mutations HOT 6 OPEN

mmckibben commented on August 16, 2024

Multiple Trees in File

from bad_mutations.

Comments (6)

mmckibben commented on August 16, 2024

update: I changed the biopython portion from read() to parse() and was able to run the tree, however I am getting p-values of 0 for every SNP in the test files.

from bad_mutations.

TomJKono commented on August 16, 2024

Hello,

The first issue (multiple trees in file) looks like a real bug - I will start working on the tree sanitization script to address that. The second issue regarding P-values of 0 seems like more of a complicated case. I will look into that once the tree sanitizing is in place and I can reliably reproduce the problematic files. Thank you, and apologies for the inconvenience!

from bad_mutations.

mmckibben commented on August 16, 2024

Thank you for your help, please let me know if you would like more information or need any help! As an update I tried running the tree again with 42 species instead of all the 90 (what currently downloads) and now get proper p-values, however all of them are significant, even for those in the test data which should not be.

from bad_mutations.

TomJKono commented on August 16, 2024

Hello,

Thank you for your patience! I have pushed some edits to the dev branch that should address the tree parsing issue. With regard to the P-value issue, can you share the output of the predict and compile commands? I can make sure that the columns of the HYPHY output are being parsed and handled properly. Thank you!

from bad_mutations.

mmckibben commented on August 16, 2024

Thanks again for the help! I have attached all the files and logs associated. Please note if I run the predict command on the provided tree and MSA fasta file I get similar p-values to those in the sample output files. I also tried running both commands on some of my own data and it gives highly significant p-values (e^-10) for all possible snps, which seems somewhat suspicious to me.

CBF3.tree.txt
CBF3_MSA.fasta.txt
CBF3_Alignment.log.txt
CBF3_Predictions.log.txt
messages.log.txt

from bad_mutations.

TomJKono commented on August 16, 2024

Hello,

Thank you for sharing the files! It looks like the alignments and tree files are generating properly and the HYPHY log does not show any problematic output. The issue might be that the logistic regression model for generating P-values does not really translate to analyses that use a different set of query genomes than the ones for which it was developed. My hunch with that is that the total substitution rate observed depends on the number of sequences in the alignment.

I think the easiest way to get a usable prediction from the output of BAD_Mutations when using a different set of sequences, you may be able to use the heuristics that were originally used in the Chun and Fay 2009 (https://genome.cshlp.org/content/19/9/1553.abstract) paper:

Deleterious mutations were predicted by nonsynonymous SNPs that disrupt significantly constrained codons defined by the LRT (P < 0.001) and a number of subsequent filters (Supplemental Table S1). First, positions with low power, <10 eutherian mammals, were eliminated. Second, a small number of sites with dN significantly greater than dS were discarded. Finally, positions where the derived deleterious allele occurred in another eutherian species were eliminated.

I'm sorry that the best solution for now is to go back to heuristic approaches for prediction. We applied similar heuristics in our 2016 paper, which you can see implemented in this old script: https://github.com/MorrellLAB/Deleterious_Mutations/blob/master/Analysis_Scripts/Count_Deleterious_By_Sample.py (function defined on lines 25-63). I'll look into getting a new logistic regression model with a more modern set of genomes for the next release of BAD_Mutations. My apologies!

from bad_mutations.

Multiple Trees in File about bad_mutations HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent