Code Monkey home page Code Monkey logo

Comments (6)

mmckibben avatar mmckibben commented on August 16, 2024

update: I changed the biopython portion from read() to parse() and was able to run the tree, however I am getting p-values of 0 for every SNP in the test files.

from bad_mutations.

TomJKono avatar TomJKono commented on August 16, 2024

Hello,

The first issue (multiple trees in file) looks like a real bug - I will start working on the tree sanitization script to address that. The second issue regarding P-values of 0 seems like more of a complicated case. I will look into that once the tree sanitizing is in place and I can reliably reproduce the problematic files. Thank you, and apologies for the inconvenience!

from bad_mutations.

mmckibben avatar mmckibben commented on August 16, 2024

Thank you for your help, please let me know if you would like more information or need any help! As an update I tried running the tree again with 42 species instead of all the 90 (what currently downloads) and now get proper p-values, however all of them are significant, even for those in the test data which should not be.

from bad_mutations.

TomJKono avatar TomJKono commented on August 16, 2024

Hello,

Thank you for your patience! I have pushed some edits to the dev branch that should address the tree parsing issue. With regard to the P-value issue, can you share the output of the predict and compile commands? I can make sure that the columns of the HYPHY output are being parsed and handled properly. Thank you!

from bad_mutations.

mmckibben avatar mmckibben commented on August 16, 2024

Thanks again for the help! I have attached all the files and logs associated. Please note if I run the predict command on the provided tree and MSA fasta file I get similar p-values to those in the sample output files. I also tried running both commands on some of my own data and it gives highly significant p-values (e-10) for all possible snps, which seems somewhat suspicious to me.

CBF3.tree.txt
CBF3_MSA.fasta.txt
CBF3_Alignment.log.txt
CBF3_Predictions.log.txt
messages.log.txt

from bad_mutations.

TomJKono avatar TomJKono commented on August 16, 2024

Hello,

Thank you for sharing the files! It looks like the alignments and tree files are generating properly and the HYPHY log does not show any problematic output. The issue might be that the logistic regression model for generating P-values does not really translate to analyses that use a different set of query genomes than the ones for which it was developed. My hunch with that is that the total substitution rate observed depends on the number of sequences in the alignment.

I think the easiest way to get a usable prediction from the output of BAD_Mutations when using a different set of sequences, you may be able to use the heuristics that were originally used in the Chun and Fay 2009 (https://genome.cshlp.org/content/19/9/1553.abstract) paper:

Deleterious mutations were predicted by nonsynonymous SNPs that disrupt significantly constrained codons defined by the LRT (P < 0.001) and a number of subsequent filters (Supplemental Table S1). First, positions with low power, <10 eutherian mammals, were eliminated. Second, a small number of sites with dN significantly greater than dS were discarded. Finally, positions where the derived deleterious allele occurred in another eutherian species were eliminated.

I'm sorry that the best solution for now is to go back to heuristic approaches for prediction. We applied similar heuristics in our 2016 paper, which you can see implemented in this old script: https://github.com/MorrellLAB/Deleterious_Mutations/blob/master/Analysis_Scripts/Count_Deleterious_By_Sample.py (function defined on lines 25-63). I'll look into getting a new logistic regression model with a more modern set of genomes for the next release of BAD_Mutations. My apologies!

from bad_mutations.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.