Code Monkey home page Code Monkey logo

Comments (5)

btmartin721 avatar btmartin721 commented on June 8, 2024

RaxML considers those invariant sites because the N could be an A, T, G, or C. My script should remove those two sites, which is what it needs to do because RaxML will crash otherwise. It's not the normal definition of monomorphic sites but that's how RaxML handles them.

from raxml_ascbias.

slvrshot avatar slvrshot commented on June 8, 2024

I'm trying to think what else could be causing the inflated counts because their sum (sum of A/C/G/Ts) is basically the number of sites of the entire bacterial genome: roughly 2.2M. If all the sites are invariant that would imply the isolates are clonal which they are not.

from raxml_ascbias.

btmartin721 avatar btmartin721 commented on June 8, 2024

It could be gaps. I'm not sure on that but if you want I can take a look this weekend. I seem to remember that this script might not be perfect in all datasets, and i think maybe gaps throw it off. I'm normally very thorough with my coding and testing practices but this was one of my early scripts when I was learning python and I've noticed it sometimes has issues. I also wrote this as a quick and dirty script for a specific need that I had.

If it's the same dataset you sent me the other day I'll go ahead and look at it. I'm currently not super speedy about getting to this stuff lately because I have a newborn baby at home but I'll try to work on it this weekend.

from raxml_ascbias.

slvrshot avatar slvrshot commented on June 8, 2024

It could be gaps. I'm not sure on that but if you want I can take a look this weekend. I seem to remember that this script might not be perfect in all datasets, and i think maybe gaps throw it off. I'm normally very thorough with my coding and testing practices but this was one of my early scripts when I was learning python and I've noticed it sometimes has issues. I also wrote this as a quick and dirty script for a specific need that I had.

If it's the same dataset you sent me the other day I'll go ahead and look at it. I'm currently not super speedy about getting to this stuff lately because I have a newborn baby at home but I'll try to work on it this weekend.

Congrats on the baby! I totally understand. I can already tell you that the whole genome SNP alignment was generated in snippy and there are gaps and Ns. I'm awful with python so kudos to you. I might need to send you a dropbox link though to another file. I'll probably send that later tonight. Thanks!

from raxml_ascbias.

slvrshot avatar slvrshot commented on June 8, 2024

Okay I am closing this for real now. You have been a big help. After thinking about this more...I found the solution. I used snp-sites -cbp outputprefix input.phylip option to extract the monomorphic sites ACGT only and export in a phylip file. Then I used the phylip file as input into your script. The results are the exact same as the internal script. So everything is fine. Enjoy time with the new baby. You can ignore the .aln file I sent. Take care!

from raxml_ascbias.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.