Code Monkey home page Code Monkey logo

Comments (4)

singing-scientist avatar singing-scientist commented on May 20, 2024 1

Greetings, Sergio! Thanks very much for the question and for using SNPGenie.

First, I am a little confused on the nature of your analysis — it seems as if you may have sequenced 138 individual bacterial genomes and want to compare each one to a single reference genome. If that's the case, then snpgenie.pl is overkill, and will not be appropriate, because it assumes that each sample is a POOL of multiple individuals, and that SNPs are called for sample containing multiple genotypes representative of your population. You'd instead want to simply create FASTA files of each genome and use either SNPGenie's within-group or between-group scripts. Other programs can also do standard one-to-one comparisons for dN/dS.

However, if your samples DO represent deep sequencing to detect SNPs among multiple genomes (e.g., each sample is a bunch of E. coli pooled together), and you're interested in the variation WITHIN each sample, then snpgenie.pl might be appropriate. For the reference genome, you could simply use the reference sequence against which sequencing reads were aligned. This is in fact necessary, because the variants in the SNP report (e.g., VCF file) must be numbered according to the positions within your reference genome. In other words, if I understand your question, the reference genome which made the most sense for read mapping also makes the most sense for SNP calling.

Finally, if you're sequencing 138 distinct E. coli isolates, calling one genotype for each isolate, and want to summarize the variation among those isolates, then it might make sense to summarize the variation in a VCF file, depending on your goal.

Please let me know if that helps, and if you have any further questions as a result!

Yours,
Chase

from snpgenie.

arredondo23 avatar arredondo23 commented on May 20, 2024

Hi @cwnelson88,

Thanks for the quick feedback, much appreciated.

You guessed right! So my samples do not represent a pool of multiple individuals in which I expect multiple genotypes. I will follow the advice on using the SNPGenie's within-group script.

One quick question, since my genomes have a different size should I perform an alignment of orthologous genes and pass that to the script, so for each orthologous gene, should I have a fasta file alignment? Can I pass a core genome alignment instead? In both cases, how should I parse the gtf file?

Once again thanks for all the help provided!

Sergio

from snpgenie.

singing-scientist avatar singing-scientist commented on May 20, 2024

Hello @arredondo23! The explanation of the within-group script is here: https://github.com/chasewnelson/SNPGenie#snpgenie-within

In short, the sequences within the fasta file will need to be aligned, correct. If the fasta contains one gene, the GTF file will have one line corresponding to that one gene, with the start and end coordinates corresponding to the start and end positions of the alignment. Alternatively, if the alignment is a full genome (perhaps a virus) or chromosome, the GTF may contain multiple records for many genes. You'll just want to make sure that your alignment method was codon-aware, i.e., gaps in coding regions are multiples of 3 and do not interrupt reading frame.

Let me know if that helps!
Chase

from snpgenie.

arredondo23 avatar arredondo23 commented on May 20, 2024

Hi @cwnelson88 ,

Thanks for the explanation, it is very clear the way to go now :)

I close the issue, thanks for the quick feedback!

Sergio

from snpgenie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.