Hi @cwnelson88 , Thanks for the fantastic software and the detailed

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

SNPGenie for bacterial samples from the same lineage about snpgenie HOT 4 CLOSED

chasewnelson commented on May 20, 2024 1

SNPGenie for bacterial samples from the same lineage

from snpgenie.

Comments (4)

singing-scientist commented on May 20, 2024 1

Greetings, Sergio! Thanks very much for the question and for using SNPGenie.

First, I am a little confused on the nature of your analysis — it seems as if you may have sequenced 138 individual bacterial genomes and want to compare each one to a single reference genome. If that's the case, then snpgenie.pl is overkill, and will not be appropriate, because it assumes that each sample is a POOL of multiple individuals, and that SNPs are called for sample containing multiple genotypes representative of your population. You'd instead want to simply create FASTA files of each genome and use either SNPGenie's within-group or between-group scripts. Other programs can also do standard one-to-one comparisons for dN/dS.

However, if your samples DO represent deep sequencing to detect SNPs among multiple genomes (e.g., each sample is a bunch of E. coli pooled together), and you're interested in the variation WITHIN each sample, then snpgenie.pl might be appropriate. For the reference genome, you could simply use the reference sequence against which sequencing reads were aligned. This is in fact necessary, because the variants in the SNP report (e.g., VCF file) must be numbered according to the positions within your reference genome. In other words, if I understand your question, the reference genome which made the most sense for read mapping also makes the most sense for SNP calling.

Finally, if you're sequencing 138 distinct E. coli isolates, calling one genotype for each isolate, and want to summarize the variation among those isolates, then it might make sense to summarize the variation in a VCF file, depending on your goal.

Please let me know if that helps, and if you have any further questions as a result!

Yours,
Chase

from snpgenie.

arredondo23 commented on May 20, 2024

Hi @cwnelson88,

Thanks for the quick feedback, much appreciated.

You guessed right! So my samples do not represent a pool of multiple individuals in which I expect multiple genotypes. I will follow the advice on using the SNPGenie's within-group script.

One quick question, since my genomes have a different size should I perform an alignment of orthologous genes and pass that to the script, so for each orthologous gene, should I have a fasta file alignment? Can I pass a core genome alignment instead? In both cases, how should I parse the gtf file?

Once again thanks for all the help provided!

Sergio

from snpgenie.

singing-scientist commented on May 20, 2024

Hello @arredondo23! The explanation of the within-group script is here: https://github.com/chasewnelson/SNPGenie#snpgenie-within

In short, the sequences within the fasta file will need to be aligned, correct. If the fasta contains one gene, the GTF file will have one line corresponding to that one gene, with the start and end coordinates corresponding to the start and end positions of the alignment. Alternatively, if the alignment is a full genome (perhaps a virus) or chromosome, the GTF may contain multiple records for many genes. You'll just want to make sure that your alignment method was codon-aware, i.e., gaps in coding regions are multiples of 3 and do not interrupt reading frame.

Let me know if that helps!
Chase

from snpgenie.

arredondo23 commented on May 20, 2024

Hi @cwnelson88 ,

Thanks for the explanation, it is very clear the way to go now :)

I close the issue, thanks for the quick feedback!

Sergio

from snpgenie.

SNPGenie for bacterial samples from the same lineage about snpgenie HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent