Comments (4)
Greetings, Sergio! Thanks very much for the question and for using SNPGenie.
First, I am a little confused on the nature of your analysis — it seems as if you may have sequenced 138 individual bacterial genomes and want to compare each one to a single reference genome. If that's the case, then snpgenie.pl is overkill, and will not be appropriate, because it assumes that each sample is a POOL of multiple individuals, and that SNPs are called for sample containing multiple genotypes representative of your population. You'd instead want to simply create FASTA files of each genome and use either SNPGenie's within-group or between-group scripts. Other programs can also do standard one-to-one comparisons for dN/dS.
However, if your samples DO represent deep sequencing to detect SNPs among multiple genomes (e.g., each sample is a bunch of E. coli pooled together), and you're interested in the variation WITHIN each sample, then snpgenie.pl might be appropriate. For the reference genome, you could simply use the reference sequence against which sequencing reads were aligned. This is in fact necessary, because the variants in the SNP report (e.g., VCF file) must be numbered according to the positions within your reference genome. In other words, if I understand your question, the reference genome which made the most sense for read mapping also makes the most sense for SNP calling.
Finally, if you're sequencing 138 distinct E. coli isolates, calling one genotype for each isolate, and want to summarize the variation among those isolates, then it might make sense to summarize the variation in a VCF file, depending on your goal.
Please let me know if that helps, and if you have any further questions as a result!
Yours,
Chase
from snpgenie.
Hi @cwnelson88,
Thanks for the quick feedback, much appreciated.
You guessed right! So my samples do not represent a pool of multiple individuals in which I expect multiple genotypes. I will follow the advice on using the SNPGenie's within-group script.
One quick question, since my genomes have a different size should I perform an alignment of orthologous genes and pass that to the script, so for each orthologous gene, should I have a fasta file alignment? Can I pass a core genome alignment instead? In both cases, how should I parse the gtf file?
Once again thanks for all the help provided!
Sergio
from snpgenie.
Hello @arredondo23! The explanation of the within-group script is here: https://github.com/chasewnelson/SNPGenie#snpgenie-within
In short, the sequences within the fasta file will need to be aligned, correct. If the fasta contains one gene, the GTF file will have one line corresponding to that one gene, with the start and end coordinates corresponding to the start and end positions of the alignment. Alternatively, if the alignment is a full genome (perhaps a virus) or chromosome, the GTF may contain multiple records for many genes. You'll just want to make sure that your alignment method was codon-aware, i.e., gaps in coding regions are multiples of 3 and do not interrupt reading frame.
Let me know if that helps!
Chase
from snpgenie.
Hi @cwnelson88 ,
Thanks for the explanation, it is very clear the way to go now :)
I close the issue, thanks for the quick feedback!
Sergio
from snpgenie.
Related Issues (20)
- triplet error for spliced proteins HOT 2
- Negative values for mean_gdiv_polymorphic HOT 1
- What is the best option? HOT 1
- Empty output HOT 8
- SNP genie
- Coverage warning HOT 4
- within-host diversity influenza whole genome HOT 11
- within-host diversity analysis : one individual, different time-points HOT 4
- All classified as synonymous HOT 3
- How to join the output for a whole genome analysis HOT 5
- GTF file does not contain any sense (+) strand products HOT 12
- Need help to determine method for inference of convergent evolution HOT 1
- CDS annotation(s) does not have a gene_id HOT 2
- Using SNPGenie on VCF from RAD-seq HOT 1
- gtf2revcom.pl script issue HOT 2
- VCF has no header
- No snps problems HOT 4
- Issue with SNPGenie_sliding_windows.R HOT 4
- Warning for coverage and nucleotide sums HOT 3
- problem with minfreq HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from snpgenie.