Comments (5)
Greetings, Wanting! Thanks for this question. In fact, this is the desired behavior. For example, suppose there is a polymorphic site with C=9999 and T=1. The frequency of T is 1 / (9999 + 1) = 0.0001, or 0.01%, and it doesn't matter whether it's the REF or ALT allele. If C is the REF allele, then the VCF will report ALT as T with AF=0.0001; alternatively, if T if the REF allele, then the VCF will report ALT as C with AF=0.9999. However, in both cases there is an allele (T) with a frequency below your cut-off, and it will be eliminated. Put another way, the frequency filter will eliminate an allele regardless of whether it is REF or ALT. Note that which allele is REF is often arbitrary (e.g., a standard reference sequence against which variants were called). Let me know if this makes sense.
If there's a reason you wanted to filter out low-frequency ALT alleles only (but not low-frequency REF alleles), you could filter your VCF file on your own before submitting it to SNPGenie, and then simply not use --minfreq. However, in most circumstances I don't think it makes sense to give low-frequency REF alleles a 'free pass' but eliminate low-frequency ALT alleles. But, it's possible your situation is an exception!
Hope this helps, let me know!
Chase
from snpgenie.
Thank you for response so fast. Just want to confirm that I understand correctly if C is the REF with 0.99% and T is ALT 0.01%. The T will be filtered out but C will remain in the calculation.
from snpgenie.
My pleasure! It's correct that, in this example, T will be filtered out and C will not be filtered out. However, regarding calculation, it depends what you mean. If you mean π, there is no longer a 'calculation', i.e., the site will no longer be polymorphic so its π = 0. Let me know!
from snpgenie.
Gotcha. I am trying to compare piN/piS to access selective pressure for my sample. I am sequencing omicron samples, that call variants based on wuhan-1, in that case should I consider those "fixed or 99%" mutation. Or it would be more reasonable to filter the vcf without lower variant and using default minfreq in the program.
from snpgenie.
Understood! If you're interested in constraint then πn/πs is the way to go. To me, your question is more about how to make sure to quality filter variants — using minfreq is just one overly simple way to do that. Indeed, when you use a minfreq cutoff of 0.01, you are effectively saying you believe that anything with AF < 1% isn't real, i.e., is a sequencing error.
I see you're working with Tom — if you have questions about study design and QC that's probably beyond the ambit of SNPGenie and GitHub, but feel free to send an email to us! Of interest, we just published a program called VCFgenie to perform QC filtering of iSNVs that you may wish to use:
- paper: https://academic.oup.com/ve/advance-article/doi/10.1093/ve/veae013/7606455
- github page: https://github.com/chasewnelson/VCFgenie
Let me know!
C
from snpgenie.
Related Issues (20)
- triplet error for spliced proteins HOT 2
- Negative values for mean_gdiv_polymorphic HOT 1
- What is the best option? HOT 1
- Empty output HOT 8
- SNP genie
- Coverage warning HOT 4
- within-host diversity influenza whole genome HOT 11
- within-host diversity analysis : one individual, different time-points HOT 4
- All classified as synonymous HOT 3
- How to join the output for a whole genome analysis HOT 5
- GTF file does not contain any sense (+) strand products HOT 12
- Need help to determine method for inference of convergent evolution HOT 1
- CDS annotation(s) does not have a gene_id HOT 2
- Using SNPGenie on VCF from RAD-seq HOT 1
- gtf2revcom.pl script issue HOT 2
- VCF has no header
- No snps problems HOT 4
- Issue with SNPGenie_sliding_windows.R HOT 4
- Warning for coverage and nucleotide sums HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from snpgenie.