Code Monkey home page Code Monkey logo

Comments (5)

singing-scientist avatar singing-scientist commented on June 11, 2024

Greetings, Wanting! Thanks for this question. In fact, this is the desired behavior. For example, suppose there is a polymorphic site with C=9999 and T=1. The frequency of T is 1 / (9999 + 1) = 0.0001, or 0.01%, and it doesn't matter whether it's the REF or ALT allele. If C is the REF allele, then the VCF will report ALT as T with AF=0.0001; alternatively, if T if the REF allele, then the VCF will report ALT as C with AF=0.9999. However, in both cases there is an allele (T) with a frequency below your cut-off, and it will be eliminated. Put another way, the frequency filter will eliminate an allele regardless of whether it is REF or ALT. Note that which allele is REF is often arbitrary (e.g., a standard reference sequence against which variants were called). Let me know if this makes sense.

If there's a reason you wanted to filter out low-frequency ALT alleles only (but not low-frequency REF alleles), you could filter your VCF file on your own before submitting it to SNPGenie, and then simply not use --minfreq. However, in most circumstances I don't think it makes sense to give low-frequency REF alleles a 'free pass' but eliminate low-frequency ALT alleles. But, it's possible your situation is an exception!

Hope this helps, let me know!
Chase

from snpgenie.

wantingwei avatar wantingwei commented on June 11, 2024

Thank you for response so fast. Just want to confirm that I understand correctly if C is the REF with 0.99% and T is ALT 0.01%. The T will be filtered out but C will remain in the calculation.

from snpgenie.

singing-scientist avatar singing-scientist commented on June 11, 2024

My pleasure! It's correct that, in this example, T will be filtered out and C will not be filtered out. However, regarding calculation, it depends what you mean. If you mean π, there is no longer a 'calculation', i.e., the site will no longer be polymorphic so its π = 0. Let me know!

from snpgenie.

wantingwei avatar wantingwei commented on June 11, 2024

Gotcha. I am trying to compare piN/piS to access selective pressure for my sample. I am sequencing omicron samples, that call variants based on wuhan-1, in that case should I consider those "fixed or 99%" mutation. Or it would be more reasonable to filter the vcf without lower variant and using default minfreq in the program.

from snpgenie.

singing-scientist avatar singing-scientist commented on June 11, 2024

Understood! If you're interested in constraint then πn/πs is the way to go. To me, your question is more about how to make sure to quality filter variants — using minfreq is just one overly simple way to do that. Indeed, when you use a minfreq cutoff of 0.01, you are effectively saying you believe that anything with AF < 1% isn't real, i.e., is a sequencing error.

I see you're working with Tom — if you have questions about study design and QC that's probably beyond the ambit of SNPGenie and GitHub, but feel free to send an email to us! Of interest, we just published a program called VCFgenie to perform QC filtering of iSNVs that you may wish to use:

Let me know!
C

from snpgenie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.