Code Monkey home page Code Monkey logo

Comments (5)

singing-scientist avatar singing-scientist commented on June 11, 2024

Greetings, Pablo! Thanks very much for the question. That all sounds correct; the requirement to run twice (once for each strand) is a bit of a hack, unfortunately necessary because SNPGenie was originally written for one strand only. The genes of each strand can simply be combined as you say -- but be aware that, if there are any overlapping antisense genes (i.e., genes in reverse complement reading frames that overlap the same genome positions), then some genome positions/regions will be represented more than once. Also, some of the individual SITES results will be redundant. Thus, the answer to how the data should be combined depends in part on how you want to treat overlapping genes from separate strands (if they exist). Certainly for non-protein-coding sites, you'll only want to include them once, and results will be redundant between the strands.

For the whole (coding) genome, the question is whether you want the mean value for all sites (codon is the unit) or of all genes (gene is the unit). For the first, you could simply sum numerators and denominators, i.e., mean_dN = sum(N_diffs)/sum(N_sites) and mean_dS = sum(S_diffs)/sum(S_sites). This would reflect the "typical protein-coding site". Contrarily, if you're looking for an average of all genes, you could simply take the means of genes, e.g., mean_dN = mean(vector of all genes' dN). This would reflect the "typical gene". Let me know if that makes sense!

Yours,
Chase

from snpgenie.

pabloangulo7 avatar pabloangulo7 commented on June 11, 2024

Thank you very much, sorry for the late reply but my data is very large and it took me a long time to finish the analysis. I already have the results for each of the chromosomes and for the minus and plus strand. Then following the instructions you gave me in the previous answer, I was about to calculate the dN/dS ratio for each chromosome and for the whole genome, but to make sure I do it correctly, to do the calculation, for example, for all the sites, I would have to use the file called "codon_results.txt" and use the columns of N_diffs, S_diffs, N_Sites and S_Sites, right? And to do the calculation by genes, would it be the same but with the file "product_results.txt"?

Thanks in advance

from snpgenie.

singing-scientist avatar singing-scientist commented on June 11, 2024

Hello! Both codon_results and product_results should yield the same results, i.e., product_results are just the sum of codon_results, for each gene. The question is whether you want to add all the N_diffs, S_diffs, N_Sites and S_Sites together, then calculate dN/dS (mean of all coding sites); or first calculate dN/dS for each product, and take the mean of those values (mean of all genes). Let me know if that helps!

C

from snpgenie.

pabloangulo7 avatar pabloangulo7 commented on June 11, 2024

Thank you very much, I think I understand it better now. On the other hand, I was reviewing the results and there is one thing that doesn't add up. In one of the "site_results.txt" files, one of the SNPs is missing, which for some reason the program has filtered out, but I don't quite know why. In the temporary file of the sample (generated by SNPGenie) "temp_snp_report_VY02.txt" this variant does appear. So, there would be a total of 37 SNPs although only 36 SNPs are analyzed in the results. I attach the files, and specifically it is the variant at site 1580099 (line 27 in the file). Let me know if you need more info.

Thanks in advance
temp_snp_report_VY02.txt
site_results.txt

from snpgenie.

singing-scientist avatar singing-scientist commented on June 11, 2024

Greetings, Pablo! Indeed I see what you mean and confirm the SNP at 1580099 is present in temp but not the sites file. It is not really possible for me to troubleshoot this without the input; if you'd like to supply the 3 files and command that give rise to this result I'd be happy to check it out and determine what's happening.

Regarding dN or dS equal to 0, ideally one would perform a statistical test to determine if dN=dS can be rejected with significance. Normally this is done with bootstrapping, but when dS=0 there's no point. Thus as a rule of thumb, I generally consider that if dS (which reflects mutation rate) is 0, there's not enough variation to have statistical power to detect a difference between dN and dS. I'd report the ratio as undefined (certainly don't infer dN>dS or positive selection usually). If dN is 0 that's a different story (again because dS reflects mutation rate) and one can infer strong purifying selection given a proper statistical test. Let me know if that helps!

from snpgenie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.