Comments (5)
Greetings, Pablo! Thanks very much for the question. That all sounds correct; the requirement to run twice (once for each strand) is a bit of a hack, unfortunately necessary because SNPGenie was originally written for one strand only. The genes of each strand can simply be combined as you say -- but be aware that, if there are any overlapping antisense genes (i.e., genes in reverse complement reading frames that overlap the same genome positions), then some genome positions/regions will be represented more than once. Also, some of the individual SITES results will be redundant. Thus, the answer to how the data should be combined depends in part on how you want to treat overlapping genes from separate strands (if they exist). Certainly for non-protein-coding sites, you'll only want to include them once, and results will be redundant between the strands.
For the whole (coding) genome, the question is whether you want the mean value for all sites (codon is the unit) or of all genes (gene is the unit). For the first, you could simply sum numerators and denominators, i.e., mean_dN = sum(N_diffs)/sum(N_sites) and mean_dS = sum(S_diffs)/sum(S_sites). This would reflect the "typical protein-coding site". Contrarily, if you're looking for an average of all genes, you could simply take the means of genes, e.g., mean_dN = mean(vector of all genes' dN). This would reflect the "typical gene". Let me know if that makes sense!
Yours,
Chase
from snpgenie.
Thank you very much, sorry for the late reply but my data is very large and it took me a long time to finish the analysis. I already have the results for each of the chromosomes and for the minus and plus strand. Then following the instructions you gave me in the previous answer, I was about to calculate the dN/dS ratio for each chromosome and for the whole genome, but to make sure I do it correctly, to do the calculation, for example, for all the sites, I would have to use the file called "codon_results.txt" and use the columns of N_diffs, S_diffs, N_Sites and S_Sites, right? And to do the calculation by genes, would it be the same but with the file "product_results.txt"?
Thanks in advance
from snpgenie.
Hello! Both codon_results and product_results should yield the same results, i.e., product_results are just the sum of codon_results, for each gene. The question is whether you want to add all the N_diffs, S_diffs, N_Sites and S_Sites together, then calculate dN/dS (mean of all coding sites); or first calculate dN/dS for each product, and take the mean of those values (mean of all genes). Let me know if that helps!
C
from snpgenie.
Thank you very much, I think I understand it better now. On the other hand, I was reviewing the results and there is one thing that doesn't add up. In one of the "site_results.txt" files, one of the SNPs is missing, which for some reason the program has filtered out, but I don't quite know why. In the temporary file of the sample (generated by SNPGenie) "temp_snp_report_VY02.txt" this variant does appear. So, there would be a total of 37 SNPs although only 36 SNPs are analyzed in the results. I attach the files, and specifically it is the variant at site 1580099 (line 27 in the file). Let me know if you need more info.
Thanks in advance
temp_snp_report_VY02.txt
site_results.txt
from snpgenie.
Greetings, Pablo! Indeed I see what you mean and confirm the SNP at 1580099 is present in temp but not the sites file. It is not really possible for me to troubleshoot this without the input; if you'd like to supply the 3 files and command that give rise to this result I'd be happy to check it out and determine what's happening.
Regarding dN or dS equal to 0, ideally one would perform a statistical test to determine if dN=dS can be rejected with significance. Normally this is done with bootstrapping, but when dS=0 there's no point. Thus as a rule of thumb, I generally consider that if dS (which reflects mutation rate) is 0, there's not enough variation to have statistical power to detect a difference between dN and dS. I'd report the ratio as undefined (certainly don't infer dN>dS or positive selection usually). If dN is 0 that's a different story (again because dS reflects mutation rate) and one can infer strong purifying selection given a proper statistical test. Let me know if that helps!
from snpgenie.
Related Issues (20)
- triplet error for spliced proteins HOT 2
- Negative values for mean_gdiv_polymorphic HOT 1
- What is the best option? HOT 1
- Empty output HOT 8
- SNP genie
- Coverage warning HOT 4
- within-host diversity influenza whole genome HOT 11
- within-host diversity analysis : one individual, different time-points HOT 4
- All classified as synonymous HOT 3
- GTF file does not contain any sense (+) strand products HOT 12
- Need help to determine method for inference of convergent evolution HOT 1
- CDS annotation(s) does not have a gene_id HOT 2
- Using SNPGenie on VCF from RAD-seq HOT 1
- gtf2revcom.pl script issue HOT 2
- VCF has no header
- No snps problems HOT 4
- Issue with SNPGenie_sliding_windows.R HOT 4
- Warning for coverage and nucleotide sums HOT 3
- problem with minfreq HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from snpgenie.