by Michel Heiniger and Patrick Pfeifer
Various research papers published during the past three decades are focusing on this particular problem. One paper by Frank Wright that seems to be widely cited and was published on the 1st of March 1990 in Volume 87, Issue 1, Pages 23–29 of Elsevier's "Gene" Journal is titled "The ‘effective number of codons’ used in a gene". It is available at http://dx.doi.org/10.1016/0378-1119(90)90491-9
.The Student's
"The codon usage table of a gene can be subdivided according to the number of synonymous codons belonging to each aa. Thus for a gene using the 'universal' code, there are 2 aa with only one codon choice, 9 with two, 1 with three, 5 with four, and 3 with six. These represent five SF types, designated SF types 1, 2, 3, 4, and 6 according to their respective number of synonymous codons." (Wright, 1990) (aa = amino-acid)
Then, for each aa, calculate:
\begin{align}\hat{F} &= \frac{n \displaystyle\sum_{i = 1}^{k}{(p_i^2 - 1)} }{n - 1}\end{align}
And for a complete gene, calculate:
\begin{align}\hat{N}_c &= 2 + 9 / \overline{\hat{F}_2} + 1 / \hat{F}_3 + 5 / \overline{\hat{F}_4} + 3 / \overline{\hat{F}_6} \end{align}
"where
We will (a) carry out
Assuming there are
We examine the actual codon usage in E. Coli and S. solfataricus. For a given amino-acid, e.g. Glycin: CGA, GGC, GGG, GGT, we examine if codon-usage satisfies the null-hypothesis.
e.g.
To test this, assuming the null-hypothesis, we can calculate the probability that out
of