What is the difference between synonymous and nonsynonymous substitution
Perler F. Cell — Rzhetsky A. Sawyer S. Sharp P. Shields D. Shimmin L. Stephens J. Tajima F. Takahata N. In New aspects of the genetics of molecular evolution eds M. Kimura and N. Tamura K. Tillier E. CAS Google Scholar. Uzzell T. Wakeley J. Trends Ecol. Wolfe K. Yang Z. Zharkikh A. Download references. You can also search for this author in PubMed Google Scholar. Reprints and Permissions. Ina, Y. Pattern of synonymous and nonsynonymous substitutions: An indicator of mechanisms of molecular evolution.
Download citation. Anyone you share the following link with will be able to read this content:. Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative. Skip to main content. Search SpringerLink Search. Abstract Comparison of numbers of synonymous and nonsynonymous substitutions is useful for understanding mechanisms of molecular evolution.
References Berg O. Geneticsi — Google Scholar Kimura M. Genetics — Google Scholar Tajima F. Rights and permissions Reprints and Permissions.
About this article Cite this article Ina, Y. Copy to clipboard. However, a good approximate method should not deviate too far from the truth with an infinite amount of data. The second approach we take is computer simulation.
Finite data sets are generated by simulation and then analyzed by different methods to examine their biases and sampling variances. Three sets of base frequencies at codon positions are used. The first set has equal base codon frequencies. The second set is from primate mitochondrial protein-coding genes and has very biased base frequencies. The third set is from HIV env genes Table 2. The universal genetic code is used in both the consistency analysis and the computer simulation.
Consistency is the property that the estimate converges to the true value of the parameter as the amount of data approaches infinity. While consistency is a weak requirement, the approximate methods examined here are all inconsistent. It is nevertheless interesting to examine which steps of the approximate methods i. This is relatively easy since infinite data do not involve sampling errors, and estimates of sites S and N and rates d S and d N can be directly compared with the correct values.
Results for the three sets of codon frequencies are shown in Figure 2 A — I. We consider the NG method first. In this case, NG indeed gives estimates close to the true values. This bias is mainly generated in the step of counting sites. However, compared with the bias in counting sites, the bias in counting differences is much less important because there may not be many pairs of codons that are different at two or three positions and because different pathways may involve the same numbers of synonymous and nonsynonymous changes.
As mentioned above, NG underestimates S considerably This pattern appears to be due to the use of equal weighting of pathways in counting differences. Results obtained using base frequencies at the three codon positions from the primate mitochondrial genes see Table 2 are shown in Figure 2 D — F.
The results are very different from those of Figure 2 A — C under equal codon frequencies. The two methods give very different counts of sites S. While transition bias always leads to more synonymous sites, the effect of base frequency bias is more complicated. There tend to be more synonymous sites if the two most frequent nucleotides at third positions are both purines or both pyrimidines.
For the mitochondrial genes, S is much smaller than expected under equal base frequencies, causing NG to overestimate rather than underestimate S. The overestimation of S caused by ignoring the base frequency bias more than compensates for the underestimation caused by ignoring the transition bias.
Nevertheless, it should be noted that the observed pattern depends on the particular set of codon frequencies. Since the new method counts sites correctly see Table 3 , the bias must be due to counting of differences and correction for multiple hits.
Base frequencies in this gene are less biased than are those in the mitochondrial genes, and the effect of ignoring the base frequency bias is minor. Patterns in Figure 2 G — I are quite similar to those for equal codon frequencies Fig. The estimates are biased toward 1, mainly due to the use of equal weighting in counting differences. The bias is not as extreme as it is for mitochondrial genes. The data consist of a pair of codon sequences and are simulated by sampling codon site patterns from the multinomial distribution specified by the site pattern probabilities f ij eq.
Estimates from mitochondrial genes vary considerably among data sets, from 2 or 3 to over Standard errors for the ML estimates are also presented, while those for other methods not shown are very small due to the use of many more replicates. Averages of d S and d N are calculated for all methods but not shown. We note that the simulation results are highly consistent with those found for infinite data, discussed above.
For example, if a method gives estimates smaller than the true value in infinite data, it tends to have negative biases in finite samples as well.
ML estimates are known to be often biased in small samples. The bias is nevertheless small. These results agree well with previous simulations by Ota and Nei and Muse , who used similar simple models to examine the performance of NG.
However, NG is biased in most other parameter combinations. The biases in general agree with findings of the consistency analysis Fig. In almost all other cases, ML has smaller biases than NG. This is the same pattern as that found in infinite data Fig. The new method appears to have little bias over most of the parameter space examined Table 4. The new method appears to provide a close approximation of ML over the range of parameter values examined.
Since all methods are biased for at least some parameter combinations, the mean squared error MSE may be an appropriate criterion by which to compare methods. The square root of the MSE is plotted in Figure 3 against the sequence length number of codons. Two parameter combinations are considered. In the first case Fig. In the second case Fig. In both cases, the new method lies between NG and ML. We also performed a small-scale simulation to examine the effect of sequence divergence level t.
The results are shown in Figure 4. Other methods are insensitive to sequence divergence level. In this case, ML and the new method have little bias over the whole range of the sequence divergence level. Muse discussed the fact that at high sequence divergences, NG does not produce distance estimates linear with time. The concatenated sequences of the 12 protein-coding genes on the H-strand of the mitochondrial genome from the human Homo sapiens, GenBank accession number D and the orangutan Pongo pygmaeus p.
The results are shown in Table 5. We also included the method of Li in the comparison, implemented in X. ML is applied with different assumptions concerning the transition bias and the codon frequency bias. The pattern is especially revealing for ML estimates under different models. The pattern is the same as that in the consistency analysis and the computer simulation discussed above. Some minor differences in implementation between ML and the corresponding approximate methods were discussed by Yang and Nielsen Those results suggest that ad hoc treatments involved in the approximate methods may not have introduced too much bias and that failure to account for the transition bias and base frequency bias appears to be more important.
However, Muse points out that at high sequence divergence levels, ad hoc treatments such as those used in multiple-hit correction in approximate methods may become a more serious problem see also Fig.
The method may thus be useful for large-scale screening, when ML may be too time-consuming. The ML method for pairwise comparison is less biased and has a lower MSE than the approximate methods for almost all parameter combinations.
We suggest that, in general, the ML method, which accounts for both the transition bias and the codon usage bias, should be the preferred method for estimating d S and d N between two sequences. Only in the case of very short sequences may it be advantageous to use simpler models.
In the course of this study, we realized that correcting for biases involved in the NG method is extremely complicated, despite the fact that the method is well known for its simplicity. In contrast, ML is conceptually much simpler, mainly because the probability theory employed by the method takes care of the difficult tasks of weighting evolutionary pathways and correcting for multiple hits, with no need for ad hoc approximations.
Specifically, the Chapman-Kolmogorov theorem e. This obvious result ensures that the likelihood calculation eqs. The major advantage of ML appears to lie in its flexibility in simultaneous comparison of multiple sequences, taking into account their phylogenetic relationship. The ML model can easily be extended to include important features of DNA sequence evolution such as the dependence of nonsynonymous rates on the chemical properties of the amino acids Yang, Nielsen, and Hasegawa On a fast Pentium II, each pairwise comparison takes about 10—20 s by ML and a few seconds by the method of this paper.
If pathways are weighted equally in counting differences in the new method, iteration will not be needed, and the method will be about as fast as other approximate methods such as NG, which seem to finish instantaneously. Caro-Beth Stewart, Reviewing Editor. Keywords: synonymous rate, nonsynonymous rate, approximate methods, maximum likelihood, molecular evolution, adaptive evolution, positive selection. E-mail: z. Probabilities are calculated using parameter estimates for the human and orangutan mitochondrial genes.
The universal genetic code is used. Each average was obtained by simulating 2, replicates for NG and YN the method of this paper , for the method of Ina , and for ML. We thank X. Xia for the analysis using the method of Li Akashi, H.
Genetics — Comeron, J. A method for estimating the numbers of synonymous and nonsynonymous substitutions per site. Crandall, K. Rhodopsin evolution in the dark. Nature — Eyre-Walker, A. High genomic deleterious mutation rates in hominoids. Moreover, the patterns observed in nonsense mutations between cancer-related genes versus other genes might be related to CUB. These issues are untested at this stage and are regarded as limitations in this study.
Genomes of extant species are the relics shaped by natural selection and inform us the scenario after the selections happen. If the extinct species were resurrected, it would definitely help evolutionary biologists more accurately infer the positive and purifying selections. However, analyzing the available data is so far the most practical way to study evolution and infer natural selection.
Similarly, investigating the genomic data of healthy individuals gives us another aspect to reflect the evolutionary constraint and selection events in cancers and cancer-related genes, even without the patient data from numerous cancer types. We could imagine that the constraint observed in the cancer-related genes of healthy individuals would indicate that the bad mutations in cancer-related genes might appear in some cancer types and not be observed by researchers.
Nonsynonymous mutations are thought to be largely deleterious due to their property of changing amino acids. The same goes for nonsense mutations that induce truncated proteins. The synonymous mutations that are originally thought neutral are now recognized to impact the codon usage bias and undergo natural selection as well. In this study, we retrieved the human SNP data and the list of human cancer-related genes.
We removed the SNPs in splicing regions, which potentially excludes the effect of mutations on splicing patterns. We found very clear evidence for purifying selections on nonsynonymous, synonymous and nonsense mutations in cancer-related genes compared to the expected level inferred from other genes.
Among the synonymous SNPs, the codons after mutation in cancer-related genes tend to be preferred and have higher frequency in the genome compared to those in other genes. It is interesting that the optimized codon usage could facilitate the cellular translation elongation process and is beneficial upon rapid cell growth [ 24 , 25 ]. It remains an open question that whether cancer-related genes took this advantage to achieve rapid tumor growth.
The nonsense mutations in cancer-related genes are less frequent and meanwhile located closer to the end of CDSs. Interestingly, these nonsense mutations could be caused by either a single mutation or double nonsynonymous mutations within the same codon. At the whole CDS level, we discovered that in cancer-related genes, the mutations towards mouse are suppressed and the mutations towards monkey are favored.
We also observed that the nonsynonymous or synonymous SNP sites in cancer-related genes are more conserved at DNA level and less polymorphic than those in other genes. For the limitation of this study, we have aforementioned that the poorly annotated gene sets, the GC content and the choice of canonical transcripts might introduce biases to our results. However, we have tested our main points and patterns when these confounding factors are controlled.
Thus, our results are consolidated at this stage. Our study reported the signals of purifying selection on nonsynonymous, synonymous and nonsense SNPs in human cancer-related genes from multiple aspects. Although the deleterious effects of nonsynonymous and nonsense mutations are obvious, we indeed reflect these effects in different ways. Moreover, in addition to the known effect of synonymous mutations on mRNA splicing [ 6 ], we displayed the selection on codon usage bias for synonymous SNPs in cancer-related genes.
Importantly, in cancer diagnosis, attentions have been paid to the detection of nonsynonymous or nonsense driver mutations. Our work should be interesting to the cancer research community and the field of evolutionary biology.
Our study demonstrated the evolutionary constraint on mutations in CDS of cancer-related genes. We drew to this conclusion without the requirement of data from cancer tissues or patients.
The optimized codon usage in cancer-related genes might contribute to rapid cell growth and could be a potential mechanism related to oncogenesis. Edwards AW. The genetical theory of natural selection. Functional transition of Pak proto-oncogene during early evolution of metazoans. Lynch M. Evolution of the mutation rate.
Trends Genet. Rate, molecular spectrum, and consequences of human mutation. The evolution of mutation rates: separating causes from consequences. Synonymous mutations frequently act as driver mutations in human cancers. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet. Alternative splicing in oncogenic kinases: from physiological functions to cancer.
J Nucleic Acids. Article Google Scholar. Dana A, Tuller T. Nucleic Acids Res. Solving the riddle of codon usage preferences: a test for translational selection. Codon bias and heterologous protein expression. Trends Biotechnol. Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis.
J Mol Evol. Codon usage determines translation rate in Escherichia coli. J Mol Biol. Translation is a non-uniform process.
Effect of tRNA availability on the rate of elongation of nascent polypeptide chains. Codon usage influences the local rate of translation elongation to regulate co-translational protein folding. Mol Cell.
Comeron JM. Selective and mutational patterns associated with gene expression in humans: influences on synonymous composition and intron presence. Ikemura T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. Plotkin JB, Kudla G. Synonymous but not the same: the causes and consequences of codon bias. Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes.
Akashi H. Genome-wide changes in protein translation efficiency are associated with autism. Genome Biol Evol. Single synonymous mutation in factor IX alters protein properties and underlies haemophilia B. J Med Genet. Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet. Codon-by-codon modulation of translational speed and accuracy via mRNA folding. PLoS Biol. Evolution and heterogeneity of non-hereditary colorectal cancer revealed by single-cell exome sequencing.
A system for detecting high impact-low frequency mutations in primary tumors and metastases. Genomic analysis of genetic heterogeneity and evolution in high-grade serous ovarian carcinoma. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med. Somatic mutations affect key pathways in lung adenocarcinoma.
Positive natural selection in the human lineage. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w; iso-2; iso Fly Austin. The UCSC genome browser database: update BEDTools: a flexible suite of utilities for comparing genomic features. N 6 -methyladenosine modulates messenger RNA translation efficiency. HTSeq--a Python framework to work with high-throughput sequencing data. Download references.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. You can also search for this author in PubMed Google Scholar. LW designed and supervised this research. Both DC and LW analyzed the data. DC calculated the parameters for codon usage bias. DC and LW wrote this article. Both authors read and approved the final manuscript.
Correspondence to Lai Wei. All datasets used in this study were downloaded from publically available websites as described in the Methods section. Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Figure S1. Profiling the SNPs in cancer-related genes and other genes. P-value was calculated using Wilcoxon rank sum test.
0コメント