Gene Ther Mol Biol Vol 4, 313-322. December 1999.

 

Glycine clock: Eubacteria first, Archaea next, Protoctista, Fungi, Planta and Animalia at last

Research Article

 

Edward N. Trifonov

Department of Structural Biology, The Weizmann Institute of Science, Rehovot 76100, Israel

______________________________________________________________________________________

Correspondence: E. N. Trifonov, Department of Structural Biology, The Weizmann Institute of Science, Rehovot 76100, Israel. Fax: +972 8 934 2653; E-mail edward.trifonov@weizmann.ac.il

Key words: evolutionary trees, triplet code, earliest proteins, amino-acid chronology, amino-acid composition, molecular evolution, codon chronology, primordial soup, multiple alignments,

Received: 15 April 1999; accepted 25 April 1999

 

Summary

Twenty-five different single-factor criteria and hypotheses about chronological order of appearance of amino acids in the early evolution are summarized in consensus ranking. All available knowledge and thoughts about origin and evolution of the genetic code are thus combined in a single list where the amino acids are ranked in descending order, starting with the earliest ones:

G, A, D, V, P, S, E, L, T, I, N, F, H, K, R, Q, C, M, Y, W

One may expect that in the composition of the ancient proteins the earliest amino acids would dominate. Indeed, when homologous prokaryotic and eukaryotic protein sequences are aligned, the most frequent residue amongst matching amino acids (presumably, what remains of the common ancestor sequence) is glycine that makes about 14% vs. glycine content of 6-7% in modern proteins. The glycine content of the matching residues may, then, serve as a measure of the time (glycine clock) since the separation of compared species. This approach is applied to 370 pairwise alignments of protein sequences from over 100 species of 6 major kingdoms. The evolutionary tree is derived, where the kingdoms separate consecutively from the central stem in the order: Eubacteria (13.5% G at the moment of separation), Archaea (11.5%), Protoctista (10.5%), Fungi (9%), Planta/Animalia (8%), largely consistent with common knowledge on the evolution of the kingdoms. The glycine content, thus, may serve as a time label that allows the tracing back of the separation of any two species with potential accuracy of the order of 50 to 100 million years, all the way to the very origin of species.

 

 


I. Introduction

The molecular clocks of which many sophisticated versions had been developed since original suggestion by Zuckerkandl and Pauling (1962), suffer from numerous drawbacks (see, e. g., Doolittle, 1997; Ayala et al., 1998), especially when applied to very early molecular events. In particular, the evolutionary rates are not constant, the distance estimates are influenced by horizontal transfer, and double (multiple) replacements are difficult to account for. The quantitative evaluations of similarity in the sequence comparisons become unreliable when too little of a common ancestor is left in the sequences. Moreover, the sequence dissimilarity indicates evolutionary distance between the sequences, but the time direction remains uncertain, resulting in so-called unrooted evolutionary trees. It would be highly desirable to find some internal property(ies) of the sequences that would indicate their evolutionary age. One such property is suggested by the recently derived chronological ranking of amino acids, order of their appearance on the early evolutionary scene (Trifonov and Bettecken, 1997; Trifonov, 1999). The earliest amino acids should have been overrepresented in the earliest proteins, in which case mere amino-acid composition could serve as the indicator of the age of the protein. This approach, however, can not be used in as straightforward way, since all extant proteins are of the same age, if one assumes that the proteins originate from their immediate and distant ancestors, rather than formed de novo (Zuckerkandl, 1976). One way to evaluate the amino-acid composition of the proteins of the distant past is to compare (align) related sequences from evolutionary distant species and take the composition of shared residues. As it is described below, the "common" composition of eukaryotic and prokaryotic sequences (evolving separately about 3 Gyrs), indeed, is strongly biased towards the earliest amino acids, in particular, glycine. This suggests to use the glycine content as measure of time (glycine clock) passed since separation of the species, to construct the rooted evolutionary tree.

 

II. Results and discussion

A. Amino-acid composition of early proteins

The earliest form of the triplet code has been recently reconstructed, consisting of 10 codons and 7 respective amino acids: ala, asp, gly, pro, ser, thr and val (Trifonov and Bettecken, 1997). The reconstruction was based on natural expandability of (GCT)n sequences, and on universal (GCU)n pattern hidden in mRNA sequences (Lagunez-Otero and Trifonov, 1992). This suggested that the very first triplets were GCU and it’s 9 point change derivatives. The reconstruction of the above list of the earliest amino acids was based on the experiments of S. L. Miller (1987), on chemical simplicity of the amino acids and on association with more ancient class II aminoacyl-tRNA synthetases. Inspection of the table of the triplet code revealed a striking correspondence between all these residues and the GCU-derived codons (Trifonov and Bettecken, 1997). This gives reason to believe that the earliest proteins, perhaps, long time before the separation of eukaryotes from prokaryotes, had been built from the above 7 ancient residues. At later stages, with appearance of other amino acids the domination of the seven, surely, was compromised. However, one could expect that even at the stage of separation eukaryotes-prokaryotes some of the ancient residues still prevailed. Further insight into the amino-acid chronology is provided by adding to the analysis four more criteria of the amino-acids' evolutionary age, in addition to the above three: frequency of occurrence of various amino acids in modern proteins, stability of the codon-anticodon interactions, chemical inertness of amino acids, and the GCU triplet-based list of the amino acids, as an independent criterion. Ranking analysis of the seven "chronologies" suggested by these criteria (Trifonov, 1999) resulted in the following list of the amino acids, in descending order of their appearance on the evolutionary scene: ala, gly, ser, pro, val, thr, leu, asp, ile, glu, asn, phe, lys, arg, gln, cys, his, met, trp and tyr. The earliest proteins, therefore, would be expected to contain less of the latest amino acids, say, gln, cys, his, met, trp and tyr. As a matter of fact, these residues, indeed, are least frequent even in extant proteins (see Figure 1), but the early proteins, perhaps, had even less of these residues. This is checked by alignment of prokaryotic and eukaryotic sequences and comparing amino-acid composition of the common parts (points) to the composition of modern eukaryotic and prokaryotic proteins. In extension of an earlier work (Trifonov, 1998) this analysis is performed on 70 arbitrarily chosen functionally different aligned sequence pairs (Table 1), scoring total 5551 matching residues. The actual scores and amino-acid compositions in % are presented in the Table 2 and in the Figure 1 (under "common"). In this Figure the composition values for prokaryotic and eukaryotic proteins (two upper plots) are taken from Arques and Michel, (1996). The histograms presented in the Figure 1 show, first of all, that in the common (about 3 billion years old) eukaryotic-prokaryotic material the gly residues are significantly more frequent (about twice) than in modern proteins. This major bias is observed even when only 10 sequence pairs are taken for the analysis. In the Table 2 amino-acid compositions for 7 different sets, 10 sequence pairs each, are presented (sequence Nos. 1-10, 11-20, 21-30, ... 61-70 of the Table 1). In all cases the domination of glycine is obvious: 12.7 to 15.5 % versus 6 - 7% in modern proteins. To be sure that the bias is not due to overrepresentation of some species, E. coli in particular (33 sequence pairs), two sets have been assembled, one dominated by E. coli sequences (set 7) and another one - with E. coli sequences underrepresented (set 6). The content of gly is found to be high in both cases. Total of 27 different prokaryotic species and 32 eukaryotic species are represented in the 70 sequence pairs analyzed (Table 1). The effect, therefore, is general, apparently reflecting, indeed, the amino-acid composition of the proteins at the moment of separation between prokaryotes and eukaryotes.

If the ratios of the occurrences in "common" to the occurrences in prokaryotes and in eukaryotes are considered, then two more amino acids appear on the top: asp and pro (about 20% excess). All three including glycine belong to the earliest alphabet. That is, the earliest amino acids have been still overrepresented at the time of separation eukaryotes-prokaryotes. Glycine, aspartic acid and proline are known to be the most specific residues for the turns of folded polypeptide chains (Kwasigroch et al., 1996). Their unusual conservation, thus, indicates that the turns are no less important in maintaining conserved protein structure than alpha-helices and beta-sheets.

Another conspicuous feature of the "common" distribution (Figure 1) is an abrupt drop of composition values for the amino acids tyr, asn, his, gln, met, trp and cys. Five of them belong to the latest in the amino-acid chronology (Trifonov, 1999, and manuscript in preparation).

It appears, thus, that about 3 billion years back these "young" residues have been just entering the scene being, therefore, substantially less numerous than the "older" residues. Their share in the total, according to our data, was 10.7%, versus 30% for even distribution of amino acids. No such step in the amino-acid composition is observed in case of modern proteins (Figure 1, upper plots) though the "young" residues are underrepresented here as well. It appears, thus, that since the time of separation eukaryotes-prokaryotes the proportion of the "young" residues increased, apparently, in the process of their gradual accommodation and optimization of the protein composition. The proportion of the latest residues as well as excess of the earliest glycine residues may, thus, potentially serve for timing of the evolutionary bifurcations.

 


Figure 1. Amino-acid composition of matching residues in alignments of related prokaryotic and eukaryotic protein sequences ("common") as compared to modern proteins of prokaryotes and eukaryotes.

 

 


Exceptional status of glycine in molecular evolution has been indicated earlier in the study on the correlation of the evolutionary rate with the amino-acid composition (Graur, 1985). An "almost uninterchangeable" glycine was found to be "one of the most conserved amino acids". This also suggests higher content of glycine in the older, conserved proteins. Being the smallest amino acid glycine serves very much as a hinge in the polypeptide chain providing it with high flexibility. The conformational versatility would be of high importance in the early stages of protein evolution. Later on, perhaps, with advance in sophistication of the protein structure rather stability of the evolved conformations became important, and the glycine content eventually came down to the modest present level.

B. The amino-acid and codon chronology

More extended analysis involving 25 different amino-acid age criteria (manuscript in preparation) arrives to the chronology very similar to the one listed above. A vertical column on the left of the Figure 2 represents the order of the amino acids, in which they, presumably, appeared on the evolutionary scene. All available knowledge and thoughts about origin and evolution of the genetic code are combined in this single list where the amino acids are ranked in descending order, starting with the earliest ones. The ranking is inevitably of rather poor accuracy. The typical differences in the calculated ranks as compared with the earlier 7-criteria list are 1-2 ranks.

 


Table 1. Aligned prokaryotic-eukaryotic protein sequence pairs.

Species        Protein (gene)                Reference

1. Escherichia coli

human

thymidilate synthase

--“--

Gene 150, 221, 1994

2. Halobact. cutirubrum

C. elegans

hypothetical G-protein

--“--

Gene 151, 153, 1994

3. Bacteroides fragilis

maize

pyruvate dikinase

--“--

Gene 151, 173, 1994

4. Flavobact. meningosepticum

pig

prolyl endopeptidase

Gene 152, 103, 1995

5. Escherichia coli

rabbit

phosphofructokinase

ATP-dep phosphofructokinase

Gene 152, 181, 1995

6. Bacillus circulans

Brugia malayi (nematode)

chitinase A3

chitinase

Gene 153, 147, 1995

7. Enterococcus faecium

carrot

dihydrofolate reductase

--“--

Gene 154, 7, 1995

8. Agrobact. tumefaciens

X. laevis

Arginase

--“--

Gene 154, 115, 1995

9. Escherichia coli

human

ribosomal protein S1

--“-- , repeat 2

Gene 155, 231, 1995

10. Escherichia coli

human

glutathione reductase

--“--

Gene 156, 123, 1995

11. Escherichia coli

mouse

ribose 5-phosphate isomerase

--“--

Gene 156, 191, 1995

12. Escherichia coli

tomato

RNase I

RNase LE

Gene 158, 203, 1995

13. Clostridium acetobutylicum

C. elegans

3-hydroxyacyl CoA dehydrogenase

--“-- (F54C8.6)

Gene 160, 309, 1995

14. Alcaligenes

Arabidopsis thaliana

Nitrilase

--“--

Gene 161, 15, 1995

15. Escherichia coli

Arabidopsis thaliana

adenine phosphorybosyltransferase

--“--

Gene 161, 81, 1995

16. Pseudomonas

Aspergillus nidulans

NAD-dep. formate dehydrogenase

--“--

Gene 162, 99, 1995

17. Escherichia coli

rat

arginyl-tRNA synthetase

--“--

Gene 164, 347, 1995

18. Escherichia coli

mouse

RNA polymerase subunit a

RNA polymerase I/III AC40

Gene 167, 203, 1995

19. Escherichia coli

C. elegans

RNA polymerase subunit a

RNA polymerase III AC16

Gene 172, 211, 1996

20. B. stearothermophilus

Plasmodium knowlesi

valine-tRNA synthetase

--“--

Gene 173, 137, 1996

21. B. cereus

rabbit

thermolysin

microsomal endopeptidase

Gene 174, 135, 1996

22. B. subtilis

mouse

inosine monophosphate dehydrogenase

--“--

Gene 174, 209, 1996

23. B. subtilis

human

methylenomycin A resistance protein glucose transporter type I

Gene 175, 223, 1996

24. Escherichia coli

Aspergilus nidulans

NARK nitrate transporter

CRNA nitrate transporter

Gene 175, 223, 1996

25. Lactobacillus sake

Chinese hamster

SapT (sakacin synthesis)

multidrug resistance protein

Gene 176, 55, 1996

26. Rhodobacter capsulatus

Triticum aestivum

S-adenosylhomocysteine hydrolase

--“--

Gene 177, 17, 1996

27. B. subtilis

rat

3-methyladenine DNA glycosylase

--“--

Gene 177, 229, 1996

28. Escherichia coli

rice

Mrp (ATPase)

EST D25016 (ATPase)

Gene 178, 97, 1996

29. Escherichia coli

red alga

3-ketoacyl-acyl carrier prot. synthase

--“--

Gene 182, 45, 1996

30. B. subtilis

Arabidopsis thaliana

protoporphyrinogen oxidase

--“--

Gene 182, 169, 1996

31. Pseudomonas putida

human

glyoxalase I

--“--

Gene 186, 103, 1997

32. Escherichia coli

mouse

spermidine synthase

--“--

Gene 187, 35, 1997

33. Escherichia coli

rabbit

glutaredoxin

--“--

Gene 188, 23, 1997

34. Zymomonas mobilis

human

glyceraldehyde-3-phosphate DH

--“--

Gene 188, 221, 1997

35. Zymomonas mobilis

human

phosphoglycerate kinase

--“--

Gene 188, 221, 1997

36. B. megaterium

human

triosephosphate isomerase

--“--

Gene 188, 221, 1997

37. Escherichia coli

Brassica napus

phosphoenolpyruvate carboxykinase

--“--

Gene 192, 235, 1997

38. B. subtilis

rat

peptidylprolyl cis-trans isomerase

--“--

Gene 193, 65, 1997

39. B. subtilis

X. laevis

Arginase

--“--

Gene 193, 157, 1997

40. Escherichia coli

dog

signal peptidase I

--“--

Gene 194, 249, 1997

41. Rhizobium leguminosarum

D. discoideum

orotate phosphorybosyltransferase

--“--

Gene 195, 329, 1997

42. B. subtilis

human

myo-inositol 2-dehydrogenase

biliverdin reductase

Gene 196, 209, 1997

43. Escherichia coli

Schistosoma mansoni

cold-shock protein CSPA

Y-box binding protein

Gene 198, 5, 1997

44. Streptococcus mutans

tobacco

non-phosphorylating GAPN

--“--

Gene 198, 237, 1997

45. Staphylococcus xylosus

human

histone deacetylase (acuC)

--“-- (HDm)

Gene 198, 275, 1997

46. Escherichia coli

human

heat-shock protein HSP 60

--“--

Gene 199, 83, 1997

47. Escherichia coli

human

porphobilinogen deaminase

--“--

Gene 199, 231, 1997

48. Synechococcus

barley

HemL protein

--“--

Gene 199, 231, 1997

49. Escherichia coli

D. melanogaster

RNA helicase

--“--

Gene 199, 241, 1997

50. P. aeruginosa

T. bruce i

mercuric reductase

trypanothione reductase

Gene 200, 163, 1997

51. B. subtilis

Geodia cydonium

alcohol dehydrogenase

AidB-like protein

J. Mol. Evol. 47, 343, 1998

52. Legionella pneumophila

bovine

Cu, Zn superoxide dismutase

--“--

J. Mol. Biol. 274, 408, 1997

53. Thermus aquaticus

mouse

DNA polymerase (5'-3' exonucl.domain)

flap endonuclease (FEN-1)

J. Biol. Chem. 272, 28531, 1997

54. M. genitalium

tobacco

uracil phpsphoribosyltransferase

--“--

EMBO J. 17, 3219, 1998

55. Synechococcus elongatus

Chlamydomonas reinhardtii

photosystem II RC domain

--“--

J. Mol. Biol. 280, 1998

56. Rhodobacter capsulatus

tobacco

uroporphyrinogen decarboxylase

--“--

EMBO 17, 2463, 1998

57. Streptomyces hydrogenans

Drosophila lebanonensis

3a,20b-hydroxysteroid dehydrogenase

alcohol dehydrogenase

J. Mol. Biol. 282, 383, 1998

58. Escherichia coli

C. elegans

transition metal transporter

--“--

J. Biol. Chem. 272, 28485, 1997

59. T. thermophilus

human

histidyl-tRNA synthetase

--“--

J. Mol. Biol. 280, 847, 1998

60. Escherichia coli

D. melanogaster

pspE

HSP67Bb

J. Mol. Biol. 282, 195, 1998

61. Escherichia coli

human

GTP-binding protein (FtsY)

--“-- (SRa)

Gene 201, 37, 1997

62. Escherichia coli

rabbit

trehalase

--“--

Gene 202, 69, 1997

63. Escherichia coli

D. melanogaster

parvulin

Dodo protein

Gene 203, 89, 1997

64. Escherichia coli

rat

aminopeptidase N

--“--

Biochemistry 37, 686, 1998

65. Escherichia coli

Brugia malayi

asparaginyl-tRNA synthetase

--“--

EMBO J. 17, 2947, 1998

66. Escherichia coli

human

glutathione S-transferase

--“--

J. Mol. Biol. 271, 135, 1998

67. Escherichia coli

rice

thioredoxin

glutaredoxin

J. Mol. Biol. 281, 949, 1998

68. Escherichia coli

human

glutaredoxin

thioredoxin

J. Mol. Biol. 281, 949, 1998

69. Escherichia coli

Flaveria trinervia

phosphoenolpyruvate carboxylase

--“--

J. Mol. Evol. 46, 107, 1998

70. Escherichia coli

human

periplasmic cyclophilin

cyclophilin A1

EMBO J. 17, 2463, 1998

 

 


Despite this uncertainty, due to consensus nature of the chronology it has several important properties not visible in individual rankings by any of the initial criteria. The conclusion of the earlier GCU-based theory on the structure of the earliest code is confirmed: all 7 earliest amino acids are, indeed, found at the top of the consensus chronology (G, A, D, V, P, S and T). Ten amino acids of the Miller's imitation of primordial soup are all ranked as topmost (G, A, D, V, P, S, E, L, T, I). This result is especially important, since it confirms that, indeed, the experimental conditions chosen by Miller are close to the primordial ones, and that the first amino acids acquired by the emerging life were synthesized abiotically.

The consensus order of appearance of the 20 amino acids on the evolutionary scene also reveals a unique and simple chronological organization of 64 codons, that could not be figured out from individual criteria: new codons appear in complementary pairs, with the complement recruited from the codon repertoire of the earlier or simultaneously appearing amino acids. The resulting codon chronology also reveals that of alternative codon-anticodon pairs the most stable ones appear first, if not all together.

Contrary to the GCU-based theory of the origin of the code, it is glycine rather than alanine that appears at the top of the list. Actually, they appear simultaneously, within the accuracy of the ranking (manuscript in preparation). The apparent contradiction, however, rather suggests a correction to the GCU-model. As it was indicated in the paper on the GCU theory (Trifonov and Bettecken, 1997), the GCC triplet and its point change derivatives correspond to the same seven earliest amino acids. The first codons, thus, could be, indeed, GCC and GGC, for alanine and glycine, respectively, in accordance with the chronology displayed in the Figure 2. This pair of codons has been suggested as the earliest ones 20 years ago by Eigen and Schuster (1978). What is important for the elaboration in the next section - the glycine is one of the earliest amino acids. It apparently took over at some time in the early evolution becoming a dominant residue (see Figure 1).

 

C. Glycine clock and evolutionary tree for six major kingdoms.

The calculations similar to those made for the prokaryotes and eukaryotes, as presented in the Tables 1 and 2, are performed for sequence pairs from 6 major kingdoms: eukaryotes (Protoctista, Fungi, Planta and Animalia) and prokaryotes (Eubacteria and Archaea). Total 370 sequence pairs are analyzed, and the average contents of the glycine amongst the shared residues are calculated for each of 15 groups of the kingdom-to-kingdom sequence comparisons. The functionally diverse sequences are taken from literature, basically, on the random basis. They represent as large variety of species, as exemplified by the Table 1. In the Table 3 the derived values are presented, together with actual scores (in brackets, glycine/total). The number of sequence pairs used for the analysis is indicated as well (italics). The errors are calculated on the assumption that the scatter in the actual scores of glycines follows normal distribution with STD equal to square root of the score.

The highest contents of glycine among the shared residues of the aligned sequences is observed for Eubacteria (see Table 3). The respective % GLY values vary between 12.1 ± 1.2% and 14.8 ± 0.6% with the average 13.7 ± 0.3%. If only eukaryotes are taken for the alignments with the eubacterial protein sequences, as in the Table 2, the average % GLY value from the new set of the sequences is 1460/10602 = 13.8 ± 0.4%, to compare with 14.3 ± 0.5% for the earlier set (Table 2), indistinguishable within the error bars. The % GLY values for Archaea, compared to four eukaryotic kingdoms, vary between 11.3 ± 1.0% and 13.3 ± 1.5%, with the average 11.7 ± 0.6%, clearly lower than the above average value for Eubacteria. That would correspond to a later separation of the Archaea from eukaryotes, some time after Eubacteria. The % GLY value for separation Archaea-Eubacteria, on the other hand, is close to the separation level for Eubacteria, as it would be expected, 12.8 ± 0.9% vs. 13.7 ± 0.3%. Similarly, the % GLY values for later separations of Protoctista, Fungi and Planta are progressively lower, while comparisons of their sequences with older kingdoms give higher % GLY values, corresponding, respectively, to the separation times of the latter.

The % GLY values are arranged in the Table 3 in such a way that the line averages of the values provide the branching level of % GLY for respective kingdoms. Of 15 kingdom-to-kingdom % GLY values only 3 (< 32% of 15) are more than 1 STD off the respective averages, which, thus, justifies the assumed normal distribution of the % GLY estimates. The evolutionary tree based on the % GLY values presented in the Table 3 is shown on the Figure 3. This tree is very much consistent with the trees derived from molecular clock calculations (Feng et al., 1997; Doolittle, 1997; Otsuka et al., 1999). If the time separation between branchings of plants and of Eubacteria is taken equal 2 Gyrs, 1% GLY corresponds to about 350 Myrs. This provides an approximate calibration of the glycine clock. At this early stage of the development of the glycine clock the linear calibration is an understandable simplification. Both the Table 3 and the Figure 3 represent the first estimates of the branchings of the major kingdoms, based on only 370 sequence pairs. The number of the sequences can be substantially increased (say, to many thousands), so that the tree would be subject of further improvements towards better accuracy. However, as the current error bars indicate, the overall topology of the basic tree will most likely stay unchanged.


 

 


Table 2. Amino-acid composition of common residues in eukaryotic-prokaryotic sequence alignments


 


Figure 2. Chronology of 32 codon pairs. The amino-acid chronology is calculated as average ranking based on 25 different criteria. The codon chronology is one simple way of arranging the 64 triplets in accordance with the amino-acid chronology. Of alternative codons those which make most stable codon-anticodon pairs are engaged first (bold). In this case there is always a complementary triplet available, of the codon repertoires for earlier amino acids.

 


Table 3. Contents of shared glycine (%) in kingdom-to-kingdom protein sequence alignments

 

 

ANIMALIA

PLANTA

FUNGI

PROTOCTISTA

ARCHEA

Branching level

PLANTA

8.1± 0.6

(193/2194, 25)

 

 

 

 

8.1± 0.6

(193/2194, 25)

FUNGI

8.88±0.4

(573/6479, 70).

9.1±0.7

(179/1977, 23)

 

 

 

8.9±0.3

(752/8456, 93)

PROTOCTISTA

11.1±1.1

(98/879, 11)

9.8±0.8

(156/1595, 10)

11.4±1.0

(137/1200, 11)

 

 

10.6±0.5

(391/3674, 32)

ARCHEA

11.3±1.0

(128/1133, 18)

11.7±1.7

(49/418, 12)

11.3±1.0

(132/1170, 19)

13.3±1.5

(82/616, 8)

 

11.7 ±0.6

(391/3337, 57)

EUBACTERIA

14.8±0.6

(584/3935, 63)

13.1±0.7

(313/2381, 21)

13.4±0.6

(468/3502, 46)

12.1±1.2

(95/784, 10)

12.8±0.9

(187/1462, 23)

13.7±0.3

(1647/12064, 163)

 

 


It is noteworthy that the glycine clock approach (or, presumably, any other approach based on the content of the earliest amino acids) apparently provides both evolutionary distance (in % GLY time units in this case) and directionality (the larger the branching % GLY value the older the separation event). This would allow to construct a detailed rooted tree, with further subdivisions of the kingdoms and potential resolution of 50 to 100 Myrs, the higher the more sequences are taken for the alignments. The technique is especially promising in dating the earliest separations where sensitivity of the classical molecular clock is low. The tree in the Figure 3 is presented in its simplest form, with the central stem from which the respective kingdoms separate in the chronological order as indicated. Animalia rather than Planta are chosen to crown the tree, to reflect the obvious trend displayed by the tree - from the simplest to the most complex. Indeed, anuclear prokaryotes separate first, followed by the nucleated eukaryotes. The eukaryotes, on the other hand, progress from unicellular to multicellular, differentiated organisms. In a way, at each stage the simpler forms separated from the stem that continued to evolve to yet more complex forms. In that sense the common ancestor of all kingdoms though, perhaps, as simple as Eubacteria at the moment of their separation, was omnipotent having carried all elements that later evolved into the higher complexity of younger kingdoms. The higher evolutionary potential stayed associated with the main stem at every next branching. The branches of the kingdoms in the Figure 3 are not continued to the top of the tree, to the typical and common modern 6-7% of GLY, although this is implied, in order to better reflect the linear succession of the branching events.

Apart from appealing simplicity of the glycine clock, its directionality and applicability to the earliest branchings, this technique is substantially less dependent on the effects of horizontal transfer and variations in the evolutionary rates. These are averaged over large number of sequences that are taken for the calculations.

 

III. Sequences and methods

The aligned prokaryotic-eukaryotic sequence pairs are collected from literature, irrespective of the alignment technique chosen by the authors of the original papers. To ensure random choice of the sequences, all alignments published in Gene