Biochemistry 811

First Lecture on Evolution

D. Nelson, last modified Nov. 23, 2002 8AM

Reading: Berg, Tymoczko and Stryer 5th edition, Chapter 7 and links in the text below

A link to a second evolution lecture Christian Anfinsen began the preface to his book The Molecular Basis of Evolution with this sentence: "The writing of this book has been stimulated by the excitement and promise of contemporary protein chemistry and genetics and by the possibility of integration of these fields... " this was written in May 1959. More than forty years later it is still true. Possibly one of the most famous quotes from any biologist is from Theodosius Dobzhansky. It was actually the title of an article in American Biology Teacher 25, 125-129 1973. The title was "Nothing in Biology Makes Sense, Except in "the Light of Evolution." That is even more true now than when it was first written. Today, we will begin the topic of evolution. This is the concept that all life on the planet is derived from a common ancestor. The biochemistry of living organisms that we have been studying in this course is a collection of accumulated successful strategies from billions of years of experimentation in life. The best estimate for how old life is on earth has been revised to 3.85 billion years (Nature Nov. 7, 1996). This is based on carbon isotope ratios in some of the oldest sedimentary rocks known on earth, from the Isua rock belt in Greenland. These rocks do not contain visible microfossils, but living cells preferentially incorporate the lighter isotope of carbon, C12 as opposed to C13 or C14. Material that has originated from living things has a ratio of these isotopes of carbon that reflects depletion of the heavier isotopes. Carbon from non-biological organic material (calcium carbonate or limestone) has a different ratio. The carbon isotope ratios seen in these 3.85 billion year old rocks look as though they derived from living cells. Similar depletion of heavy carbon isotopes has been reported for a Martian meteorite thought to contain microfossils. (See Science 274, Nov. 8 1996 , p. 918). This has now been argued to come from earthly contaminants, and the microfossils seem to be too small to have been real cells. In August 2002 a report appeared in Nature suggesting the rocks in Greenland might have contamination of graphite from a younger source. (Nature 2002 Aug 8;418(6898) :627-30 Reassessing the evidence for the earliest traces of life. van Zuilen MA, Lepland A, Arrhenius G.) The issue is not settled.

Much of the information that follows is taken from Vital Dust, by Christian De Duve. Christian De Duve has a new book out called Life Evolving, but I have not read it yet.

What does a 3.85 billion year old biomarker mean for the origin of life? The planet formed about 4.5 billion years ago and it is thought that the surface was either molten or under continuous bombardment from space early in its history until about 4 billion years ago. Meteor impacts and volcanic activity would have made the surface unfit for life. The probable existence of life at 3.85 billion years means that life arose on the planet almost as soon as it was possible for it to exist on the surface. Therefore, the origin of life on earth was very rapid. Some , like Francis Crick believe it was too rapid, that 150 million years is not enough time. They propose a panspermia model where life reached earth from space. This is not the mainstream view. The earliest microfossil evidence of cells that resemble cyanobacteria come from the early archaen Apex chert of Western Australia dated at about 3.5 billion years. (Schopf, Science 260, 640-646 1993). Other evidence of ancient life on earth are formations called stromatolites. These are columns of fossilized material that look just like modern structures seen in various sites around the world, for example Shark Bay, Australia. These are formed by bacterial colonies that exist in mats, with phototrophs on the top and heterotrophs lower down. The colonies grow into columnar structures, accumulating debris and eventually fossilizing. The structures of 3.5 billion year old stromatolites look nearly identical to modern stromatolites that are still growing. An article in the October 3, 1996 Nature, p. 385, proposed that these structures may have formed from non-biological processes involving some fractal growth patterns. However, the microfossil evidence and modern day stromatolite structure supports the biological origin of stromatolites.

We will assume from this evidence that life on earth dates back 3.85 billion years. The microfossils in stromatolites that appear similar to modern cyanobacteria, suggest that life evolved to a form similar to today's bacteria very early and that the bacteria's outward appearance at least has changed little in the intervening 3.5 billion years. All present day life is based on a nucleic acid informational molecule, either DNA or RNA that encodes information needed to make a living cell. The information is coded in a triplet code and though there are some examples of slight variations in this code, no radical departures from it exist. This "universal" code is interpreted as proteins by a complex machinery called the ribosome that is also shared in common among all living things. These main features of information storage and retrieval are conserved and provide convincing evidence that all life on earth has one origin and shares a common ancestor.

Since life began, it has been changing in permissible directions. Physical constraints on the chemistry of life including the properties of water, the nature of carbon and other key aspects of biology have allowed variations on the original theme, but only within certain boundaries. However, 3.85 billion years is a long time and many variations have been tried out and many have been successful. Because the informational molecule has to transmit the information through time for the blueprint of a cell, it is the information that has changed. This molecule has been copied billions of times, but not without some errors creeping in. Today, we can look back in time by comparing the sequences of the nucleotides and the translated sequences of the proteins from present day organisms. By quantitating the differences between the sequences and by making some modest assumptions about rates of change in the sequences, we can estimate when different organisms diverged from one another. These dates can be calibrated by the fossil record which can be dated to high precision. Very similar organisms have very similar sequences, and more distant relatives have more differences in their code. This is the concept of the molecular clock. With this very simple idea, one can use sequences to build trees with branches representing different species. The relationships between organisms can be graphed in this way. If enough organisms are included and the most conserved sequences are used, a tree of life can be constructed. This is an approximate genealogy of present day organisms, and it is very interesting. In your handout look at the tree for the eukaryotes This was made from joining four different proteins together and making a longer super protein for sequence analysis. The proteins used are EF1 alpha, alpha and beta tubulin and actin. These are highly conserved eukaryotic proteins suitable for this tree making purpose.

In such a tree, the branches always diverge, they do not merge back together, because species do not merge, except in very rare events. The place where two branches come together is the point in time when they were the same species. Farther and farther back on the tree are divergences that are deeper and more ancient. If we go back far enough, the most distant branches meet, at the common ancestor. The single celled organism that gave rise to all life on the planet. This model of evolution has been challenged lately by evidence that early bacteria shared their DNA often, leading to a network topology rather than a branching bush. This sharing of genetic material has become less common as the different lineages have become less similar to each other. However, this makes any effort to go back and identify the common ancestor impossible, because there were fuzzy definitions of species at these early times.

To build a tree like this is not trivial. It takes some care to pick the right sequences to use, because not all sequences are appropriate for this job. Even in a single gene, not all of the gene is useful for this task. Frequently, only the most conserved parts of a gene are included in building trees. Ideally, the most ancient features common to all life would be good candidates to compare. This has been done most often with ribosomal RNA, since all life has this molecule and it is so central to life that it has to be highly conserved. This is the basis of the rRNA database project. Here 57,773 rRNA sequences have been used to make trees. These trees include every kind of life possible to get a pretty comprehensive tree of life. The results look like the tree in your handout. Here representatives from the major groups of organisms are shown. Every other organism will fit on this tree very close to an existing branch, so it does little good to use all the sequences, because it just clutters up the picture. It is clear from this tree that there are three main divisions in life. These have been called domains. In a hierarchy of life, domains are higher than kingdoms. The three domains are bacteria, archaea and eukarya. The taxonomy of life at these highest levels is a matter of philosophy. Ernst Mayr, one of the most prominent of all evolutionary biologists, does not accept the three domain system. He argues that the fundamental division in life is the prokaryote eukaryote division and that the archaea are just another group of bacteria that do not deserve domain status. (PNAS 95, 9720 1998 Aug. 18 issue).

Bacteria and archaea are both prokaryotes, without a nucleus. They are as different from each other as each is to the eukarya, so even though they are prokaryotes, it is difficult to lump them together. In fact, the most common version of this tree shows the archaea to be more closely related to eukarya, but this is a debated issue. Researchers into the early evolution of life are actively trying to resolve this issue, but it is proving to be very hard. The evidence accumulating now shows that archea are a mix of bacterial and eukaryotic like genes. This data suggests that early prokaryotes swapped genes frequently, and this scrambles evolutionary history.

The first cell certainly existed before the common ancestor. We do not know how long a time elapsed before this major split in life occurred. We can compare the three domains and make some guesses about what the common ancestor was like. Features that are present in all three domains were probably present in the common ancestor. It is hard to go beyond that point, except in very general terms.
The last common ancestor was a prokaryote, with a cell wall and a lipid bilayer, incorporating membrane proteins within it. This organism probably used electron transport and proton pumping to make a proton gradient for ATP synthesis. This strategy is too common to have arisen later. Along with this, there must have been a minimum set of membrane transport proteins that could move ions and substrates into the cell (and out of the cell) without leaking too many protons. This cell must have had the ability to make many coenzymes and other special molecules like heme, flavins and iron sulfur centers. The informational molecule was probably DNA, with genes being transcribed to RNA and then to protein on ribosomes. A DNA polymerase and an RNA polymerase were present. Oxygen was not available, so the electron acceptor at the end of the electron transport chain may have been a sulfur compound or Fe3+. This cell probably had the protein export machinery needed to construct a cell wall outside the plasma membrane. The cell could make its own lipids, though it is not clear whether it made ether lipids or ester lipids. Certainly it had some important biochemical pathways for purines, pyrimidines and probably all 20 amino acids. This last point is supported by the fact that some enzymes that are clearly present in all three domains can be aligned with some certainty. The aligned positions of these enzymes include highly conserved positions that represent all 20 amino acids. Therefore, all 20 were apparently present in the last common ancestor. It is not clear if it was photosynthetic or not, but the similarities between ancient stromatolites and modern stromatolites, that have photosynthetic cyanobacteria, suggest photosynthesis was probably present in the 3.5 billion year old stromatolites. If the common ancestor was not photosynthetic, then the split between bacteria and archaea had to predate 3.5 billion years ago. This is the idea favored by Christian de Duve in his book Vital Dust. He argues that no archaebacteria have chlorophyll, and relatively few bacteria are photosynthetic. Therefore, chlorophyll and photosynthesis must have developed after archaea and bacteria split, but before 3.5 billion years ago, when cyanobacteria were probably photosynthetic. The most ancient bacteria are mostly thermophilic, both archaea and bacteria. This heat loving characteristic may have been present in the common ancestor. We cannot say if it had introns or not.

For a nice discussion of recent work on this area go here The Hunt for our Last Common Ancestor. Source: NASA Stephen Hart.

THE THREE MAIN BRANCHES OF LIFE. HOW DO THEY CLUSTER?

With three branches on a tree, there are three ways to cluster the sequences. Archaea can cluster with eukarya, or archaea can cluster with bacteria, or bacteria can cluster with eukarya. What is the correct tree, and how do we know? This is a highly controversial issue that is being debated today. See Science of Nov. 15, 1996 in the letters section to see this debate. The Tree of Life web page also has a discussion on this issue. Trees made from different protein sequences support different clusterings. Also, as we shall see, the eukarya may be derived from a fusion between an archaeon and a gram negative bacteria. If the origin of the eukaryotes is really hybrid, then this issue is even more complex than it first appears.

Let me say that there is a true historical answer to this problem. Either the bacteria and the archaea diverged first followed by eukarya branching off from the archaeal line (most common view) or they did not. The problem cannot be solved by looking at individual protein, RNA or DNA sequences. As I have said, all three possible trees can be found this way. What needs to be done, and what is probably already being done is to compare whole genomes, or the most useful parts of whole genomes, the highly conserved genes. This may represent a hundred or two hundred genes that all living things have to have to be alive. With this much data, the results should be clear enough to pick between the three possibilities, or the hybrid eukarya option. As of Nov. 1996, whole genomes from each domain have been sequenced, and preliminary identity of many of their genes has been made. The raw data is available for this comparison, and it is almost certainly being done now. As of Oct. 25, 2002 the NCBI Genomes page shows 119 complete genomes are sequenced. 9 eukaryote (not counting human, mouse, fugu or rice), 16 archaea and 90 bacteria (66 species) (total of 95 species, since some bacterial species have been sequenced more than once). From eukarya, yeast (Saccharomyces cerevisiae); Schizosaccharomyces pombe [fission yeast]; and C. elegans, the nematode worm; Drosophila; Anopheles; Plasmodium falciparum (malaria); Arabidopsis (a model plant); rice and human are complete (almost). Several other genomes are at various stages of being done (Dictyostelium discoideum [cellular slime mold] Candida albicans [fungal pathogen], and mouse.) The human genome is nearly done, with most genes being represented. It is supposed to be completed by April 2003 in polished form.

The majority of M. jannaschii [an Archaea member] transcription and translation genes look like eukaryotic genes. However, the metabolic enzymes seem to be more like bacterial enzymes. (see the paper on the genome, Science 273, 1043-1045 and 1058-1073 1996). This confuses the issue, since a single archeon looks like a hybrid between eukarya and bacteria. If eukaryotes evolved from an archaea, then there might be no need to invoke a hybrid eukaryote made from a fusion of genomes. But this does not explain how M. jannaschii came to look so hybrid. This pattern is seen in other archael genomes and implies that lateral gene transfer from bacteria to archaea was very common (Current Biology 8, R209 1998).

Within the archaea, there are two main divisions called crenarchaeota (4 complete genomes) and euryarchaeota (12 complete genomes). Euryarchaeota have histone-like proteins that share a common ancestor with eukaryotic histones, based on 3-dimensional structures. Crenarchaeota do not have these histone like proteins. M. jannaschii is from the euryarchaeota. Sulfolobus solfataricus is from the crenarchaeota. Another aerobic crenarcheon Aeropyrum pernix, has been finished and is published(DNA Res 1999 Apr 30;6, 83-101, 145-52). The progress on sequencing genomes has moved forward, but the analysis of the data has not given clear answers yet to the basic questions about the origins of the main divisions in the tree of life.

THE ROOT OF THE UNIVERSAL TREE OF LIFE

The root of a phylogenetic tree is the location of the last common ancestor shared by all members of the tree. The root of a tree cannot be determined without an outgroup, a sequence that is related to the sequences of interest, but not a direct member of that group. For example, if you wanted to root the tree of mammalian ADP/ATP carriers, you could include a fungal ADP/ATP carrier as an outgroup. The point on the tree where the outgroup joins the other sequences is the root.

When there are only three groups being considered, like archaea, bacteria and eukarya, you cannot root the tree, because there is no outgroup. Clever molecular evolutionists figured out a way to root this tree by making a tree with duplicated genes that are very old and are found in all the domains of life. This means that the duplication preceded the divergence of these three domains. Two duplicated proteins that were used for this are the alpha and beta subunits of the vacuolar V-type and F1FO ATPases. This did not work too well, because there are many different V-type and F1FO ATPases, and there is some evidence that there might have been lateral gene transfer. Another protein set that does not have this criticism are the elongation factors EF-Tu and EF-G. The EF-G branch on the tree serves to root the Ef-Tu branch and vice versa. This was reported in the July 96 PNAS vol 93, 7749-7754. A similar result was found with isoleucyl tRNA synthetase sequences (PNAS 92, 2441 1995).

The analysis of the EF-Tu and EF-G sequences strongly supports the bacteria/archaea split at the root of the tree. However, the frequently drawn branching of eukaryotes from archaea does not occur. Instead, eukaryotes branch from within the archaea. They appear to cluster with the crenarchaeota. The authors caution that more proteins need to be analyzed in this way and the data is not absolutely convincing. They also point out that other results have been seen with glutamine synthase, glutamate dehydrogenase and Hsp70 sequences.

GENOME FUSION TO MAKE EUKARYOTES?

What is the evidence for a bacterial, archaeal fusion to make eukaryotes? First, eukaryotes have a nucleus. Neither archaea or bacteria have a nucleus, and it is very hard to imagine how such a structure would evolve. Fusion of two cells would offer an explanation of how a nucleus could be formed. If a gram negative bacterial cell engulfed an archael cell, it could wrap it in its outer membrane to form a structure that would be like a nucleus. If the archael cell lost its membrane and used the host membrane instead, this would form the nuclear envelope and the endoplasmic reticulum. (see Gupta and Golding May 96 TIBS Fig. 3)

In Feb. 1996 Lynn Margulis published a paper in PNAS suggesting eukaryotes were formed by an endosymbiosis between prokaryotes. She earlier had promoted the idea that mitochondria and chloroplasts were endosymbiotic proteobacteria and cyanobacteria, respectively, and she has been proven correct in that idea. So the idea deserves a fair hearing. Margulis believes that many genomes have contributed to eukaryotes. Plants, that could be formed from an endosymbiosis between fungi and algae, might have seven different genomes mixed together. In May 1996, Radhey S. Gupta and G. Brian Golding published a discussion in TIBS on the origin of the eukaryotic cell and also called it an endosymbiotic event. They gave the following evidence in their favor.

Hsp70 proteins (heat shock proteins, the most conserved proteins found in all three domains) have a unique insert seen at the same site in sequences from eukaryotes and gram negative bacteria. This insert is missing in gram positive bacteria and archaea. So, if eukaryotes evolved from archaea, how can the insert be explained?

Gupta and Golding claim that of 24 proteins they examined, 7 supported eukaryotes clustering with gram negative bacteria. Nine supported eukaryotes clustering with archaea. Eight were not clear. This supports the idea of a chimeric origin for eukaryotes.

The results of Gupta and Golding were strongly criticized by Roger and Brown in the Oct. 96 TIBS. The issue is one of small numbers. Many more sequences need to be compared.

An excerpt from the homepage of W. Ford Doolittle, an expert on early evolution.

"Origin of eukaryotic nuclear genes We are finding evidence to support our suspicion that many eukaryotic nuclear genes derive from alpha-proteobacterial genes (genes of protomitochondrial endosymbionts or earlier endosymbionts which failed to establish themselves as permanent cellular organelles)."

The hydrogen hypothesis is the newest alternative theory. (Nature 392, 37-41 1998 and commentary on 15-16 William Martin and Miklos Muller) This hypothesis claims that there was no benefit to the host from aerobic respiration brought in by the mitochondrial ancestor. Instead, the whole process was anaerobic between a host that was a methanogen that converted CO2 and H2 into methane like modern day archaeal methanogens. The symbiont would be a bacterium that could respire in aerobic conditions or ferment in anaerobic conditions producing CO2 and H2 as waste products. Since the waste products are the starting compounds for the methanogen, there would be an strong selection to associate. The theory then proposes movement of genes from the symbiont to the host for glycolysis enzymes and sugar transporters for fuel molecules for the symbiont, so the host can support the symbiont and completely enclose it from the environment. In an aerobic environment, the symbiont can respire, but it is not a benefit to the host until the ADP/ATP carrier can evolve to export the ATP to the cytosol. Loss of respiratory enzymes results in formation of a hydrogenosome organelle, a degenerate mitochondrion.

THE RNA WORLD

Several years ago (Cell 31, 147-157 1982) Tom Cech discovered that RNA could splice introns out of itself (type I and type II introns are self splicing) without any protein being present. This meant that RNA had enzymatic activity. This changed our concept of the origin of life, by saying that proteins were not needed early in the evolution of life. A complete DNA to RNA to protein apparatus was not needed. DNA was not even needed. Everything could be done by RNA. The concept of life without DNA or protein is the RNA world. The RNA world is presented at a page by Leslie Orgel, an origin of life researcher.

What are some present day relics of the RNA world? There are several key biochemical processes that depend on RNA components. These are listed below.

1) Ribosomes have RNA in their structure that is essential for function.  In fact, it is 
     beginning to look like the formation of peptide bonds is catalyzed by the RNA part of 
     the ribosome and not the protein part (Noller, et al Science 256, 1416-1419 1992).
2) Messenger RNA is a key ingredient in information transfer.  Many viruses are RNA 
     viruses that use RNA as the main informational macromolecule.  RNA seems to have 
     preceded DNA as the informational macromolecule for directing protein synthesis.
3) Protein export machinery of the endoplasmic reticulum binds a protein RNA complex 
    called the signal recognition particle SRP.  SRP contains an essential RNA called 7SL 
    RNA.
4) Telomerase, required to maintain the ends of linear chromosomes has an RNA template 
     that is needed to restore chromosome ends after replication.  
5) RNAase P is a protein RNA complex needed for maturation of the 5' end of tRNAs .
     It has an RNA subunit and a protein subunit.  In bacteria, it is clear that the RNA 
     subunit is responsible for enzymatic activity.  The protein enhances activity by 20 fold.
6) snRNAs (small nucleolar RNAs) are needed to process mRNA for export 
     to the cytosol, by removing introns.  These are found in the spliceosome.
7) tRNAs appear to predate protein synthesis.  The anticodon loop seems to be a later 
     evolutionary development.  Some tRNAs have specific catalytic roles, such as the 
     formation of delta aminolevulinate in heme biosynthesis in plants.
8) Many coenzymes contain ribose, and not deoxyribose sugars.  It is suggested in 
    Molecular Biology of the Cell, p. 78, that ribozymes had so few useful structures to 
    catalyze reactions that they needed many coenzymes to add the necessary functionality.  
    (Bruce Martin comments that proteins are no different.  They basically have only acid 
    base chemistry, except for cysteine, so they need the coenzymes as much as ribozymes 
    would.)  The nucleotide part of these coenzymes does not participate in the reactions.
    Instead, it helps the protein recognize and bind the coenzyme.  Ribozymes
    also could bind these coenzymes more readily if they contained nucleotides that 
    could hydrogen bond with the RNA nucleotides.  This may be a reason why so many 
    coenzymes have nucleotide parts: NAD, FAD, CoA, S-adenosylmethionine, cobalamin 
    (vitamin B12).
9) ATP, the major energy currency of the cell, is a ribose containing RNA nucleotide.
10) Group I and group II introns can splice themselves.  The RNA cuts and religates
    without any help from proteins in vitro.  In vivo, proteins aid the reaction.  
11) stRNA (small temporal RNAs) also called miRNAs for micro RNAs are only 21-24 
    nucleotides long, but they are conserved between human and worm and have a role 
    in timing development.  It is not known how far back in time they will be found.

A TIMELINE FOR MAJOR EVENTS IN THE HISTORY OF LIFE ON EARTH

4.5 billion years: planet earth forms.
4.0 billion years: planet surface cools and bombardment from space slows, so life has the 
                            possibility of existing on the planet.  Oldest earth rocks dated by 
                            radioactvity.
3.85 billion years: evidence for life seen in Greenland rocks enriched in C12 isotope.
                            this is a hallmark of life caused by an isotope effect in CO2 fixation.
                            No non-life process can lead to C12 enrichment.
Sometime after 3.85 billion years, but before 3.7 billion years: prokaryotes diverge from 
                            archaea.
Sometime after prokaryotes diverge from archaea, but before 3.7 billion years: chlorophyll 
                            and photosynthesis evolve in the bacterial lineage. Archaea do not make 
                            chlorophyll, and only some bacteria are photosynthetic.
3.7 billion years: first banded iron formation seen. Implies oxygen made by photosynthesis
3.5 billion years: first stromatolites seen. (assumed to be of biological origin, with 
                            cyanobacteria, as in present day stromatolites)
2.7 billion years: steranes = eukaryotic sterol derived biomarkers found in Australian shale
2.1 billion years: First tentative evidence of a eukaryotic microfossil, not yet confirmed
                            (Science 257,232-235 1992)
2.0 billion years: Oxygen begins to rise in the atmosphere after oxygen sinks saturated.
1.5 billion years: Oxygen level in the atmosphere reaches present day level and stabilizes.
1.5 billion years: More convincing evidence of eukaryotic microfossils.  Chloroplasts and 
                            mitochondria present.
1.2 billion years: major eukaryotic phyla diverge. plants branched before animals/fungi
670 million years: invertebrates and vertebrates diverge. Hox gene cluster exists.
530 million years: Cambrian explosion of fossil record. Burgess shale
420 million years: fish and other vertebrates diverge. plants and fungi invade the land
380 million years: vertebrates move onto land
360 million years: gymnosperms(naked seed plants) diverge from angiosperms (flowering plants)
310 million years: birds and other vertebrates diverge.
150-200 million years: monocots diverge from dicots, oldest angiosperm fossil = 142 million years