Biochemistry 811
First Lecture on Evolution
D. Nelson, last modified Nov. 23, 2002 8AM
Reading: Berg, Tymoczko and Stryer 5th edition, Chapter 7 and links in the text below
A link to a second evolution lecture
Christian Anfinsen began the preface to his book The Molecular Basis of Evolution with
this sentence: "The writing of this book has been stimulated by the excitement and promise
of contemporary protein chemistry and genetics and by the possibility of integration of
these fields... " this was written in May 1959. More than forty years later it is still true.
Possibly one of the most famous quotes from any biologist is from Theodosius Dobzhansky. It was
actually the title of an article in American Biology Teacher 25, 125-129 1973. The title was
"Nothing in Biology Makes Sense, Except in "the Light of Evolution." That is even more
true now than when it was first written. Today, we will begin the topic of evolution. This
is the concept that all life on the planet is derived from a common ancestor. The
biochemistry of living organisms that we have been studying in this course is a collection of
accumulated successful strategies from billions of years of experimentation in life. The
best estimate for how old life is on earth has been revised to 3.85 billion years
(Nature Nov. 7, 1996). This is based on carbon isotope ratios in some of the oldest
sedimentary rocks known on earth, from the Isua rock belt in Greenland. These rocks do
not contain visible microfossils, but living cells preferentially incorporate the lighter isotope
of carbon, C12 as opposed to C13 or C14. Material that has originated from living things
has a ratio of these isotopes of carbon that reflects depletion of the heavier isotopes.
Carbon from non-biological organic material (calcium carbonate or limestone) has a different ratio.
The carbon isotope ratios seen in these 3.85 billion year old rocks look as though they derived
from living cells. Similar depletion of heavy carbon isotopes has been reported for a Martian
meteorite thought to contain microfossils. (See Science 274, Nov. 8 1996 , p. 918). This has
now been argued to come from earthly contaminants, and the microfossils seem to be too small
to have been real cells. In August 2002 a report appeared in Nature suggesting the rocks in
Greenland might have contamination of graphite from a younger source. (Nature 2002 Aug 8;418(6898)
:627-30 Reassessing the evidence for the earliest traces of life. van Zuilen MA, Lepland A,
Arrhenius G.) The issue is not settled.
Much of the information that follows is taken from Vital Dust, by Christian De Duve.
Christian De Duve has a new book out called Life Evolving, but I have not read it yet.
What does a 3.85 billion year old biomarker mean for the origin of life? The planet formed
about 4.5 billion years ago and it is thought that the surface was either molten or under
continuous bombardment from space early in its history until about 4 billion years ago.
Meteor impacts and volcanic activity would have made the surface unfit for life. The probable
existence of life at 3.85 billion years means that life arose on the planet almost as soon as
it was possible for it to exist on the surface. Therefore, the origin of life on earth was
very rapid. Some , like Francis Crick believe it was too rapid, that 150 million years is not
enough time. They propose a panspermia model where life reached earth from space.
This is not the mainstream view.
The earliest microfossil evidence of cells that resemble cyanobacteria come from
the early archaen Apex chert of Western Australia dated at about 3.5 billion years. (Schopf,
Science 260, 640-646 1993). Other evidence of ancient life on earth are formations called
stromatolites. These are columns of fossilized material that look just like modern structures
seen in various sites around the world, for example Shark Bay, Australia. These are formed by
bacterial colonies that exist in mats, with phototrophs on the top and heterotrophs lower
down. The colonies grow into columnar structures, accumulating debris and eventually
fossilizing. The structures of 3.5 billion year old stromatolites look nearly identical to
modern stromatolites that are still growing. An article in the October 3, 1996 Nature,
p. 385, proposed that these structures may have formed from non-biological processes
involving some fractal growth patterns. However, the microfossil evidence and modern
day stromatolite structure supports the biological origin of stromatolites.
We will assume from this evidence that life on earth dates back 3.85 billion years. The
microfossils in stromatolites that appear similar to modern cyanobacteria, suggest that life
evolved to a form similar to today's bacteria very early and that the bacteria's outward
appearance at least has changed little in the intervening 3.5 billion years. All present day
life is based on a nucleic acid informational molecule, either DNA or RNA that encodes information
needed to make a living cell. The information is coded in a triplet code and though there are
some examples of slight variations in this code, no radical departures from it exist. This
"universal" code is interpreted as proteins by a complex machinery called the ribosome
that is also shared in common among all living things. These main features of information
storage and retrieval are conserved and provide convincing evidence that all life on earth
has one origin and shares a common ancestor.
Since life began, it has been changing in permissible directions. Physical constraints on the
chemistry of life including the properties of water, the nature of carbon and other key
aspects of biology have allowed variations on the original theme, but only within certain
boundaries. However, 3.85 billion years is a long time and many variations have been
tried out and many have been successful. Because the informational molecule has to
transmit the information through time for the blueprint of a cell, it is the information that
has changed. This molecule has been copied billions of times, but not without some errors
creeping in. Today, we can look back in time by comparing the sequences of the
nucleotides and the translated sequences of the proteins from present day organisms. By
quantitating the differences between the sequences and by making some modest
assumptions about rates of change in the sequences, we can estimate when different
organisms diverged from one another. These dates can be calibrated by the fossil record
which can be dated to high precision. Very similar organisms have very similar
sequences, and more distant relatives have more differences in their code. This is the
concept of the molecular clock. With this very simple idea, one can use sequences to build
trees with branches representing different species. The relationships between organisms
can be graphed in this way. If enough organisms are included and the most conserved
sequences are used, a tree of life can be constructed. This is an approximate genealogy
of present day organisms, and it is very interesting. In your handout look at the tree for
the eukaryotes This was made from joining four different proteins together and making a longer
super protein for sequence analysis. The proteins used are EF1 alpha, alpha and beta tubulin
and actin. These are highly conserved eukaryotic proteins suitable for this tree making purpose.
In such a tree, the branches always diverge, they do not merge back together, because
species do not merge, except in very rare events. The place where two branches come
together is the point in time when they were the same species. Farther and farther back on
the tree are divergences that are deeper and more ancient. If we go back far enough, the
most distant branches meet, at the common ancestor. The single celled organism that
gave rise to all life on the planet. This model of evolution has been challenged lately by
evidence that early bacteria shared their DNA often, leading to a network topology rather
than a branching bush. This sharing of genetic material has become less common as the
different lineages have become less similar to each other. However, this makes any effort
to go back and identify the common ancestor impossible, because there were fuzzy
definitions of species at these early times.
To build a tree like this is not trivial. It takes some care to pick the right sequences to use,
because not all sequences are appropriate for this job. Even in a single gene, not all of the
gene is useful for this task. Frequently, only the most conserved parts of a gene are
included in building trees. Ideally, the most ancient features common to all life would be
good candidates to compare. This has been done most often with ribosomal RNA, since
all life has this molecule and it is so central to life that it has to be highly conserved. This
is the basis of the rRNA database project. Here 57,773 rRNA sequences have
been used to make trees. These trees include every kind of life possible to get a pretty
comprehensive tree of life. The results look like the tree in your handout.
Here representatives from the major groups of organisms are shown.
Every other organism will fit on this tree very close to an
existing branch, so it does little good to use all the sequences, because it just clutters up
the picture. It is clear from this tree that there are three main divisions in life. These have
been called domains. In a hierarchy of life, domains are higher than kingdoms. The three
domains are bacteria, archaea and eukarya. The taxonomy of life at these highest levels is a
matter of philosophy. Ernst Mayr, one of the most prominent of all evolutionary
biologists, does not accept the three domain system. He argues that the fundamental
division in life is the prokaryote eukaryote division and that the archaea are just another
group of bacteria that do not deserve domain status. (PNAS 95, 9720 1998 Aug. 18
issue).
Bacteria and archaea are both prokaryotes, without a nucleus. They are as different from
each other as each is to the eukarya, so even though they are prokaryotes, it is difficult to
lump them together. In fact, the most common version of this tree shows the archaea to
be more closely related to eukarya, but this is a debated issue. Researchers into the early
evolution of life are actively trying to resolve this issue, but it is proving to be very hard.
The evidence accumulating now shows that archea are a mix of bacterial and eukaryotic like
genes. This data suggests that early prokaryotes swapped genes frequently, and this
scrambles evolutionary history.
The first cell certainly existed before the common ancestor. We do not know how long a
time elapsed before this major split in life occurred. We can compare the three domains
and make some guesses about what the common ancestor was like. Features that are
present in all three domains were probably present in the common ancestor. It is hard to
go beyond that point, except in very general terms.
For a nice discussion of recent work on this area go
here The Hunt for our Last Common
Ancestor. Source: NASA Stephen Hart.
THE THREE MAIN BRANCHES OF LIFE. HOW DO THEY CLUSTER?
With three branches on a tree, there are three ways to cluster the sequences. Archaea can
cluster with eukarya, or archaea can cluster with bacteria, or bacteria can cluster with
eukarya. What is the correct tree, and how do we know? This is a highly controversial
issue that is being debated today. See Science of Nov. 15, 1996 in the letters section to
see this debate. The Tree of Life web page also has a discussion on this issue. Trees
made from different protein sequences support different clusterings. Also, as we shall see,
the eukarya may be derived from a fusion between an archaeon and a gram negative
bacteria. If the origin of the eukaryotes is really hybrid, then this issue is even more
complex than it first appears.
Let me say that there is a true historical answer to this problem. Either the bacteria and
the archaea diverged first followed by eukarya branching off from the archaeal line (most
common view) or they did not. The problem cannot be solved by looking at individual
protein, RNA or DNA sequences. As I have said, all three possible trees can be found this
way. What needs to be done, and what is probably already being done is to compare
whole genomes, or the most useful parts of whole genomes, the highly conserved genes.
This may represent a hundred or two hundred genes that all living things have to have to
be alive. With this much data, the results should be clear enough to pick between the
three possibilities, or the hybrid eukarya option. As of Nov. 1996, whole genomes from
each domain have been sequenced, and preliminary identity of many of their genes has
been made. The raw data is available for this comparison, and it is almost certainly being
done now. As of Oct. 25, 2002 the NCBI Genomes page shows 119 complete genomes are sequenced.
9 eukaryote (not counting human, mouse, fugu or rice), 16 archaea and 90 bacteria (66 species)
(total of 95 species, since some bacterial species have been sequenced more than once).
From eukarya, yeast (Saccharomyces cerevisiae); Schizosaccharomyces pombe [fission yeast];
and C. elegans, the nematode worm; Drosophila; Anopheles; Plasmodium falciparum (malaria);
Arabidopsis (a model plant); rice and human are complete (almost). Several other genomes
are at various stages of being done (Dictyostelium discoideum [cellular slime mold] Candida
albicans [fungal pathogen], and mouse.) The human genome is nearly done, with most genes
being represented. It is supposed to be completed by April 2003 in polished form.
The majority of M. jannaschii [an Archaea member]
transcription and translation genes look like eukaryotic genes. However, the metabolic
enzymes seem to be more like bacterial enzymes. (see the paper on the genome, Science
273, 1043-1045 and 1058-1073 1996). This confuses the issue, since a single archeon
looks like a hybrid between eukarya and bacteria. If eukaryotes evolved from an archaea,
then there might be no need to invoke a hybrid eukaryote made from a fusion of genomes.
But this does not explain how M. jannaschii came to look so hybrid. This pattern is seen
in other archael genomes and implies that lateral gene transfer from bacteria to archaea was
very common (Current Biology 8, R209 1998).
Within the archaea, there are two main divisions called crenarchaeota (4 complete genomes)
and euryarchaeota (12 complete genomes). Euryarchaeota have histone-like proteins that share a
common ancestor with eukaryotic histones, based on 3-dimensional structures. Crenarchaeota
do not have these histone like proteins. M. jannaschii is from the euryarchaeota. Sulfolobus
solfataricus is from the crenarchaeota. Another aerobic crenarcheon Aeropyrum pernix,
has been finished and is published(DNA Res 1999 Apr 30;6, 83-101, 145-52). The progress on
sequencing genomes has moved forward, but the analysis of the data has not given clear answers
yet to the basic questions about the origins of the main divisions in the tree of life.
THE ROOT OF THE UNIVERSAL TREE OF LIFE
The root of a phylogenetic tree is the location of the last common ancestor shared by all
members of the tree. The root of a tree cannot be determined without an outgroup, a
sequence that is related to the sequences of interest, but not a direct member of that group.
For example, if you wanted to root the tree of mammalian ADP/ATP carriers, you could
include a fungal ADP/ATP carrier as an outgroup. The point on the tree where the
outgroup joins the other sequences is the root.
When there are only three groups being considered, like archaea, bacteria and eukarya, you
cannot root the tree, because there is no outgroup. Clever molecular evolutionists figured
out a way to root this tree by making a tree with duplicated genes that are very old and are
found in all the domains of life. This means that the duplication preceded the divergence of
these three domains. Two duplicated proteins that were used for this are the alpha and beta
subunits of the vacuolar V-type and F1FO ATPases. This did not work too well, because
there are many different V-type and F1FO ATPases, and there is some evidence that there
might have been lateral gene transfer. Another protein set that does not have this criticism
are the elongation factors EF-Tu and EF-G. The EF-G branch on the tree serves to root the
Ef-Tu branch and vice versa. This was reported in the July 96 PNAS vol 93, 7749-7754.
A similar result was found with isoleucyl tRNA synthetase sequences (PNAS 92, 2441
1995).
The analysis of the EF-Tu and EF-G sequences strongly supports the bacteria/archaea split
at the root of the tree. However, the frequently drawn branching of eukaryotes from
archaea does not occur. Instead, eukaryotes branch from within the archaea. They appear
to cluster with the crenarchaeota. The authors caution that more proteins need to be
analyzed in this way and the data is not absolutely convincing. They also point out that
other results have been seen with glutamine synthase, glutamate dehydrogenase and Hsp70
sequences.
GENOME FUSION TO MAKE EUKARYOTES?
What is the evidence for a bacterial, archaeal fusion to make eukaryotes? First, eukaryotes
have a nucleus. Neither archaea or bacteria have a nucleus, and it is very hard to imagine
how such a structure would evolve. Fusion of two cells would offer an explanation of
how a nucleus could be formed. If a gram negative bacterial cell engulfed an archael cell, it
could wrap it in its outer membrane to form a structure that would be like a nucleus. If the
archael cell lost its membrane and used the host membrane instead, this would form the
nuclear envelope and the endoplasmic reticulum. (see Gupta and Golding May 96 TIBS
Fig. 3)
In Feb. 1996 Lynn Margulis published a paper in PNAS suggesting eukaryotes were
formed by an endosymbiosis between prokaryotes. She earlier had promoted the idea
that mitochondria and chloroplasts were endosymbiotic proteobacteria and cyanobacteria,
respectively, and she has been proven correct in that idea. So the idea deserves a fair
hearing. Margulis believes that many genomes have contributed to eukaryotes. Plants, that
could be formed from an endosymbiosis between fungi and algae, might have seven
different genomes mixed together. In May 1996, Radhey S. Gupta and G. Brian Golding
published a discussion in TIBS on the origin of the eukaryotic cell and also called it an
endosymbiotic event. They gave the following evidence in their favor.
Hsp70 proteins (heat shock proteins, the most conserved proteins found in all three
domains) have a unique insert seen at the same site in sequences from eukaryotes and
gram negative bacteria. This insert is missing in gram positive bacteria and archaea. So, if
eukaryotes evolved from archaea, how can the insert be explained?
Gupta and Golding claim that of 24 proteins they examined, 7 supported eukaryotes
clustering with gram negative bacteria. Nine supported eukaryotes clustering with
archaea. Eight were not clear. This supports the idea of a chimeric origin for eukaryotes.
The results of Gupta and Golding were strongly criticized by Roger and Brown in the Oct.
96 TIBS. The issue is one of small numbers. Many more sequences need to be compared.
An excerpt from the homepage of W. Ford Doolittle, an expert on early evolution.
"Origin of eukaryotic nuclear genes We are finding evidence to support our
suspicion that many eukaryotic nuclear genes derive from alpha-proteobacterial genes
(genes of protomitochondrial endosymbionts or earlier endosymbionts which failed to
establish themselves as permanent cellular organelles)."
The hydrogen hypothesis is the newest alternative theory. (Nature 392, 37-41 1998 and
commentary on 15-16 William Martin and Miklos Muller) This hypothesis claims that there
was no benefit to the host from
aerobic respiration brought in by the mitochondrial ancestor. Instead, the whole process
was anaerobic between a host that was a methanogen that converted CO2 and H2 into methane
like modern day archaeal methanogens. The symbiont would be a bacterium that could respire
in aerobic conditions or ferment in anaerobic conditions producing CO2 and H2 as waste products.
Since the waste products are the starting compounds for the methanogen, there would be an
strong selection to associate. The theory then proposes movement of genes from the symbiont
to the host for glycolysis enzymes and sugar transporters for fuel molecules for the symbiont,
so the host can support the symbiont and completely enclose it from the environment.
In an aerobic environment, the symbiont can respire, but it is not a benefit to the host until
the ADP/ATP carrier can evolve to export the ATP to the cytosol. Loss of respiratory enzymes
results in formation of a hydrogenosome organelle, a degenerate mitochondrion.
THE RNA WORLD
Several years ago (Cell 31, 147-157 1982) Tom Cech discovered that RNA could splice
introns out of itself (type I and type II introns are self splicing) without any protein being
present. This meant that RNA had enzymatic activity. This changed our concept of the
origin of life, by saying that proteins were not needed early in the evolution of life. A
complete DNA to RNA to protein apparatus was not needed. DNA was not even needed.
Everything could be done by RNA. The concept of life without DNA or protein is the
RNA world. The RNA world is presented at a page by Leslie Orgel, an origin of life researcher.
What are some present day relics of the RNA world? There are several key biochemical
processes that depend on RNA components. These are listed below.
1) Ribosomes have RNA in their structure that is essential for function. In fact, it is
beginning to look like the formation of peptide bonds is catalyzed by the RNA part of
the ribosome and not the protein part (Noller, et al Science 256, 1416-1419 1992).
2) Messenger RNA is a key ingredient in information transfer. Many viruses are RNA
viruses that use RNA as the main informational macromolecule. RNA seems to have
preceded DNA as the informational macromolecule for directing protein synthesis.
3) Protein export machinery of the endoplasmic reticulum binds a protein RNA complex
called the signal recognition particle SRP. SRP contains an essential RNA called 7SL
RNA.
4) Telomerase, required to maintain the ends of linear chromosomes has an RNA template
that is needed to restore chromosome ends after replication.
5) RNAase P is a protein RNA complex needed for maturation of the 5' end of tRNAs .
It has an RNA subunit and a protein subunit. In bacteria, it is clear that the RNA
subunit is responsible for enzymatic activity. The protein enhances activity by 20 fold.
6) snRNAs (small nucleolar RNAs) are needed to process mRNA for export
to the cytosol, by removing introns. These are found in the spliceosome.
7) tRNAs appear to predate protein synthesis. The anticodon loop seems to be a later
evolutionary development. Some tRNAs have specific catalytic roles, such as the
formation of delta aminolevulinate in heme biosynthesis in plants.
8) Many coenzymes contain ribose, and not deoxyribose sugars. It is suggested in
Molecular Biology of the Cell, p. 78, that ribozymes had so few useful structures to
catalyze reactions that they needed many coenzymes to add the necessary functionality.
(Bruce Martin comments that proteins are no different. They basically have only acid
base chemistry, except for cysteine, so they need the coenzymes as much as ribozymes
would.) The nucleotide part of these coenzymes does not participate in the reactions.
Instead, it helps the protein recognize and bind the coenzyme. Ribozymes
also could bind these coenzymes more readily if they contained nucleotides that
could hydrogen bond with the RNA nucleotides. This may be a reason why so many
coenzymes have nucleotide parts: NAD, FAD, CoA, S-adenosylmethionine, cobalamin
(vitamin B12).
9) ATP, the major energy currency of the cell, is a ribose containing RNA nucleotide.
10) Group I and group II introns can splice themselves. The RNA cuts and religates
without any help from proteins in vitro. In vivo, proteins aid the reaction.
11) stRNA (small temporal RNAs) also called miRNAs for micro RNAs are only 21-24
nucleotides long, but they are conserved between human and worm and have a role
in timing development. It is not known how far back in time they will be found.
A TIMELINE FOR MAJOR EVENTS
IN THE HISTORY OF LIFE ON EARTH
4.5 billion years: planet earth forms.
4.0 billion years: planet surface cools and bombardment from space slows, so life has the
possibility of existing on the planet. Oldest earth rocks dated by
radioactvity.
3.85 billion years: evidence for life seen in Greenland rocks enriched in C12 isotope.
this is a hallmark of life caused by an isotope effect in CO2 fixation.
No non-life process can lead to C12 enrichment.
Sometime after 3.85 billion years, but before 3.7 billion years: prokaryotes diverge from
archaea.
Sometime after prokaryotes diverge from archaea, but before 3.7 billion years: chlorophyll
and photosynthesis evolve in the bacterial lineage. Archaea do not make
chlorophyll, and only some bacteria are photosynthetic.
3.7 billion years: first banded iron formation seen. Implies oxygen made by photosynthesis
3.5 billion years: first stromatolites seen. (assumed to be of biological origin, with
cyanobacteria, as in present day stromatolites)
2.7 billion years: steranes = eukaryotic sterol derived biomarkers found in Australian shale
2.1 billion years: First tentative evidence of a eukaryotic microfossil, not yet confirmed
(Science 257,232-235 1992)
2.0 billion years: Oxygen begins to rise in the atmosphere after oxygen sinks saturated.
1.5 billion years: Oxygen level in the atmosphere reaches present day level and stabilizes.
1.5 billion years: More convincing evidence of eukaryotic microfossils. Chloroplasts and
mitochondria present.
1.2 billion years: major eukaryotic phyla diverge. plants branched before animals/fungi
670 million years: invertebrates and vertebrates diverge. Hox gene cluster exists.
530 million years: Cambrian explosion of fossil record. Burgess shale
420 million years: fish and other vertebrates diverge. plants and fungi invade the land
380 million years: vertebrates move onto land
360 million years: gymnosperms(naked seed plants) diverge from angiosperms (flowering plants)
310 million years: birds and other vertebrates diverge.
150-200 million years: monocots diverge from dicots, oldest angiosperm fossil = 142 million years