Protein Expression Systems David Nelson, Mar. 24, 1997
This week we will cover protein expression systems in prokaryotes and eukaryotes.
There are so many different systems, I have had to select just a few examples from each
area. Today we will talk about expression in E. coli. On Tuesday, we will cover
eukaryotic systems except for yeast. Thursday will be reserved for yeast expression
systems.
Early on, (1975-1978) yeast DNA was placed in E. coli vectors without
modification, just to see if it could complement some metabolic defect in E. coli. Several
yeast genes were found to complement bacterial genes. These include:
HIS3 = hisB in E. coli
LEU2 = leuB in E.coli
URA3 = pyrF in E. coli
TRP1 = trpC in E. coli
This permitted cloning of these yeast genes by complementation very early on and
they are the standard set of yeast selectable markers today. This story is recounted in the
chapter Genes, Replicators and Centromeres: The First Artificial Chromosomes. by John
Carbon in The Early Days of Yeast Genetics. Cold Spring Harbor Laboratory Press 1993
pp. 375-390. (I have this in the lab if you want to read it.) The E. coli RNA polymerase
apparently recognized weak promoter like elements in the yeast DNA, not necessarily in the
yeast promoter itself.
The pET Vector System
It soon became clear that this would not work for most eukaryotic genes. The
expression of eukaryotic genes in E. coli required vectors that supplied a bacterial or phage
promoter. To start our discussion of expression in E. coli, I have chosen a popular system
based on the phage T7 promoter. The vectors used are called pET vectors for "plasmid for
expression by T7 RNA polymerase".
This system was created by William Studier and his colleagues.
Studier, F.W. et al. (1990) Use of T7 RNA polymerase to direct expression of cloned
genes. Methods in Enzymol. 185, 60-89.
Dubendorff, J.W. and Studier, F.W.(1991) Controlling basal expression in an inducible
T7 expression system by blocking the target T7 promoter with lac repressor. J. Mol. Biol.
219, 45-59.
They point out that T7 RNA polymerase is very specific for T7 promoters and it does
not recognize DNA from other sources, since these promoter sequences are very rare.
Also, termination signals for T7 RNA polymerase are rare too, so long transcripts can be
made without premature truncation. The T7 RNA polymerase is about 5 times faster that
E.coli RNA polymerase, so genes controlled by T7 promoters can be overexpressed.
In order for this system to work, the T7 RNA polymerase must be supplied to the
cells. A special strain called BL21(DE3) has been made for this purpose. The BL21(DE3)
cells are lysogenic for a fragment of the phage DE3. This fragment contains
the lacI gene, the lac UV5 promoter, The start of lacZ (beta galactosidase) and the T7 RNA
polymerase gene. The lac UV5 promoter is responsible for driving the expression of T7
RNA polymerase. It is inducible by IPTG (isopropyl beta D thiogalactopyranoside). This
fragment is inserted into the E.coli chromosome.
In addition to the T7 RNA polymerase, there is a vector that carries the cloned gene behind
a T7 promoter, and termination signals are downstream of the gene to stop the transcription
(pET vector). In theory, the gene can only be made if IPTG is added to activate the lacUV5
promoter and turn on synthesis of T7 RNA polymerase. This in turn transcribes the gene
in the pET vector. In practice though, the E. coli RNA polymerase does make a small
amount of T7 RNA polymerase without induction. It may bind upstream of the lac UV5
promoter and read through it. It is said to be a leaky system. This causes a small amount
of the cloned gene to be expressed. If the protein is toxic to the cell, the cells may die.
To reduce basal (background) expression of a cloned gene, Studier et al. take advantage
of the fact that T7 lysozyme is an inhibitor of T7 RNA polymerase. The lysozyme
normally attacks the cell wall, but since it is inside the cells, it cannot harm the cell wall.
By expressing this gene on a different plasmid, the level of T7 RNA polymerase can be
further reduced. In fact, there are two different plasmids used for this purpose. One has
the lysozyme gene oriented so it is expressed from the tet promoter (pLysE). The other has
the gene oriented in the opposite direction (pLysS). pLysE makes more of the T7
lysozyme and it has lower levels of T7 RNA polymerase activity, but the cells grow more
slowly. pLysS works to reduce T7 RNA polymerase activity, but it does not slow the cells
growth. These should both be tried if your protein is toxic. Once IPTG is added, even the
pLysE cells express enough T7 RNA polymerase that your gene can be expressed.
This system permits even very toxic proteins to be made upon induction. In fact,
most of the cells translation machinery gets dedicated to making the protein you desire. In
about 3 hours after induction the major protein in the cell can be the expressed gene
product. In cases where the protein is not toxic it may be possible to grow the cells more
slowly overnight and achieve 50% of the cell protein as your gene product.
The basal expression of T7 RNA polymerase is still a problem in this system if
your protein is toxic. To reduce this even further, the lac operator was placed upstream of
the lac UV5 promoter. Then lacI was added to the pET vector so lac repressor would be
made and bind to the operator. This reduced the ability of E. coli RNA polymerase to read
through the T7 RNA polymerase gene. Induction by IPTG bypasses this block, because
the lac UV5 promoter is downstream of the operator/repressor block. Extremely toxic
proteins can be expressed in this system.
The PL Expression System
This next system is more recent than the pET vector system. It is described in the
1997 Invitrogen catalog. The PL system is similar to the pET system in that there are two
interacting components. One is on the bacterial chromosome and the other is on a vector.
The goal is to achieve tight control of transcription so that expression is very low until it is
induced. The PL system uses tryptophan to induce, so it is not as expensive as the IPTG
induction.
The chromosomal part of the system is the lambda cI repressor gene controlled by
the tightly regulated trp promoter and operator. The cI repressor gene is normally on until
TRP is added. The tryptophan-trp repressor complex then binds to the operator and shuts
down the lambda cI repressor gene. The vector is called pLEX and it carriers the cloned
gene under control of the strong PL promoter and operator. PL is the phage lambda leftward
facing early promoter. The operator to PL binds the cI repressor and shuts off the PL
promoter. In normal cells, the repressor is being made and the cloned gene under control
of PL is turned off. When tryptophan is added, the trp promoter is bound by the trp
repressor protein and shut off, causing the lambda cI protein to cease being transcribed.
This in turn relieves the inhibition of the PL promoter by cI-repressor complex and turns on
the cloned gene.
This system does not make a special RNA polymerase like T7. It uses the hosts
polymerase. The essence of the control is to limit access to the cloned gene via the action
of two different repressors. The PL system is more tightly controlled than the pET system,
so the leakyness is not as much a problem. Very toxic proteins can be expressed without
the need to use added levels of control like the lysozyme trick in the pET system.
The IMPACT System, use of intein splicing in protein expression and purification
This system is very recent. It is described in The NEB Transcript for Jan. 1997 (a
newsletter from New England Biolabs, see handout) This is a very clever approach that
uses protein self splicing of inteins to remove a purification tag and give pure isolated
protein in one chromatography step.
IMPACT stands for Intein Mediated Purification with an Affinity Chitin-binding
Tag. An intein is the protein equivalent of an intron in RNA. An intron is removed to
make the final message before translation. Once proteins are translated, it seems unlikely
that a peptide could be removed without breaking the protein, but recent evidence has
shown that this can happen. In fact, the recent genome sequence of the archaebacterium
Methanococcus jannaschii was predicted to have 18 inteins present, when only 10 total had
been described previously (Science 273, 1058-1073 1996, see p. 1070, also TIBS 20, 351
1995). The process is dependent on specific chemistry involving thiols or hydroxyls and a
conserved asparagine.
When inteins are aligned, they all have Cys, Ser or Thr on the C-terminal side of
both splice junctions. As we will see, the sulfhydryl or hydroxyl groups are required for
splicing. The C-terminal of the intein is usually His-Asn, with the Asn being invariant. All
the required information is contained in the intein and the amino acid just beyond the intein.
If these sequences are placed in the context of a foreign protein, they still splice themselves
out. The splicing is so efficient, it was difficult to isolate intermediates. However, when
an intein from a thermophilic bacterium was placed in a foreign protein, it could not be
spliced unless the temperature was raised. This system permitted study of the splicing
reactions. These are shown in Fig. 2 of your handout.
The SH or OH group at amino acid 1 if the intein reacts with the peptide bond at the
first splice junction. The peptide bond is exchanged for an ester or a thioester linkage. (I
don't know if this is frequently occurring in all proteins where there are SH or OH
containing amino acids). Next, the SH at the second splice junction attacks the newly
formed thioester and cleaves it forming a new thioester. This makes a branched protein
with the intein C-terminal attached at the second splice junction. The Asn amide then
cleaves the intein peptide bond at the second splice junction, releasing the intein. The two
protein fragments on either side of the intein (these are called exteins) are now linked by a
thioester that spontaneously rearranges to form the normal peptide bond. This structure is
more stable and the process is finished.
The IMPACT expression system exploits this unusual chemistry by mutation of the
C-terminal Asn to Ala in a yeast intein. This mutation prevents the first cleavage reaction
and traps the protein in a thioester that can be cleaved by beta mercaptoethanol or DTT
(dithiothreitol). At NEB, Dr. Xu and colleagues have made a useful protein purification
system using the modified yeast intein. They developed an inducible vector that allows
fusion of your gene in frame with the Cys of the intein N-terminal. On the opposite end of
the intein they added a chitin binding domain (CBD) from Bacillus circulans. Chitin is one
of the most abundant biomaterials on earth and it is tough. It is second only to cellulose
and its structure is nearly the same. The C2 OH group of cellulose is replaced by an
acetamido group in chitin. Chitin is found in cell walls of fungi, algae and in the
exoskeletons of arthropods. An affinity column made with chitin would be very durable
and it could be reused many times.
In the IMPACT system, the fusion protein is made in E. coli and passed down the
chitin column, where it binds. The protein can be cleaved off the column by DTT at 4
degrees C. This is slow and takes overnight, so it could be a problem if your protein is not
stable under these conditions. Detergents and salt do not affect the cleavage reaction, so
these can be present in the buffer. The final product is native except for the DTT thioester
moiety attached at the end. This can be removed by adding Cys that will compete for the
thioester linkage. Once the Cys displaces the DTT it will rearrange to form a peptide bond.
In effect, this adds a Cys the C-terminal of the protein. The Cys can be radiolabeled,
or it can be a site for chemical modification, especially if it is the only Cys in the protein.
Cys is a good site to add crosslinkers, fluorescent probes, spin labels or other tags.
If you were in Biochem 811, you remember I talked about GST fusion proteins
used for purification. The IMPACT method is an improvement over GST fusions because
the cleavage method is more reliable. GST fusions built in a protease site that was
designed to cut off the purification tag (GST) either on the affinity column or after it was
eluted. A protease is usually the last thing you want to add to a pure protein, and you also
have to remove it later. The IMPACT system avoids both problems.
The NEB researchers point out that toxic proteins could be made as precursors with
inteins. The intact protein could only be formed after the intein was removed. However,
this required that intein splicing be inducible, and that is not true of the present system.
The IMPACT system is currently available in E. coli, but there is no reason why it
could not work in eukaryotes like yeast or insect cells (for baculovirus expression). NEB
is working on eukaryotic systems now.
Some general comments about the sequence around the start codon
In E. coli the sequence just before the start codon is important for expression.
see de Boer, H.A. and Hui, A.S. Methods in Enzymol. 185, 103-114, (1990). They
found a 20 fold difference in expression based on a difference in the first three nucleotides
upstream of the AUG codon. UAU and CUU were optimal. UUC, UCA or AGG were
20 fold lower. AGG should be avoided because it looks like a part of the Shine Dalgarno
sequence and it may misdirect ribosome binding. We have been talking about expressing
heterologous genes in E. coli. This often involves adding a restriction site at or near the
AUG start codon. NcoI (CCATGG) and NdeI (CATATG) both contain ATG in their
recognition sequence. Modification of these upstream nucleotides may have a serious
negative effect on translation.
The sequence downstream of the start codon is also influential. Detailed studies on
the first codon downstream of AUG showed a 15 fold difference in expression (EMBO J.
6, 2489 1987). This difference was not due to abundance of the different tRNAs as might
be expected. There were strong differences in codons that were recognized by the same
tRNA. UUU was nine times better than UUC, but they are both recognized by the same
tRNA Phe. This gives E. coli a way to regulate expression of a protein by choosing which
of several codons will be used at the second position. This codon bias is different from the
usual codon bias that is based on how frequently a codon appears in coding regions.
The influence of the first part of the sequence goes farther. One upregulating
mutation was found at nucleotide 12 of human gamma interferon that increased expression
30 fold. Because of this, some people like to add a known presequence onto proteins they
want to express in E. coli. Among cytochrome P450 researchers, it is known that the first
17 codons of the bovine side chain cleavage P450 are very good for E. coli expression of
P450 genes. They often add this piece on to their constructs.
Secretion from E. coli
One aspect of expression in E. coli that we will mention briefly is secretion. It is
possible to add a signal sequence onto a protein so it will be secreted into the periplasm.
The problem here is that E. coli normally do not secrete proteins to the medium. They
usually export them to the periplasm or the outer membrane, so it is difficult to get them to
be completely secreted to the medium. Bacillus subtilis can secrete proteins to the medium.
The advantages of doing this are great, since the secreted protein will be nearly the only one
in the medium, purification is much simplified. Also proteolysis is reduced. Sometimes
this can be a serious problem with proteins made and retained in the cytosol. One other
advantage is that proteolytic processing of the leader will generate an N-terminal without
formyl methionine that is the starting amino acid in E. coli. Careful planning of the fusion
joint between the desired protein and the leader sequence can give the native protein
sequence after cleavage. Finally, proteins in E. coli are not able to form disulfide bonds
easily because of the strong reducing nature of the cytoplasm. Eukaryotic proteins with
many Cys residues would benefit from export to the periplasm or medium where they
could form the correct disulfide bonds more readily.
See Stader, J.A. and Silhavy, T.J. Engineering E. coli to secrete heterologous gene
products. Methods in Enzymol. 185, 166-187 (1990).
see Nagarajan, V. System for secretion of heterologous protein in Bacillus subtilis.
Methods in Enzymol. 185, 214-223 (1990)
It is easier to get a normally secreted protein to be secreted in E. coli. These tend to
be smaller proteins and they have evolved to be passed through a membrane. Non-secreted
proteins are harder. Some proteins will jam the export apparatus and are fatal when tagged
with a signal sequence.
Because E. coli cannot do the proper post translational modifications to eukaryotic proteins,
it may be best to go right to the Pichia pastoris yeast secretion system and not bother with
the E. coli system if you want to secrete a eukaryotic protein.