Protein Expression Systems                               David Nelson, Mar. 24, 1997

	This week we will cover protein expression systems in prokaryotes and eukaryotes.  
There are so many different systems, I have had to select just a few examples from each 
area.  Today we will talk about expression in E. coli.  On Tuesday, we will cover 
eukaryotic systems except for yeast.  Thursday will be reserved for yeast expression 
systems.

	Early on, (1975-1978) yeast DNA was placed in E. coli vectors without 
modification, just to see if it could complement some metabolic defect in E. coli.  Several 
yeast genes were found to complement bacterial genes.  These include:

	HIS3 = hisB in E. coli
	LEU2 = leuB in E.coli
	URA3 = pyrF in E. coli
	TRP1 = trpC in E. coli

	This permitted cloning of these yeast genes by complementation very early on and 
they are the standard set of yeast selectable markers today.  This story is recounted in the 
chapter Genes, Replicators and Centromeres: The First Artificial Chromosomes. by John 
Carbon in The Early Days of Yeast Genetics. Cold Spring Harbor Laboratory Press 1993 
pp. 375-390. (I have this in the lab if you want to read it.)  The E. coli RNA polymerase 
apparently recognized weak promoter like elements in the yeast DNA, not necessarily in the 
yeast promoter itself.  

The pET Vector System

	It soon became clear that this would not work for most eukaryotic genes.  The 
expression of eukaryotic genes in E. coli required vectors that supplied a bacterial or phage 
promoter.  To start our discussion of expression in E. coli, I have chosen a popular system 
based on the phage T7 promoter.  The vectors used are called pET vectors for "plasmid for 
expression by T7 RNA polymerase".  

	This system was created by William Studier and his colleagues.

Studier, F.W. et al. (1990) Use of T7 RNA polymerase to direct expression of cloned 
genes.  Methods in Enzymol. 185, 60-89.

Dubendorff, J.W. and Studier, F.W.(1991) Controlling basal expression in an inducible 
T7 expression system by blocking the target T7 promoter with lac repressor.  J. Mol. Biol. 
219, 45-59.

   They point out that T7 RNA polymerase is very specific for T7 promoters and it does 
not recognize DNA from other sources, since these promoter sequences are very rare.  
Also, termination signals for T7 RNA polymerase are rare too, so long transcripts can be 
made without premature truncation.  The T7 RNA polymerase is about 5 times faster that 
E.coli RNA polymerase, so genes controlled by T7 promoters can be overexpressed.  

	In order for this system to work, the T7 RNA polymerase must be supplied to the 
cells.  A special strain called BL21(DE3) has been made for this purpose.  The BL21(DE3) 
cells are lysogenic for a fragment of the phage DE3.  This fragment contains 
the lacI gene, the lac UV5 promoter, The start of lacZ (beta galactosidase) and the T7 RNA 
polymerase gene.  The lac UV5 promoter is responsible for driving the expression of T7 
RNA polymerase.  It is inducible by IPTG (isopropyl beta D thiogalactopyranoside).  This 
fragment is inserted into the E.coli chromosome.

 In addition to the T7 RNA polymerase, there is a vector that carries the cloned gene behind 
a T7 promoter, and termination signals are downstream of the gene to stop the transcription 
(pET vector).  In theory, the gene can only be made if IPTG is added to activate the lacUV5 
promoter and turn on synthesis of T7 RNA polymerase.  This in turn transcribes the gene 
in the pET vector.  In practice though, the E. coli RNA polymerase does make a small 
amount of T7 RNA polymerase without induction.  It may bind upstream of the lac UV5 
promoter and read through it.  It is said to be a leaky system.  This causes a small amount  
of the cloned gene to be expressed.  If the protein is toxic to the cell, the cells may die.  

    To reduce basal (background) expression of a cloned gene, Studier et al. take advantage 
of the fact that T7 lysozyme is an inhibitor of T7 RNA polymerase.  The lysozyme 
normally attacks the cell wall, but since it is inside the cells, it cannot harm the cell wall.  
By expressing this gene on a different plasmid, the level of T7 RNA polymerase can be 
further reduced.  In fact, there are two different plasmids used for this purpose.  One has 
the lysozyme gene oriented so it is expressed from the tet promoter (pLysE).  The other has 
the gene oriented in the opposite direction (pLysS).  pLysE makes more of the T7 
lysozyme and it has lower levels of T7 RNA polymerase activity, but the cells grow more 
slowly.  pLysS works to reduce T7 RNA polymerase activity, but it does not slow the cells 
growth.  These should both be tried if your protein is toxic.  Once IPTG is added, even the 
pLysE cells express enough T7 RNA polymerase that your gene can be expressed.  

	This system permits even very toxic proteins to be made upon induction.  In fact, 
most of the cells translation machinery gets dedicated to making the protein you desire.  In 
about 3 hours after induction the major protein in the cell can be the expressed gene 
product.  In cases where the protein is not toxic it may be possible to grow the cells more 
slowly overnight and achieve 50% of the cell protein as your gene product.

	The basal expression of T7 RNA polymerase is still a problem in this system if 
your protein is toxic.  To reduce this even further, the lac operator was placed upstream of 
the lac UV5 promoter.  Then lacI was added to the pET vector so lac repressor would be 
made and bind to the operator.  This reduced the ability of E. coli RNA polymerase to read 
through the T7 RNA polymerase gene.  Induction by IPTG bypasses this block, because 
the lac UV5 promoter is downstream of the operator/repressor block.  Extremely toxic 
proteins can be expressed in this system.  

The PL Expression System

	This next system is more recent than the pET vector system.  It is described in the 
1997 Invitrogen catalog.  The PL system is similar to the pET system in that there are two 
interacting components.  One is on the bacterial chromosome and the other is on a vector.  
The goal is to achieve tight control of transcription so that expression is very low until it is 
induced.  The PL system uses tryptophan to induce, so it is not as expensive as the IPTG 
induction. 

	The chromosomal part of the system is the lambda cI repressor gene controlled by 
the tightly regulated trp promoter and operator.  The cI repressor gene is normally on until 
TRP is added.  The tryptophan-trp repressor complex then binds to the operator and shuts 
down the lambda cI repressor gene.  The vector is called pLEX and it carriers the cloned 
gene under control of the strong PL promoter and operator.  PL is the phage lambda leftward 
facing early promoter.  The operator to PL binds the cI repressor and shuts off the PL 
promoter.  In normal cells, the repressor is being made and the cloned gene under control 
of PL is turned off.  When tryptophan is added, the trp promoter is bound by the trp 
repressor protein and shut off, causing the lambda cI protein to cease being transcribed.  
This in turn relieves the inhibition of the PL promoter by cI-repressor complex and turns on 
the cloned gene.  

	This system does not make a special RNA polymerase like T7.  It uses the hosts 
polymerase.  The essence of the control is to limit access to the cloned gene via the action 
of two different repressors.  The PL system is more tightly controlled than the pET system, 
so the leakyness is not as much a problem.  Very toxic proteins can be expressed without 
the need to use added levels of control like the lysozyme trick in the pET system.

The IMPACT System, use of intein splicing in protein expression and purification

This system is very recent.  It is described in The NEB Transcript for Jan. 1997 (a 
newsletter from New England Biolabs, see handout)  This is a very clever approach that 
uses protein self splicing of inteins to remove a purification tag and give pure isolated 
protein in one chromatography step.

	IMPACT stands for Intein Mediated Purification with an Affinity Chitin-binding 
Tag.  An intein is the protein equivalent of an intron in RNA.  An intron is removed to 
make the final message before translation.  Once proteins are translated, it seems unlikely 
that a peptide could be removed without breaking the protein, but recent evidence has 
shown that this can happen.  In fact, the recent genome sequence of the archaebacterium 
Methanococcus jannaschii was predicted to have 18 inteins present, when only 10 total had 
been described previously (Science 273, 1058-1073 1996, see p. 1070, also TIBS 20, 351 
1995).  The process is dependent on specific chemistry involving thiols or hydroxyls and a 
conserved asparagine.

	When inteins are aligned, they all have Cys, Ser or Thr on the C-terminal side of 
both splice junctions.  As we will see, the sulfhydryl or hydroxyl groups are required for 
splicing.  The C-terminal of the intein is usually His-Asn, with the Asn being invariant.  All 
the required information is contained in the intein and the amino acid just beyond the intein.  
If these sequences are placed in the context of a foreign protein, they still splice themselves 
out.  The splicing is so efficient, it was difficult to isolate intermediates.  However, when 
an intein from a thermophilic bacterium was placed in a foreign protein, it  could not be 
spliced unless the temperature was raised.  This system permitted study of the splicing 
reactions.  These are shown in Fig. 2 of your handout.

	The SH or OH group at amino acid 1 if the intein reacts with the peptide bond at the 
first splice junction.  The peptide bond is exchanged for an ester or a thioester linkage.  (I 
don't know if this is frequently occurring in all proteins where there are SH or OH 
containing amino acids).  Next, the SH at the second splice junction attacks the newly 
formed thioester and cleaves it forming a new thioester.  This makes a branched protein 
with the intein C-terminal attached at the second splice junction.  The Asn amide then 
cleaves the intein peptide bond at the second splice junction, releasing the intein.  The two 
protein fragments on either side of the intein (these are called exteins) are now linked by a 
thioester that spontaneously rearranges to form the normal peptide bond.  This structure is 
more stable and the process is finished.

	The IMPACT expression system exploits this unusual chemistry by mutation of the 
C-terminal Asn to Ala in a yeast intein.  This mutation prevents the first cleavage reaction 
and traps the protein in a thioester that can be cleaved by beta mercaptoethanol or DTT 
(dithiothreitol).  At NEB, Dr. Xu and colleagues have made a useful protein purification 
system using the modified yeast intein.  They developed an inducible vector that allows 
fusion of your gene in frame with the Cys of the intein N-terminal.  On the opposite end of 
the intein they added a chitin binding domain (CBD) from Bacillus circulans.  Chitin is one 
of the most abundant biomaterials on earth and it is tough.  It is second only to cellulose 
and its structure is nearly the same.  The C2 OH group of cellulose is replaced by an 
acetamido group in chitin.  Chitin is found in cell walls of fungi, algae and in the 
exoskeletons of arthropods.  An affinity column made with chitin would be very durable 
and it could be reused many times.  

	In the IMPACT system, the fusion protein is made in E. coli and passed down the 
chitin column, where it binds.  The protein can be cleaved off the column by DTT at 4 
degrees C.  This is slow and takes overnight, so it could be a problem if your protein is not 
stable under these conditions.  Detergents and salt do not affect the cleavage reaction, so 
these can be present in the buffer.  The final product is native except for the DTT thioester 
moiety attached at the end.  This can be removed by adding Cys that will compete for the 
thioester linkage.  Once the Cys displaces the DTT it will rearrange to form a peptide bond.  
In effect, this adds a Cys the C-terminal of the protein.  The Cys can be radiolabeled, 
or it can be a site for chemical modification, especially if it is the only Cys in the protein.  
Cys is a good site to add crosslinkers, fluorescent probes, spin labels or other tags.

	If you were in Biochem 811, you remember I talked about GST fusion proteins 
used for purification.  The IMPACT method is an improvement over GST fusions because 
the cleavage method is more reliable.  GST fusions built in a protease site that was 
designed to cut off the purification tag (GST) either on the affinity column or after it was 
eluted.  A protease is usually the last thing you want to add to a pure protein, and you also 
have to remove it later.  The IMPACT system avoids both problems.

	The NEB researchers point out that toxic proteins could be made as precursors with 
inteins.  The intact protein could only be formed after the intein was removed.  However, 
this required that intein splicing be inducible, and that is not true of the present system.  

	The IMPACT system is currently available in E. coli, but there is no reason why it 
could not work in eukaryotes like yeast or insect cells (for baculovirus expression).  NEB 
is working on eukaryotic systems now.

Some general comments about the sequence around the start codon

	In E. coli the sequence just before the start codon is important for expression.
see de Boer, H.A. and Hui, A.S. Methods in Enzymol. 185, 103-114, (1990).  They 
found a 20 fold difference in expression based on a difference in the first three nucleotides 
upstream of the AUG codon.  UAU and CUU were optimal.  UUC, UCA or AGG were 
20 fold lower.  AGG should be avoided because it looks like a part of the Shine Dalgarno 
sequence and it may misdirect ribosome binding.  We have been talking about expressing 
heterologous genes in E. coli.  This often involves adding a restriction site at or near the 
AUG start codon.  NcoI (CCATGG) and NdeI (CATATG) both contain ATG in their 
recognition sequence.  Modification of these upstream nucleotides may have a serious 
negative effect on translation.  

	The sequence downstream of the start codon is also influential.  Detailed studies on 
the first codon downstream of AUG showed a 15 fold difference in expression (EMBO J. 
6, 2489 1987).  This difference was not due to abundance of the different tRNAs as might 
be expected.  There were strong differences in codons that were recognized by the same 
tRNA.  UUU was nine times better than UUC, but they are both recognized by the same 
tRNA Phe.  This gives E. coli a way to regulate expression of a protein by choosing which 
of several codons will be used at the second position.  This codon bias is different from the 
usual codon bias that is based on how frequently a codon appears in coding regions.

	The influence of the first part of the sequence goes farther.  One upregulating 
mutation was found at nucleotide 12 of human gamma interferon that increased expression 
30 fold.  Because of this, some people like to add a known presequence onto proteins they 
want to express in E. coli.  Among cytochrome P450 researchers, it is known that the first 
17 codons of the bovine side chain cleavage P450 are very good for E. coli expression of 
P450 genes.  They often add this piece on to their constructs.  


Secretion from E. coli

	One aspect of expression in E. coli that we will mention briefly is secretion.  It is 
possible to add a signal sequence onto a protein so it will be secreted into the periplasm.  
The problem here is that E. coli normally do not secrete proteins to the medium.  They 
usually export them to the periplasm or the outer membrane, so it is difficult to get them to 
be completely secreted to the medium.  Bacillus subtilis can secrete proteins to the medium.  
The advantages of doing this are great, since the secreted protein will be nearly the only one 
in the medium, purification is much simplified.  Also proteolysis is reduced.  Sometimes 
this can be a serious problem with proteins made and retained in the cytosol.  One other 
advantage is that proteolytic processing of the leader will generate an N-terminal without 
formyl methionine that is the starting amino acid in E. coli.  Careful planning of the fusion 
joint between the desired protein and the leader sequence can give the native protein 
sequence after cleavage.  Finally, proteins in E. coli are not able to form disulfide bonds 
easily because of the strong reducing nature of the cytoplasm.  Eukaryotic proteins with 
many Cys residues would benefit from export to the periplasm or medium where they 
could form the correct disulfide bonds more readily.

See Stader, J.A. and Silhavy, T.J. Engineering E. coli to secrete heterologous gene 
products. Methods in Enzymol. 185, 166-187 (1990).

see Nagarajan, V. System for secretion of heterologous protein in Bacillus subtilis.
Methods in Enzymol. 185, 214-223 (1990)

	It is easier to get a normally secreted protein to be secreted in E. coli.  These tend to 
be smaller proteins and they have evolved to be passed through a membrane.  Non-secreted 
proteins are harder.  Some proteins will jam the export apparatus and are fatal when tagged 
with a signal sequence.  

Because E. coli cannot do the proper post translational modifications to eukaryotic proteins, 
it may be best to go right to the Pichia pastoris yeast secretion system and not bother with 
the E. coli system if you want to secrete a eukaryotic protein.