Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

John Wiley & Sons - 2004 - Analysis of Genes and Genomes

.pdf
Скачиваний:
342
Добавлен:
17.08.2013
Размер:
11.18 Mб
Скачать

258

PROTEIN PRODUCTION AND PURIFICATION 8

 

 

 

 

ter

 

 

o

 

 

m

 

o

r

 

P

 

 

Multiple cloning

site

Transcriptional terminator

RBS

Expression

vector

Origin of replication

Figure 8.1. The architecture of an expression vector. An expression vector should contain a strong inducible promoter, a multiple cloning site for the insertion of target genes, and a transcriptional terminator. Additionally, a ribosome binding site (RBS) is included to promote efficient translation

Such vectors will often contain a multiple cloning site located between a strong transcriptional promoter and terminator sequence. Additionally, the expression vector, like other plasmids, will contain an origin of replication and a selectable marker such that the vector may be autonomously replicated and maintained within cells.

At high levels, many proteins will be toxic to the host cell in which they are produced. Indeed, some proteins when produced in small amounts will also be toxic to the host. For example, the expression of the poliovirus 3AB gene product is highly toxic to E. coli cells, due to the drastic changes it creates in the membrane permeability of the bacteria (Lama and Carrasco, 1992). Therefore, to maximize protein expression it is vital that an inducible expression system be established, so large quantities of the host cells can be grown before the expression of the target protein is initiated. Protein production can then be activated rapidly and the cells harvested soon afterwards prior to the potentially toxic effects of the expressed protein. Here, we will discuss a number of inducible expression systems that are in common use today. Additionally, we will describe the common host –vector systems that are used for protein production in E. coli, yeast, insect and mammalian cells.

8.1Expression in E. coli

E. coli remains the host cell of choice for the majority of protein expression experiments. Its rapid doubling time (approximately 30 min) in simple defined

8.1 EXPRESSION IN E. coli

259

 

 

(and inexpensive) media, combined with an extensive knowledge of its promoter and terminator sequences, means that many proteins of both prokaryotic and eukaryotic origin can be produced within the organism. Additionally, E. coli cells are easily broken for the harvesting of the proteins produced within the cell. Of course, E. coli does suffer from the fact that is a prokaryotic organism when it is used to produce eukaryotic proteins. E. coli cells are unable to process introns and do not possess the extensive post-translational machinery found in eukaryotic cells that can glycolylate, methylate, phosphorylate or alter the initially produced protein in other ways, such as through extensive disulphide bond formation. The use of cDNA to produce an expression vector overcomes the first of these problems, but if post-translational modifications to the protein are necessary for protein function, then an alternative host must be sought.

Many different promoter sequences have been used to illicit inducible protein production in E. coli (Makrides, 1996). Some of these are discussed below.

8.1.1The lac Promoter

We have already seen (Figure 1.23) that the E. coli lac promoter provides a mechanism for inducible gene expression. The lac genes are expressed maximally when E. coli are grown on lactose. Fusing the lac promoter sequences to another gene will result in the lactose- (or IPTG-) dependent expression of that gene. The lac promoter suffers, however, from a number of problems that mean that it is rarely used to drive the expression of target genes. First, the lac promoter is fairly weak and therefore cannot drive very high levels of protein production, and second the lac genes are transcribed to a significant level in the absence of induction (Gronenborn, 1976). The latter problem can be partially overcome by expressing mutant versions of the lacI gene that have increased DNA binding (and consequently repressing) ability – for example the lacIq allele results in the overproduction of LacI and consequently results in a reduced level of transcription in the absence of inducer (Muller¨-Hill, Crapo and Gilbert, 1968).

8.1.2The tac Promoter

The ease with which the lac promoter can be activated (the addition of IPTG to E. coli cultures) makes it an attractive system for producing target proteins. However, the relative weakness of the promoter means that the target gene will not be greatly over-produced. Through the analysis of many E. coli promoters, consensus sequences for the −35 and −10 regions, to which the RNA polymerase must bind to transcribe the gene, can be determined (Lisser and Margalit, 1993). The lac promoter is weak because the −35 region deviates from the consensus (Figure 8.2). The creation of a fusion sequence

260

PROTEIN PRODUCTION AND PURIFICATION 8

 

 

 

 

−35

−10

 

consensus

consensus

 

5'-TTGACA-3'

5'-TATAAT-3'

lac : 5’-...TTTACACTTTATGCTTCCGGCTCGTATATTGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCT ATG...-3'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

−35

 

 

 

−10

 

 

 

 

RBS

Protein

 

 

 

 

 

 

 

 

 

 

 

 

 

 

trp : 5'-...

TTGACAATTAATCATCGAACTAGTTAACTAGTACGCAAGTTCACGTAAAAAGGGTATCGACA ATG...-3'

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

−35

 

 

−10

 

RBS

Protein

 

 

tac : 5'-...TTGACAATTAATCATCGGCTCGTATAATGTGTGGAATTGTGAGCGGATAACAATTTCACACAGGAAACAGCG ATG...-3'

−35 −10 RBS Protein

Figure 8.2. DNA sequences of the lac, trp and tac promoters. The consensus E. coli −35 and −10 sequences based on the analysis of naturally occurring promoters are shown above, and the sequences of each of the promoters, extending from the −35 region to the translational start site, are shown. The tac promoter is a hybrid of the trp and lac promoters. The −35 and −10 regions it contains closely resemble the consensus sequences. The tac promoter is approximately five times stronger than the lac promoter, but is still inducible by lactose or IPTG

containing the −35 region of the E. coli trp operon, and the −10 region of the lac operon, controlling the expression of the genes responsible for tryptophan biosynthesis and lactose metabolism, respectively, results in the formation of the tac promoter, which is five times as strong as the lac promoter itself (de Boer, Comstock and Vasser, 1983). The tac promoter is able to induce the expression of target genes such that the encoded polypeptide can accumulate at a level of 20 –30 per cent of the total cell protein (Amann, Brosius and Ptashne, 1983). Expression vectors that carry the tac promoter also carry the lacO operator and usually the lacI gene encoding the Lac repressor (Stark, 1987). Genes cloned into these vectors are therefore IPTG inducible and can be repressed and induced in a variety of E. coli strains (Figure 8.3).

8.1.3The λPL Promoter

The λPL promoter is responsible for transcription of the left-hand side of the λ genome, including N and cIII (see Chapter 3). λ repressor, the product of the cI gene, represses the promoter. Two basic methods are used to activate the λPL promoter. In the first, a temperature-sensitive mutant of cI (cI857) is used

in conjunction with λPL for the expression of target genes (Hendrix, 1983). When grown at 30 C the mutant cI protein is able to bind to the λPL promoter and repress it. Above 30 C, however, the mutant cI protein is unable to bind DNA, and the λPL promoter is activated (Bernard et al., 1979). This method produces high levels of target gene expression, but the heat pulse required to induce protein production can be difficult to control. A second way to induce

 

 

 

 

 

 

 

 

8.1

EXPRESSION IN E. coli

261

 

 

 

 

 

 

 

 

 

 

 

kDa

 

 

Protein 1

Protein 2

Protein 3

 

 

 

 

 

 

M

+

+

+

 

IPTG induction

 

100

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

75

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

50

 

 

 

 

 

 

 

 

 

 

 

Protein 1

 

 

 

 

 

 

 

 

 

 

 

 

 

37

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Protein 2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

25

 

 

 

 

 

 

 

 

 

 

 

Protein 3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 8.3. The production of three different proteins in E. coli whose expression is driven from the tac promoter. Bacterial cells, harbouring the appropriate tac based expression vector, were grown in liquid cultures and then IPTG was added, as indicated, to half of each culture. Growth was continued for an additional 90 min before the cells were harvested, broken open and subjected to SDS–polyacrylamide gel electrophoresis. The protein content of each culture was observed by staining the gel with Coomassie blue. The locations of the three proteins produced upon IPTG induction are indicated

the λPL promoter is to transform the expression vector into an E. coli strain in which the cI gene has been placed under the control of the tightly regulated trp promoter. Expression of the target gene can then be induced by the addition of tryptophan to the growth media, which will prevent transcription of the cI gene, and consequently activate the strong λPL promoter. This results in a system that is so tightly controlled that it can be used to express even highly toxic proteins (Wang, Deems and Dennis, 1997; Celis et al., 1998).

8.1.4The T7 Expression System

This is the RNA polymerase encoded by bacteriophage T7 is different from its E. coli counterpart. Unlike the α2β2 subunit structure of the E. coli enzyme, T7 RNA polymerase is a single-subunit enzyme that binds to distinct DNA 17 bp promoter sequences (5 -TAATACGACTCACTATA-3 ) found upstream of the T7 viral gene it activates. E. coli RNA polymerase does not recognize T7 promoter sequences as start sites for transcription. The overall scheme for the production of target proteins using the T7 system is shown in Figure 8.4 (Studier and Moffatt, 1986; Studier et al., 1990). The target gene is cloned into a plasmid expression vector such that it is under the control of the T7 promoter. Propagation of this plasmid in wild-type E. coli cells will not result in the expression of the target gene since the T7 RNA polymerase is absent. To elicit target gene expression, the expression plasmid is transformed into an E. coli strain that contains a copy of T7 gene 1 that is under the control of the lac promoter. Such sequences can be transferred into most E. coli strains using a λ lysogen called DE3 that contains the T7 RNA polymerase gene under

262

PROTEIN PRODUCTION AND PURIFICATION 8

 

 

E. coli RNA

IPTG dissociates lac repressor to initiate transcription

polymerase

 

 

T

 

7 g

 

e

 

n

 

e

 

1

lac promoter

DE3

lac repressor

E. coli genome

lacI gene

T7 RNA polymerase

INACTIVE

T7 lysozyme

pLysS

T7 lysozyme gene

 

 

P

 

NA

7

R

 

T

 

 

T7 promoter

lac

repressor pET

lacI gene

Target gene

Figure 8.4. The T7 system for the expression of proteins in E. coli. The expression vector (pET) contains the target gene under the control of the T7 RNA polymerase promoter. The vector is transformed into an E. coli strain that contains, integrated into its genome, a copy of the gene for T7 RNA polymerase (T7 gene 1) under the control of the lac promoter. Additionally, the promoters for both the target gene and T7 gene 1 also contain the lacO operator sequence and are therefore inhibited by the lac repressor (lacI). IPTG induction allows the transcription of the T7 RNA polymerase gene whose protein product subsequently activates the expression of the target gene. The presence of an additional plasmid in the E. coli cell producing T7 lysozyme inactivates any T7 RNA polymerase that may be produced in the absence of induction. After induction sufficient T7 RNA polymerase is produced to escape this regulation. Reprinted with permission of Novagen, Inc.

the control of the lacUV5 promoter (Figure 8.4). Therefore, IPTG induction will promote the synthesis of T7 RNA polymerase which will bind to the T7 promoter and drive the expression of the target gene. As we have already noted, the lac promoter will express small amounts of the gene it controls even in the absence of inducer. The addition of a lacO sequence in between the T7 promoter and the target gene in the expression vector reduces the level of target gene expression (Dubendorff and Studier, 1991). To control the leaky production of T7 RNA polymerase (thereby ensuring that target gene expression is minimized) E. coli cells can be co-transformed with an additional plasmid. As shown in Figure 8.4, the plasmid pLysS, which uses a different, but compatible, replication origin to the expression vector, will produce T7 lysozyme, which is a natural inhibitor of T7 RNA polymerase. The production

8.1 EXPRESSION IN E. coli

263

 

 

of this inhibitor will inactivate the small levels of polymerase produced in the absence of induction, but will be swamped, and thereby rendered ineffective, by the larger amounts of polymerase produced during induction.

Despite the availability of excellent promoters that will drive high levels of RNA production, many proteins cannot be produced in E. coli cells. Promoter strength is not necessarily the determining factor as to levels at which the target protein will accumulate within the cell. Some additional factors are listed below.

¨

Expression vector levels. Naıvely, one would imaging that increasing the

 

copy number of the expression vector would lead to an increase in the

 

accumulation of the protein it encodes. There are, however, documented

 

cases when a very high expression vector copy number (in comparison

 

to the levels obtained for pBR322) did not result in increased protein

 

production (Yansura and Henner, 1990) and others where increased vector

 

levels actually reduce the levels of protein production (Vasquez et al.,

 

1989). Most commercially available expression vectors today contain the

 

replication origin of either pBR322 or pUC (Chapter 3), and altering copy

 

number is not commonly used to modulate protein production, although

 

some systems are available (Wild, Hradecna and Szybalski, 2001).

Transcriptional termination. Although often overlooked in the design of expression vectors, efficient transcription termination is an essential component for achieving high levels of gene expression. Terminators enhance mRNA stability (Hayashi and Hayashi, 1985) and can lead to substantial increases in the levels of accumulated protein (Vasquez et al., 1989). The two tandem transcriptional terminators (T1 and T2) from the rrnB rRNA operon of E. coli (Brosius et al., 1981) are often present in expression vectors, but other terminators also work well.

Codon usage. The degeneracy of the genetic code means that more than one codon will result in the insertion of an individual amino acid into a growing polypeptide chain. The genes of both prokaryotes and eukaryotes show a non-random usage of alternative codons. Genes containing favourable codons will be translated more efficiently than those containing infrequently used codons. This effect is particularly prevalent in genes that are highly expressed in E. coli, where there is a high degree of codon bias. In general, the frequency of use of alternative codons reflects the abundance of their cognate tRNA molecules. For example, the minor arginine tRNAArg(AGG/AGA) has been shown to be a limiting factor in the bacterial production of several mammalian proteins (Brinkmann, Mattes and Buckel,

264

PROTEIN PRODUCTION AND PURIFICATION 8

 

 

1989) because the codons AGG and AGA are infrequently used in E. coli. The co-expression of the gene coding for tRNAArg(AGG/AGA) (dnaY) can result in high-level production of the target protein whose production is limited in this way. Systems have been established for the expression of other tRNA molecules that occur frequently in mammalian coding sequence but are used rarely in E. coli. One such system uses a bacterial strain (called RosettaTM) that expresses the tRNAs for AGG, AGA, AUA, CUA, CCC and GGA on a plasmid that is compatible with the expression vector. An alternative, although more time consuming, approach to the problem of rare codon occurrence is to mutate the gene that is to be expressed such that the codons it contains are more frequently used by other highly expressed genes in E. coli. That is, the DNA sequence of the gene is altered to allow more favourable codons to be used, but the encoded polypeptide remains unchanged. There does not appear to be a simple correlation between the presence of rare codons within a gene and the levels to which protein production can occur. A combination of consecutive rare codons within the target sequence and other factors reduces the overall efficiency of translation.

Protein sequence. The amino acid sequence of the target protein plays an important role in the ability of the protein to accumulate to high levels. First described in the laboratory of Alexander Varshavsky, the ‘N-end rule’ relates protein stability to the sequences at its amino-terminal end (Bachmair, Finley and Varshavsky, 1986; Varshavsky, 1992). In E. coli, an amino-terminal Arg, Lys, Leu, Phe, Tyr and Trp located directly after the initiating methionine results in proteins with a half-life of less than 2 min. Other amino acids at the same location in the same protein confer a half-life of over 10 h (Tobias et al., 1991). Additional amino-acid-sequence- dependent protein stability factors also exist, reviewed by Makrides (1996).

Protein degradation. E. coli is often considered as a molecular biology ‘bag’ for making DNA and proteins. Of course, the organism is highly developed and contains multiple mechanisms for removal of substances

that may be toxic to it. For example, E. coli contains a large number of proteases located in the cytoplasm and the periplasm and associated with the inner and outer membranes (Chung, 1993; Gottesman, 1996). Proteolysis serves to limit the accumulation of critical regulatory proteins, and also rids the cell of abnormal and mis-folded proteins. Target proteins expressed in E. coli may be mis-folded for a variety of reasons, including the exposure of hydrophobic residues that are normally in the core of the protein, the lack of its normal interaction partners and inappropriate or

8.2 EXPRESSION IN YEAST

265

 

 

missing post-translational modifications. Some methods used to counteract the effects of proteolysis include the use of protease deficient E. coli strains; low-temperature cell growth; expression of the target gene fused to a known stable protein; and the targeting of the produced protein to the periplasm, where fewer proteases exist (Murby, Uhlen´ and Stahl,˚ 1996).

Despite the limitations discussed above, E. coli remains widely used as the organism of choice for protein production. Some of the expression systems that can be used in the laboratory are not, however, suitable for the production of proteins on a very large scale. For example, IPTG induction of human therapeutic proteins is impractical due to the cost of inducing large cultures and the potential toxicity of IPTG itself (Figge et al., 1988).

8.2Expression in Yeast

As eukaryotes, yeasts have many of the advantages of higher-eukaryotic cells, such as post-translational modifications, while at the same time being almost as easy to manipulate as E. coli. Yeast cell growth is faster, easier and less expensive than other eukaryotic cells, and generally gives higher expression levels. Three main species of yeast are used for the production of recombinant proteins – Saccharomyces cerevisiae, Pichia pastoris and Schizosaccharomyces pombe.

8.2.1Saccharomyces cerevisiae

Baker’s yeast, S. cerevisiae, is a single-celled eukaryote that grows rapidly (a doubling time of approximately 90 min) in simple, defined media similar to those used for E. coli cell growth. Proteins produced in S. cerevisiae contain many, but not all, of the post-translation modifications found in highereukaryotic cells. For example, human α-1-antitrypsin, a 52 kDa serum protein involved in the control of coagulation and fibrinolysis, is normally glycosylated. However, if the protein is produced in S. cerevisiae, glycosylation still occurs at the same locations as the human-derived protein, but the glycosylation pattern obtained is very different (Moir and Dumais, 1987).

A number of strong constitutive promoters have been used to drive target gene expression in yeast. For example, the promoters for the genes encoding phosphoglycerate kinase (PGK), glyceraldehyde-3-phosphate dehydrogenase (GPD) and alcohol dehydrogenase (ADH1) have all been used to produce target proteins (Cereghino and Cregg, 1999). However, these suffer similar problems as constitutive E. coli expression systems. A variety of systems for

266

PROTEIN PRODUCTION AND PURIFICATION 8

 

 

the inducible production of target proteins in S. cerevisiae have been utilized. Two of these are discussed below.

8.2.1.1The GAL System

In yeast, like almost all other cells, galactose is converted to glucose-6-phosphate by the enzymes of the Leloir pathway. Each of the Leloir pathway structural genes (collectively called the GAL genes) are expressed at a high level, representing 0.5 –1 per cent of the total cellular mRNA (St John and Davis, 1981), but only when the cells are grown on galactose as the sole carbon source. Each of the GAL genes contains within its promoter at least one, and often multiple, binding sites for the transcriptional activator Gal4p. The binding of Gal4p to these sites, and its transcriptional activity when bound, is regulated by the source of carbon available to the cell. When yeast is grown on glucose, its preferred carbon source, transcription from the GAL4 promoter (regulating the production of Gal4p) is down-regulated so that there is less Gal4p in the cell, and consequently a reduced level of activator binding at the promoters of the GAL structural genes (Griggs and Johnston, 1991). In other carbon sources, such as raffinose, Gal4p is produced and binds to the GAL structure gene promoters, but a repressor, Gal80p, inhibits its activity. Gal80p binds directly to Gal4p and is thought to mask its activation domain such that it is unable to recruit the transcriptional machinery to the gene (Lue et al., 1987). Only in the presence of galactose is the inhibitory effect of Gal80p alleviated, leading to strong, inducible levels of target gene expression.

To produce a target protein in S. cerevisiae using galactose induction, the gene encoding the protein must be cloned so that it is under the control of a GAL promoter. The promoter from the GAL1 gene, encoding galactokinase, is most commonly used, but synthetic promoters containing multiple Gal4p binding sites are also available. Once constructed, the expression vector is transformed into yeast cells and protein production is initiated by switching the cells into a galactose-containing medium. Proteins produced in this way seldom accumulate to the levels of recombinant protein found in E. coli cells. It not usually possible to detect protein produced in this way using Coomassie stained gels, such as those in Figure 8.3, and maximum production may represent only 1 –5 per cent of the total cell protein. Western blotting, or other methods to detect the target protein, must be used. An additional difficulty is brought about as a consequence of the activator of the GAL genes, Gal4p, being normally present in the yeast cell at a very low level. Therefore, if the expression vector, which carries multiple Gal4p binding sites, is a high-copy-number plasmid then there may be insufficient Gal4p to activate the expression of all of the available target genes to a maximum level. To overcome this problem, yeast strains have

8.2 EXPRESSION IN YEAST

267

 

 

been constructed in which the coding sequence of the GAL4 gene has been placed under the GAL promoter control (Schultz et al., 1987; Mylin et al., 1990). This results in a feedback loop in which induction by galactose results in the production of Gal4p so that more of the target gene may be expressed (Figure 8.5).

Gal4p produced from its own promoter

PGAL4

GAL4

 

 

 

Gal4p

Target

 

 

gene

 

 

PGAL1

 

 

Expression

Expression

Expression

vector

vector

vector

Target protein

Gal4p produced from the GAL1 promoter

PGAL1 GAL4

Gal4p

Target

gene

PGAL1

 

 

Expression

Expression

Expression

vector

vector

vector

Target

 

 

protein

 

 

Figure 8.5. Galactose inducible gene expression in yeast. The expression of genes from multicopy vectors under the control of the GAL1 promoter (PGAL1) can be increased substantially if the gene encoding the transcriptional activator of GAL1, GAL4, is also placed under the control of PGAL1. In this case, induction by galactose will produce more Gal4p and consequently more of the target protein

Соседние файлы в предмете Генетика