Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

376

Becker

yielded nonlinear Arrhenius plots that resemble those seen experimentally. It also pointed to the presence of kinetic intermediates that are actually misfolded ‘‘traps’’ and not necessary steps for folding. On the other hand, the ‘‘locally connected’’ model resulted in significantly different kinetics. In one regime of the parameters the overall reaction rate was determined by the rate of going through a bottleneck region (in terms of the order parameter) that corresponds to the state of highest free energy. In other regimes, close to the glass transition, the rate was limited by search through misfolded states.

Another simple model of protein folding kinetics was suggested by Zwanzig [25]. This model assumes that the energy depends solely on the sequence and can be described as a simple function of the distance S between a given conformation and the native state. If N ‘‘parameters’’ (e.g., dihedral angles) characterize the native conformation, then S is the number of parameters in a given conformation that have non-native values. The energy in this model is defined as

ES SU εδSU

(5)

where S 1, 2, . . . , N and both U and ε are assumed to be positive. The positive U ensures a smooth funnel as the energy increases with increasing S, and the positive ε ensures an energy gap between S 0 and S 1. That is, the reaction coordinate is the similarity of a conformation to the native state. The model employs a gap in the energy spectrum, has large configuration entropy, and exhibits a free energy barrier between folded and partially folded states. The folding time in this model was estimated by means of a local thermodynamic equilibrium assumption followed by solving the master equation. It was found that the above set of rules leads to an energy landscape that has two basins, one corresponding to the native state and the other corresponding to an ensemble of partially folded states. Following a short equilibration time the overall kinetics are similar to those of fast-folding two-state systems. The folding time has a maximum near the folding transition temperature and can have a minimum at lower temperatures.

III. LATTICE MODELS

The current understanding of the protein folding process has benefited much from studies that focus on computer simulations of simplified lattice models. These studies try to construct as simple a model as possible that will capture some of the more important properties of the real polypeptide chain. Once such a model is defined it can be explored and studied at a level of detail that is hard to achieve with more realistic (and thus more complex) atomistic models.

In a lattice model the protein is represented as a ‘‘string of beads’’ threaded on a lattice (often denoted as a ‘‘self-avoiding walk’’ on a lattice). Each residue is positioned on a different grid point, and specific nearest-neighbor interactions, which depend on the residues involved, are defined. Once the model is defined the folding process is simulated by local Monte Carlo moves that change the position of the ‘‘beads’’ on the lattice until the chain reaches its lowest energy configuration. In many studies a simple square [20] or a cubic grid was used [26–28], although more complex lattices have also been employed [29,72]. Figure 2 illustrates a simple polypeptide chain with 27 amino acids (27-mer) folded on a 3 3 3 cubic lattice. All in all there are on the order of 1016 conformations of a 27-mer chain on an infinite cubic lattice. Due to an overall attraction between the residues (primarily of hydrophobic nature), the native state of the model protein is ‘‘col-

Protein Folding: Computational Approaches

377

Figure 2 A low energy conformation of a 27-mer lattice model on a 3 3 3 cubic lattice. (Adapted from Ref. 11.)

lapsed’’ and can be fit into a 3 3 3 cube, which is fully occupied by the polypeptide chain. There are more than 100,000 ways to fit a 27-mer into this cube. The most stable conformation, which corresponds to the native state, is determined by the specific interactions dictated by amino acid sequence. Different sequences are likely to have different native conformations, even in the simplified lattice representation.

As discussed above, folding is driven by nonbonded interactions. In lattice models this is represented by ‘‘contact energies,’’ i.e., interactions between residues that are situated on adjacent (or nearest-neighbor) lattice sites but are not covalently bonded to each other. For example, since there are 28 nearest-neighbor contacts in the native structure of a 27-mer in a 3 3 3 cube, each conformation of this model can be characterized in terms of how many of these native contacts are correctly formed. Indeed, in most lattice models simple contact potentials are thus used to represent the effective energy of a given configuration. The combination of a simple model, which enables extensive enumeration of conformations, together with a simple ‘‘contact’’ energy function allows such model studies to determine the thermodynamics and dynamics of the system within a reasonable amount of computer time.

The ‘‘contact’’ energy E of a given conformation is typically calculated by summing the values of energies over all nonbonded contacts in the lattice,

E ε(Si, Sj)(ri rj)

(6)

neighbors

 

where ri and rj denote the locations of residues i and j and (ri rj) 0 unless residues i and j are on adjacent vertices of the lattice. The term ε(si, sj) indicates the nonbonded neighboring interaction between a residue of type si and a residue of type sj. These contact interactions are typically on the order of kB T.

Despite their simplicity, certainly compared to the all-atom potentials used in molecular dynamics studies, these contact energy functions enable the exploration of different interaction scenarios. This diversity is achieved by changing the heterogeneity of the sequence, by altering the number N of different types of ‘‘residues’’ that are being used. The most elementary lattice model involves only two types of monomers: hydrophobic

378

Becker

monomers (H) and polar monomers (P). Such a model is often referred to as an HP model. In HP models, only nearest-neighbor contacts of the type HH have a stabilizing contribution ε 0 to the overall energy, whereas all other contact types, whether HP or PP, are considered neutral, contributing zero energy [18,30]. It was found that most HP model sequences have rugged energy landscapes with many kinetic traps [20]. In this case, folding kinetics involve at least two stages: a fast collapse to compact non-native conformations followed by a slow barrier-climbing process to escape traps and reattempt to fold [18,30,31].

In this respect, the HP model is unlike many real proteins that appear to have smoother landscapes with few traps, resulting in fast folding and two-state kinetics [11,21]. One way to make the model more proteinlike is to increase its heterogeneity. Another way is to introduce a specific bias toward the native state, resulting in a variant model denoted as the HP model [20]. For an HP sequence with a unique native structure, the HP energy given by Eq. (6) is defined by a negative ε value for each native HH contact, by ε 0 for each native HP or PP contact, and by ε for all non-native contacts ( ε 0). As a result the HP and its corresponding HP sequences share the same unique native structure, with the only difference being that in the HP energy function nonnative contacts have unfavorable energies. This extra interaction in the HP model is intended to capture, in a very simple way and without additional parameters, more energetic specificity than the original HP model. The HP model is similar in principle to the ‘‘Go model,’’ which adds an explicit biasing potential to the native structure, ensuring that this structure becomes the global minimum of the whole energy landscape [32,33].

Agreement with the real protein folding process can be obtained by increasing the heterogeneity of the lattice model, using multiple-letter codes and sequence design [26,27,34–37]. A model with 20 different residue types (N 20) is expected to have heterogeneity similar to that of a real protein. In such models the energy is taken from a range of interaction energies, ensuring an overall net attraction. For example, contact energies between adjacent residues may be chosen to have an average of 2kB T with an effective deviation of about kB T, ensuring that the stable native contacts are among the most stable nonbonded interactions, with an average energy of about 3kB T [26]. In other studies the interactions were selected randomly from a continuous range of interactions with special terms to prevent the chain from crossing over itself [34]. Overall, these more complex models show kinetic pathways that converge into folding funnels, guiding the folding to a unique stable native conformation.

A convenient property of all lattice models is the ability to use the ‘‘fraction of native contacts’’ Q as a reaction coordinate or progress variable to describe the folding process. The variable is the ratio between the number of ‘‘native contacts’’ that are observed in any given conformation of the chain and the maximum number of possible native contacts. Thus, Q varies from a value near zero for the highly denatured conformation to unity for the native state. For the 27-mer in a 3 3 3 cube described above, there are 156 different possible contacts and 28 native contacts. For a 125-mer there are 3782 possible contacts and 176 native contacts in a 5 5 5 cube [11]. Although there are many more ‘‘native contacts’’ in a real protein, it is expected that even there a smaller subset of contacts can be used to define the native conformation in a way similar to the Q variable in lattice models. The progress variable Q has been very useful for visualizing the average effective energy and the configuration entropy of the polypeptide chain as it folds from the denatured to the native state. The resulting values, which are averaged over many

Protein Folding: Computational Approaches

379

folding simulations, depend as expected on the temperature at which the simulation is performed.

Like real proteins, lattice models have a narrow optimal temperature range in which the folding process is most efficient. At temperatures that are too low, folding may be extremely slow because the chain cannot escape from local minima. At very high temperatures the native state is not stable, and the number of accessible conformations is so large that the folding problem cannot be solved. Indeed, analysis of a low temperature average effective energy/entropy surface calculated for the 27-mer model on a cubic lattice showed that the conformation space accessible to the protein is limited, even at low Q (unfolded conformations) [11]. At such temperatures the polypeptide chain collapses to a misfolded globular state with a Q value near that of the random coil. The change in configuration entropy on collapse is small enough that its destabilizing contribution to the free energy is compensated for by the burial of hydrophobic groups, even in the absence of native contacts. At this temperature the average effective energy surface as a function of Q is ‘‘rough’’ due to the presence of energy barriers to reorganization within the collapsed state. The transition region at these temperatures was found to be close to the native state (Q 0.7–0.9).

At high folding temperatures, on the other hand, the average effective energy/entropy surface resulting from lattice simulations indicates a different scenario [11]. Early in folding (e.g., for Q 0.2), the surface is very broad, indicating that most of the unfolded configurations are accessible. As the entropy decreases with the increase of Q to unity for the native structure, the surface becomes narrower, resulting in an overall funnel structure for the average effective energy surface. Thus, regardless of the initial conformation, the molecule moves downward in energy toward the native state as the number of stabilizing contacts increases. Despite the smoothness of the effective energy surface, a transition state barrier in the free energy profile can exist even for the 27-mer at relatively high temperatures. The free energy transition barrier corresponds to an entropy ‘‘bottleneck’’ that arises from a reduction of the chain entropy at large Q values (the number of accessible configurations decreases rapidly as Q approaches the native state). In general, it is the balance between the rate of decrease of the energy and that of the entropy that determines whether there is a free energy barrier and where it occurs. A different balance between the two contributions to the free energy could move the transition barrier in the free energy to smaller or larger Q values.

To conclude, although the models used in lattice simulations are very simplified, the results provide general information on possible protein folding scenarios, albeit not on the detailed behavior of specific proteins, which would require more complex models and more accurate potentials. The contribution made by these simulations is that they enable an analysis of the structures, energetics, and dynamics of folding reactions at a level of detail not accessible to experiment.

IV. OFF-LATTICE MINIMALIST MODELS

Despite their contribution to the understanding of protein folding, the correspondence between lattice models and real proteins is still very limited. The first step toward making such models more realistic is to remove the lattice and study off-lattice minimalist models. Simple off-lattice models of proteins can have proteinlike shapes with well-defined sec-

380

Becker

ondary structure elements, as in real proteins. In addition, the continuum character of the conformation space allows for the native state to become a basin rather than a single minimum.

An off-lattice minimalist model that has been extensively studied is the 46-mer β- barrel model, which has a native state characterized by a four-stranded β-barrel. The first to introduce this model were Honeycutt and Thirumalai [38], who used a three-letter code to describe the residues. In this model monomers are labeled hydrophobic (H), hydrophilic (P), or neutral (N) and the sequence that was studied is (H)9(N)3(PH)4(N)3(H)9(N)3(PH)5P. That is, two strands are hydrophobic (residues 1–9 and 24–32) and the other two strands contain alternating H and P beads (residues 12–20 and 36–46). The four strands are connected by neutral three-residue bends. Figure 3 depicts the global minimum conformation of the 46-mer β-barrel model. This β-barrel model was studied by several researchers [38–41], and additional off-lattice minimalist models of α-helical [42] and β-sheet proteins [43] were also investigated.

The energy function of the off-lattice three-letter model is much more elaborate than those used in lattice models [Eq. (6)]. Similar to all-atom energy functions, it includes both bonded and nonbonded energy terms. Bond, bond angle, and dihedral angle energy terms give the model flexibility along the bonded structure while a nonbonded van der Waals interaction term is used to mimic the hydrophobic/hydrophilic character of the different monomer types.

 

 

 

 

 

R

 

 

 

R

 

 

 

 

E {bonds} {angles} {dihedral}

i j 3

4εS1

 

σ

S2

 

6

 

(7)

 

 

 

12

 

σ

 

 

where the bonded energy terms are similar to those used in all-atom models (see Chapter 2), and the parameters S1 and S2 in the van der Waals term distinguishes between the different types of beads. There are attractive interactions between all HH residue pairs (S1 1 and S2 1), repulsion interaction between all PP and PH pairs (S1 2/3 and S2 1), and only excluded volume interactions between the pairs PN, HN, and NN (S1 1 and S2 0).

Studies of this model showed that the underlying energy landscape is very rough, probably due to the long-range and nonspecific character of the interactions. To overcome the roughness and smooth the surface, a ‘‘Go model’’-like variant of the three-letter model was introduced [15]. In this variant the only attractive interactions are those between monomers that form native contacts, i.e., contacts found in the native β-barrel. An analysis of the native β-barrel structure yielded 47 pairs of monomers within a distance of 1.167σ, most of them between hydrophobic monomers. All other pairs have only the repulsive van der Waals term, which accounts for excluded volume. It was shown that this modification removes the roughness that is introduced by the non-native contacts, allowing the sequence to recover a nearly optimal folding behavior.

Recently a different modification of the classic 46-mer β-barrel model was suggested. In this case a single side group, represented by a bead that may be hydrophilic or hydrophobic, was added to the model [44]. Molecular dynamics and quenching simulations showed that the nature and the location of the single side group bead influences both the structure at the global minimum of internal energy and the relaxation process by which the system finds its minima. The most drastic effects occur with a hydrophobic side group in the middle of a sequence of hydrophobic residues.

Protein Folding: Computational Approaches

381

Figure 3 The minimum energy conformation of the off-lattice 46-mer β-barrel model. Hydrophobic residues are in gray, hydrophilic residues in black, and neutral residues are white. (Adapted from Ref. 44.)

382

Becker

V.ATOMISTIC MODELS

The highest level of detail in theoretical studies of protein folding involves the use of detailed atomic models of the protein and the environment. Such models have been discussed in depth in previous chapters of this book. The main limitation of atomic models is that they are computationally much more demanding, a fact that restricts the number of calculations that can be performed with them. In terms of using atomic models for protein folding it is possible to identify two main approaches. The first approach is to study the folding process by performing explicit molecular dynamics simulations of protein unfolding and folding. The other approach is to use conformation sampling techniques to characterize the underlying energy and free energy landscapes.

A. Unfolding/Folding Simulations

The main problem facing the attempt to study room temperature folding by direct molecular dynamics simulations of an all-atom model is that of time scales. Whereas protein folding takes place on the millisecond time scale and up, the time scale accessible to molecular dynamics is on the order of nanoseconds. Recently, using a massively parallel computer, Duan and Kollman [45] performed a 1 µs simulation of the villin headgroup subdomain protein, a 36-residue peptide, in water. Starting from a fully unfolded extended state, including approximately 3000 water molecules, the simulation was able to follow the dynamics of this protein as it adopted a partially folded conformation. Such long-time- scale molecular dynamics (MD) simulations require exceptionally large computational resources. Furthermore, the usefulness of these simulations is limited by the fact that they cannot provide the level of statistics required for studying folding kinetics and thermodynamics. Another problem associated with a direct MD approach to the folding process is that it is unclear how well the MD potential energy functions used fare in the unfolded regime.

Thus, instead of using molecular dynamics to simulate the folding process, many researchers turned their attention to using MD simulations as a tool for studying the inverse process of protein unfolding from the native state. It is hoped, though not proven, that analysis of the unfolding process will contribute to the understanding of the folding process. To speed up the unfolding reaction, which has a significant activation barrier, these studies are typically performed in the high temperature range of 400–600 K. A simple Arrhenius-type calculation shows that the unfolding reaction for a protein that denatures experimentally at 325 K and has an activation barrier for unfolding of 20 kcal/mol is about six orders of magnitude faster at 600 K than at 325 K. Even if the Arrhenius equation is not exact for unfolding reactions, this argument indicates that elevating the temperature reduces the time for unfolding from the experimentally observed millisecond range to the nanosecond time scale, which is accessible to molecular dynamics simulations.

The details of many all-atom unfolding simulation studies have been summarized in several reviews [17,46,47]. These studies include unfolding simulations of α-lactalbumin, lysozyme, bovine pancreatic trypsin inhibitor (BPTI), barnase, apomyoglobin, β-lacta- mase, and more. The advantage of these simulations is that they provide much more detailed information than is available from experiment. However, it should be stressed that there is still only limited evidence that the pathways and intermediates observed in the nanosecond unfolding simulations correlate with the intermediates observed in the actual experiments.