- •Foreword
- •Preface
- •Contents
- •Introduction
- •Oren M. Becker
- •Alexander D. MacKerell, Jr.
- •Masakatsu Watanabe*
- •III. SCOPE OF THE BOOK
- •IV. TOWARD A NEW ERA
- •REFERENCES
- •Atomistic Models and Force Fields
- •Alexander D. MacKerell, Jr.
- •II. POTENTIAL ENERGY FUNCTIONS
- •D. Alternatives to the Potential Energy Function
- •III. EMPIRICAL FORCE FIELDS
- •A. From Potential Energy Functions to Force Fields
- •B. Overview of Available Force Fields
- •C. Free Energy Force Fields
- •D. Applicability of Force Fields
- •IV. DEVELOPMENT OF EMPIRICAL FORCE FIELDS
- •B. Optimization Procedures Used in Empirical Force Fields
- •D. Use of Quantum Mechanical Results as Target Data
- •VI. CONCLUSION
- •REFERENCES
- •Dynamics Methods
- •Oren M. Becker
- •Masakatsu Watanabe*
- •II. TYPES OF MOTIONS
- •IV. NEWTONIAN MOLECULAR DYNAMICS
- •A. Newton’s Equation of Motion
- •C. Molecular Dynamics: Computational Algorithms
- •A. Assigning Initial Values
- •B. Selecting the Integration Time Step
- •C. Stability of Integration
- •VI. ANALYSIS OF DYNAMIC TRAJECTORIES
- •B. Averages and Fluctuations
- •C. Correlation Functions
- •D. Potential of Mean Force
- •VII. OTHER MD SIMULATION APPROACHES
- •A. Stochastic Dynamics
- •B. Brownian Dynamics
- •VIII. ADVANCED SIMULATION TECHNIQUES
- •A. Constrained Dynamics
- •C. Other Approaches and Future Direction
- •REFERENCES
- •Conformational Analysis
- •Oren M. Becker
- •II. CONFORMATION SAMPLING
- •A. High Temperature Molecular Dynamics
- •B. Monte Carlo Simulations
- •C. Genetic Algorithms
- •D. Other Search Methods
- •III. CONFORMATION OPTIMIZATION
- •A. Minimization
- •B. Simulated Annealing
- •IV. CONFORMATIONAL ANALYSIS
- •A. Similarity Measures
- •B. Cluster Analysis
- •C. Principal Component Analysis
- •REFERENCES
- •Thomas A. Darden
- •II. CONTINUUM BOUNDARY CONDITIONS
- •III. FINITE BOUNDARY CONDITIONS
- •IV. PERIODIC BOUNDARY CONDITIONS
- •REFERENCES
- •Internal Coordinate Simulation Method
- •Alexey K. Mazur
- •II. INTERNAL AND CARTESIAN COORDINATES
- •III. PRINCIPLES OF MODELING WITH INTERNAL COORDINATES
- •B. Energy Gradients
- •IV. INTERNAL COORDINATE MOLECULAR DYNAMICS
- •A. Main Problems and Historical Perspective
- •B. Dynamics of Molecular Trees
- •C. Simulation of Flexible Rings
- •A. Time Step Limitations
- •B. Standard Geometry Versus Unconstrained Simulations
- •VI. CONCLUDING REMARKS
- •REFERENCES
- •Implicit Solvent Models
- •II. BASIC FORMULATION OF IMPLICIT SOLVENT
- •A. The Potential of Mean Force
- •III. DECOMPOSITION OF THE FREE ENERGY
- •A. Nonpolar Free Energy Contribution
- •B. Electrostatic Free Energy Contribution
- •IV. CLASSICAL CONTINUUM ELECTROSTATICS
- •A. The Poisson Equation for Macroscopic Media
- •B. Electrostatic Forces and Analytic Gradients
- •C. Treatment of Ionic Strength
- •A. Statistical Mechanical Integral Equations
- •VI. SUMMARY
- •REFERENCES
- •Steven Hayward
- •II. NORMAL MODE ANALYSIS IN CARTESIAN COORDINATE SPACE
- •B. Normal Mode Analysis in Dihedral Angle Space
- •C. Approximate Methods
- •IV. NORMAL MODE REFINEMENT
- •C. Validity of the Concept of a Normal Mode Important Subspace
- •A. The Solvent Effect
- •B. Anharmonicity and Normal Mode Analysis
- •VI. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Free Energy Calculations
- •Thomas Simonson
- •II. GENERAL BACKGROUND
- •A. Thermodynamic Cycles for Solvation and Binding
- •B. Thermodynamic Perturbation Theory
- •D. Other Thermodynamic Functions
- •E. Free Energy Component Analysis
- •III. STANDARD BINDING FREE ENERGIES
- •IV. CONFORMATIONAL FREE ENERGIES
- •A. Conformational Restraints or Umbrella Sampling
- •B. Weighted Histogram Analysis Method
- •C. Conformational Constraints
- •A. Dielectric Reaction Field Approaches
- •B. Lattice Summation Methods
- •VI. IMPROVING SAMPLING
- •A. Multisubstate Approaches
- •B. Umbrella Sampling
- •C. Moving Along
- •VII. PERSPECTIVES
- •REFERENCES
- •John E. Straub
- •B. Phenomenological Rate Equations
- •II. TRANSITION STATE THEORY
- •A. Building the TST Rate Constant
- •B. Some Details
- •C. Computing the TST Rate Constant
- •III. CORRECTIONS TO TRANSITION STATE THEORY
- •A. Computing Using the Reactive Flux Method
- •B. How Dynamic Recrossings Lower the Rate Constant
- •IV. FINDING GOOD REACTION COORDINATES
- •A. Variational Methods for Computing Reaction Paths
- •B. Choice of a Differential Cost Function
- •C. Diffusional Paths
- •VI. HOW TO CONSTRUCT A REACTION PATH
- •A. The Use of Constraints and Restraints
- •B. Variationally Optimizing the Cost Function
- •VII. FOCAL METHODS FOR REFINING TRANSITION STATES
- •VIII. HEURISTIC METHODS
- •IX. SUMMARY
- •ACKNOWLEDGMENT
- •REFERENCES
- •Paul D. Lyne
- •Owen A. Walsh
- •II. BACKGROUND
- •III. APPLICATIONS
- •A. Triosephosphate Isomerase
- •B. Bovine Protein Tyrosine Phosphate
- •C. Citrate Synthase
- •IV. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Jeremy C. Smith
- •III. SCATTERING BY CRYSTALS
- •IV. NEUTRON SCATTERING
- •A. Coherent Inelastic Neutron Scattering
- •B. Incoherent Neutron Scattering
- •REFERENCES
- •Michael Nilges
- •II. EXPERIMENTAL DATA
- •A. Deriving Conformational Restraints from NMR Data
- •B. Distance Restraints
- •C. The Hybrid Energy Approach
- •III. MINIMIZATION PROCEDURES
- •A. Metric Matrix Distance Geometry
- •B. Molecular Dynamics Simulated Annealing
- •C. Folding Random Structures by Simulated Annealing
- •IV. AUTOMATED INTERPRETATION OF NOE SPECTRA
- •B. Automated Assignment of Ambiguities in the NOE Data
- •C. Iterative Explicit NOE Assignment
- •D. Symmetrical Oligomers
- •VI. INFLUENCE OF INTERNAL DYNAMICS ON THE
- •EXPERIMENTAL DATA
- •VII. STRUCTURE QUALITY AND ENERGY PARAMETERS
- •VIII. RECENT APPLICATIONS
- •REFERENCES
- •II. STEPS IN COMPARATIVE MODELING
- •C. Model Building
- •D. Loop Modeling
- •E. Side Chain Modeling
- •III. AB INITIO PROTEIN STRUCTURE MODELING METHODS
- •IV. ERRORS IN COMPARATIVE MODELS
- •VI. APPLICATIONS OF COMPARATIVE MODELING
- •VII. COMPARATIVE MODELING IN STRUCTURAL GENOMICS
- •VIII. CONCLUSION
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Roland L. Dunbrack, Jr.
- •II. BAYESIAN STATISTICS
- •A. Bayesian Probability Theory
- •B. Bayesian Parameter Estimation
- •C. Frequentist Probability Theory
- •D. Bayesian Methods Are Superior to Frequentist Methods
- •F. Simulation via Markov Chain Monte Carlo Methods
- •III. APPLICATIONS IN MOLECULAR BIOLOGY
- •B. Bayesian Sequence Alignment
- •IV. APPLICATIONS IN STRUCTURAL BIOLOGY
- •A. Secondary Structure and Surface Accessibility
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Computer Aided Drug Design
- •Alexander Tropsha and Weifan Zheng
- •IV. SUMMARY AND CONCLUSIONS
- •REFERENCES
- •Oren M. Becker
- •II. SIMPLE MODELS
- •III. LATTICE MODELS
- •B. Mapping Atomistic Energy Landscapes
- •C. Mapping Atomistic Free Energy Landscapes
- •VI. SUMMARY
- •REFERENCES
- •Toshiko Ichiye
- •II. ELECTRON TRANSFER PROPERTIES
- •B. Potential Energy Parameters
- •IV. REDOX POTENTIALS
- •A. Calculation of the Energy Change of the Redox Site
- •B. Calculation of the Energy Changes of the Protein
- •B. Calculation of Differences in the Energy Change of the Protein
- •VI. ELECTRON TRANSFER RATES
- •A. Theory
- •B. Application
- •REFERENCES
- •Fumio Hirata and Hirofumi Sato
- •Shigeki Kato
- •A. Continuum Model
- •B. Simulations
- •C. Reference Interaction Site Model
- •A. Molecular Polarization in Neat Water*
- •B. Autoionization of Water*
- •C. Solvatochromism*
- •F. Tautomerization in Formamide*
- •IV. SUMMARY AND PROSPECTS
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Nucleic Acid Simulations
- •Alexander D. MacKerell, Jr.
- •Lennart Nilsson
- •D. DNA Phase Transitions
- •III. METHODOLOGICAL CONSIDERATIONS
- •A. Atomistic Models
- •B. Alternative Models
- •IV. PRACTICAL CONSIDERATIONS
- •A. Starting Structures
- •C. Production MD Simulation
- •D. Convergence of MD Simulations
- •WEB SITES OF INTEREST
- •REFERENCES
- •Membrane Simulations
- •Douglas J. Tobias
- •II. MOLECULAR DYNAMICS SIMULATIONS OF MEMBRANES
- •B. Force Fields
- •C. Ensembles
- •D. Time Scales
- •III. LIPID BILAYER STRUCTURE
- •A. Overall Bilayer Structure
- •C. Solvation of the Lipid Polar Groups
- •IV. MOLECULAR DYNAMICS IN MEMBRANES
- •A. Overview of Dynamic Processes in Membranes
- •B. Qualitative Picture on the 100 ps Time Scale
- •C. Incoherent Neutron Scattering Measurements of Lipid Dynamics
- •F. Hydrocarbon Chain Dynamics
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Appendix: Useful Internet Resources
- •B. Molecular Modeling and Simulation Packages
- •Index
376 |
Becker |
yielded nonlinear Arrhenius plots that resemble those seen experimentally. It also pointed to the presence of kinetic intermediates that are actually misfolded ‘‘traps’’ and not necessary steps for folding. On the other hand, the ‘‘locally connected’’ model resulted in significantly different kinetics. In one regime of the parameters the overall reaction rate was determined by the rate of going through a bottleneck region (in terms of the order parameter) that corresponds to the state of highest free energy. In other regimes, close to the glass transition, the rate was limited by search through misfolded states.
Another simple model of protein folding kinetics was suggested by Zwanzig [25]. This model assumes that the energy depends solely on the sequence and can be described as a simple function of the distance S between a given conformation and the native state. If N ‘‘parameters’’ (e.g., dihedral angles) characterize the native conformation, then S is the number of parameters in a given conformation that have non-native values. The energy in this model is defined as
ES SU εδSU |
(5) |
where S 1, 2, . . . , N and both U and ε are assumed to be positive. The positive U ensures a smooth funnel as the energy increases with increasing S, and the positive ε ensures an energy gap between S 0 and S 1. That is, the reaction coordinate is the similarity of a conformation to the native state. The model employs a gap in the energy spectrum, has large configuration entropy, and exhibits a free energy barrier between folded and partially folded states. The folding time in this model was estimated by means of a local thermodynamic equilibrium assumption followed by solving the master equation. It was found that the above set of rules leads to an energy landscape that has two basins, one corresponding to the native state and the other corresponding to an ensemble of partially folded states. Following a short equilibration time the overall kinetics are similar to those of fast-folding two-state systems. The folding time has a maximum near the folding transition temperature and can have a minimum at lower temperatures.
III. LATTICE MODELS
The current understanding of the protein folding process has benefited much from studies that focus on computer simulations of simplified lattice models. These studies try to construct as simple a model as possible that will capture some of the more important properties of the real polypeptide chain. Once such a model is defined it can be explored and studied at a level of detail that is hard to achieve with more realistic (and thus more complex) atomistic models.
In a lattice model the protein is represented as a ‘‘string of beads’’ threaded on a lattice (often denoted as a ‘‘self-avoiding walk’’ on a lattice). Each residue is positioned on a different grid point, and specific nearest-neighbor interactions, which depend on the residues involved, are defined. Once the model is defined the folding process is simulated by local Monte Carlo moves that change the position of the ‘‘beads’’ on the lattice until the chain reaches its lowest energy configuration. In many studies a simple square [20] or a cubic grid was used [26–28], although more complex lattices have also been employed [29,72]. Figure 2 illustrates a simple polypeptide chain with 27 amino acids (27-mer) folded on a 3 3 3 cubic lattice. All in all there are on the order of 1016 conformations of a 27-mer chain on an infinite cubic lattice. Due to an overall attraction between the residues (primarily of hydrophobic nature), the native state of the model protein is ‘‘col-
Protein Folding: Computational Approaches |
377 |
Figure 2 A low energy conformation of a 27-mer lattice model on a 3 3 3 cubic lattice. (Adapted from Ref. 11.)
lapsed’’ and can be fit into a 3 3 3 cube, which is fully occupied by the polypeptide chain. There are more than 100,000 ways to fit a 27-mer into this cube. The most stable conformation, which corresponds to the native state, is determined by the specific interactions dictated by amino acid sequence. Different sequences are likely to have different native conformations, even in the simplified lattice representation.
As discussed above, folding is driven by nonbonded interactions. In lattice models this is represented by ‘‘contact energies,’’ i.e., interactions between residues that are situated on adjacent (or nearest-neighbor) lattice sites but are not covalently bonded to each other. For example, since there are 28 nearest-neighbor contacts in the native structure of a 27-mer in a 3 3 3 cube, each conformation of this model can be characterized in terms of how many of these native contacts are correctly formed. Indeed, in most lattice models simple contact potentials are thus used to represent the effective energy of a given configuration. The combination of a simple model, which enables extensive enumeration of conformations, together with a simple ‘‘contact’’ energy function allows such model studies to determine the thermodynamics and dynamics of the system within a reasonable amount of computer time.
The ‘‘contact’’ energy E of a given conformation is typically calculated by summing the values of energies over all nonbonded contacts in the lattice,
E ε(Si, Sj)∆(ri rj) |
(6) |
neighbors |
|
where ri and rj denote the locations of residues i and j and ∆(ri rj) 0 unless residues i and j are on adjacent vertices of the lattice. The term ε(si, sj) indicates the nonbonded neighboring interaction between a residue of type si and a residue of type sj. These contact interactions are typically on the order of kB T.
Despite their simplicity, certainly compared to the all-atom potentials used in molecular dynamics studies, these contact energy functions enable the exploration of different interaction scenarios. This diversity is achieved by changing the heterogeneity of the sequence, by altering the number N of different types of ‘‘residues’’ that are being used. The most elementary lattice model involves only two types of monomers: hydrophobic
378 |
Becker |
monomers (H) and polar monomers (P). Such a model is often referred to as an HP model. In HP models, only nearest-neighbor contacts of the type HH have a stabilizing contribution ε 0 to the overall energy, whereas all other contact types, whether HP or PP, are considered neutral, contributing zero energy [18,30]. It was found that most HP model sequences have rugged energy landscapes with many kinetic traps [20]. In this case, folding kinetics involve at least two stages: a fast collapse to compact non-native conformations followed by a slow barrier-climbing process to escape traps and reattempt to fold [18,30,31].
In this respect, the HP model is unlike many real proteins that appear to have smoother landscapes with few traps, resulting in fast folding and two-state kinetics [11,21]. One way to make the model more proteinlike is to increase its heterogeneity. Another way is to introduce a specific bias toward the native state, resulting in a variant model denoted as the HP model [20]. For an HP sequence with a unique native structure, the HP energy given by Eq. (6) is defined by a negative ε value for each native HH contact, by ε 0 for each native HP or PP contact, and by ε for all non-native contacts ( ε 0). As a result the HP and its corresponding HP sequences share the same unique native structure, with the only difference being that in the HP energy function nonnative contacts have unfavorable energies. This extra interaction in the HP model is intended to capture, in a very simple way and without additional parameters, more energetic specificity than the original HP model. The HP model is similar in principle to the ‘‘Go model,’’ which adds an explicit biasing potential to the native structure, ensuring that this structure becomes the global minimum of the whole energy landscape [32,33].
Agreement with the real protein folding process can be obtained by increasing the heterogeneity of the lattice model, using multiple-letter codes and sequence design [26,27,34–37]. A model with 20 different residue types (N 20) is expected to have heterogeneity similar to that of a real protein. In such models the energy is taken from a range of interaction energies, ensuring an overall net attraction. For example, contact energies between adjacent residues may be chosen to have an average of 2kB T with an effective deviation of about kB T, ensuring that the stable native contacts are among the most stable nonbonded interactions, with an average energy of about 3kB T [26]. In other studies the interactions were selected randomly from a continuous range of interactions with special terms to prevent the chain from crossing over itself [34]. Overall, these more complex models show kinetic pathways that converge into folding funnels, guiding the folding to a unique stable native conformation.
A convenient property of all lattice models is the ability to use the ‘‘fraction of native contacts’’ Q as a reaction coordinate or progress variable to describe the folding process. The variable is the ratio between the number of ‘‘native contacts’’ that are observed in any given conformation of the chain and the maximum number of possible native contacts. Thus, Q varies from a value near zero for the highly denatured conformation to unity for the native state. For the 27-mer in a 3 3 3 cube described above, there are 156 different possible contacts and 28 native contacts. For a 125-mer there are 3782 possible contacts and 176 native contacts in a 5 5 5 cube [11]. Although there are many more ‘‘native contacts’’ in a real protein, it is expected that even there a smaller subset of contacts can be used to define the native conformation in a way similar to the Q variable in lattice models. The progress variable Q has been very useful for visualizing the average effective energy and the configuration entropy of the polypeptide chain as it folds from the denatured to the native state. The resulting values, which are averaged over many
Protein Folding: Computational Approaches |
379 |
folding simulations, depend as expected on the temperature at which the simulation is performed.
Like real proteins, lattice models have a narrow optimal temperature range in which the folding process is most efficient. At temperatures that are too low, folding may be extremely slow because the chain cannot escape from local minima. At very high temperatures the native state is not stable, and the number of accessible conformations is so large that the folding problem cannot be solved. Indeed, analysis of a low temperature average effective energy/entropy surface calculated for the 27-mer model on a cubic lattice showed that the conformation space accessible to the protein is limited, even at low Q (unfolded conformations) [11]. At such temperatures the polypeptide chain collapses to a misfolded globular state with a Q value near that of the random coil. The change in configuration entropy on collapse is small enough that its destabilizing contribution to the free energy is compensated for by the burial of hydrophobic groups, even in the absence of native contacts. At this temperature the average effective energy surface as a function of Q is ‘‘rough’’ due to the presence of energy barriers to reorganization within the collapsed state. The transition region at these temperatures was found to be close to the native state (Q 0.7–0.9).
At high folding temperatures, on the other hand, the average effective energy/entropy surface resulting from lattice simulations indicates a different scenario [11]. Early in folding (e.g., for Q 0.2), the surface is very broad, indicating that most of the unfolded configurations are accessible. As the entropy decreases with the increase of Q to unity for the native structure, the surface becomes narrower, resulting in an overall funnel structure for the average effective energy surface. Thus, regardless of the initial conformation, the molecule moves downward in energy toward the native state as the number of stabilizing contacts increases. Despite the smoothness of the effective energy surface, a transition state barrier in the free energy profile can exist even for the 27-mer at relatively high temperatures. The free energy transition barrier corresponds to an entropy ‘‘bottleneck’’ that arises from a reduction of the chain entropy at large Q values (the number of accessible configurations decreases rapidly as Q approaches the native state). In general, it is the balance between the rate of decrease of the energy and that of the entropy that determines whether there is a free energy barrier and where it occurs. A different balance between the two contributions to the free energy could move the transition barrier in the free energy to smaller or larger Q values.
To conclude, although the models used in lattice simulations are very simplified, the results provide general information on possible protein folding scenarios, albeit not on the detailed behavior of specific proteins, which would require more complex models and more accurate potentials. The contribution made by these simulations is that they enable an analysis of the structures, energetics, and dynamics of folding reactions at a level of detail not accessible to experiment.
IV. OFF-LATTICE MINIMALIST MODELS
Despite their contribution to the understanding of protein folding, the correspondence between lattice models and real proteins is still very limited. The first step toward making such models more realistic is to remove the lattice and study off-lattice minimalist models. Simple off-lattice models of proteins can have proteinlike shapes with well-defined sec-
380 |
Becker |
ondary structure elements, as in real proteins. In addition, the continuum character of the conformation space allows for the native state to become a basin rather than a single minimum.
An off-lattice minimalist model that has been extensively studied is the 46-mer β- barrel model, which has a native state characterized by a four-stranded β-barrel. The first to introduce this model were Honeycutt and Thirumalai [38], who used a three-letter code to describe the residues. In this model monomers are labeled hydrophobic (H), hydrophilic (P), or neutral (N) and the sequence that was studied is (H)9(N)3(PH)4(N)3(H)9(N)3(PH)5P. That is, two strands are hydrophobic (residues 1–9 and 24–32) and the other two strands contain alternating H and P beads (residues 12–20 and 36–46). The four strands are connected by neutral three-residue bends. Figure 3 depicts the global minimum conformation of the 46-mer β-barrel model. This β-barrel model was studied by several researchers [38–41], and additional off-lattice minimalist models of α-helical [42] and β-sheet proteins [43] were also investigated.
The energy function of the off-lattice three-letter model is much more elaborate than those used in lattice models [Eq. (6)]. Similar to all-atom energy functions, it includes both bonded and nonbonded energy terms. Bond, bond angle, and dihedral angle energy terms give the model flexibility along the bonded structure while a nonbonded van der Waals interaction term is used to mimic the hydrophobic/hydrophilic character of the different monomer types.
|
|
|
|
|
R |
|
|
|
R |
|
|
|
|
E {bonds} {angles} {dihedral} |
i j 3 |
4εS1 |
|
σ |
S2 |
|
6 |
|
(7) |
||||
|
|
|
12 |
|
σ |
|
|
where the bonded energy terms are similar to those used in all-atom models (see Chapter 2), and the parameters S1 and S2 in the van der Waals term distinguishes between the different types of beads. There are attractive interactions between all HH residue pairs (S1 1 and S2 1), repulsion interaction between all PP and PH pairs (S1 2/3 and S2 1), and only excluded volume interactions between the pairs PN, HN, and NN (S1 1 and S2 0).
Studies of this model showed that the underlying energy landscape is very rough, probably due to the long-range and nonspecific character of the interactions. To overcome the roughness and smooth the surface, a ‘‘Go model’’-like variant of the three-letter model was introduced [15]. In this variant the only attractive interactions are those between monomers that form native contacts, i.e., contacts found in the native β-barrel. An analysis of the native β-barrel structure yielded 47 pairs of monomers within a distance of 1.167σ, most of them between hydrophobic monomers. All other pairs have only the repulsive van der Waals term, which accounts for excluded volume. It was shown that this modification removes the roughness that is introduced by the non-native contacts, allowing the sequence to recover a nearly optimal folding behavior.
Recently a different modification of the classic 46-mer β-barrel model was suggested. In this case a single side group, represented by a bead that may be hydrophilic or hydrophobic, was added to the model [44]. Molecular dynamics and quenching simulations showed that the nature and the location of the single side group bead influences both the structure at the global minimum of internal energy and the relaxation process by which the system finds its minima. The most drastic effects occur with a hydrophobic side group in the middle of a sequence of hydrophobic residues.
Protein Folding: Computational Approaches |
381 |
Figure 3 The minimum energy conformation of the off-lattice 46-mer β-barrel model. Hydrophobic residues are in gray, hydrophilic residues in black, and neutral residues are white. (Adapted from Ref. 44.)
382 |
Becker |
V.ATOMISTIC MODELS
The highest level of detail in theoretical studies of protein folding involves the use of detailed atomic models of the protein and the environment. Such models have been discussed in depth in previous chapters of this book. The main limitation of atomic models is that they are computationally much more demanding, a fact that restricts the number of calculations that can be performed with them. In terms of using atomic models for protein folding it is possible to identify two main approaches. The first approach is to study the folding process by performing explicit molecular dynamics simulations of protein unfolding and folding. The other approach is to use conformation sampling techniques to characterize the underlying energy and free energy landscapes.
A. Unfolding/Folding Simulations
The main problem facing the attempt to study room temperature folding by direct molecular dynamics simulations of an all-atom model is that of time scales. Whereas protein folding takes place on the millisecond time scale and up, the time scale accessible to molecular dynamics is on the order of nanoseconds. Recently, using a massively parallel computer, Duan and Kollman [45] performed a 1 µs simulation of the villin headgroup subdomain protein, a 36-residue peptide, in water. Starting from a fully unfolded extended state, including approximately 3000 water molecules, the simulation was able to follow the dynamics of this protein as it adopted a partially folded conformation. Such long-time- scale molecular dynamics (MD) simulations require exceptionally large computational resources. Furthermore, the usefulness of these simulations is limited by the fact that they cannot provide the level of statistics required for studying folding kinetics and thermodynamics. Another problem associated with a direct MD approach to the folding process is that it is unclear how well the MD potential energy functions used fare in the unfolded regime.
Thus, instead of using molecular dynamics to simulate the folding process, many researchers turned their attention to using MD simulations as a tool for studying the inverse process of protein unfolding from the native state. It is hoped, though not proven, that analysis of the unfolding process will contribute to the understanding of the folding process. To speed up the unfolding reaction, which has a significant activation barrier, these studies are typically performed in the high temperature range of 400–600 K. A simple Arrhenius-type calculation shows that the unfolding reaction for a protein that denatures experimentally at 325 K and has an activation barrier for unfolding of 20 kcal/mol is about six orders of magnitude faster at 600 K than at 325 K. Even if the Arrhenius equation is not exact for unfolding reactions, this argument indicates that elevating the temperature reduces the time for unfolding from the experimentally observed millisecond range to the nanosecond time scale, which is accessible to molecular dynamics simulations.
The details of many all-atom unfolding simulation studies have been summarized in several reviews [17,46,47]. These studies include unfolding simulations of α-lactalbumin, lysozyme, bovine pancreatic trypsin inhibitor (BPTI), barnase, apomyoglobin, β-lacta- mase, and more. The advantage of these simulations is that they provide much more detailed information than is available from experiment. However, it should be stressed that there is still only limited evidence that the pathways and intermediates observed in the nanosecond unfolding simulations correlate with the intermediates observed in the actual experiments.