- •Foreword
- •Preface
- •Contents
- •Introduction
- •Oren M. Becker
- •Alexander D. MacKerell, Jr.
- •Masakatsu Watanabe*
- •III. SCOPE OF THE BOOK
- •IV. TOWARD A NEW ERA
- •REFERENCES
- •Atomistic Models and Force Fields
- •Alexander D. MacKerell, Jr.
- •II. POTENTIAL ENERGY FUNCTIONS
- •D. Alternatives to the Potential Energy Function
- •III. EMPIRICAL FORCE FIELDS
- •A. From Potential Energy Functions to Force Fields
- •B. Overview of Available Force Fields
- •C. Free Energy Force Fields
- •D. Applicability of Force Fields
- •IV. DEVELOPMENT OF EMPIRICAL FORCE FIELDS
- •B. Optimization Procedures Used in Empirical Force Fields
- •D. Use of Quantum Mechanical Results as Target Data
- •VI. CONCLUSION
- •REFERENCES
- •Dynamics Methods
- •Oren M. Becker
- •Masakatsu Watanabe*
- •II. TYPES OF MOTIONS
- •IV. NEWTONIAN MOLECULAR DYNAMICS
- •A. Newton’s Equation of Motion
- •C. Molecular Dynamics: Computational Algorithms
- •A. Assigning Initial Values
- •B. Selecting the Integration Time Step
- •C. Stability of Integration
- •VI. ANALYSIS OF DYNAMIC TRAJECTORIES
- •B. Averages and Fluctuations
- •C. Correlation Functions
- •D. Potential of Mean Force
- •VII. OTHER MD SIMULATION APPROACHES
- •A. Stochastic Dynamics
- •B. Brownian Dynamics
- •VIII. ADVANCED SIMULATION TECHNIQUES
- •A. Constrained Dynamics
- •C. Other Approaches and Future Direction
- •REFERENCES
- •Conformational Analysis
- •Oren M. Becker
- •II. CONFORMATION SAMPLING
- •A. High Temperature Molecular Dynamics
- •B. Monte Carlo Simulations
- •C. Genetic Algorithms
- •D. Other Search Methods
- •III. CONFORMATION OPTIMIZATION
- •A. Minimization
- •B. Simulated Annealing
- •IV. CONFORMATIONAL ANALYSIS
- •A. Similarity Measures
- •B. Cluster Analysis
- •C. Principal Component Analysis
- •REFERENCES
- •Thomas A. Darden
- •II. CONTINUUM BOUNDARY CONDITIONS
- •III. FINITE BOUNDARY CONDITIONS
- •IV. PERIODIC BOUNDARY CONDITIONS
- •REFERENCES
- •Internal Coordinate Simulation Method
- •Alexey K. Mazur
- •II. INTERNAL AND CARTESIAN COORDINATES
- •III. PRINCIPLES OF MODELING WITH INTERNAL COORDINATES
- •B. Energy Gradients
- •IV. INTERNAL COORDINATE MOLECULAR DYNAMICS
- •A. Main Problems and Historical Perspective
- •B. Dynamics of Molecular Trees
- •C. Simulation of Flexible Rings
- •A. Time Step Limitations
- •B. Standard Geometry Versus Unconstrained Simulations
- •VI. CONCLUDING REMARKS
- •REFERENCES
- •Implicit Solvent Models
- •II. BASIC FORMULATION OF IMPLICIT SOLVENT
- •A. The Potential of Mean Force
- •III. DECOMPOSITION OF THE FREE ENERGY
- •A. Nonpolar Free Energy Contribution
- •B. Electrostatic Free Energy Contribution
- •IV. CLASSICAL CONTINUUM ELECTROSTATICS
- •A. The Poisson Equation for Macroscopic Media
- •B. Electrostatic Forces and Analytic Gradients
- •C. Treatment of Ionic Strength
- •A. Statistical Mechanical Integral Equations
- •VI. SUMMARY
- •REFERENCES
- •Steven Hayward
- •II. NORMAL MODE ANALYSIS IN CARTESIAN COORDINATE SPACE
- •B. Normal Mode Analysis in Dihedral Angle Space
- •C. Approximate Methods
- •IV. NORMAL MODE REFINEMENT
- •C. Validity of the Concept of a Normal Mode Important Subspace
- •A. The Solvent Effect
- •B. Anharmonicity and Normal Mode Analysis
- •VI. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Free Energy Calculations
- •Thomas Simonson
- •II. GENERAL BACKGROUND
- •A. Thermodynamic Cycles for Solvation and Binding
- •B. Thermodynamic Perturbation Theory
- •D. Other Thermodynamic Functions
- •E. Free Energy Component Analysis
- •III. STANDARD BINDING FREE ENERGIES
- •IV. CONFORMATIONAL FREE ENERGIES
- •A. Conformational Restraints or Umbrella Sampling
- •B. Weighted Histogram Analysis Method
- •C. Conformational Constraints
- •A. Dielectric Reaction Field Approaches
- •B. Lattice Summation Methods
- •VI. IMPROVING SAMPLING
- •A. Multisubstate Approaches
- •B. Umbrella Sampling
- •C. Moving Along
- •VII. PERSPECTIVES
- •REFERENCES
- •John E. Straub
- •B. Phenomenological Rate Equations
- •II. TRANSITION STATE THEORY
- •A. Building the TST Rate Constant
- •B. Some Details
- •C. Computing the TST Rate Constant
- •III. CORRECTIONS TO TRANSITION STATE THEORY
- •A. Computing Using the Reactive Flux Method
- •B. How Dynamic Recrossings Lower the Rate Constant
- •IV. FINDING GOOD REACTION COORDINATES
- •A. Variational Methods for Computing Reaction Paths
- •B. Choice of a Differential Cost Function
- •C. Diffusional Paths
- •VI. HOW TO CONSTRUCT A REACTION PATH
- •A. The Use of Constraints and Restraints
- •B. Variationally Optimizing the Cost Function
- •VII. FOCAL METHODS FOR REFINING TRANSITION STATES
- •VIII. HEURISTIC METHODS
- •IX. SUMMARY
- •ACKNOWLEDGMENT
- •REFERENCES
- •Paul D. Lyne
- •Owen A. Walsh
- •II. BACKGROUND
- •III. APPLICATIONS
- •A. Triosephosphate Isomerase
- •B. Bovine Protein Tyrosine Phosphate
- •C. Citrate Synthase
- •IV. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Jeremy C. Smith
- •III. SCATTERING BY CRYSTALS
- •IV. NEUTRON SCATTERING
- •A. Coherent Inelastic Neutron Scattering
- •B. Incoherent Neutron Scattering
- •REFERENCES
- •Michael Nilges
- •II. EXPERIMENTAL DATA
- •A. Deriving Conformational Restraints from NMR Data
- •B. Distance Restraints
- •C. The Hybrid Energy Approach
- •III. MINIMIZATION PROCEDURES
- •A. Metric Matrix Distance Geometry
- •B. Molecular Dynamics Simulated Annealing
- •C. Folding Random Structures by Simulated Annealing
- •IV. AUTOMATED INTERPRETATION OF NOE SPECTRA
- •B. Automated Assignment of Ambiguities in the NOE Data
- •C. Iterative Explicit NOE Assignment
- •D. Symmetrical Oligomers
- •VI. INFLUENCE OF INTERNAL DYNAMICS ON THE
- •EXPERIMENTAL DATA
- •VII. STRUCTURE QUALITY AND ENERGY PARAMETERS
- •VIII. RECENT APPLICATIONS
- •REFERENCES
- •II. STEPS IN COMPARATIVE MODELING
- •C. Model Building
- •D. Loop Modeling
- •E. Side Chain Modeling
- •III. AB INITIO PROTEIN STRUCTURE MODELING METHODS
- •IV. ERRORS IN COMPARATIVE MODELS
- •VI. APPLICATIONS OF COMPARATIVE MODELING
- •VII. COMPARATIVE MODELING IN STRUCTURAL GENOMICS
- •VIII. CONCLUSION
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Roland L. Dunbrack, Jr.
- •II. BAYESIAN STATISTICS
- •A. Bayesian Probability Theory
- •B. Bayesian Parameter Estimation
- •C. Frequentist Probability Theory
- •D. Bayesian Methods Are Superior to Frequentist Methods
- •F. Simulation via Markov Chain Monte Carlo Methods
- •III. APPLICATIONS IN MOLECULAR BIOLOGY
- •B. Bayesian Sequence Alignment
- •IV. APPLICATIONS IN STRUCTURAL BIOLOGY
- •A. Secondary Structure and Surface Accessibility
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Computer Aided Drug Design
- •Alexander Tropsha and Weifan Zheng
- •IV. SUMMARY AND CONCLUSIONS
- •REFERENCES
- •Oren M. Becker
- •II. SIMPLE MODELS
- •III. LATTICE MODELS
- •B. Mapping Atomistic Energy Landscapes
- •C. Mapping Atomistic Free Energy Landscapes
- •VI. SUMMARY
- •REFERENCES
- •Toshiko Ichiye
- •II. ELECTRON TRANSFER PROPERTIES
- •B. Potential Energy Parameters
- •IV. REDOX POTENTIALS
- •A. Calculation of the Energy Change of the Redox Site
- •B. Calculation of the Energy Changes of the Protein
- •B. Calculation of Differences in the Energy Change of the Protein
- •VI. ELECTRON TRANSFER RATES
- •A. Theory
- •B. Application
- •REFERENCES
- •Fumio Hirata and Hirofumi Sato
- •Shigeki Kato
- •A. Continuum Model
- •B. Simulations
- •C. Reference Interaction Site Model
- •A. Molecular Polarization in Neat Water*
- •B. Autoionization of Water*
- •C. Solvatochromism*
- •F. Tautomerization in Formamide*
- •IV. SUMMARY AND PROSPECTS
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Nucleic Acid Simulations
- •Alexander D. MacKerell, Jr.
- •Lennart Nilsson
- •D. DNA Phase Transitions
- •III. METHODOLOGICAL CONSIDERATIONS
- •A. Atomistic Models
- •B. Alternative Models
- •IV. PRACTICAL CONSIDERATIONS
- •A. Starting Structures
- •C. Production MD Simulation
- •D. Convergence of MD Simulations
- •WEB SITES OF INTEREST
- •REFERENCES
- •Membrane Simulations
- •Douglas J. Tobias
- •II. MOLECULAR DYNAMICS SIMULATIONS OF MEMBRANES
- •B. Force Fields
- •C. Ensembles
- •D. Time Scales
- •III. LIPID BILAYER STRUCTURE
- •A. Overall Bilayer Structure
- •C. Solvation of the Lipid Polar Groups
- •IV. MOLECULAR DYNAMICS IN MEMBRANES
- •A. Overview of Dynamic Processes in Membranes
- •B. Qualitative Picture on the 100 ps Time Scale
- •C. Incoherent Neutron Scattering Measurements of Lipid Dynamics
- •F. Hydrocarbon Chain Dynamics
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Appendix: Useful Internet Resources
- •B. Molecular Modeling and Simulation Packages
- •Index
Comparative Protein Structure Modeling |
285 |
is that constraints or restraints derived from a number of different sources can easily be added to the homology derived restraints. For example, restraints could be provided by rules for secondary structure packing [84], analyses of hydrophobicity [85] and correlated mutations [86], empirical potentials of mean force [87], nuclear magnetic resonance (NMR) experiments [88], cross-linking experiments, fluorescence spectroscopy, image reconstruction in electron microscopy, site-directed mutagenesis [89], intuition, etc. In this way, a comparative model, especially in the difficult cases, could be improved by making it consistent with available experimental data and/or with more general knowledge about protein structure.
D. Loop Modeling
In comparative modeling, target sequences often have inserted residues relative to the template structures or have regions that are structurally different from the corresponding regions in the templates. Thus, no structural information about these inserted or conformationally variable segments can be extracted from the template structures. These regions frequently correspond to surface loops. Loops often play an important role in defining the functional specificity of a given protein framework, forming the active and binding sites. The accuracy of loop modeling is a major factor determining the usefulness of comparative models in applications such as ligand docking. Loop modeling can be seen as a mini protein folding problem. The correct conformation of a given segment of a polypeptide chain has to be calculated mainly from the sequence of the segment itself. However, loops are generally too short to provide sufficient information about their local fold. Even identical decapeptides do not always have the same conformation in different proteins [90,91]. Some additional restraints are provided by the core anchor regions that span the loop and by the structure of the rest of a protein that cradles the loop. Although many loop modeling methods have been described, it is still not possible to model correctly and with high confidence loops longer than approximately eight residues [239].
There are two main classes of loop modeling methods: (1) the database search approaches, where a segment that fits on the anchor core regions is found in a database of all known protein structures [62,94], and (2) the conformational search approaches [95– 97]. There are also methods that combine these two approaches [92,98,99].
The database search approach to loop modeling is accurate and efficient when a database of specific loops is created to address the modeling of the same class of loops, such as β-hairpins [100], or loops on a specific fold, such as the hypervariable regions in the immunoglobulin fold [94,101]. For example, an analysis of the hypervariable immunoglobulin regions resulted in a series of rules that allowed a very high accuracy of loop prediction in other members of the family. These rules were based on the small number of conformations for each loop and the dependence of the loop conformation on its length and certain key residues. There have been attempts to classify loop conformations into more general categories, thus extending the applicability of the database search approach to more cases [102–105]. However, the database methods are limited by the fact that the number of possible conformations increases exponentially with the length of a loop. As a result, only loops up to four to seven residues long have most of their conceivable conformations present in the database of known protein structures [106,107]. Even according to the more optimistic estimate, approximately 30% and 60% of all the possible eightand nine-residue loop conformations, respectively, are missing from the database [106]. This is made even worse by the requirement for an overlap of at least one residue
286 Fiser et al.
between the database fragment and the anchor core regions, which means that the modeling of a five-residue insertion requires at least a seven-residue fragment from the database [70]. Despite the rapid growth of the database of known structures, there is no possibility of covering most of the conformations of a nine-residue segment in the foreseeable future. On the other hand, most of the insertions in a family of homologous proteins are shorter than nine residues [108,239].
To overcome the limitations of the database search methods, conformational search methods were developed [95,96,109]. There are many such methods, exploiting different protein representations, objective function terms, and optimization or enumeration algorithms. The search algorithms include the minimum perturbation method [97], molecular dynamics simulations [92,110,111], genetic algorithms [112], Monte Carlo and simulated annealing [113,114], multiple copy simultaneous search [115–117], self-consistent field optimization [118], and an enumeration based on the graph theory [119].
We now describe a new loop modeling protocol in the conformational search class [239]. It is implemented in the program MODELLER (Table 1). The modeling procedure consists of optimizing the positions of all non-hydrogen atoms of a loop with respect to an objective function that is a sum of many spatial restraints. Many different combinations of various restraints were explored. The best set of restraints includes the bond length, bond angle, and improper dihedral angle terms from the CHARMM22 force field [80,81], statistical preferences for the main chain and side chain dihedral angles [31], and statistical preferences for non-bonded contacts that depend on the two atom types, their distance through space, and separation in sequence [120]. The objective function was optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the loop prediction corresponds to the lowest energy conformation out of the 500 independent optimizations. The algorithm allows straightforward incorporation of additional spatial restraints, including those provided by template fragments, disulfide bonds, and ligand binding sites. To simulate comparative modeling problems, the loop modeling procedure was evaluated by predicting loops of known structure in only approximately correct environments. Such environments were obtained by distorting the anchor regions corresponding to the three residues at either end of the loop and all the
˚ ˚
atoms within 10 A of the native loop conformation for up to 2–3 A by molecular dynamics simulations. In the case of five-residue loops in the correct environments, the average
˚
error was 0.6 A, as measured by local superposition of the loop main chain atoms alone (C, N, Cα, O). In the case of eight-residue loops in the correct environments, 90% of the
˚ ˚
loops had less than 2 A main chain RMS error, with an average of less than 1.2 A (Fig. 6).
E. Side Chain Modeling
As for loops, side chain conformation is predicted from similar structures and from steric or energy considerations [5,121]. The geometry of disulfide bridges is modeled from disulfide bridges in protein structures in general [122,123] and from equivalent disulfide bridges in related structures [79]. Modeling the stability and conformation of point mutations by free energy perturbation simulations is not discussed here [124–127].
Vasquez [121] reviewed and commented on various approaches to side chain modeling. The importance of two effects on side chain conformation was emphasized. The first effect was the coupling between the main chain and side chains, and the second effect was the continuous nature of the distributions of side chain dihedral angles; for example,
Comparative Protein Structure Modeling |
287 |
Figure 6 |
˚ |
Oxidoreductase (2nac), loop residues 28–35. Anchor distortion 1.2 A. Sample models |
of varying accuracy for an eight-residue loop in an approximately correct protein environment. The calculated loops (shaded) are compared with the X-ray structure (black). Three levels of accuracy
˚
are illustrated: High accuracy corresponding to the backbone RMSD 1 A (top), medium accuracy
˚
corresponding to the backbone RMSD 2 A (middle), and low accuracy corresponding to the
˚
backbone RMSD 2 A (bottom). The panels on the left compare the loop backbone conformations after least-squares superposition of the complete protein structure. The panels on the right compare the loop backbone conformations after local superposition of the loops. The RMSD values are quoted for the main chain atoms only. The fraction of the loops modeled at each accuracy level is given in the rightmost column. The figure was prepared using MOLSCRIPT [236].
5–30% of side chains in crystal structures are significantly different from their rotamer conformations [128] and 6% of the χ1 or χ2 values are not within 40° of any rotamer conformation [129]. Both effects appear to be important when correlating packing energies and stability [130]. The correct energetics may be obtained for the incorrect reasons; i.e., the side chains adopt distorted conformations to compensate for the rigidity of the backbone. Correspondingly, the backbone shifts may hinder the use of these methods when the template structures are related at less than 50% sequence identity [131]. This is consis-
288 Fiser et al.
tent with the X-ray structure of a variant of λ repressor, which reveals that the protein accommodates the potentially disruptive residues with shifts in its α-helical arrangement and with only limited changes in side chain orientations [132]. Some attempts to include backbone flexibility in side chain modeling have been described [118,133,134], but the methods are not yet generally applicable.
Significant correlations were found between side chain dihedral angle probabilities and backbone Φ, Ψ values [129,135]. These correlations go beyond the dependence of side chain conformation on the secondary structure [136]. For example, the preferred rotamers can vary within the same secondary structure, with the changes in the Φ, Ψ dihedral angles as small as 20° [135]. Since these changes are smaller than the differences between closely related homologs, the prediction of the side chain conformation generally cannot be uncoupled from backbone prediction. This partly explains why the conformation of equivalent side chains in homologous structures is useful in side chain modeling [31]. A backbone-dependent rotamer library for amino acid side chains was developed and used to construct side chain conformations from main chain coordinates [135]. This automated method first places the side chains according to the rotamer library and then removes steric clashes by combinatorial energy minimization. It was also demonstrated that simple arguments based on conformational analysis could account for many features of the observed dependence of the side chain rotamers on the backbone [135]. Recently, the main chain–dependent side chain rotamer library was recalculated and extensively evaluated [129] (Table 1). The accuracy of the method was 82% for the χ1 dihedral angle and 72% for both χ1 and χ2 dihedral angles when the backbones of templates in the range from 30% to 90% sequence identity were used; a prediction was deemed correct when it was within 40° of the target crystal structure value.
Chung and Subbiah [131,137] gave an elegant structural explanation for the rapid decrease in the conservation of side chain packing as the sequence identity decreases below 30%. Although the fold is maintained, the pattern of side chain interactions is generally lost in this range of sequence similarity [138]. Two sets of computations were done for two sample protein sequences: The side chain conformation was predicted by maximizing packing on the fixed native backbone and on a fixed backbone with approxi-
˚ ˚
mately 2 A RMSD from the native backbone; the 2 A RMSD generally corresponds to the differences between the conserved cores of two proteins related at 25–30% sequence identity. The side chain predictions based on the two kinds of backbone turned out to be unrelated. Thus, inasmuch as packing reflects the true laws determining side chain conformation, a backbone with less than 30% sequence identity to the sequence being modeled is no longer sufficiently restraining to result in the correct packing of the buried side chains.
The solvation term is important for the modeling of exposed side chains [139– 142]. It was also demonstrated that treating hydrogen bonds explicitly could significantly improve side chain prediction [135,143]. Calculations that do not take into account the solvent, either implicitly or explicitly, introduce errors into the hydrogen-bonding patterns even in the core regions of a protein [142]. Residues with zero solvent accessibility area can still have a significant interaction energy with the solvent atoms [144].
A recent survey analyzed the accuracy of three different side chain prediction methods [134]. These methods were tested by predicting side chain conformations on near-
˚
native protein backbones with 4 A RMSD to the native structures. The three methods included the packing of backbone-dependent rotamers [129], the self-consistent meanfield approach to positioning rotamers based on their van der Waals interactions [145],