- •Foreword
- •Preface
- •Contents
- •Introduction
- •Oren M. Becker
- •Alexander D. MacKerell, Jr.
- •Masakatsu Watanabe*
- •III. SCOPE OF THE BOOK
- •IV. TOWARD A NEW ERA
- •REFERENCES
- •Atomistic Models and Force Fields
- •Alexander D. MacKerell, Jr.
- •II. POTENTIAL ENERGY FUNCTIONS
- •D. Alternatives to the Potential Energy Function
- •III. EMPIRICAL FORCE FIELDS
- •A. From Potential Energy Functions to Force Fields
- •B. Overview of Available Force Fields
- •C. Free Energy Force Fields
- •D. Applicability of Force Fields
- •IV. DEVELOPMENT OF EMPIRICAL FORCE FIELDS
- •B. Optimization Procedures Used in Empirical Force Fields
- •D. Use of Quantum Mechanical Results as Target Data
- •VI. CONCLUSION
- •REFERENCES
- •Dynamics Methods
- •Oren M. Becker
- •Masakatsu Watanabe*
- •II. TYPES OF MOTIONS
- •IV. NEWTONIAN MOLECULAR DYNAMICS
- •A. Newton’s Equation of Motion
- •C. Molecular Dynamics: Computational Algorithms
- •A. Assigning Initial Values
- •B. Selecting the Integration Time Step
- •C. Stability of Integration
- •VI. ANALYSIS OF DYNAMIC TRAJECTORIES
- •B. Averages and Fluctuations
- •C. Correlation Functions
- •D. Potential of Mean Force
- •VII. OTHER MD SIMULATION APPROACHES
- •A. Stochastic Dynamics
- •B. Brownian Dynamics
- •VIII. ADVANCED SIMULATION TECHNIQUES
- •A. Constrained Dynamics
- •C. Other Approaches and Future Direction
- •REFERENCES
- •Conformational Analysis
- •Oren M. Becker
- •II. CONFORMATION SAMPLING
- •A. High Temperature Molecular Dynamics
- •B. Monte Carlo Simulations
- •C. Genetic Algorithms
- •D. Other Search Methods
- •III. CONFORMATION OPTIMIZATION
- •A. Minimization
- •B. Simulated Annealing
- •IV. CONFORMATIONAL ANALYSIS
- •A. Similarity Measures
- •B. Cluster Analysis
- •C. Principal Component Analysis
- •REFERENCES
- •Thomas A. Darden
- •II. CONTINUUM BOUNDARY CONDITIONS
- •III. FINITE BOUNDARY CONDITIONS
- •IV. PERIODIC BOUNDARY CONDITIONS
- •REFERENCES
- •Internal Coordinate Simulation Method
- •Alexey K. Mazur
- •II. INTERNAL AND CARTESIAN COORDINATES
- •III. PRINCIPLES OF MODELING WITH INTERNAL COORDINATES
- •B. Energy Gradients
- •IV. INTERNAL COORDINATE MOLECULAR DYNAMICS
- •A. Main Problems and Historical Perspective
- •B. Dynamics of Molecular Trees
- •C. Simulation of Flexible Rings
- •A. Time Step Limitations
- •B. Standard Geometry Versus Unconstrained Simulations
- •VI. CONCLUDING REMARKS
- •REFERENCES
- •Implicit Solvent Models
- •II. BASIC FORMULATION OF IMPLICIT SOLVENT
- •A. The Potential of Mean Force
- •III. DECOMPOSITION OF THE FREE ENERGY
- •A. Nonpolar Free Energy Contribution
- •B. Electrostatic Free Energy Contribution
- •IV. CLASSICAL CONTINUUM ELECTROSTATICS
- •A. The Poisson Equation for Macroscopic Media
- •B. Electrostatic Forces and Analytic Gradients
- •C. Treatment of Ionic Strength
- •A. Statistical Mechanical Integral Equations
- •VI. SUMMARY
- •REFERENCES
- •Steven Hayward
- •II. NORMAL MODE ANALYSIS IN CARTESIAN COORDINATE SPACE
- •B. Normal Mode Analysis in Dihedral Angle Space
- •C. Approximate Methods
- •IV. NORMAL MODE REFINEMENT
- •C. Validity of the Concept of a Normal Mode Important Subspace
- •A. The Solvent Effect
- •B. Anharmonicity and Normal Mode Analysis
- •VI. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Free Energy Calculations
- •Thomas Simonson
- •II. GENERAL BACKGROUND
- •A. Thermodynamic Cycles for Solvation and Binding
- •B. Thermodynamic Perturbation Theory
- •D. Other Thermodynamic Functions
- •E. Free Energy Component Analysis
- •III. STANDARD BINDING FREE ENERGIES
- •IV. CONFORMATIONAL FREE ENERGIES
- •A. Conformational Restraints or Umbrella Sampling
- •B. Weighted Histogram Analysis Method
- •C. Conformational Constraints
- •A. Dielectric Reaction Field Approaches
- •B. Lattice Summation Methods
- •VI. IMPROVING SAMPLING
- •A. Multisubstate Approaches
- •B. Umbrella Sampling
- •C. Moving Along
- •VII. PERSPECTIVES
- •REFERENCES
- •John E. Straub
- •B. Phenomenological Rate Equations
- •II. TRANSITION STATE THEORY
- •A. Building the TST Rate Constant
- •B. Some Details
- •C. Computing the TST Rate Constant
- •III. CORRECTIONS TO TRANSITION STATE THEORY
- •A. Computing Using the Reactive Flux Method
- •B. How Dynamic Recrossings Lower the Rate Constant
- •IV. FINDING GOOD REACTION COORDINATES
- •A. Variational Methods for Computing Reaction Paths
- •B. Choice of a Differential Cost Function
- •C. Diffusional Paths
- •VI. HOW TO CONSTRUCT A REACTION PATH
- •A. The Use of Constraints and Restraints
- •B. Variationally Optimizing the Cost Function
- •VII. FOCAL METHODS FOR REFINING TRANSITION STATES
- •VIII. HEURISTIC METHODS
- •IX. SUMMARY
- •ACKNOWLEDGMENT
- •REFERENCES
- •Paul D. Lyne
- •Owen A. Walsh
- •II. BACKGROUND
- •III. APPLICATIONS
- •A. Triosephosphate Isomerase
- •B. Bovine Protein Tyrosine Phosphate
- •C. Citrate Synthase
- •IV. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Jeremy C. Smith
- •III. SCATTERING BY CRYSTALS
- •IV. NEUTRON SCATTERING
- •A. Coherent Inelastic Neutron Scattering
- •B. Incoherent Neutron Scattering
- •REFERENCES
- •Michael Nilges
- •II. EXPERIMENTAL DATA
- •A. Deriving Conformational Restraints from NMR Data
- •B. Distance Restraints
- •C. The Hybrid Energy Approach
- •III. MINIMIZATION PROCEDURES
- •A. Metric Matrix Distance Geometry
- •B. Molecular Dynamics Simulated Annealing
- •C. Folding Random Structures by Simulated Annealing
- •IV. AUTOMATED INTERPRETATION OF NOE SPECTRA
- •B. Automated Assignment of Ambiguities in the NOE Data
- •C. Iterative Explicit NOE Assignment
- •D. Symmetrical Oligomers
- •VI. INFLUENCE OF INTERNAL DYNAMICS ON THE
- •EXPERIMENTAL DATA
- •VII. STRUCTURE QUALITY AND ENERGY PARAMETERS
- •VIII. RECENT APPLICATIONS
- •REFERENCES
- •II. STEPS IN COMPARATIVE MODELING
- •C. Model Building
- •D. Loop Modeling
- •E. Side Chain Modeling
- •III. AB INITIO PROTEIN STRUCTURE MODELING METHODS
- •IV. ERRORS IN COMPARATIVE MODELS
- •VI. APPLICATIONS OF COMPARATIVE MODELING
- •VII. COMPARATIVE MODELING IN STRUCTURAL GENOMICS
- •VIII. CONCLUSION
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Roland L. Dunbrack, Jr.
- •II. BAYESIAN STATISTICS
- •A. Bayesian Probability Theory
- •B. Bayesian Parameter Estimation
- •C. Frequentist Probability Theory
- •D. Bayesian Methods Are Superior to Frequentist Methods
- •F. Simulation via Markov Chain Monte Carlo Methods
- •III. APPLICATIONS IN MOLECULAR BIOLOGY
- •B. Bayesian Sequence Alignment
- •IV. APPLICATIONS IN STRUCTURAL BIOLOGY
- •A. Secondary Structure and Surface Accessibility
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Computer Aided Drug Design
- •Alexander Tropsha and Weifan Zheng
- •IV. SUMMARY AND CONCLUSIONS
- •REFERENCES
- •Oren M. Becker
- •II. SIMPLE MODELS
- •III. LATTICE MODELS
- •B. Mapping Atomistic Energy Landscapes
- •C. Mapping Atomistic Free Energy Landscapes
- •VI. SUMMARY
- •REFERENCES
- •Toshiko Ichiye
- •II. ELECTRON TRANSFER PROPERTIES
- •B. Potential Energy Parameters
- •IV. REDOX POTENTIALS
- •A. Calculation of the Energy Change of the Redox Site
- •B. Calculation of the Energy Changes of the Protein
- •B. Calculation of Differences in the Energy Change of the Protein
- •VI. ELECTRON TRANSFER RATES
- •A. Theory
- •B. Application
- •REFERENCES
- •Fumio Hirata and Hirofumi Sato
- •Shigeki Kato
- •A. Continuum Model
- •B. Simulations
- •C. Reference Interaction Site Model
- •A. Molecular Polarization in Neat Water*
- •B. Autoionization of Water*
- •C. Solvatochromism*
- •F. Tautomerization in Formamide*
- •IV. SUMMARY AND PROSPECTS
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Nucleic Acid Simulations
- •Alexander D. MacKerell, Jr.
- •Lennart Nilsson
- •D. DNA Phase Transitions
- •III. METHODOLOGICAL CONSIDERATIONS
- •A. Atomistic Models
- •B. Alternative Models
- •IV. PRACTICAL CONSIDERATIONS
- •A. Starting Structures
- •C. Production MD Simulation
- •D. Convergence of MD Simulations
- •WEB SITES OF INTEREST
- •REFERENCES
- •Membrane Simulations
- •Douglas J. Tobias
- •II. MOLECULAR DYNAMICS SIMULATIONS OF MEMBRANES
- •B. Force Fields
- •C. Ensembles
- •D. Time Scales
- •III. LIPID BILAYER STRUCTURE
- •A. Overall Bilayer Structure
- •C. Solvation of the Lipid Polar Groups
- •IV. MOLECULAR DYNAMICS IN MEMBRANES
- •A. Overview of Dynamic Processes in Membranes
- •B. Qualitative Picture on the 100 ps Time Scale
- •C. Incoherent Neutron Scattering Measurements of Lipid Dynamics
- •F. Hydrocarbon Chain Dynamics
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Appendix: Useful Internet Resources
- •B. Molecular Modeling and Simulation Packages
- •Index
Modeling in NMR Structure Determination |
261 |
B. Molecular Dynamics Simulated Annealing
In Cartesian coordinates, molecular dynamics-based simulated annealing (MDSA) refinement consists of the numerical solution of Newton’s equations of motion (see Chapter 3). The specific advantage of molecular dynamics over energy minimization is the larger radius of convergence due to possible uphill motions over large energy barriers (Fig. 4). Together with variation of temperature or energy scales, very powerful minimization strategies can be implemented.
Scaling the temperature, the overall weight on Ehybrid or all masses mi are formally equivalent [31]. The independent scaling of each contribution El by its weight factor wl gives rise to a large number of possible simulated annealing schemes. We call annealing schemes that vary the wl independently ‘‘generalized annealing schemes.’’ The initial velocities are usually assigned from a Maxwell distribution at the desired starting temperature, and the temperature is controlled (e.g., by coupling to a heat bath [56]). For the use of MD as an optimization technique, it is convenient to use uniform masses mi m for all i [39]. This, in combination with uniform energy constants in the force field, allows the use of larger time steps in the molecular dynamics, because differences in vibrational frequencies are avoided (the time step is determined by the highest vibrational frequency).
Recently, MD constrained to torsion angle space [torsion angle dynamics (TAD)] was introduced to refinement calculations [33,57,58]. Earlier versions of the equations of
Figure 4 (a) Solving Newton’s equations of motion at constant energy allows the molecule to overcome energy barriers in Ehybrid. The quantities ri and mi are the coordinate vectors and masses, respectively, of atom i, and Ehybrid is the target function of the minimization problem, containing different contributions from experimental data and from a priori knowledge (i.e., the force field).
(b) With temperature variation, powerful minimization schemes can be implemented, allowing for large energy barriers to be crossed at high temperatures, ultimately leading to the identification of the ‘‘global’’ minimum.
262 |
Nilges |
motion for molecular dynamics in torsion angle space were very inefficient to solve owing to the need for a matrix inversion at every time step [59]. Newer algorithms break down the necessary operations into a series of multiplications of small matrices and are therefore much more efficient [60,61].
The application of TAD in standard MD calculations may require the development of dedicated force fields to emulate the missing flexibility by a reparametrization of the non-bonded potential. This is not necessary for its application in NMR structure calculation, because the energy parameters developed for this purpose already assume in most cases a rigid covalent geometry, either by employing high force constants or by using only torsion angles as degrees of freedom. The advantage of TAD is that the geometry of the molecule does not have to be maintained by high force constants, which lead to high vibrational frequencies. Therefore, longer time steps at higher temperatures can be used with TAD, and the refinement protocols are numerically more stable.
C. Folding Random Structures by Simulated Annealing
Various simulated annealing protocols have been suggested to fold random structures with experimental restraints. The choice of starting structure determines the optimal protocol. The most obvious choices are random distributions of dihedral angles (as indicated in Fig. 2). The minimization procedure has to try to avoid entanglement of the chain while properly relaxing large forces in the starting conformation, which could arise from overlapping atoms or distance restraints violated by a large amount. This is achieved by a combination of soft non-bonded interactions, a violation-tolerant form of the distance restraint potential, and high temperature dynamics.
To achieve convergence with an annealing protocol using Cartesian dynamics, multistage generalized annealing protocols were introduced (Fig. 5). The first stage is a high temperature search where the molecule adopts approximately the correct fold. In this stage, the non-bonded interactions are reduced to allow the chain to intersect itself, and the representation of the non-bonded interactions may be further simplified by computing them for only a fraction of the atoms. The protocol is also adaptable to ambiguous restraint lists by a specifically reduced weight wambig on the ADRs [20], which is varied independently of wunambig (see Fig. 5). A detailed description can be found elsewhere [20].
With mostly unambiguous data, this protocol has been successfully used for proteins with up to 160 residues [62]. Although virtually all structures converge to the correct fold for small proteins, we observe that approximately one-third of the structures are misfolded for larger proteins, or for low data density, or many ambiguities (see, e.g., Ref. 63). We have also used this protocol for most structure calculations with the automated NOE assignment method ARIA discussed in the next section.
Calculations starting from random Cartesian coordinates and using standard Newton dynamics illustrate the flexibility of the generalized annealing approach. The extremely bad geometry of the initial structures requires that the weights on the covalent geometry terms start with very low values, which are then slowly increased during the calculation. All torsion angle terms (dihedral angles, planarity, and chirality) are removed from Ehybrid because of the difficulty in calculating them for random Cartesian structures. Enantiomer selection and regularization are necessary with this protocol much as they are with MMDG embedded structures. The principal advantage of the use of random Cartesian coordinates over that of random dihedral coordinates is that the former give better sampling for highly ambiguous data. The initial structure does not bias toward intraresidue or sequential assignments of ambiguous NOEs.
Modeling in NMR Structure Determination |
263 |
Figure 5 Schematic representation of a Cartesian dynamics protocol starting from random torsion angles. The weights w for non-bonded (i.e., van der Waals) interactions, unambiguous distance restraints, and ambiguous distance restraints are varied independently. The covalent interactions are maintained with full weight, wcovalent, for the entire protocol. Weights for other experimental terms may be varied in an analogous way. Coupling constant restraints and anisotropy restraints are usually used only in a refinement stage.
A TAD protocol [58] may have a three-stage organization similar to that of the Cartesian MDSA protocol (Fig. 5), with two TAD stages (one high temperature, one cooling) and a final Cartesian cooling stage. The starting temperatures can be set to much higher values (up to 50,000 K). Weights on experimental and non-bonded terms differ in the different stages, with higher weights on the experimental terms in the high temperature stage, but the principal parameter that is varied during simulated annealing is the temperature. TAD protocols used with the program DYANA [64] are even simpler, with only temperature variation in the simulated annealing stage, which is followed by conjugate gradient minimization.
In general, TAD shows better convergence than Cartesian dynamics. For nucleic acid structures, for example, the convergence rate can be very low both for MMDG and for Cartesian dynamics owing to the low restraint density. The sampling of conformational
264 |
Nilges |
space by TAD for very sparse data sets should be comparable to Cartesian dynamics protocols and better than for MMDG without metrization. Depending on the implementation, ambiguous distance restraints can be used throughout the protocol as with Cartesian dynamics. With its implementation in several NMR structure determination programs, including X-plor [65], CNS [66], and DYANA [33], the field seems to converge toward this calculation method.
IV. AUTOMATED INTERPRETATION OF NOE SPECTRA
The methods discussed in this section extend the original concept of deriving structures from experimental NMR data in two ways. First, during the structure calculation, part of the assignment problem is solved automatically. This allows specification of the NOE data in a form closer to the raw data, which makes the refinement similar to X-ray refinement. Second, the quality of the data is assessed. The methods have been recently reviewed in more detail [64,67].
A.Recognition of Incorrect Restraints: The Structural Consistency Hypothesis
Structure calculation algorithms in general assume that the experimental list of restraints is completely free of errors. This is usually true only in the final stages of a structure calculation, when all errors (e.g., in the assignment of chemical shifts or NOEs) have been identified, often in a laborious iterative process. Many effects can produce inconsistent or incorrect restraints, e.g., artifact peaks, imprecise peak positions, and insufficient error bounds to correct for spin diffusion.
Restraints due to artifacts may, by chance, be completely consistent with the correct structure of the molecule. However, the majority of incorrect restraints will be inconsistent with the correct structural data (i.e., the correct restraints and information from the force field). Inconsistencies in the data produce distortions in the structure and violations in some restraints. Structural consistency is often taken as the final criterion to identify problematic restraints. It is, for example, the central idea in the ‘‘bound-smoothing’’ part of distance geometry algorithms, and it is intimately related to the way distance data are usually specified: The error bounds are set wide enough that all data are geometrically consistent.
The problem in using violations to identify incorrect restraints is twofold. First, one has to distinguish between violations that appear because of insufficient convergence power of the structure calculation algorithm and violations due to incorrect restraints. Violations caused by incorrect restraints will be consistent (i.e., they will be present in the majority of structures), whereas insufficient convergence will produce violations that are randomly distributed. This reasoning has been formalized in the ‘‘self-correcting distance geometry’’ method [22,29], which calculates structures iteratively and modifies the list of restraints after each iteration. Consistent violations are identified by calculating the fraction of structures in which a particular restraint is violated by more than a threshold
˚
(e.g., 0.5 A). If this fraction exceeds a certain value (e.g., 0.5), the restraint is removed from the list for the calculation in the next iteration.
Second, it is possible that an incorrect restraint produces a systematic violation of another restraint. Currently, this can be ruled out only by manually checking the results,