- •Foreword
- •Preface
- •Contents
- •Introduction
- •Oren M. Becker
- •Alexander D. MacKerell, Jr.
- •Masakatsu Watanabe*
- •III. SCOPE OF THE BOOK
- •IV. TOWARD A NEW ERA
- •REFERENCES
- •Atomistic Models and Force Fields
- •Alexander D. MacKerell, Jr.
- •II. POTENTIAL ENERGY FUNCTIONS
- •D. Alternatives to the Potential Energy Function
- •III. EMPIRICAL FORCE FIELDS
- •A. From Potential Energy Functions to Force Fields
- •B. Overview of Available Force Fields
- •C. Free Energy Force Fields
- •D. Applicability of Force Fields
- •IV. DEVELOPMENT OF EMPIRICAL FORCE FIELDS
- •B. Optimization Procedures Used in Empirical Force Fields
- •D. Use of Quantum Mechanical Results as Target Data
- •VI. CONCLUSION
- •REFERENCES
- •Dynamics Methods
- •Oren M. Becker
- •Masakatsu Watanabe*
- •II. TYPES OF MOTIONS
- •IV. NEWTONIAN MOLECULAR DYNAMICS
- •A. Newton’s Equation of Motion
- •C. Molecular Dynamics: Computational Algorithms
- •A. Assigning Initial Values
- •B. Selecting the Integration Time Step
- •C. Stability of Integration
- •VI. ANALYSIS OF DYNAMIC TRAJECTORIES
- •B. Averages and Fluctuations
- •C. Correlation Functions
- •D. Potential of Mean Force
- •VII. OTHER MD SIMULATION APPROACHES
- •A. Stochastic Dynamics
- •B. Brownian Dynamics
- •VIII. ADVANCED SIMULATION TECHNIQUES
- •A. Constrained Dynamics
- •C. Other Approaches and Future Direction
- •REFERENCES
- •Conformational Analysis
- •Oren M. Becker
- •II. CONFORMATION SAMPLING
- •A. High Temperature Molecular Dynamics
- •B. Monte Carlo Simulations
- •C. Genetic Algorithms
- •D. Other Search Methods
- •III. CONFORMATION OPTIMIZATION
- •A. Minimization
- •B. Simulated Annealing
- •IV. CONFORMATIONAL ANALYSIS
- •A. Similarity Measures
- •B. Cluster Analysis
- •C. Principal Component Analysis
- •REFERENCES
- •Thomas A. Darden
- •II. CONTINUUM BOUNDARY CONDITIONS
- •III. FINITE BOUNDARY CONDITIONS
- •IV. PERIODIC BOUNDARY CONDITIONS
- •REFERENCES
- •Internal Coordinate Simulation Method
- •Alexey K. Mazur
- •II. INTERNAL AND CARTESIAN COORDINATES
- •III. PRINCIPLES OF MODELING WITH INTERNAL COORDINATES
- •B. Energy Gradients
- •IV. INTERNAL COORDINATE MOLECULAR DYNAMICS
- •A. Main Problems and Historical Perspective
- •B. Dynamics of Molecular Trees
- •C. Simulation of Flexible Rings
- •A. Time Step Limitations
- •B. Standard Geometry Versus Unconstrained Simulations
- •VI. CONCLUDING REMARKS
- •REFERENCES
- •Implicit Solvent Models
- •II. BASIC FORMULATION OF IMPLICIT SOLVENT
- •A. The Potential of Mean Force
- •III. DECOMPOSITION OF THE FREE ENERGY
- •A. Nonpolar Free Energy Contribution
- •B. Electrostatic Free Energy Contribution
- •IV. CLASSICAL CONTINUUM ELECTROSTATICS
- •A. The Poisson Equation for Macroscopic Media
- •B. Electrostatic Forces and Analytic Gradients
- •C. Treatment of Ionic Strength
- •A. Statistical Mechanical Integral Equations
- •VI. SUMMARY
- •REFERENCES
- •Steven Hayward
- •II. NORMAL MODE ANALYSIS IN CARTESIAN COORDINATE SPACE
- •B. Normal Mode Analysis in Dihedral Angle Space
- •C. Approximate Methods
- •IV. NORMAL MODE REFINEMENT
- •C. Validity of the Concept of a Normal Mode Important Subspace
- •A. The Solvent Effect
- •B. Anharmonicity and Normal Mode Analysis
- •VI. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Free Energy Calculations
- •Thomas Simonson
- •II. GENERAL BACKGROUND
- •A. Thermodynamic Cycles for Solvation and Binding
- •B. Thermodynamic Perturbation Theory
- •D. Other Thermodynamic Functions
- •E. Free Energy Component Analysis
- •III. STANDARD BINDING FREE ENERGIES
- •IV. CONFORMATIONAL FREE ENERGIES
- •A. Conformational Restraints or Umbrella Sampling
- •B. Weighted Histogram Analysis Method
- •C. Conformational Constraints
- •A. Dielectric Reaction Field Approaches
- •B. Lattice Summation Methods
- •VI. IMPROVING SAMPLING
- •A. Multisubstate Approaches
- •B. Umbrella Sampling
- •C. Moving Along
- •VII. PERSPECTIVES
- •REFERENCES
- •John E. Straub
- •B. Phenomenological Rate Equations
- •II. TRANSITION STATE THEORY
- •A. Building the TST Rate Constant
- •B. Some Details
- •C. Computing the TST Rate Constant
- •III. CORRECTIONS TO TRANSITION STATE THEORY
- •A. Computing Using the Reactive Flux Method
- •B. How Dynamic Recrossings Lower the Rate Constant
- •IV. FINDING GOOD REACTION COORDINATES
- •A. Variational Methods for Computing Reaction Paths
- •B. Choice of a Differential Cost Function
- •C. Diffusional Paths
- •VI. HOW TO CONSTRUCT A REACTION PATH
- •A. The Use of Constraints and Restraints
- •B. Variationally Optimizing the Cost Function
- •VII. FOCAL METHODS FOR REFINING TRANSITION STATES
- •VIII. HEURISTIC METHODS
- •IX. SUMMARY
- •ACKNOWLEDGMENT
- •REFERENCES
- •Paul D. Lyne
- •Owen A. Walsh
- •II. BACKGROUND
- •III. APPLICATIONS
- •A. Triosephosphate Isomerase
- •B. Bovine Protein Tyrosine Phosphate
- •C. Citrate Synthase
- •IV. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Jeremy C. Smith
- •III. SCATTERING BY CRYSTALS
- •IV. NEUTRON SCATTERING
- •A. Coherent Inelastic Neutron Scattering
- •B. Incoherent Neutron Scattering
- •REFERENCES
- •Michael Nilges
- •II. EXPERIMENTAL DATA
- •A. Deriving Conformational Restraints from NMR Data
- •B. Distance Restraints
- •C. The Hybrid Energy Approach
- •III. MINIMIZATION PROCEDURES
- •A. Metric Matrix Distance Geometry
- •B. Molecular Dynamics Simulated Annealing
- •C. Folding Random Structures by Simulated Annealing
- •IV. AUTOMATED INTERPRETATION OF NOE SPECTRA
- •B. Automated Assignment of Ambiguities in the NOE Data
- •C. Iterative Explicit NOE Assignment
- •D. Symmetrical Oligomers
- •VI. INFLUENCE OF INTERNAL DYNAMICS ON THE
- •EXPERIMENTAL DATA
- •VII. STRUCTURE QUALITY AND ENERGY PARAMETERS
- •VIII. RECENT APPLICATIONS
- •REFERENCES
- •II. STEPS IN COMPARATIVE MODELING
- •C. Model Building
- •D. Loop Modeling
- •E. Side Chain Modeling
- •III. AB INITIO PROTEIN STRUCTURE MODELING METHODS
- •IV. ERRORS IN COMPARATIVE MODELS
- •VI. APPLICATIONS OF COMPARATIVE MODELING
- •VII. COMPARATIVE MODELING IN STRUCTURAL GENOMICS
- •VIII. CONCLUSION
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Roland L. Dunbrack, Jr.
- •II. BAYESIAN STATISTICS
- •A. Bayesian Probability Theory
- •B. Bayesian Parameter Estimation
- •C. Frequentist Probability Theory
- •D. Bayesian Methods Are Superior to Frequentist Methods
- •F. Simulation via Markov Chain Monte Carlo Methods
- •III. APPLICATIONS IN MOLECULAR BIOLOGY
- •B. Bayesian Sequence Alignment
- •IV. APPLICATIONS IN STRUCTURAL BIOLOGY
- •A. Secondary Structure and Surface Accessibility
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Computer Aided Drug Design
- •Alexander Tropsha and Weifan Zheng
- •IV. SUMMARY AND CONCLUSIONS
- •REFERENCES
- •Oren M. Becker
- •II. SIMPLE MODELS
- •III. LATTICE MODELS
- •B. Mapping Atomistic Energy Landscapes
- •C. Mapping Atomistic Free Energy Landscapes
- •VI. SUMMARY
- •REFERENCES
- •Toshiko Ichiye
- •II. ELECTRON TRANSFER PROPERTIES
- •B. Potential Energy Parameters
- •IV. REDOX POTENTIALS
- •A. Calculation of the Energy Change of the Redox Site
- •B. Calculation of the Energy Changes of the Protein
- •B. Calculation of Differences in the Energy Change of the Protein
- •VI. ELECTRON TRANSFER RATES
- •A. Theory
- •B. Application
- •REFERENCES
- •Fumio Hirata and Hirofumi Sato
- •Shigeki Kato
- •A. Continuum Model
- •B. Simulations
- •C. Reference Interaction Site Model
- •A. Molecular Polarization in Neat Water*
- •B. Autoionization of Water*
- •C. Solvatochromism*
- •F. Tautomerization in Formamide*
- •IV. SUMMARY AND PROSPECTS
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Nucleic Acid Simulations
- •Alexander D. MacKerell, Jr.
- •Lennart Nilsson
- •D. DNA Phase Transitions
- •III. METHODOLOGICAL CONSIDERATIONS
- •A. Atomistic Models
- •B. Alternative Models
- •IV. PRACTICAL CONSIDERATIONS
- •A. Starting Structures
- •C. Production MD Simulation
- •D. Convergence of MD Simulations
- •WEB SITES OF INTEREST
- •REFERENCES
- •Membrane Simulations
- •Douglas J. Tobias
- •II. MOLECULAR DYNAMICS SIMULATIONS OF MEMBRANES
- •B. Force Fields
- •C. Ensembles
- •D. Time Scales
- •III. LIPID BILAYER STRUCTURE
- •A. Overall Bilayer Structure
- •C. Solvation of the Lipid Polar Groups
- •IV. MOLECULAR DYNAMICS IN MEMBRANES
- •A. Overview of Dynamic Processes in Membranes
- •B. Qualitative Picture on the 100 ps Time Scale
- •C. Incoherent Neutron Scattering Measurements of Lipid Dynamics
- •F. Hydrocarbon Chain Dynamics
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Appendix: Useful Internet Resources
- •B. Molecular Modeling and Simulation Packages
- •Index
Modeling in NMR Structure Determination |
255 |
inary structure (see, e.g., Ref. 19). Alternatively, it can be treated like an ambiguous NOE (see below) [20].
B. Distance Restraints
In an isolated two-spin system, the NOE (or, more accurately, the slope of its buildup) depends simply on d 6, where d is the distance between two protons. The difficulties in the interpretation of the NOE originate in deviations from this simple distance dependence of the NOE buildup (due to spin diffusion caused by other nearby protons, and internal dynamics) and from possible ambiguities in its assignment to a specific proton pair. Molecular modeling methods to deal with these difficulties are discussed further below.
Usually, simplified representations of the data are used to obtain preliminary structures. Thus, lower and upper bounds on the interproton distances are estimated from the NOE intensity [10], using appropriate reference distances for calibration. The bounds should include the estimates of the cumulative error due to all sources such as peak integration errors, spin diffusion, and internal dynamics.
The dispersion of proton chemical shifts is usually incomplete in the one-dimen- sional spectrum of a macromolecule, resulting in many degenerate resonances. As a result, few NOEs can be assigned only on the basis of resonance assignments and without any knowledge of the structure of the molecule [21,22]. Unless ambiguities can be resolved by using additional information, such as the peak shape or data from heteronuclear experiments, the remaining NOEs are ambiguous and cannot be converted into restraints on distances between proton pairs.
Nevertheless, the information from ambiguous NOEs can be converted directly into structural restraints. The structure calculation or refinement with ambiguous data can proceed in a way directly analogous to refinement with standard distance restraints, restraining
a ‘‘d 6-summed distance’’ D by means of a distance target function (see Fig. 2). By analogy with standard unambiguous distance restraints between atom pairs, we call these ‘‘ambiguous distance restraints’’ (ADRs) [21]. Similar methods can be applied to ambiguities in other experimental data, such as hydrogen bonds [23,24], disulfide bridges [21], and paramagnetic shift broadening and chemical shift differences [25,26].
The distances in the structure are restrained to the upper and lower bounds derived from NOEs by ‘‘flat-bottom’’ potentials. The potential should be gradient-bounded and have an asymptotic region for large violations that is linear [27–29] (see Fig. 2). Then, for large restraint violations, the force approaches a maximum value or can even be decreased, depending on the parameters. This makes the optimization numerically more stable and seems to improve convergence by transiently allowing larger violations during the calculation, thus allowing the structure to gradually escape deep local minima.
The limitation of the gradient of the potential is particularly important for calculations with ADRs and for data sets that potentially contain noise peaks, since it facilitates the appearance of violations due to incorrect restraints. A standard harmonic potential would put a high penalty on large violations and would introduce larger distortions into the structure.
C. The Hybrid Energy Approach
Even if the set of data from NMR experiments is as complete as possible, it is insufficient to define the positions of all the atoms in the molecule, simply because most of the data
256 |
Nilges |
Figure 2 Use of unambiguous or ambiguous distance restraints in an optimization calculation.
(a)The distance D that is restrained can be a distance measured between two protons in the molecule or a ‘‘(∑ d 6) 1/6 summed distance’’ with contributions from many proton pairs, where the sum runs over all contributions to a cross-peak that are possible due to chemical shift degeneracy. The question marks indicate ambiguities in the assignment of the NOE. For clarity, a situation with only two assignment possibilities is shown. There can be many more possibilities with experimental data.
(b)The restraining potential is gradient bounded to avoid large forces for large violations. kNOE is
the energy constant, and U and L are upper and lower bounds derived from the size of the NOE. The parameter σ determines the distance at which the potential switches from harmonic to asymptotic behavior, β is the asymptotic slope of the potential, and the coefficients α and γ are determined such that the potential is continuous and differentiable at U σ. If D is between L and U, the energy and gradient are zero.
Modeling in NMR Structure Determination |
257 |
are measured for protons only. The positions of the other atoms have to be inferred, using values of bond lengths, bond angles, planarity, and van der Waals radii that are known a priori.
A molecular dynamics force field is a convenient compilation of these data (see Chapter 2). The data may be used in a much simplified form (e.g., in the case of metric matrix distance geometry, all data are converted into lower and upper bounds on interatomic distances, which all have the same weight). Similar to the use of energy parameters in X-ray crystallography, the parameters need not reflect the dynamic behavior of the molecule. The force constants are chosen to avoid distortions of the molecule when experimental restraints are applied. Thus, the force constants on bond angle and planarity are a factor of 10–100 higher than in standard molecular dynamics force fields. Likewise, a detailed description of electrostatic and van der Waals interactions is not necessary and may not even be beneficial in calculating NMR structures.
The problem of finding conformations of the molecule that satisfy the experimental data is then that of finding conformations that minimize a hybrid energy function Ehybrid, which contains different contributions from experimental data and the force field (see below). These contributions need to be properly weighted with respect to each other. However, if the chosen experimental upper and lower bounds are wide enough to avoid any geometrical inconsistencies between the force field and the data, this relative weight does not play a predominant role.
III. MINIMIZATION PROCEDURES
Finding the minimum of the hybrid energy function is very complex. Similar to the protein folding problem, the number of degrees of freedom is far too large to allow a complete systematic search in all variables. Systematic search methods need to reduce the problem to a few degrees of freedom (see, e.g., Ref. 30). Conformations of the molecule that satisfy the experimental bounds are therefore usually calculated with metric matrix distance geometry methods followed by optimization or by optimization methods alone.
Minimization is often not powerful enough for structure calculations of macromolecules unless it is used with an elaborate protocol (e.g., the ‘‘buildup method’’ [3]). More powerful approaches are based on global optimization of the hybrid energy function by molecular dynamics based simulated annealing [31–33]. Other optimization methods have been suggested for NMR structure calculation, notably Monte Carlo simulated annealing [34] and genetic algorithms [35]. Branch-and-bound algorithms have also been suggested for docking rigid monomers with ambiguous restraints [36] or with very sparse data sets [37]. An important feature of the latter is the addition of a hydrophobic potential [38] to the hybrid energy function, which serves to pack secondary structure elements.
Because the parameter-to-observable ratio is rather low, structures are calculated repeatedly with the same restraints. The aim is a random sampling of the conformational space consistent with the restraints. In metric matrix distance geometry, randomness is achieved by the random selection of distance estimates within the bounds. In optimization calculations, one achieves random searching by either selecting a starting conformation very far from the folded structure (e.g., an extended strand [4]) or by choosing starting conformations that are random (either in torsion angles or in Cartesian coordinates) [3,39].
258 |
Nilges |
A. Metric Matrix Distance Geometry
A distance geometry calculation consists of two major parts. In the first, the distances are checked for consistency, using a set of inequalities that distances have to satisfy (this part is called ‘‘bound smoothing’’); in the second, distances are chosen randomly within these bounds, and the so-called metric matrix (Mij) is calculated. ‘‘Embedding’’ then converts this matrix to three-dimensional coordinates, using methods akin to principal component analysis [40].
There are many extensive reviews on metric matrix distance geometry [41–44], some of which provide illustrative examples [45,46]. In total, we can distinguish five steps in a distance geometry calculation:
1.Bound smoothing
2.Distance selection and metrization
3.Construction of the metric matrix
4.Embedding
5.Refinement (optimization)
Bound smoothing serves two purposes: to check consistency of the distances and to transfer information between atoms. Distances have to satisfy the triangle inequalities in a metric space of any dimension (the sum of two sides of a triangle has to be larger than the third; see Fig. 3). To ensure consistency of the distances in three-dimensional space, more inequalities would be necessary (the triangle, tetrangle, pentangle, and hexangle inequalities) [41,47]. Only the tetrangle inequality is of practical use, and it is usually not employed because of high computational costs. This inequality transfers information from one diagonal of a tetrangle to the other; in two dimensions this is the parallelogram equation |a b| |c d|.
The most important consequence of bound smoothing is the transfer of information from those atoms for which NMR data are available to those that cannot be observed directly in NMR experiments. Within the original experimental bounds, the minimal distance intervals are identified for which all N3 triangle inequalities can be satisfied. A distance chosen outside these intervals would violate at least one triangle inequality. For example, an NOE between protons pi and pj and the covalent bond between pj and carbon Cj imposes upper and lower bounds on the distance between pi and Cj, although this distance is not observable experimentally nor is it part of Echem.
The second step concerns distance selection and metrization. Bound smoothing only reduces the possible intervals for interatomic distances from the original bounds. However, the embedding algorithm demands a specific distance for every atom pair in the molecule. These distances are chosen randomly within the interval, from either a uniform or an estimated distribution [48,49], to generate a trial distance matrix. Uniform distance distributions seem to provide better sampling for very sparse data sets [48].
Note that although the bounds on the distances satisfy the triangle inequalities, particular choices of distances between these bounds will in general violate them. Therefore, if all distances are chosen within their bounds independently of each other (the method that is used in most applications of distance geometry for NMR structure determination), the final distance matrix will contain many violations of the triangle inequalities. The main consequence is a very limited sampling of the conformational space of the embedded structures for very sparse data sets [48,50,51] despite the intrinsic randomness of the tech-
Modeling in NMR Structure Determination |
259 |
Figure 3 Flow of a distance geometry calculation. On the left is shown the development of the data; on the right, the operations. dij is the distance between atoms i and j; Lij and Uij are lower and upper bounds on the distance; L′ij and U′ij are the smoothed bounds after application of the triangle inequality; di0 is the distance between atom i and the geometric center; N is the number of atoms; (Mij) is the metric matrix; ri is the positional vector of atom i; e1 is the first eigenvector of (Mij) with eigenvalue λl; xi, yi, and zi are the x-, y-, and z-coordinates of atom i. (1–5 correspond to the numbered list on pg. 258.)
nique. In spite of these limitations, the algorithm is remarkably stable in its simplest form.
Metrization guarantees that all distances satisfy the triangle inequalities by repeating a bound-smoothing step after each distance choice. The order of distance choice becomes important [48,49,51]; optimally, the distances are chosen in a completely random sequence
260 |
Nilges |
[49]. Metrization is a very computer-intensive operation. Computer time can be saved by using a partially random sequence [43] and terminating the process after 4N distances
[51](a three-dimensional object is completely specified by 4N 10 distances). Metrization leads to a much better sampling of conformational space and dramati-
cally improves the local quality of the structures when few long-range connectivities are present [48,51]. The better sampling of space comes at a certain price: The embedded structures may show errors in the topology that are not seen without metrization [31,43]. This may be due to the enforced propagation of an error in a distance choice to many other distances through the triangle inequality.
The metric matrix is the matrix of all scalar products of position vectors of the atoms when the geometric center is placed in the origin. By application of the law of cosines, this matrix can be obtained from distance information only. Because it is invariant against rotation but not translation, the distances to the geometric center have to be calculated from the interatomic distances (see Fig. 3). The matrix allows the calculation of coordinates from distances in a single step, provided that all Natom(Natom 1)/2 interatomic distances are known.
Embedding is the calculation of coordinates from the metric matrix by methods akin to principal component analysis [40,52]. The eigenvectors of the metric matrix contain the principal coordinates of the atoms. If the distances correspond to a three-dimensional object, only three eigenvalues of the matrix are nonzero (see Refs. 41 and 53 for mathematical proofs), and the first eigenvector contains all x-coordinates, the second all y-coordi- nates, and the third all z-coordinates. If the distances are not consistent with a threedimensional object (the usual situation with sparse NMR data, when the majority of distances come from the random number generator), there will be more than three positive eigenvalues. The eigenvector expansion is then truncated after the first three eigenvalues; this corresponds to a projection of a higher dimensional object into three-dimensional space.
Refinement of the embedded structures is always necessary to remove distortions in the structure. One shortcoming of the embedding algorithm is that data cannot be weighted according to their certainty in any way. During the projection, bond lengths are distorted in the same way as long-range distances guessed by the random number generator within possibly very wide bounds. Also, chirality information is completely absent during bound smoothing and embedding. The first step in the refinement is the selection of the correct enantiomer, which may be achieved on the basis of the chirality of Cα atoms [1], secondary structure elements, or partial refinement of both enantiomers and choice of the enantiomer with lower energy [51].
If the distances satisfy the triangle inequalities, they are embeddable in some dimension. One possible solution is therefore to try to start refinement in four dimensions and use the allowed deviation into the fourth dimension as an additional annealing parameter [43,54]. The advantages of refinement in higher dimensions are similar to those of soft atoms discussed below.
A time-saving variant of the distance geometry procedure described above is substructure embedding. Here, about a third of the atoms are chosen after the bound smoothing step and embedded. This procedure was originally used to improve the performance of the distance geometry algorithm by adding the distances from the embedded and partially refined structures back to the distance list [1]. The substructures can be refined directly with simulated annealing by filling in the missing atoms approximately in their correct positions [55].