- •Foreword
- •Preface
- •Contents
- •Introduction
- •Oren M. Becker
- •Alexander D. MacKerell, Jr.
- •Masakatsu Watanabe*
- •III. SCOPE OF THE BOOK
- •IV. TOWARD A NEW ERA
- •REFERENCES
- •Atomistic Models and Force Fields
- •Alexander D. MacKerell, Jr.
- •II. POTENTIAL ENERGY FUNCTIONS
- •D. Alternatives to the Potential Energy Function
- •III. EMPIRICAL FORCE FIELDS
- •A. From Potential Energy Functions to Force Fields
- •B. Overview of Available Force Fields
- •C. Free Energy Force Fields
- •D. Applicability of Force Fields
- •IV. DEVELOPMENT OF EMPIRICAL FORCE FIELDS
- •B. Optimization Procedures Used in Empirical Force Fields
- •D. Use of Quantum Mechanical Results as Target Data
- •VI. CONCLUSION
- •REFERENCES
- •Dynamics Methods
- •Oren M. Becker
- •Masakatsu Watanabe*
- •II. TYPES OF MOTIONS
- •IV. NEWTONIAN MOLECULAR DYNAMICS
- •A. Newton’s Equation of Motion
- •C. Molecular Dynamics: Computational Algorithms
- •A. Assigning Initial Values
- •B. Selecting the Integration Time Step
- •C. Stability of Integration
- •VI. ANALYSIS OF DYNAMIC TRAJECTORIES
- •B. Averages and Fluctuations
- •C. Correlation Functions
- •D. Potential of Mean Force
- •VII. OTHER MD SIMULATION APPROACHES
- •A. Stochastic Dynamics
- •B. Brownian Dynamics
- •VIII. ADVANCED SIMULATION TECHNIQUES
- •A. Constrained Dynamics
- •C. Other Approaches and Future Direction
- •REFERENCES
- •Conformational Analysis
- •Oren M. Becker
- •II. CONFORMATION SAMPLING
- •A. High Temperature Molecular Dynamics
- •B. Monte Carlo Simulations
- •C. Genetic Algorithms
- •D. Other Search Methods
- •III. CONFORMATION OPTIMIZATION
- •A. Minimization
- •B. Simulated Annealing
- •IV. CONFORMATIONAL ANALYSIS
- •A. Similarity Measures
- •B. Cluster Analysis
- •C. Principal Component Analysis
- •REFERENCES
- •Thomas A. Darden
- •II. CONTINUUM BOUNDARY CONDITIONS
- •III. FINITE BOUNDARY CONDITIONS
- •IV. PERIODIC BOUNDARY CONDITIONS
- •REFERENCES
- •Internal Coordinate Simulation Method
- •Alexey K. Mazur
- •II. INTERNAL AND CARTESIAN COORDINATES
- •III. PRINCIPLES OF MODELING WITH INTERNAL COORDINATES
- •B. Energy Gradients
- •IV. INTERNAL COORDINATE MOLECULAR DYNAMICS
- •A. Main Problems and Historical Perspective
- •B. Dynamics of Molecular Trees
- •C. Simulation of Flexible Rings
- •A. Time Step Limitations
- •B. Standard Geometry Versus Unconstrained Simulations
- •VI. CONCLUDING REMARKS
- •REFERENCES
- •Implicit Solvent Models
- •II. BASIC FORMULATION OF IMPLICIT SOLVENT
- •A. The Potential of Mean Force
- •III. DECOMPOSITION OF THE FREE ENERGY
- •A. Nonpolar Free Energy Contribution
- •B. Electrostatic Free Energy Contribution
- •IV. CLASSICAL CONTINUUM ELECTROSTATICS
- •A. The Poisson Equation for Macroscopic Media
- •B. Electrostatic Forces and Analytic Gradients
- •C. Treatment of Ionic Strength
- •A. Statistical Mechanical Integral Equations
- •VI. SUMMARY
- •REFERENCES
- •Steven Hayward
- •II. NORMAL MODE ANALYSIS IN CARTESIAN COORDINATE SPACE
- •B. Normal Mode Analysis in Dihedral Angle Space
- •C. Approximate Methods
- •IV. NORMAL MODE REFINEMENT
- •C. Validity of the Concept of a Normal Mode Important Subspace
- •A. The Solvent Effect
- •B. Anharmonicity and Normal Mode Analysis
- •VI. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Free Energy Calculations
- •Thomas Simonson
- •II. GENERAL BACKGROUND
- •A. Thermodynamic Cycles for Solvation and Binding
- •B. Thermodynamic Perturbation Theory
- •D. Other Thermodynamic Functions
- •E. Free Energy Component Analysis
- •III. STANDARD BINDING FREE ENERGIES
- •IV. CONFORMATIONAL FREE ENERGIES
- •A. Conformational Restraints or Umbrella Sampling
- •B. Weighted Histogram Analysis Method
- •C. Conformational Constraints
- •A. Dielectric Reaction Field Approaches
- •B. Lattice Summation Methods
- •VI. IMPROVING SAMPLING
- •A. Multisubstate Approaches
- •B. Umbrella Sampling
- •C. Moving Along
- •VII. PERSPECTIVES
- •REFERENCES
- •John E. Straub
- •B. Phenomenological Rate Equations
- •II. TRANSITION STATE THEORY
- •A. Building the TST Rate Constant
- •B. Some Details
- •C. Computing the TST Rate Constant
- •III. CORRECTIONS TO TRANSITION STATE THEORY
- •A. Computing Using the Reactive Flux Method
- •B. How Dynamic Recrossings Lower the Rate Constant
- •IV. FINDING GOOD REACTION COORDINATES
- •A. Variational Methods for Computing Reaction Paths
- •B. Choice of a Differential Cost Function
- •C. Diffusional Paths
- •VI. HOW TO CONSTRUCT A REACTION PATH
- •A. The Use of Constraints and Restraints
- •B. Variationally Optimizing the Cost Function
- •VII. FOCAL METHODS FOR REFINING TRANSITION STATES
- •VIII. HEURISTIC METHODS
- •IX. SUMMARY
- •ACKNOWLEDGMENT
- •REFERENCES
- •Paul D. Lyne
- •Owen A. Walsh
- •II. BACKGROUND
- •III. APPLICATIONS
- •A. Triosephosphate Isomerase
- •B. Bovine Protein Tyrosine Phosphate
- •C. Citrate Synthase
- •IV. CONCLUSIONS
- •ACKNOWLEDGMENT
- •REFERENCES
- •Jeremy C. Smith
- •III. SCATTERING BY CRYSTALS
- •IV. NEUTRON SCATTERING
- •A. Coherent Inelastic Neutron Scattering
- •B. Incoherent Neutron Scattering
- •REFERENCES
- •Michael Nilges
- •II. EXPERIMENTAL DATA
- •A. Deriving Conformational Restraints from NMR Data
- •B. Distance Restraints
- •C. The Hybrid Energy Approach
- •III. MINIMIZATION PROCEDURES
- •A. Metric Matrix Distance Geometry
- •B. Molecular Dynamics Simulated Annealing
- •C. Folding Random Structures by Simulated Annealing
- •IV. AUTOMATED INTERPRETATION OF NOE SPECTRA
- •B. Automated Assignment of Ambiguities in the NOE Data
- •C. Iterative Explicit NOE Assignment
- •D. Symmetrical Oligomers
- •VI. INFLUENCE OF INTERNAL DYNAMICS ON THE
- •EXPERIMENTAL DATA
- •VII. STRUCTURE QUALITY AND ENERGY PARAMETERS
- •VIII. RECENT APPLICATIONS
- •REFERENCES
- •II. STEPS IN COMPARATIVE MODELING
- •C. Model Building
- •D. Loop Modeling
- •E. Side Chain Modeling
- •III. AB INITIO PROTEIN STRUCTURE MODELING METHODS
- •IV. ERRORS IN COMPARATIVE MODELS
- •VI. APPLICATIONS OF COMPARATIVE MODELING
- •VII. COMPARATIVE MODELING IN STRUCTURAL GENOMICS
- •VIII. CONCLUSION
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Roland L. Dunbrack, Jr.
- •II. BAYESIAN STATISTICS
- •A. Bayesian Probability Theory
- •B. Bayesian Parameter Estimation
- •C. Frequentist Probability Theory
- •D. Bayesian Methods Are Superior to Frequentist Methods
- •F. Simulation via Markov Chain Monte Carlo Methods
- •III. APPLICATIONS IN MOLECULAR BIOLOGY
- •B. Bayesian Sequence Alignment
- •IV. APPLICATIONS IN STRUCTURAL BIOLOGY
- •A. Secondary Structure and Surface Accessibility
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Computer Aided Drug Design
- •Alexander Tropsha and Weifan Zheng
- •IV. SUMMARY AND CONCLUSIONS
- •REFERENCES
- •Oren M. Becker
- •II. SIMPLE MODELS
- •III. LATTICE MODELS
- •B. Mapping Atomistic Energy Landscapes
- •C. Mapping Atomistic Free Energy Landscapes
- •VI. SUMMARY
- •REFERENCES
- •Toshiko Ichiye
- •II. ELECTRON TRANSFER PROPERTIES
- •B. Potential Energy Parameters
- •IV. REDOX POTENTIALS
- •A. Calculation of the Energy Change of the Redox Site
- •B. Calculation of the Energy Changes of the Protein
- •B. Calculation of Differences in the Energy Change of the Protein
- •VI. ELECTRON TRANSFER RATES
- •A. Theory
- •B. Application
- •REFERENCES
- •Fumio Hirata and Hirofumi Sato
- •Shigeki Kato
- •A. Continuum Model
- •B. Simulations
- •C. Reference Interaction Site Model
- •A. Molecular Polarization in Neat Water*
- •B. Autoionization of Water*
- •C. Solvatochromism*
- •F. Tautomerization in Formamide*
- •IV. SUMMARY AND PROSPECTS
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Nucleic Acid Simulations
- •Alexander D. MacKerell, Jr.
- •Lennart Nilsson
- •D. DNA Phase Transitions
- •III. METHODOLOGICAL CONSIDERATIONS
- •A. Atomistic Models
- •B. Alternative Models
- •IV. PRACTICAL CONSIDERATIONS
- •A. Starting Structures
- •C. Production MD Simulation
- •D. Convergence of MD Simulations
- •WEB SITES OF INTEREST
- •REFERENCES
- •Membrane Simulations
- •Douglas J. Tobias
- •II. MOLECULAR DYNAMICS SIMULATIONS OF MEMBRANES
- •B. Force Fields
- •C. Ensembles
- •D. Time Scales
- •III. LIPID BILAYER STRUCTURE
- •A. Overall Bilayer Structure
- •C. Solvation of the Lipid Polar Groups
- •IV. MOLECULAR DYNAMICS IN MEMBRANES
- •A. Overview of Dynamic Processes in Membranes
- •B. Qualitative Picture on the 100 ps Time Scale
- •C. Incoherent Neutron Scattering Measurements of Lipid Dynamics
- •F. Hydrocarbon Chain Dynamics
- •ACKNOWLEDGMENTS
- •REFERENCES
- •Appendix: Useful Internet Resources
- •B. Molecular Modeling and Simulation Packages
- •Index
Comparative Protein Structure Modeling |
289 |
and the segment-matching method of Levitt [74]. The accuracies of the methods were similar. They were able to predict correctly approximately 50% of χ1 angles and 35% of both χ1 and χ2 angles. In typical comparative modeling applications where the backbone
˚
is closer to the native structures ( 2 A RMSD), these numbers increase by approximately 20% [146].
III. AB INITIO PROTEIN STRUCTURE MODELING METHODS
This section briefly reviews prediction of the native structure of a protein from its sequence of amino acid residues alone. These methods can be contrasted to the threading methods for fold assignment [Section II.A] [39–47,147], which detect remote relationships between sequences and folds of known structure, and to comparative modeling methods discussed in this review, which build a complete all-atom 3D model based on a related known structure. The methods for ab initio prediction include those that focus on the broad physical principles of the folding process [148–152] and the methods that focus on predicting the actual native structures of specific proteins [44,153,154,240]. The former frequently rely on extremely simplified generic models of proteins, generally do not aim to predict native structures of specific proteins, and are not reviewed here.
Although comparative modeling is the most accurate modeling approach, it is limited by its absolute need for a related template structure. For more than half of the proteins and two-thirds of domains, a suitable template structure cannot be detected or is not yet known [9,11]. In those cases where no useful template is available, the ab initio methods are the only alternative. These methods are currently limited to small proteins and at best
˚
result only in coarse models with an RMSD error for the Cα atoms that is greater than 4 A. However, one of the most impressive recent improvements in the field of protein structure modeling has occurred in ab initio prediction [155–157].
Ab initio prediction relies on the thermodynamic hypothesis of protein folding [158]. The thermodynamic hypothesis suggests that the native structure of a protein sequence corresponds to its global free energy minimum state. Accordingly, ab initio prediction methods are generally formulated as optimizations. As such, they can be distinguished by the representation of a protein and its degrees of freedom, the function that defines the energy for each of the allowed conformations, and the optimization method that attempts to find the global minimum on a given energy surface.
Although the folding of short proteins has been simulated at the atomic level of detail [159,160], a simplified protein representation is often applied. Simplifications include using one or a few interaction centers per residue [161] as well as a lattice representation of a protein [162]. Some methods are hierarchical in that they begin with a simplified lattice representation and end up with an atomistic detailed molecular dynamics simulation [163].
The energy functions for folding simulations include atom-based potentials from molecular mechanics packages [164] such as CHARMM [81], AMBER [165], and ECEPP [166], the statistical potentials of mean force derived from many known protein structures [167], and simplified potentials based on chemical intuition [168–171]. Some methods also incorporate non-physical spatial restraints obtained from multiple sequence alignments and other considerations to reduce the size of the conformational space that needs to be explored [172–176].
290 |
Fiser et al. |
Many different optimization methods [177,178]—even enumerations with some lattice models [171]—have been applied to the protein folding problem. These methods include molecular dynamics simulations [179,180], Monte Carlo sampling [173,181,182], the diffusion equation method [183], and genetic algorithm optimization [184–186]. A recent and particularly successful approach assembles the whole protein model from relatively short building blocks [187–189]. Many candidate blocks are obtained from known protein structures by relying on energetic, geometrical, and sequence similarity filters. The model of a whole protein is then assembled from such pieces by a Monte Carlo optimization of a statistical energy function [188].
There is scope for combining the comparative modeling and ab initio methods. The modeling of inserted loops in comparative prediction is based primarily on the sequence information alone. In addition, the alignment errors as well as large distortions of the target relative to the template require that such regions be modeled ab initio without relying on the template structure. It is likely that the ab initio approaches will help reduce some of the limitations of comparative modeling.
IV. ERRORS IN COMPARATIVE MODELS
The errors in comparative models can be divided into five categories [58] (Fig. 7):
1.Errors in side chain packing.
2.Distortions or shifts of a region that is aligned correctly with the template structures.
3.Distortions or shifts of a region that does not have an equivalent segment in any of the template structures.
4.Distortions or shifts of a region that is aligned incorrectly with the template structures.
5.A misfolded structure resulting from using an incorrect template.
Significant methodological improvements are needed to address all of these errors. Errors 3–5 are relatively infrequent when sequences with more than 40% identity
to the templates are modeled. For example, in such a case, approximately 90% of the
˚
main chain atoms are likely to be modeled with an RMS error of about 1 A. In this range of sequence similarity, the alignment is mostly straightforward to construct, there are not many gaps, and structural differences between the proteins are usually limited to loops and side chains. When sequence identity is between 30% and 40%, the structural differences become larger, and the gaps in the alignment are more frequent and longer. As a
˚
result, the main chain RMS error increases to about 1.5 A for about 80% of the residues. The rest of the residues are modeled with large errors because the methods generally fail to model structural distortions and rigid-body shifts and are unable to recover from misalignments. Below 40% sequence identity, misalignments and insertions in the target sequence become the major problems. Insertions longer than about eight residues cannot yet be modeled accurately, but shorter loops can frequently be modeled successfully [92,119,239]. When sequence identity drops below 30%, the main problem becomes the identification of related templates and their alignment with the sequence to be modeled (Fig. 8). In general, it can be expected that about 20% of residues will be misaligned and
˚
consequently incorrectly modeled with an error greater than 3 A at this level of sequence similarity [51]. This is a serious impediment for comparative modeling because it appears
Comparative Protein Structure Modeling |
291 |
Figure 7 Typical errors in comparative modeling. (a) Errors in side chain packing. The Trp 109 residue in the crystal structure of mouse cellular retinoic acid binding protein I (thin line) is compared with its model (thick line) and with the template mouse adipocyte lipid-binding protein (broken line). (b) Distortions and shifts in correctly aligned regions. A region in the crystal structure of mouse cellular retinoic acid binding protein I (thin line) is compared with its model (thick line), and with the template fatty acid binding protein (broken line). (c) Errors in regions without a template. The Cα trace of the 112–117 loop is shown for the X-ray structure of human eosinophil neurotoxin (thin line), its model (thick line), and the template ribonuclease A structure (residues 111–117; broken line). (d) Errors due to misalignments. The N-terminal region in the crystal structure of human eosinophil neurotoxin (thin line) is compared with its model (thick line). The corresponding region of the alignment with the template ribonuclease A is shown. The black lines show
˚
correct equivalences, that is residues whose Cα atoms are within 5 A of each other in the optimal least-squares superposition of the two X-ray structures. The ‘‘a’’ characters in the bottom line indicate helical residues. (e) Errors due to an incorrect template. The X-ray structure of α-trichosanthin (thin line) is compared with its model (thick line), which was calculated using indole-3-glycerophos- phate synthase as the template. (From Ref. 146.)
that at least one-half of all related protein pairs are related at less than 30% sequence identity [9,190].
It has been pointed out that a comparative model is frequently more distant from the actual target structure than the closest template structure used to calculate the model [191]. However, at least for some modeling methods, this is the case only when there are errors in the template–target alignment used for modeling and when the correct structurebased template–target alignment is used for comparing the template with the actual target structure [58]. In contrast, the model is generally closer to the target structure than any of
292 |
Fiser et al. |
Figure 8 Average model accuracy as a function of the percentage identity between the target and template sequences. (a) The models were calculated entirely automatically, based on single template structures. As the sequence identity between the target sequence and the template structure decreases, the average structural similarity between the template and the target also decreases (dashed line, triangles). Structure overlap is defined as the fraction of equivalent Cα atoms. For comparison of the model with the actual structure (continuous line, circles), two Cα atoms were considered equivalent if
˚
they were within 3.5 A of each other and belonged to the same residue. For comparison of the template structure with the actual structure (dashed line, triangles), two Cα atoms were considered
˚
equivalent if they were within 3.5 A of each other after alignment and rigid-body superposition by the ALIGN3D command in MODELLER. (b) Three models (solid line) compared with their corresponding experimental structures (dotted line). The models were calculated with MODELLER in a completely automated fashion before the experimental structures were available [146]. When multiple sequence and structure information is used and the alignments are edited by hand, the models can be significantly more accurate than shown in this plot [58].
the templates if the modeling target–template alignment is used in evaluating the similarity between the actual target structure and the template [58]. As a result, using a model is generally better than using the template structure even when the alignment is incorrect, because the actual target structure, and therefore the correct template–target alignment, are not available in practical modeling applications.
Comparative Protein Structure Modeling |
293 |
To put the errors in comparative models into perspective, we list the differences among structures of the same protein that have been determined experimentally (Fig. 9).
˚
The 1 A accuracy of main chain atom positions corresponds to X-ray structures defined
˚
at a low resolution of about 2.5 A and with an R-factor of about 25% [192], as well as to medium resolution NMR structures determined from 10 interproton distance restraints per residue [193]. Similarly, differences between the highly refined X-ray and NMR struc-
˚
tures of the same protein also tend to be about 1 A [193]. Changes in the environment
Figure 9 Relative accuracy of comparative models. Upper left panel, comparison of homologous structures that share 40% sequence identity. Upper right panel, conformations of ileal lipid-binding protein that satisfy the NMR restraints set equally well. Lower left panel, comparison of two independently determined X-ray structures of interleukin 1β. Lower right panel, comparison of the X-ray and NMR structures of erabutoxin. The figure was prepared using the program MOLSCRIPT [236].
294 |
Fiser et al. |
(e.g., oligomeric state, crystal packing, solvent, ligands) can also have a significant effect on the structure [194]. Overall, comparative modeling based on templates with more than 40% identity is almost as good as medium resolution experimental structures, simply because the proteins at this level of similarity are likely to be as similar to each other as are the structures for the same protein determined by different experimental techniques under different conditions. However, the caveat in comparative protein modeling is that some regions, mainly loops and side chains, may have larger errors.
A particularly informative way to test protein structure modeling methods, including comparative modeling, is provided by the biennial meetings on critical assessment of techniques for protein structure prediction (CASP) [191,195,196]. The most recent meeting was held in December 1998 [241]. Protein modelers are challenged to model sequences with unknown 3D structure and to submit their models to the organizers before the meeting. At the same time, the 3D structures of the prediction targets are being determined by X-ray crystallography or NMR methods. They become available only after the models are calculated and submitted. Thus, a bona fide evaluation of protein structure modeling methods is possible.
V.MODEL EVALUATION
Essential for interpreting 3D protein models is the estimation of their accuracy, both the overall accuracy and the accuracy in the individual regions of a model. The errors in models arise from two main sources, the failure of the conformational search to find the optimal conformation and the failure of the scoring function to identify the optimal conformation. The 3D models are generally evaluated by relying on geometrical preferences of the amino acid residues or atoms that are derived from known protein structures. Empirical relationships between model errors and target–template sequence differences can also be used. It is convenient to approach an evaluation of a given model in a hierarchical manner [9]. It first needs to be assessed if the model at least has the correct fold. The model will have a correct fold if the correct template is picked and if that template is aligned at least approximately correctly with the target sequence. Once the fold of a model is confirmed, a more detailed evaluation of the overall model accuracy can be performed based on the overall sequence similarity on which the model is based (Fig. 8). Finally, a variety of error profiles can be constructed to quantify the likely errors in the different regions of a model. A good strategy is to evaluate the models by using several different methods and identify the consensus between them. In addition, energy functions are in general designed to work at a certain level of detail and are not appropriate to judge the models at a finer or coarser level [197]. There are many model evaluation programs and servers [198,199] (Table 1).
A basic requirement for a model is that it have good stereochemistry. The most useful programs for evaluating stereochemistry are PROCHECK [200], PROCHECKNMR [201], AQUA [201], SQUID [202], and WHATCHECK [203]. The features of a model that are checked by these programs include bond lengths, bond angles, peptide bond and side chain ring planarities, chirality, main chain and side chain torsion angles, and clashes between non-bonded pairs of atoms. In addition to good stereochemistry, a model also has to have low energy according to a molecular mechanics force field, such as that of CHARMM22 [80]. However, low molecular mechanical energy does not ensure