Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

16

MacKerell

application for which extensive experimental data are available. Tests of different force fields (and programs) can then be performed to see which best reproduces the experimental data for the model system and would therefore be the most appropriate for the application.

IV. DEVELOPMENT OF EMPIRICAL FORCE FIELDS

As emphasized by the word ‘‘empirical’’ to describe the force fields used for biomolecular computations, the development of these force fields is largely based on the methods and target data used to optimize the parameters in the force field. Decisions concerning these methods and target data are strongly dependent on the force field developer. To a large extent, even the selection of the form of the potential energy function itself is empirical, based on considerations of what terms are and are not required to obtain satisfactory results. Accordingly, the philosophy, or assumptions, used in the development of a force field will dictate both its applicability and its quality. A brief discussion of some of the philosophical considerations behind the most commonly used force fields follows.

A.Philosophical Considerations Behind Commonly Used Force Fields

Step 1 in the development of a force field is a decision concerning its applicability and transferability. The applicability issue was discussed in Section III.D and can be separated, on one level, into force fields for biological molecules and those for small molecules. Applicability also includes the use of explicit solvent representations (i.e., the solvent molecules themselves are included in the simulations), implicit solvent models [i.e., the solvent is included in a simplified, continuum-based fashion, the simplest being the use of a dielectric constant of 78 (for water) versus 1 (for vacuum)], or free energy based force fields. Transferability is concerned with the ability to take parameters optimized for a given set of target data and apply them to compounds not included in the target data. For example, dihedral parameters about a CEC single bond may be optimized with respect to the rotational energy surface of ethane. In a transferable force field those parameters would then be applied for calculations on butane. In a nontransferable force field, the parameters for the CECECEC and CECECEH dihedrals not in ethane would be optimized specifically by using target data on butane. Obviously, the definition of transferability is somewhat ambiguous, and the extent to which parameters can be transferred is associated with chemical similarity. However, because of the simplicity of empirical force fields, transferability must be treated with care.

Force fields for small molecules are generally considered transferable, the transferability being attained by the use of various cross terms in the potential energy function. Typically, a set of model compounds representing a type of functional group (e.g., azo compounds or bicarbamates) is selected. Parameters corresponding to the functional group are then optimized to reproduce the available target data for the selected model compounds. Those parameters are then transferred to new compounds that contain that functional group but for which unique chemical connectivities are present (see the ethane-to- butane example above). A recent comparison of several of the small-molecule force fields discussed above has shown this approach to yield reasonable results for conformational energies; however, in all cases examples exist of catastrophic failures [52]. Such failures emphasize the importance of user awareness when a force field is being applied to a novel chemical system. This awareness includes an understanding of the range of functional

Atomistic Models and Force Fields

17

groups used in the optimization of the force field and the relationship of the novel chemical systems to those functional groups. The more dissimilar the novel compound and the compounds included in the target data, the less confidence the user should have in the obtained results. This is also true in the case of bifunctional compounds, where the physical properties of the first functional group could significantly change those of the second group and vice versa. In such cases it is recommended that some tests of the force field be performed via comparison with QM data (see below).

Of the biomolecular force fields, AMBER [21] is considered to be transferable, whereas academic CHARMM [20] is not transferable. Considering the simplistic form of the potential energy functions used in these force fields, the extent of transferability should be considered to be minimal, as has been shown recently [52]. As stated above, the user should perform suitable tests on any novel compounds to ensure that the force field is treating the systems of interest with sufficient accuracy.

Another important applicability decision is whether the force field will be used for gas-phase (i.e., vacuum) or condensed phase (e.g., in solution, in a membrane, or in the crystal environment) computations. Owing to a combination of limitations associated with available condensed phase data and computational resources, the majority of force fields prior to 1990 were designed for gas-phase calculations. With small-molecule force fields this resulted in relatively little emphasis being placed on the accurate treatment of the external interaction terms in the force fields. In the case of the biomolecular force fields designed to be used in vacuum via implicit treatment of the solvent environment, such as the CHARMM Param 19 [6,23] and AMBER force fields [22], care was taken in the optimization of charges to be consistent with the use of an R-dependent dielectric constant. The first concerted effort to rigorously model condensed phase properties was with the OPLS force field [53]. Those efforts were based on the explicit use of pure solvent and aqueous phase computations to calculate experimentally accessible thermodynamic properties. The external parameters were then optimized to maximize the agreement between the calculated and experimental thermodynamic properties. This very successful approach is the basis for the optimization procedures used in the majority of force fields currently being developed and used for condensed phase simulations.

Although while a number of additional philosophical considerations with respect to force fields could be discussed, presentation of parameter optimization methods in the remainder of this section will include philosophical considerations. It is worth reemphasizing the empirical nature of force fields, which leads to the creators of different ones having a significant impact on the quality of the resulting force field even when exactly the same form of potential energy function is being used. This is in large part due to the extensive nature of parameter space. Because of the large number of different individual parameters in a force field, an extensive amount of correlation exists between those parameters. Thus, a number of different combinations of parameters could reproduce a given set of target data. Although additional target data can partially overcome this problem, it cannot eliminate it, making the parameter optimization approach central to the ultimate quality of the force field. It should be emphasized that even though efforts have been made to automate parametrization procedures [54,55], a significant amount of manual intervention is generally required during parameter optimization.

B. Optimization Procedures Used in Empirical Force Fields

Knowledge of the approaches and target data used in the optimization of an empirical force field aids in the selection of the appropriate force field for a given study and acts

18

MacKerell

as the basis for extending a force field to allow for its use with new compounds (see below). In this section some of the general considerations that are involved during the development of a force field are presented, followed by a more detailed description of the parameter optimization procedure.

Presented in Table 1 is a list of the parameters in Eqs. (2) and (3) and the type of target data used for their optimization. The information in Table 1 is separated into categories associated with those parameters. It should be noted that separation into the different categories represents a simplification; in practice there is extensive correlation between the different parameters, as discussed above; for example, changes in bond parameters that affect the geometry may also have an influence on ∆Gsolvation for a given model compound. These correlations require that parameter optimization protocols include iterative approaches, as will be discussed below.

Internal parameters are generally optimized with respect to the geometries, vibrational spectra, and conformational energetics of selected model compounds. The equilib-

Table 1 Types and Sources of Target Data Used in the Optimization of Empirical Force Field Parameters

Term

Target data

Source

 

 

 

Internal

 

 

Equilibrium terms, multi-

Geometries

QM, electron diffraction, mi-

plicity, and phase (b0, θ0,

 

crowave, crystal survey

n, δ)

 

 

Force constants (Kb, Kθ,

Vibrational spectra,

QM, IR, Raman

Kχ)

 

 

 

Conformational properties

QM, IR, NMR, crystal survey

External

 

 

VDW terms (εi , Rmin,i)

Pure solvent properties [56]

Vapor pressure, calorimetry,

 

(Hvaporization, molecular vol-

densities

 

ume)

 

 

Crystal properties

X-ray and neutron diffraction,

 

(Hsublimation [56] lattice pa-

vapor pressure, calorimetry

 

rameters, non-bond dis-

 

 

tances)

 

 

Interaction energies

QM, microwave, mass spectro-

 

(dimers, rare gas–model

metry

 

compound, water–model

 

 

compound)

 

Atomic charges (qi )

Dipole moments [57]

QM, dielectric permittivity,

 

 

Stark effect, microwave

 

Electrostatic potentials

QM

 

Interaction energies

QM, microwave, mass spectro-

 

(dimers, water–model com-

metry

 

pound)

 

 

Aqueous solution

Calorimetry, volume varia-

 

(Gsolvation, Hsolvation, partial

tions

 

molar volume [58])

 

 

 

 

QM quantum mechanics; IR infrared spectroscopy.

Atomistic Models and Force Fields

19

rium bond lengths and angles and the dihedral multiplicity and phase are often optimized to reproduce gas-phase geometric data such as those obtained from QM, electron diffraction, or microwave experiments. Such data, however, may have limitations when they are used in the optimization of parameters for condensed phase simulations. For example, it has been shown that the internal geometry of N-methylacetamide (NMA), a model for the peptide bond in proteins, is significantly influenced by the environment [59]. Therefore, a force field that is being developed for condensed phase simulations should be optimized to reproduce condensed phase geometries rather than gas-phase values [20]. This is necessary because the form of the potential energy function does not allow for subtle changes in geometries and other phenomena that occur upon going from the gas phase to the condensed phase to be reproduced by the force field. The use of geometric data from a survey of the Cambridge Crystal Database (CSD) [60] can be useful in this regard. Geometries from individual crystal structures can be influenced by non-bond interactions in the crystal, especially when ions are present. Use of geometric data from a survey overcomes this limitation by averaging over a large number of crystal structures, yielding condensed phase geometric data that are not biased by interactions specific to a single crystal. Finally, QM calculations can be performed in the presence of water molecules or with a reaction field model to test whether condensed phase effects may have an influence on the obtained geometries [61].

Optimization of the internal force constants typically uses vibrational spectra and conformational energetics as the primary target data. Vibrational spectra, which comprise the individual frequencies and their assignments, dominate the optimization of the bond and angle force constants. It must be emphasized that both the frequencies and assignments should be accurately reproduced by the force field to ensure that the proper molecular distortions are associated with the correct frequencies. To attain this goal it is important to have proper assignments from the experimental data, often based on isotopic substitution. One way to supplement the assignment data is to use QM-calculated spectra from which detailed assignments in the form of potential energy distributions (PEDs) can be obtained [62]. Once the frequencies and their assignments are known, the force constants can be adjusted to reproduce these values. It should be noted that selected dihedral force constants will be optimized to reproduce conformational energetics, often at the expense of sacrificing the quality of the vibrational spectra. For example, with ethane it is necessary to overestimate the frequency of the CEC torsional rotation in order to accurately reproduce the barrier to rotation [63]. This discrepancy emphasizes the need to take into account barrier heights as well as the relative conformational energies of minima, especially in cases when the force field is to be used in MD simulation studies where there is a significant probability of sampling regions of conformational surfaces with relatively high energies. As discussed with respect to geometries, the environment can have a significant influence on both the vibrational spectra and the conformational energetics. Examples include the vibrational spectra of NMA [20] and the conformational energetics of dimethylphosphate [64], a model compound used for the parametrization of oligonucleotides. Increasing the size of the model compound used to generate the target data may also influence the final parameters. An example of this is the use of the alanine dipeptide to model the protein backbone versus a larger compound such as the alanine tetrapeptide [65].

Optimization of external parameters tends to be more difficult as the quantity of the target data is decreased relative to the number of parameters to be optimized compared to the internal parameters, leaving the solution more undetermined. This increases the

20

MacKerell

problems associated with parameter correlation, thereby limiting the ability to apply automated parameter optimization algorithms. An example of the parameter correlation problem with van der Waals parameters is presented in Table 2, where pure solvent properties for ethane using three different sets of parameters are presented (AD MacKerell Jr, M Karplus, unpublished work). As may be seen, all three sets of LJ parameters presented in Table 2 yield heats of vaporization and molecular volumes in satisfactory agreement

˚

with the experimental data, in spite of the carbon Rmin varying by over 0.5 A among the three sets. The presence of parameter correlation is evident. As the carbon Rmin increases and ε values decrease, the hydrogen Rmin decreases and ε values increase. Thus, it is clear that special care needs to be taken during the optimization of the non-bond parameters to maximize agreement with experimental data while minimizing parameter correlation. Such efforts will yield a force field that is of the highest accuracy based on the most physically reasonable parameters.

Van der Waals or Lennard-Jones contributions to empirical force fields are generally considered to be of less importance than the electrostatic term in contributing to the nonbond interactions in biological molecules. This view, however, is not totally warranted. Studies have shown significant contributions from the VDW term to heats of vaporization of polar-neutral compounds, including over 50% of the mean interaction energies in liquid NMA [67], as well as in crystals of nucleic acid bases, where the VDW energy contributed between 52% and 65% of the mean interaction energies [18]. Furthermore, recent studies on alkanes have shown that VDW parameters have a significant impact on their calculated free energies of solvation [29,63]. Thus, proper optimization of VDW parameters is essential to the quality of a force field for condensed phase simulations of biomolecules.

Significant progress in the optimization of VDW parameters was associated with the development of the OPLS force field [53]. In those efforts the approach of using Monte Carlo calculations on pure solvents to compute heats of vaporization and molecular volumes and then using that information to refine the VDW parameters was first developed and applied. Subsequently, developers of other force fields have used this same approach for optimization of biomolecular force fields [20,21]. Van der Waals parameters may also be optimized based on calculated heats of sublimation of crystals [68], as has been done for the optimization of some of the VDW parameters in the nucleic acid bases [18]. Alternative approaches to optimizing VDW parameters have been based primarily on the use of QM data. Quantum mechanical data contains detailed information on the electron distribution around a molecule, which, in principle, should be useful for the optimization of VDW

Table 2 Ethane Experimental and Calculated Pure Solvent Propertiesa

Lennard Jones parametersb

Carbon

Hydrogen

Heat of vaporizationc

Molecular volume

3.60/0.190

3.02/0.0085

3.50

90.7

4.00/0.080

2.71/0.0230

3.48

90.9

4.12/0.080

2.64/0.0220

3.49

91.8

Experiment

 

3.56

91.5

 

 

 

 

aCalculations performed using MC BOSS [66] with the CHARMM combination rules. Partial atomic charges (C 0.27 and H 0.09) were identical for all three simulations.

bLennard-Jones parameters are Rmin/ε in angstroms and kilocalories per mole, respectively.

cHeat of vaporization in kilocalories per mole and molecule volume in cubic angstroms at 89°C [56].

Atomistic Models and Force Fields

21

parameters [12]. In practice, however, limitations in the ability of QM approaches to accurately treat dispersion interactions [69–71] make VDW parameters derived solely from QM data yield condensed phase properties in poor agreement with experiment [72,73]. Recent work has combined the reproduction of experimental properties with QM data to optimize VDW parameters while minimizing problems associated with parameter correlation. In that study QM data for helium and neon atoms interacting with alkanes were used to obtain the relative values of the VDW parameters while the reproduction of pure solvent properties was used to determine their absolute values, yielding good agreement for both pure solvent properties and free energies of aqueous solvation [63]. The reproduction of both experimental pure solvent and free energies of aqueous solvation has also been used to derive improved parameters [29]. From these studies it is evident that optimization of the VDW parameters is one of the most difficult aspects of force field optimization but also of significant importance for producing well-behaved force fields.

Development of models to treat electrostatic interactions between molecules represents one of the most central, and best studied, areas in force field development. For biological molecules, the computational limitations discussed above have led to the use of the Coulombic model included in Eq. (3). Despite its simplistic form, the volume of work done on the optimization of partial atomic charges, as well as the appropriate dielectric constant, has been huge. The present discussion is limited to currently applied approaches to the optimization of partial atomic charges. These approaches are all dominated by the reproduction of target data from QM calculations, although the target data can be supplemented with experimental data on interaction energies and orientations and molecular dipole moments when such data are available.

Method 1 is based on optimizing partial atomic charges to reproduce the electrostatic potential (ESP) around a molecule determined via QM calculations. Programs are available to perform this operation [74,75], and some of these methodologies have been incorporated into the GAUSSIAN suite of programs [76]. A variation of the method, in which the charges on atoms with minimal solvent accessibility are restrained, termed RESP [77,78], has been developed and is the basis for the partial atomic charges used in the 1995 AMBER force field. The goal of the ESP approach is to produce partial atomic charges that reproduce the electrostatic field created by the molecule. The limitation of this approach is that the polarization effect associated with the condensed phase environment is not explicitly included, although the tendency for the HF/6-31G* QM level of theory to overestimate dipole moments has been suggested to account for this deficiency. In addition, experimental dipole moments can be included in the charge-fitting procedure. An alternative method, used in the OPLS, MMFF, and CHARMM force fields, is to base the partial atomic charges on the reproduction of minimum interaction energies and distances between small-mole- cule dimers and small molecule–water interacting pairs determined from QM calculations [6,53]. In this approach a series of small molecule–water (monohydrate) complexes are subjected to QM calculations for different idealized interactions. The resulting minimum interaction energies and geometries, along with available dipole moments, are then used as the target data for the optimization of the partial atomic charges. Application of this approach in combination with pure solvent and aqueous solvent simulations has yielded offsets and scale factors that allow for the production of charges that yield reasonable condensed phase properties [67,79]. Advantages of this method are that the use of the monohydrates in the QM calculations allows for local electronic polarization to occur at the different interacting sites, and the use of the scale factors accounts for the multibody electronic polarization contributions that are not included explicitly in Eq. (3).