Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

98

Darden

Realistic models of proteins or other macromolecules in solution must include some description of the bulk solvent environment. Ideally this would be an infinite bath of water including the appropriate salt concentration. Unfortunately, current simulations are limited to 106 atoms or less, which is not sufficient to model bulk behavior. Thus the connection to bulk solvation is implemented through boundary conditions. As shown above, the longrange nature of Coulombic interactions could lead to confusion even if we had ideal bulk solvent present in the simulation, and thus it is plausible that the choice of boundary conditions can have a nonnegligible effect on the results of a simulation. A second goal of this chapter is to introduce the common choices of boundary conditions and to discuss what is currently known about the nature and size of artifacts due to long-range electrostatics under various boundary conditions.

II. CONTINUUM BOUNDARY CONDITIONS

In continuum boundary conditions the protein or other macromolecule is treated as a macroscopic body surrounded by a featureless continuum representing the solvent. The internal forces of the protein are described by using the standard force field including the Coulombic interactions in Eq. (6), whereas the forces due to the presence of the continuum solvent are described by solvation terms derived from macroscopic electrostatics and fluid dynamics.

Due to limitations in computer power, early protein and DNA simulations [10,11] used a particularly simple variant of this approach. The effect of the missing solvent is approximated by using an effective dielectric function: The electrostatic energy U q1q2/ r between two charges q1 and q2 is replaced by U q1q2/[ε(r)r], and the electrostatic forces are obtained by differentiating the energy. The earliest implementation used the simple choice ε(r) r, whereas later variants [12,13] employed a sigmoidal form in which ε(r) approached 1 as r 0 whereas ε(r) 78, the dielectric constant of bulk water for

˚

r 20 A. The use of an effective dielectric causes interactions between distant charges to be screened heavily while neighboring charge pairs experience nearly the full Coulombic interaction, thus approximating the dielectric screening of charge interactions in water.

Although the use of effective dielectrics could account approximately for the dielectric screening of charge pairs in water, it failed to account for the tendency of charged and polar residues to hydrogen bond with water molecules. This led to excessive hydrogen bonding between charged and polar groups on the surface of proteins. One important consequence was that the relative free energies between conformations of a peptide were found to be artifactually affected [14]. A more dramatic consequence was that this methodology together with existing force fields failed to distinguish between correctly and incorrectly folded proteins [15]. Eisenberg and McLachlan [16] and later Ooi et al. [17] proposed surface area based self-energy terms to model the tendency of a charged or polar group to be exposed to solvent. Later Still et al. [18] proposed the generalized Born method, a computationally tractable electrostatic model that simultaneously accounts for dielectric screening of charge pairs as well as for the self-energy of charged and polar groups in the presence of a dielectric. These developments are described in Chapter 7.

Within the continuum approximation, a rigorous approach to the electrostatic free energy of proteins and other biomolecules in solution is provided by the Poisson equation, Eq. (12). In this approach a protein in water is modeled as a low dielectric region carrying a fixed charge distribution and surrounded by a high dielectric region. The boundary between the two regions is defined by a molecular surface analogous to the solvent-accessible

Long-Range Forces and Potential

99

surface of the protein. The atomic radii used to define the boundary are similar to the initial Born radii used in the generalized Born approach and likewise are fit to reproduce experimental solvation energies. Although analytic solutions to Eq. (12) exist for certain special cases, in general the equation must be solved numerically. Because solutions to Eq. (12) are needed for later discussion on the influence of boundary conditions in simulations, we briefly discuss this topic.

A variety of algorithms, including finite difference, boundary element, finite element, and multigrid, have been implemented for solving Eq. (12) for biomolecules. For example, the boundary element methods are based on the application of Gauss’ law to dielectric boundaries. The charges in a solute molecule induce a surface charge distribution at the dielectric boundary. From Gauss’ law one can show that the surface charge density σpol at the boundary between the solute and solvent is given by

σpol(r)

1

1

1

E(r) n(r)

(14)

4π

ε

where r is a point on the boundary, E(r) the electric field at r, and n the unit normal vector to the surface there. The electric field depends on the charges of the solute as well as on the other induced surface charge elements (including a self term). Thus this equation must be solved iteratively. The precision of the solution depends on the accuracy of the surface representation. The boundary element approach generalizes easily to more complex descriptions of the solute charge density such as polarizable dipoles or quantum chemical charge densities [19]. Also, because at each iteration the electric field due to a collection of point charges (fixed and induced) is needed, it is straightforward to adapt algorithms for rapid calculation of electrostatic sums to improve the performance of the boundary element method. For example, Bharadwaj et al. [20] combine the fast multipole method with their implementation of the boundary element method to arrive at an algorithm that should be optimal for large systems.

The finite difference approach has proved to be the most popular algorithm for solving the Poisson equation. In this approach the molecule is placed inside a cell representing the solvent bath. The cube is then divided into a fine grid (usually the grid size

˚

must be 0.5 A or finer for precision). The Poisson partial differential equation, Eq. (12), is discretized on the grid. The atomic charges are distributed over the neighboring grid points as a sampled charge density ρ, and the dielectric constant is interpolated near the dielectric boundary between solute and solvent. Finally, the values of the potential φ at the boundary of the containing cell must be assigned. These boundary values are usually not known a priori, but the boundary is assumed to be sufficiently distant that the details of potential assignment there do not affect the resulting potential at the molecule. One choice for a boundary potential is to use a sum of screened electrostatic potentials due to the solute charges. A modification is to use a succession of calculations, where a firstpass calculation of the potential on the grid is performed using some estimate for the potential at the boundary of the cell. A second, smaller cell, still containing the molecule of interest, is then regridded. The potential obtained at the boundary of this smaller cell from the first-pass solution is taken as the boundary condition for a second-pass solution. This can be iterated to a third pass, and so on, but in practice the potential in the interior of the cell is found to converge rapidly. The discretization of the differential equation (12) results in a linear system of equations that must be solved iteratively. Efficient iterative schemes have been developed [21].

100

Darden

The continuum treatment of electrostatics can also model salt effects by generalizing the Poisson equation (12) to the Poisson–Boltzmann equation. The finite difference approach to solving Eq. (12) extends naturally to treating the Poisson–Boltzmann equation [21], and the boundary element method can be extended as well [19].

III. FINITE BOUNDARY CONDITIONS

In finite boundary conditions the solute molecule is surrounded by a finite layer of explicit solvent. The missing bulk solvent is modeled by some form of boundary potential at the vacuum/solvent interface. A host of such potentials have been proposed, from the simple spherical half-harmonic potential, which models a hydrophobic container [22], to stochastic boundary conditions [23], which surround the finite system with shells of particles obeying simplified dynamics, and finally to the Beglov and Roux spherical solvent boundary potential [24], which approximates the exact potential of mean force due to the bulk solvent by a superposition of physically motivated terms.

The electrostatic effect of the missing bulk is usually approximated by dielectric continuum theory. The finite system including the layer of explicit solvent is treated as a low dielectric region embedded in a high dielectric continuum. The electrostatic potential at an atom is given by the solution of the Poisson equation, Eq. (12). Although this equation can be solved numerically as discussed above, for simulations a more efficient treatment is necessary. If the finite system is spherical, the reaction potential due to the continuum can be expanded in a series involving the total charge, dipole, quadrupole, and higher order multipoles of the system. The reaction potential is approximated by keeping a finite number of terms [24]. Another approximation is the image approximation [25], in which the multipole series is rearranged and the leading term in the rearranged series is identified as the potential due to image charges whose positions are defined in terms of the original charge positions. Just as in the continuum treatment of solvents, the reaction potential is sensitive to the distance between the system charges and the dielectric boundary. Thus it is important to ensure that charges do not approach this boundary during the simulation.

Although the number of atoms in a macromolecular simulation under finite boundary conditions is less than under periodic boundary conditions, straightforward evaluation of the Coulomb sum, Eq. (6), will still prove prohibitively expensive for large proteins in solution. One simple approach to reduce the cost is the ‘‘twin-range’’ approach [26]. In this method the more distant interactions are simply calculated less often (e.g., every M steps), and their effect is stored in memory to be applied as a constant force (or preferably all at once as an impulse [27]). The reasoning is that the step-to-step variations in the positions of distant charges have only a small relative effect on the potential due to them. Although this approach alleviates the problem for modest system sizes, it does not eliminate the basic order N2 nature of the Coulomb sum and thus will not work for large systems.

Another approach to reducing the cost of Coulombic interactions is to treat neighboring interactions explicitly while approximating distant interactions by a multipole expansion. In Figure 1a the group of charges q(1), q(2) , . . . , q(K) at positions r(k) (r(1k), r(2k), r(3k)), k 1, . . . , K, are all close to the point b (b1, b2, b3), so that their distances |r(k) b|, k 1, . . . , K, are all small compared to |b r|. Then the electrostatic potential due to q(1), q(2), . . . , q(K) evaluated at the point r (r1, r2, r3) can be approximated by a multipole expansion about the point b. For example, the potential due to charge q(1)

Long-Range Forces and Potential

101

Figure 1 (a) Multipole expansion of the potential at r due to charges near b. Assume that |r(j) b| |b r| for j 1, . . . , k. (b) Taylor expansion of the electrostatic potential at r about a. Assume |r a| |r(j) b| |b r| for j 1, . . . , k.

evaluated at r is q(1)/|r(1) r| q(1)/|r(1) b b r|. Since |r(1) b| is small compared to |b r|, the potential due to q(1) can be expanded as a Taylor series. A first-order (dipole) approximation about b would be

 

q(1)

 

 

q(1)

 

 

3

q(1)

(ri(1) bi)(bi ri)

 

 

 

 

 

(15)

 

|r(1) r|

 

|b r|

 

 

 

|b r|3

 

 

 

 

 

 

 

 

 

 

 

 

i 1

 

 

 

 

 

 

 

whereas a second-order (quadrupole) approximation would be

 

 

q(1)

 

 

q(1)

 

 

3

q(1)

(ri(1) bi)(bi ri)

 

 

 

 

 

 

 

|r(1) r|

 

|b r|

 

 

 

|b r|3

 

 

 

 

 

 

 

 

 

 

 

 

 

i 1

 

 

 

 

 

 

 

 

 

 

 

 

 

3

3

 

3

q(1)

(r

(1)i bi)(rj(1) bj)(bi ri)(bj rj)

 

 

 

 

 

 

 

(16)

 

 

 

 

 

2

 

 

 

 

 

|b r|5

 

 

 

 

 

 

 

 

i 1

 

j 1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

3

 

q(1) (r(1)i

bi)2

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

|b r|3

 

 

 

 

 

 

 

 

 

i 1

 

 

 

 

 

 

 

 

 

 

 

102

Darden

Third-order (octupole) or higher order multipole approximations can be employed for more accuracy.

Using the superposition principle, the second-order (quadrupole) approximation to the potential due to q(1), q(2), . . . , q(K) evaluated at r is obtained by simply summing terms over the charges q(k), k 1, . . . , K. The result is

K

 

 

Q

 

 

 

 

3

di(bi ri)

 

 

 

 

 

 

 

q(k)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

|r(k) r|

|b r|

|b r|3

 

 

 

 

 

 

k 1

 

 

 

 

 

 

 

i 1

 

 

 

 

 

 

 

 

 

 

 

 

3

 

3

 

3

 

Θij(bi ri)(bj rj)

 

1

3

Θii

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(17)

 

 

 

2

 

 

|b r|5

2

|b r|3

 

 

 

 

 

 

i 1

j 1

 

 

 

 

 

 

i 1

 

 

 

where Q Kk 1 q(k) is the total charge of the group, d its dipole moment, with di Kk 1 q(k) (r(ik) bi), and its quadrupole moment, with Θij Kk 1 q(k) (r(ik) bi) (r(jk)

bj).

A straightforward application of this multiple approximation is as follows. The simulation cell containing the molecule plus solvent is divided into subcells of similar size. At the start of the electrostatic calculation, the total charge, dipole moment, and quadrupole moment of each subcell are calculated. Each subcell is surrounded by up to 26 adjacent subcells. The electrostatic potential at position r is approximated by first calculating exactly the potential at r due to charges in the same or adjacent subcells. The potential due to charges in a nonadjacent subcell is approximated using Eq. (17), and these approximate potentials are summed over the nonadjacent subcells b. Because the number of subcells is proportional to the number of charges, this method to obtain the potential at r is still an order N algorithm, and thus the calculation of the energy by summing charge times potential as in Eq. (6) is of order N2. However, it can be made fast in comparison with the direct sum over all charge pairs by tuning the division into subcells so that the number of calculations on the right-hand side of Eq. (17) is substantially less than that on the left.

A further improvement can be seen for the situation depicted in Figure 1b. Let φb (r) denote the potential due to the charges in the cell about point b, evaluated at the point r. Let a be the center of the subcell containing q. Then φb (r) can be approximated by a second-order Taylor expansion about a:

3

φb(r) φb(a) ∂φb (a)(ri ai) (18)

i 1 ri

33

1 2 φb (a)(ri ai)(rj aj)

2 i 1 j 1 rirj

The electrostatic potential at r due to nonadjacent cells can be approximated by summing

the second-order Taylor expansion, Eq. (18), for φb (r) over all nonadjacent cells b. Thus φb (a), as well as ∂φb/ri(a) and 2φb/riri(a) for i, j 1, 2, 3, are summed over all b

to get the coefficients for the Taylor expansion, which is then used to approximate the potential at all the points in the cell centered at a. This offers improved speed at the cost of a further approximation.

The algorithm outlined above is a level 1 cell multipole or Cartesian multipole algorithm [28]. A number of modifications are possible. Accuracy can be raised by using higher order expansions, which unfortunately are more expensive. The cost can be alleviated by

Long-Range Forces and Potential

103

using a ‘‘twin-range’’ approach as above, calculating the Taylor coefficients only every M steps and approximating them as constants in Eq. (18) during intermediate steps. Instead of Cartesian multipole expansions, an expansion in spherical harmonics can be performed [29]. This is more difficult to program but becomes more efficient with high-order approximations, because fewer terms are needed in the approximation at a given order. Another strategy to improve accuracy [30] is to use the exact potential due to nonadjacent cells and its derivatives evaluated at a in the Taylor series approximation for charges in subcell a.

All of the above algorithms are of order N2, i.e., their cost grows with the square of the system size, and hence they are inefficient for large systems. The key to further improvements is to realize that more distant charges can be grouped into larger subcells, because, referring to Figure 1a, the important parameter in the approximation (16) is the ratio |r(1) b|/|b r|. This insight inspired the tree codes, culminating in the fast multipole algorithm (FMA) [29]. In these the initial cell is divided into eight child cells, called level 1 cells. These in turn are divided into eight child cells each, yielding 64 level 2 cells, and so on, down to the Lth level containing 8L subcells. The algorithm is described by a two-pass procedure. In the upward pass, multipole expansions out to some order are calculated for each subcell at each level, beginning with the lowest level L and proceeding upward.

The multipole expansion of a subcell can be calculated by using the expansions of its eight child cells using translation operators. The second or downward pass begins at level 2. For each of the 64 level 2 cells, the Taylor expansion due to nonadjacent level 2 cells is calculated as above at its center. For the level 3 cells, the Taylor expansion due to nonadjacent level 3 cells can be calculated more efficiently. First the Taylor expansion of the cell’s level 2 parent cell is translated to its center using a translation operator. This accounts for the potential due to all nonadjacent level 3 cells except for those cells whose level 2 parents are adjacent to the level 2 parent of the cell in question. The Taylor expansion for these level 3 cells is calculated as above and added to the translated level 2 Taylor expansion. This process continues down to the lowest level L. After this downward pass, the Taylor expansion at each level L cell is available. The potential at a charge is then approximated as above, a sum of direct interactions due to charges in the same or adjacent level L cell, plus a Taylor expansion to approximate the potential due to charges in nonadjacent level L cells. A clear description of the algorithm, for more advanced readers, is given by White and Head-Gordon [31].

The FMA, using the above tree recursion, is a very general approach to reducing the cost of electrostatic sums. Using it, the cost of calculating the energy and forces for a system of N charges is an order N operation, i.e., its cost grows linearly with system size. The cost depends on the order of the multipole expansion as well as the level of the tree. The greater the tree depth L, the smaller the ultimate subcells are, which in turn lowers the number of interaction pairs to be calculated explicitly. However, as L increases, more work is performed in calculating the Taylor expansion of the long-range interactions. The accuracy depends mainly on the order of the expansion.

Since empirical force fields do not accurately estimate the true interatomic forces, it is difficult a priori to say how accurate the fast multipole approximation to the exact Coulomb potential and forces (exact in terms of the sum over partial charges) should be. Probably a good rule is to make sure that at each atom the approximate electrostatic force is within a few percent relative error of the true electrostatic force, obtained by explicitly

| ˜ | | |

summing over all atom pairs, i.e., Fi Fi 0.05 Fi , for all atoms i, where Fi is the