Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

9

Free Energy Calculations

Thomas Simonson

Centre National de la Recherche Scientifique, Strasbourg, France

I.INTRODUCTION

Sir Isaac Newton spent much of his life pursuing an elusive dream, the transmutation of base materials* into gold. Though he was not successful during his lifetime, he did manage to discover the equations of motion that, three centuries later, make alchemy possible on a computer. To perform this feat, Newton’s equations need only be supplemented by the modern technology of free energy simulations.

The calculation of free energy differences is one of the most interesting applications of biomolecular simulations. Indeed, free energy calculations using molecular dynamics or Monte Carlo simulations provide a direct link between the microscopic structure and fluctuations of a system and its most important equilibrium thermodynamic property, the free energy. The earliest biological applications were calculations of changes in binding constants associated with chemical changes in inhibitors, substrates, and/or their protein targets [1–3]. Since then, many of the early difficulties have been resolved and the methodology has considerably matured. The basic theory has been described in several monographs and reviews [4–8], and applications to biological macromolecules have been reviewed [9–12].

The method relies on two fortunate circumstances. First, the free energy is a state function, which does not depend on the manner in which a particular equilibrium state is reached or prepared. Second, the energy function can be modified and manipulated with enormous flexibility in computer simulations, allowing a known system to be transformed into a wide range of other systems of interest with relative ease. For example, by gradually changing a few non-bonded and/or stereochemical force field parameters, a protein side chain can be alchemically ‘‘mutated’’ from one residue type into another in the course of a simulation. By proceeding slowly and reversibly, one can estimate, in principle, the resulting free energy change. Repeating the process, one can obtain the relative free energies of binding of a ligand to variants of a given protein or of a series of similar ligands to the same protein. This provides a potential route for rational protein engineering and for lead improvement in drug design. Applications in which charged groups are introduced

* It is not known whether he experimented with assistant professors.

169

170

Simonson

or removed can provide detailed insights into the electrostatic properties of biological macromolecules, including mechanisms of transition state stabilization by enzymes [2,9]. Such calculations, in which a molecule is chemically changed in a way that is often impossible to accomplish experimentally, are referred to as ‘‘alchemical’’ free energy calculations.

The second major class of biological applications are calculations of free energy changes due to conformational rearrangements. Indeed, many biological macromolecules possess several distinct conformations that have functional relevance, such as the R and T forms of allosteric enzymes or the B and Z forms of DNA. The statistical weights, or relative probabilities, of these forms are determined by their relative free energies. By introducing suitable conformational restraints (or constraints) in the energy function, a biomolecule can be driven reversibly from one conformation into another and the associated free energy change obtained; recent applications include folding–unfolding studies of small proteins [13].

For reasons of space and because of their prime importance, we focus here on free energy calculations based on detailed molecular dynamics (MD) or Monte Carlo (MC) simulations. However, several other computational approaches exist to calculate free energies, including continuum dielectric models and integral equation methods [4,14].

To obtain reliable free energy estimates from simulations, sufficient conformational sampling must be achieved, not only of the starting and final states but also of many (usually less interesting) intermediate states. Because of the ruggedness of protein and nucleic acid energy surfaces, this represents a considerable challenge. Many techniques to enhance sampling have been proposed, and others are being developed. The most relevant are described below; others are covered elsewhere in this book. Accurate calculation of electrostatic free energies also requires that long-range forces be included, using specialized techniques such as continuum reaction field or Ewald summation methods [15]; these must be incorporated into the free energy formalism.

Section II introduces the use of thermodynamic cycles and covers the basic theoretical and technical aspects of free energy calculations. Section III is then devoted to standard binding free energies. Section IV describes the theoretical basis of conformational free energy calculations. ‘‘Electrostatic’’ free energy calculations, i.e., those associated with charge insertion and deletion are described in Section V. Section VI then describes techniques to enhance sampling in alchemical or conformational free energy calculations. Finally, Section VII briefly discusses directions for future development.

II. GENERAL BACKGROUND

A. Thermodynamic Cycles for Solvation and Binding

To describe solvation and binding of one or more ligands and their receptors it is convenient to introduce the thermodynamic cycles shown in Figure 1 [16]. Figure 1a describes the vapor water transfer of two solutes S and Sas well as their binding to a receptor R (e.g., a protein). Figure 1b describes the binding of a ligand S to a native receptor R and a mutant receptor R. In each case, vertical legs correspond to processes that can usually be studied experimentally (solvation or binding), and horizontal legs correspond to a chemical transmutation of the ligand or protein that usually cannot be performed experimentally. The vertical legs can often be accomplished in a simulation, particularly if the solutes S, Sare not too large; however, the horizontal legs usually involve smaller

Free Energy Calculations

171

Figure 1 Thermodynamic cycles for solvation and binding. (a) Solutes S and Sin the gas phase

(g) and solution (w) and bound to the receptor R in solution. (b) Binding of S to the receptors R and R. The oblique arrows on the left remove S to the gas phase, then transfer it to its binding site on R. This pathway allows the calculation of absolute binding free energies.

structural changes and are more straightforward (see below). Since the free energy F is a state function, the horizontal and vertical legs both provide routes to the solvation free energy difference between S and S, for example:

∆∆Fsolv Fsolv(S) Fsolv(S) Fw(S S) Fg(S S)

or to the difference in binding free energies of S and Sto R, i.e.,

∆∆Fbind Fbind(S) Fbind(S) Fc(S S) Fw(S S)

(the notations are defined in Figure 1); similarly for the binding free energy difference associated with the R Rchange. Experimental numbers will usually be obtained from the vertical legs, whereas simulation numbers will often come from the horizontal legs. Precise comparison between free energy simulations and experimental binding constants or partition coefficients requires that the simulations mimic conditions that have a simple relationship to the experimental standard states, as discussed below.

172

Simonson

In addition to relative binding or solvation free energies, the left side of Figure 1b shows a pathway (oblique arrows) that can be used to compute absolute binding free energies [17]. The solute S is first removed from solution to the gas phase; this is accomplished by scaling the solute–water interactions gradually to zero in a simulation. Next, the solute is moved from the gas phase into the protein binding site by gradually scaling its interactions from 0 to 1 in a simulation of the protein–solute complex. The removal of S to and from the gas phase is computationally much less expensive than a simulation of the gradual association of R and S in solution. Simulating reversible association would require starting out with the receptor and ligand separated by tens of angstroms in a very large box of water and moving the ligand toward the receptor very slowly, so as to minimize departure of the system from equilibrium.

B. Thermodynamic Perturbation Theory

Free energy calculations rely on the following thermodynamic perturbation theory [6–8]. Consider a system A described by the energy function EA UA TA. UA UA(r N) is the potential energy, which depends on the coordinates r N (r1, r2, . . . , rN), and TA is the kinetic energy, which (in a Cartesian coordinate system) depends on the velocities vN. For concreteness, the system could be made up of a biomolecule in solution. We limit ourselves (mostly) to a classical mechanical description for simplicity and reasons of space. In the canonical thermodynamic ensemble (constant N, volume V, temperature T ), the classical partition function ZA is proportional to the configurational integral QA, which in a Cartesian coordinate system is

QA(N, V, T ) ∫ exp

UA(r N)

drN

(1)

kT

where k is Boltzmann’s constant, dr N dr1dr2 drN, and the integral is over all possible conformations (of both the biomolecule and the solvent). The absolute Helmholtz free energy is

FA kT ln ZA kT ln QA c(N, V, T )

(2)

where c(N, V, T ) is a constant arising from the velocity portion of ZA. The canonical ensemble is not always the most relevant experimentally. In the isothermal-isobaric (N, p, T ) ensemble, QA(N, V, T ) is replaced by

QA(N, p, T) ∫ exp

UA pV

dr NdV

(3)

kT

and the Helmholtz free energy is replaced by the Gibbs free energy GA kT ln QA(N, p, T) c(N, p, T ).

The Helmholtz free energy can be rearranged to read

V N/3∫ exp( UA/kT )dr

c(N, V, T )

(4)

FA kT ln∫ exp(UA/kT ) exp( UA/kT)dr

kT ln exp(UA/kT ) A c(N, V, T) kT ln V N/3

where the brackets A indicate an average over the ensemble of system A and V is the volume. Although in theory the average could be obtained from an MD or MC simulation

Free Energy Calculations

173

of A, in practice the quantity to be averaged, exp(UA/kT), is largest where the Boltzmann weight exp( UA/kT) is smallest, and vice versa. Therefore, a reliable average cannot be obtained, and absolute free energies cannot normally be calculated from a simulation.

Practical calculations always consider differences between two or more similar systems. Suppose we effect a change in the system such that the potential energy function is changed into

UB UA VBA

(5)

where VBA denotes an additional, ‘‘perturbing,’’ potential energy term. The free energy difference between A and B is

FB FA kT ln ZB kT ln QB

ZA QA

which can be rearranged to give [18,19]

FB FA kT lnexp( UA/kT ) exp( VBA/kT)drN exp( UA/kT)drN

or

FB FA kT ln exp VBA kT A

(6)

(7)

(8)

The brackets on the right indicate an average over the ensemble of the starting system A, i.e., with Boltzmann weights exp( UA/kT ). The last equality also holds for the Gibbs free energy in the (N, p, T ) ensemble. Equation (8) leads to practical computation schemes, since the required ensemble average can be calculated (in favorable cases) from a simulation of the starting system, A. Equation (8), which is exact, is nevertheless referred to as a free energy perturbation formula, because it connects the perturbed system B to the reference system A. Changes in the temperature or pressure can be treated as perturbations in the same sense as above, and perturbation formulas analogous to Eq. (8) can easily be derived. Equation (8) also holds for a quantum system if the perturbing Hamiltonian VBA commutes with the Hamiltonian of A (a common situation).

Equations (5)–(8) assume that the energy functions UA and UB operate on the same conformation space; i.e., A and B must have the same number N of degrees of freedom. In practice, this almost always implies that A and B have the same number of atoms or particles. Most biochemical changes of interest (e.g., point mutations of a protein) do not obey this requirement, but they can often be made to do so artificially through the use of dummy atoms (see below).

From the integral in the numerator on the right of Eq. (7), the conformations that contribute most to the free energy difference FB FA are those where VBA is large and negative and at the same time the Boltzmann factor exp( UA/kT) is large. If systems A and B are similar, they will occupy similar regions of conformation space; regions with large Boltzmann weights will thus have small |VBA|, while regions where VBA is large and negative will tend to have low Boltzmann weights. The regions important for the averaging process will result from a compromise between these two effects and typically lie near the ‘‘edge’’ of the regions sampled by system A; illustrative examples are given in Figure

174

Simonson

2. This suggests that only in cases where the perturbation energy VBA is very small on average (on the order of kT 1 kcal/mol) will Eq. (8) be directly applicable. More often, the transformation from A to B must be divided into several discrete steps, say n, to each of which corresponds a perturbation energy on the order of VBA/n, which can be made arbitrarily small by increasing n.

The free energy perturbation formula (8) can be expanded in powers of VBA, or equivalently in powers of 1/kT. Truncation at various orders gives free energy perturbation formulas in the true sense. Although high order terms are difficult to calculate because of sampling problems similar to those described above, expansions to low orders (1)–(4) are often more robust numerically than the original formula (8) and are especially useful for treating many small perturbations of a single reference system [20–22]. Becausee VBA/kT A has the form of a moment-generating function [23], the coefficients of the expansion involve the cumulants Cn of VBA :

Figure 2 (a) Mutation of argon into xenon in aqueous solution. Illustration of the averaging used to obtain VBA A [the first cumulant C1, Eqs. (9) and (13); adapted from Ref. 73]. The only solute– solvent interactions are van der Waals interactions between argon/xenon and the water oxygens (the hydrogen van der Waals radius is zero for the force field used here). The mutation thus consists in changing the van der Waals parameters of the solute from those of argon to those of xenon, and VBA UvdW (xenon) UvdW (argon), the difference between the solute–solvent interaction calculated with the argon and xenon van der Waals parameters. It is useful to write VBA i vBA (ri), where the summation is over all water molecules, vBA(ri) is the contribution of water molecule i, and ri its distance from the solute. To obtain, e.g., C1, averaging is performed over an argon–water simulation. For this simple mutation, VBA A depends only on the radial distribution of water oxygens around the solute. The density of water at a distance r from the solute is a function only of r, ρg(r), where ρ is the bulk water density and g(r) is known as the radial distribution function. One can show that VBA A 04πr2ρg(r)vBA(r). The functions g(r), vBA(r) (in kT units), and r2g(r)vBA(r) are shown in the plot. To accurately integrate the latter function, the reference state simulation must provide adequate sampling of the region near the function’s peak; i.e., the peak must overlap sufficiently with g(r). The sampling required to properly average e VBA/kT cannot be analyzed quite as simply here; however, analogous considerations apply; see (b). (b) Charge insertion on an atom in a protein; illustration of the averaging used in the perturbation formula [Eq. (8), upper panel] and in the second derivative of the free energy [Eq. (12), lower panel]. The data shown correspond to charge insertion on an α-carbon of cytochrome c. (Adapted from Ref. 22.) The perturbation energy is VBA q q, where q is the electrostatic potential at the site of charge insertion. Ensemble averages are over the reference state, i.e., before charge insertion. The probability distribution g

of V VBA VBA is shown as dots, along with a Gaussian fit (short dashes). The averages sought are e V/kT e V/kT g(V)dV (upper panel) and V2 ∫ ∆V2g(V)dV. For accurate

integration, the simulation must sample the configurations that contribute most to the functions being integrated. With q e/10, the peak in e V/kT g(V) is indeed adequately sampled by the simulation. The tails of g(V) are not sampled; their contributions to the ensemble averages e V/kT and V 2 can be estimated by assuming the tails are Gaussian (shaded in figure) and represent only 10% and 3%, respectively. For larger perturbing charges (q e/10), the contribution of the negative tail to the exponential average grows very rapidly, and the exponential formula [Eq. (8)] becomes unreliable. The first few free energy derivatives, in contrast, continue to be well sampled. However, as q increases, the contribution of higher derivatives (which are more difficult to sample) to the free energy increases. (From Ref. 22.)

Free Energy Calculations

175

176 Simonson

i 1

 

 

 

1

n

FB FA kT

Cn

(9)

n!

kT

 

The cumulants [23] are simple functions of the moments of the probability distribution

of VBA, e.g., C1 VBA A, C2 (VBA VBA A)2 A, C3 (VBA VBA A)3 A, C4 (VBA

VBA A)4 A 3C22. The rate of convergence of the expansion is determined by the deviations of VBA from its mean. Truncation of the expansion at order 2 corresponds to a linear response approximation [22] and is equivalent to assuming that VBA is Gaussian (with zero moments and cumulants beyond order 2). To this order, the mean and width of the distribution determine the free energy; to higher orders, the detailed shape of the distribution contributes.

Two examples of ‘‘alchemical’’ perturbations are shown schematically in Figure 2. They involve, respectively, the transformation of an argon atom into xenon in solution and the insertion of a point charge onto a single atom in a protein. The expression of the perturbing energy VBA is given in each case, assuming the potential energy function has the typical form used in many current biomolecular force fields [4,5]. These simple examples are chosen because the perturbation VBA depends essentially on a single variable in each case: the radial distribution of water density around the argon atom and the electrostatic potential on the charge insertion site. This allows a simple graphical analysis of the averaging required to use the perturbation formula (8) and/or to obtain the first few cumulants [Eq. (9)].

Although the thermodynamic perturbation approach [Eq. (8)] is important conceptually, it is usually not the most efficient numerically. A more fruitful approach in practice is to introduce an explicit transformation of the potential energy function such that it is gradually changed from its starting form UA into the final form UB through the variation of one or a few convenient parameters or ‘‘coupling coordinates.’’ The simplest such approach is linear interpolation between UA and UB, i.e., we introduce the ‘‘hybrid’’ energy function

U(rN; λ) (1 λ)UA(rN) λUB(rN)

(10)

which represents a system that is a mixture of A and B. This hybrid system is usually such that it would be impossible to prepare experimentally yet is straightforward to prepare in a simulation model. By slowly varying the coupling coordinate λ from 0 to 1, the hybrid system is gradually changed from A into B. More than one coupling parameter and more complicated (e.g., nonlinear) functional forms U(λ1, λ2, . . . ) can be used as long as the starting energy function is UA and the final one is UB. For example, λ1 and λ2 could be coupling parameters applied respectively to van der Waals and electrostatic terms in the energy function. To obtain a practical computation scheme, we notice that the derivative of the free energy with respect to (one of) the coupling coordinate(s) has the form

F

∫∂U(rN; λ)/∂λ exp[ U(rN; λ)/kT]dq

U

λ

 

∂λ(λ)

exp[ U(rN; λ)/kT]dq

∂λ

(11)

where the brackets λ indicate an average over the ensemble corresponding to the hybrid energy function U(rN; λ). Second and higher derivatives can be obtained similarly. In the general case of several coupling parameters λ1, λ2, . . . , the second derivatives are

Free Energy Calculations

177

2 F

2 U(rN; λ1, λ2, . . .)

 

 

 

 

 

 

 

 

 

 

 

(λ1, λ2, . . .)

 

 

 

 

 

 

 

 

 

 

 

 

 

∂λi∂λj

∂λi ∂λj

 

 

,

λ2

,...

 

 

 

 

 

(12)

 

 

 

 

 

 

 

 

λ1

 

 

 

 

 

 

 

 

 

1

 

U U

 

 

 

 

 

 

U

 

 

U

 

 

 

 

 

 

 

∂λi ∂λj

 

 

 

∂λi

 

 

,... ∂λj

 

 

,...

 

kT

,

λ2

,...

,

λ2

,

λ2

 

 

 

 

λ1

 

 

 

 

 

λ1

 

λ1

 

 

The same relations (11) and (12) hold for the Gibbs free energy in the (N, p, T) ensemble. Equation (11) is also valid for a quantum mechanical system. Note that for a linear coupling scheme such as Eq. (10), the first term on the right of Eq. (12) is zero; the matrix of second derivatives can then be shown to be definite negative, so that the free energy is a concave function of the λi.

The free energy derivatives are also related to the coefficients in a Taylor expansion of the free energy with respect to λ. In the case of linear coupling, we let VBA λ(UB

UA)/kT in Eq. (9); we obtain

 

nF kTcn

(13)

∂λn

 

where cn is a cumulant of (UB UA)/kT (in the A ensemble).

To compute derivatives of F at the point (λ1, λ2, . . . ) numerically, a simulation is performed with the hybrid energy function U(rN; λ1, λ2, . . . ) and the appropriate energy derivatives are averaged. This procedure is repeated for a few discrete values of the coupling parameter(s), spanning the interval from 0 to 1. The resulting free energy derivatives are then interpolated and integrated numerically to yield FB FA. Many applications use a linear coupling and rely only on first derivatives calculated at evenly spaced points; however, efficiency can be improved by using somewhat more complicated schemes [20,24]. This general approach is known as thermodynamic integration.

C.Dummy Atoms and Endpoint Corrections

Figure 3 represents an illustrative biological application: an Asp Asn mutation, carried out either in solution or in complex with a protein [25,26]. The calculation uses a hybrid amino acid with both an Asp and an Asn side chain. For convenience, we divide the system into subsystems or ‘‘blocks’’ [27]: Block 1 contains the ligand backbone as well as the solvent and protein (if present); block 2 is the Asp moiety of the hybrid ligand side chain; block 3 is the Asn moiety. We effect the ‘‘mutation’’ by making the Asn side chain gradually appear and the Asp side chain simultaneously disappear. We choose initially the hybrid potential energy function to have the form

U(λ) U11 U22 U33 (1 λ)U12 λU13

(14)

where Uii represents the interactions within block i and U1i represents the interactions between blocks 1 and i. Blocks 2 and 3 (the two side chains) do not interact. At the Asp endpoint (λ 0), the Asn side chain has no interactions with its environment but retains its internal interactions; similarly for Asp at the Asn endpoint (λ 1). At intermediate values of λ, the interactions between each side chain moiety and block 1 are weighted by λ or 1 λ.

This protocol has an important feature: Neither endpoint corresponds exactly to the biomolecule of interest. Each endpoint represents an artificial construct involving several

178

Simonson

Figure 3 Mutation of a ligand Asp into Asn in solution and bound to a protein. (a) Thermodynamic cycle. (b) Dual topology description: a hybrid ligand with two side chains. ‘‘Blocks’’ are used to define the hybrid energy function [Eq. (14)]. Only the ligand is shown; the environment is either solvent or the solvated protein. (c) Single-topology description.

‘‘dummy’’ atoms, i.e., atoms that have no interactions with their environment. An important question is, therefore, How can we relate the endpoint systems to the exact systems of interest? The use of dummy atoms is quite general in alchemical free energy calculations; despite the great flexibility with which the potential energy function can be manipulated, it is often difficult, inefficient, or impossible to connect the exact biochemical systems of interest through a thermodynamic perturbation or thermodynamic integration calculation.

Free Energy Calculations

179

In the present case, each endpoint involves—in addition to the fully interacting solute—an intact side chain fragment without any interactions with its environment. This fragment is equivalent to a molecule in the gas phase (acetamide or acetate) and contributes an additional term to the overall free energy that is easily calculated from ideal gas statistical mechanics [18]. This contribution is similar but not identical at the two endpoints. However, the corresponding contributions are the same for the transformation in solution and in complex with the protein; therefore, they cancel exactly when the upper and lower legs of the thermodynamic cycle are subtracted (Fig. 3a).

Although the above protocol leads to a simple endpoint correction, it poses sampling problems in the simulation steps immediately before the endpoints. Indeed, as λ → 0, the CAECBB bond force constant is reduced to a very small value [similarly for CAECBA (Fig. 3b) as λ → 1], and the corresponding side chain fragment begins to wander extensively. However, its excursions will never approach the free wandering of an ideal gas fragment, because of the finite (albeit small) force constant and the limited length of the simulation. Thus, the system will not approach the λ 0 or λ 1 endpoints closely or smoothly enough, and a discontinuity will occur in the calculated free energy. Mathematically, the free energy derivative can be shown in this example to go to infinity at the endpoints as 1/λ [or 1/(1 λ)]; i.e., the free energy varies very rapidly and has a singularity at each endpoint [28].

A better protocol scales the non-bonded interactions Unb12, Unb13 but leaves the covalent terms intact [25]. This removes the sampling problems discussed (and the free energy singularity) but substantially complicates the form of the endpoint correction. Indeed, at each endpoint, the ‘‘dummy’’ side chain fragments remain attached to other interacting atoms through covalent energy terms. Therefore, they contribute to the partition function at the endpoints through both kinetic and potential energy terms. Although the corresponding free energy contribution usually cannot be calculated rigorously, approximate calculations can be made. In the present example, if one neglects (or turns off) dihedral terms that couple the interacting and noninteracting parts of the ligand, then the dummy atoms are coupled to the other atoms only through bond and angle terms, and these can be factored out of the configurational integral [28]. The resulting free energy contribution then cancels exactly when the upper and lower legs of the thermodynamic cycle of Figure 3a are subtracted. Other protocols make use of positional restraints for the noninteracting groups at the endpoint, tethering them to a specific point or region instead of to interacting atoms [29,30]. The free energy associated with this tethering can be calculated exactly and can be shown to cancel in a well-chosen thermodynamic cycle. For these reasons, there is usually no need to calculate such endpoint corrections explicitly in applications that compare two legs of a cycle. However, when a single leg of a cycle is being treated, considerable care must be used to correct for dummy atom contributions. For a detailed discussion of the finer points associated with dummy atoms and endpoint corrections, see Ref. 28.

The ‘‘annihilation’’ of a particle in a condensed environment to give a dummy atom can lead to another endpoint singularity, arising from the van der Waals energy term. Indeed, if the van der Waals interaction energy is scaled by a factor λn, the free energy derivative goes to infinity as λn/4 1 when λ → 0 [31]. If n 1 [linear coupling, as in Eq. (10)], there is a free energy singularity and the free energy derivative must be extrapolated to the endpoint λ 0 with care (e.g., using the theoretical form λ 3/4). If n 4, the free energy derivative remains finite and there is no singularity.