Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

6

Internal Coordinate Simulation Method

Alexey K. Mazur

Institut de Biologie Physico-Chimique, CNRS, Paris, France

I.INTRODUCTION

In this chapter I outline the general principles of modeling biomacromolecules with internal coordinates as independent variables. This approach was generally preferred in the early period of computer conformational analysis when hardware computer resources were strongly limited [1]. In the last two decades, mainly because of the growing interest in molecular dynamics (MD), Cartesian coordinate approaches gradually became predominant, and one readily sees that just by looking into the index of this book. Nevertheless, internal coordinates continue to be employed, notably, in conformational searches based on energy minimization and Monte Carlo (MC) [2] and in normal mode analysis [3]. My main objective is to give a consistent exposition of the basic algorithms of this methodology and its underlying philosophy, with special emphasis on recent advances in the internal coordinate molecular dynamics (ICMD) techniques.

More traditional applications of internal coordinates, notably normal mode analysis and MC calculations, are considered elsewhere in this book. In the recent literature there are excellent discussions of specific applications of internal coordinates, notably in studies of protein folding [4] and energy minimization of nucleic acids [5].

II. INTERNAL AND CARTESIAN COORDINATES

The term ‘‘internal coordinates’’ usually refers to bond lengths, valence angles and dihedrals. They completely define relative atomic positions thus giving an alternative to the Cartesian coordinate description of molecular structures. Dihedrals corresponding to rotations around single bonds are most important because all other internal coordinates are usually considered fixed at their standard values, and the representation thus obtained is referred to as the standard geometry approximation [6]. For both proteins and nucleic acids the standard geometry approximation reduces the number of degrees of freedom from 3N to approximately 0.4N, where N is the total number of atoms. Freezing of ‘unimportant’ variables accelerates minimization of the potential energy as well as equilibration in Monte Carlo calculations just because the space dimension is the principal parameter that determines the theoretical rate of convergence of iterative algorithms. It is important

115

116

Mazur

also that higher order minimizers that require much computer memory to store the Hessian matrix remain affordable even for very large systems. It should be noted, however, that because of the non-linear relationship between internal and Cartesian coordinates the distinction between them is not reduced to the foregoing simple arithmetic. To begin with, let us consider the following instructive example.

Figure 1 compares the courses of energy minimization with different choices of coordinates. A standard geometry initial conformation was minimized in three modes: (1) with all degrees of freedom and Cartesian coordinates as variables, (2) with all degrees of freedom but internal coordinates as variables, and (3) with fixed standard geometry. All computations were made with the same program code employing a conjugate gradient minimizer with analytical gradients. Figure 1a demonstrates that, as expected, the minimum is most rapidly found with the standard geometry approximation. With all degrees

Figure 1 The course of energy minimization of a DNA duplex with different choices of coordinates. The rate of convergence is monitored by the decrease of the RMSD from the final local minimum structure, which was very similar in all three cases, with the number of gradient calls. The RMSD was normalized by its initial value. CC, IC, and SG stand for Cartesian coordinates, 3N internal coordinates, and standard geometry, respectively.

Internal Coordinate Simulation

117

of freedom the structure changes much more slowly, but we note that the rate of convergence is noticeably higher when internal rather than Cartesian coordinates are used, even though the space dimension is 3N in both cases. The internal coordinate minimization goes faster because internal coordinates better correspond to the local potential energy landscape. The energy gradient is an invariant vector that does not depend on the choice of coordinates, and so is the direction computed by the minimizer. Once it is chosen, however, the minimizer moves the structure along a straight line in the corresponding space. In Cartesian coordinates the profiles of the potential energy are very complex, and any straight path quickly goes to a wall. In contrast, curved atomic trajectories corresponding to straight lines in the internal coordinate space make possible much longer moves.

A clear manifestation of the foregoing effect is exhibited in Figure 1b. This graph shows results of a similar minimization test but with additional harmonic restraints that pulled atomic Cartesian coordinates to the final minimum energy values. Now the potential energy landscape in Cartesian coordinate space is greatly simplified, giving a dramatic acceleration of convergence compared to internal coordinates. As a result, convergence appears even faster than with the standard geometry approximation in spite of the difference in the number of variables. In practice, regardless of the number of variables and the type of minimizer, internal coordinates are always preferable in unconstrained minimization. In contrast, for example, in crystallographic root-mean-square refinement with a high weight of experimental restraints Cartesian coordinates should give faster convergence and lower final R factors.

The local energy minimization is arguably the clearest domain in molecular modeling, but we see that even here the difference between the two coordinate sets is far from trivial. It becomes much more complicated, however, when the specific features of macromolecular systems are considered. One feature is the multiple minima problem often discussed in connection with protein folding [2]. It is usually tackled with hybrid MC and MD techniques such as simulated annealing or MC minimization. Common examples are the protein folding by global minimization of some target function (not necessarily energy) and structure determination based on experimental data. In these calculations, called conformational searches, one looks for the structures that satisfy certain conditions and does not care how well the intermediate steps correspond to the physical reality. The standard geometry approximation offers a whole list of specific advantages for such studies.

First, larger MC steps are possible due to the same effect as in the foregoing minimization example. Second, larger MD steps are possible because freezing of bond length and bond angles eliminates the fastest motions. Third, molecular models can tolerate strong stimulation, such as by elevated temperature and strong stochastic forces, and still maintain a correct geometry of chemical groups. In addition, freezing of bond length and bond angles removes the small-scale ‘‘roughness’’ from the energy landscape of a macromolecule, thus vastly reducing the density of insignificant local minima. Exact evaluation of such density is a difficult task, but nevertheless this intuitive suggestion agrees with many practical observations. For example, in terms of root-mean-square distance (RMSD) of atomic coordinates, the standard geometry approximation results in a significantly larger radius of convergence for energy minimization from random states [7]. A similar effect has been reported for simulated annealing of protein conformations in crystallographic refinement [8].

At present, conformational searches provide for the most important application of computer molecular modeling in biology. In contrast, in statistical physics, from which MC and MD methods were originally borrowed, they are primarily used for studying

118

Mazur

physical phenomena connected with thermal molecular motions. In such investigations exhaustive sampling is indispensable. In simple words this means that if an event is considered, it must occur many times in MC or MD trajectories, and if a parameter is measured, every state that contributes a distinct individual value to the average must be visited many times. Unfortunately, with the presently available computer power, hardly any biologically important event and hardly any system can be both correctly and accurately modeled in such a sense. Nevertheless, this line of research has many long-term prospects in molecular biophysics, and in the remaining part of this section I will briefly comment on the problems connected with the application of internal coordinates in such studies.

In ‘‘true simulations’’ physical realism is the goal, and the question arises, What part of such realism is sacrificed with the elimination of ‘‘unimportant’’ degrees of freedom? This issue appears to be rather complicated. It has been debated many times in the literature, but no consensus seems to have been reached [6,9–16]. Without going into details, I briefly summarize here the two opposite lines of argumentation, denoting them

(A) and (B).

(A1) Freezing of bonds and angles deforms the phase space of the molecule and perturbs the time averages. The MD results, therefore, require a complicated correction with the so-called metric tensor, which undermines any gain in efficiency due to elimination of variables [10,17–20].

(B1) The metrics effect is very significant in special theoretical examples, like a freely joined chain. In simulations of polymer solutions of alkanes, however, it only slightly affects the static ensemble properties even at high temperatures [21]. Its possible role in common biological applications of MD has not yet been studied. With the recently developed fast recursive algorithms for computing the metric tensor [22], such corrections became affordable, and comparative calculations will probably appear in the near future.

(B2) With their frequencies beyond 1000 cm 1, the bond length and bond angle oscillations occupy the ground state at room temperature. The classical harmonic treatment makes them ‘‘too flexible.’’

(A2) In spite of the high individual frequencies, bond length and bond angle vibrations participate in quasi-classical low frequency collective normal modes. Bond angle bending is necessary for the flexibility of five-membered rings, which plays a key role in the polymorphism of nucleic acids.

(B2) Usually, the role of these vibrations is not crucial, and with bond lengths and bond angles fixed the corresponding collective modes are only modified, not eliminated. Significant variations of valence angles in strained structures, as in furanose rings of nucleic acids, can be treated with special algorithms.

(A3) Bond lengths and bond angles vary in protein crystal structures.

(B3) These variations are related to the refinement procedures much more than to the experimental data [23] and are generally larger than in high resolution structures of small molecules. In MD calculations with harmonic bond lengths and bond angles they are still higher.

(A4) Bond angle bending makes a nonnegligible contribution to conformational entropy and can affect computed equilibrium populations [11].

(B4) The corresponding estimates are valid only in harmonic approximation; therefore, they are inapplicable to normal temperature conditions. The harmonic