Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

Modeling in NMR Structure Determination

261

B. Molecular Dynamics Simulated Annealing

In Cartesian coordinates, molecular dynamics-based simulated annealing (MDSA) refinement consists of the numerical solution of Newton’s equations of motion (see Chapter 3). The specific advantage of molecular dynamics over energy minimization is the larger radius of convergence due to possible uphill motions over large energy barriers (Fig. 4). Together with variation of temperature or energy scales, very powerful minimization strategies can be implemented.

Scaling the temperature, the overall weight on Ehybrid or all masses mi are formally equivalent [31]. The independent scaling of each contribution El by its weight factor wl gives rise to a large number of possible simulated annealing schemes. We call annealing schemes that vary the wl independently ‘‘generalized annealing schemes.’’ The initial velocities are usually assigned from a Maxwell distribution at the desired starting temperature, and the temperature is controlled (e.g., by coupling to a heat bath [56]). For the use of MD as an optimization technique, it is convenient to use uniform masses mi m for all i [39]. This, in combination with uniform energy constants in the force field, allows the use of larger time steps in the molecular dynamics, because differences in vibrational frequencies are avoided (the time step is determined by the highest vibrational frequency).

Recently, MD constrained to torsion angle space [torsion angle dynamics (TAD)] was introduced to refinement calculations [33,57,58]. Earlier versions of the equations of

Figure 4 (a) Solving Newton’s equations of motion at constant energy allows the molecule to overcome energy barriers in Ehybrid. The quantities ri and mi are the coordinate vectors and masses, respectively, of atom i, and Ehybrid is the target function of the minimization problem, containing different contributions from experimental data and from a priori knowledge (i.e., the force field).

(b) With temperature variation, powerful minimization schemes can be implemented, allowing for large energy barriers to be crossed at high temperatures, ultimately leading to the identification of the ‘‘global’’ minimum.

262

Nilges

motion for molecular dynamics in torsion angle space were very inefficient to solve owing to the need for a matrix inversion at every time step [59]. Newer algorithms break down the necessary operations into a series of multiplications of small matrices and are therefore much more efficient [60,61].

The application of TAD in standard MD calculations may require the development of dedicated force fields to emulate the missing flexibility by a reparametrization of the non-bonded potential. This is not necessary for its application in NMR structure calculation, because the energy parameters developed for this purpose already assume in most cases a rigid covalent geometry, either by employing high force constants or by using only torsion angles as degrees of freedom. The advantage of TAD is that the geometry of the molecule does not have to be maintained by high force constants, which lead to high vibrational frequencies. Therefore, longer time steps at higher temperatures can be used with TAD, and the refinement protocols are numerically more stable.

C. Folding Random Structures by Simulated Annealing

Various simulated annealing protocols have been suggested to fold random structures with experimental restraints. The choice of starting structure determines the optimal protocol. The most obvious choices are random distributions of dihedral angles (as indicated in Fig. 2). The minimization procedure has to try to avoid entanglement of the chain while properly relaxing large forces in the starting conformation, which could arise from overlapping atoms or distance restraints violated by a large amount. This is achieved by a combination of soft non-bonded interactions, a violation-tolerant form of the distance restraint potential, and high temperature dynamics.

To achieve convergence with an annealing protocol using Cartesian dynamics, multistage generalized annealing protocols were introduced (Fig. 5). The first stage is a high temperature search where the molecule adopts approximately the correct fold. In this stage, the non-bonded interactions are reduced to allow the chain to intersect itself, and the representation of the non-bonded interactions may be further simplified by computing them for only a fraction of the atoms. The protocol is also adaptable to ambiguous restraint lists by a specifically reduced weight wambig on the ADRs [20], which is varied independently of wunambig (see Fig. 5). A detailed description can be found elsewhere [20].

With mostly unambiguous data, this protocol has been successfully used for proteins with up to 160 residues [62]. Although virtually all structures converge to the correct fold for small proteins, we observe that approximately one-third of the structures are misfolded for larger proteins, or for low data density, or many ambiguities (see, e.g., Ref. 63). We have also used this protocol for most structure calculations with the automated NOE assignment method ARIA discussed in the next section.

Calculations starting from random Cartesian coordinates and using standard Newton dynamics illustrate the flexibility of the generalized annealing approach. The extremely bad geometry of the initial structures requires that the weights on the covalent geometry terms start with very low values, which are then slowly increased during the calculation. All torsion angle terms (dihedral angles, planarity, and chirality) are removed from Ehybrid because of the difficulty in calculating them for random Cartesian structures. Enantiomer selection and regularization are necessary with this protocol much as they are with MMDG embedded structures. The principal advantage of the use of random Cartesian coordinates over that of random dihedral coordinates is that the former give better sampling for highly ambiguous data. The initial structure does not bias toward intraresidue or sequential assignments of ambiguous NOEs.

Modeling in NMR Structure Determination

263

Figure 5 Schematic representation of a Cartesian dynamics protocol starting from random torsion angles. The weights w for non-bonded (i.e., van der Waals) interactions, unambiguous distance restraints, and ambiguous distance restraints are varied independently. The covalent interactions are maintained with full weight, wcovalent, for the entire protocol. Weights for other experimental terms may be varied in an analogous way. Coupling constant restraints and anisotropy restraints are usually used only in a refinement stage.

A TAD protocol [58] may have a three-stage organization similar to that of the Cartesian MDSA protocol (Fig. 5), with two TAD stages (one high temperature, one cooling) and a final Cartesian cooling stage. The starting temperatures can be set to much higher values (up to 50,000 K). Weights on experimental and non-bonded terms differ in the different stages, with higher weights on the experimental terms in the high temperature stage, but the principal parameter that is varied during simulated annealing is the temperature. TAD protocols used with the program DYANA [64] are even simpler, with only temperature variation in the simulated annealing stage, which is followed by conjugate gradient minimization.

In general, TAD shows better convergence than Cartesian dynamics. For nucleic acid structures, for example, the convergence rate can be very low both for MMDG and for Cartesian dynamics owing to the low restraint density. The sampling of conformational

264

Nilges

space by TAD for very sparse data sets should be comparable to Cartesian dynamics protocols and better than for MMDG without metrization. Depending on the implementation, ambiguous distance restraints can be used throughout the protocol as with Cartesian dynamics. With its implementation in several NMR structure determination programs, including X-plor [65], CNS [66], and DYANA [33], the field seems to converge toward this calculation method.

IV. AUTOMATED INTERPRETATION OF NOE SPECTRA

The methods discussed in this section extend the original concept of deriving structures from experimental NMR data in two ways. First, during the structure calculation, part of the assignment problem is solved automatically. This allows specification of the NOE data in a form closer to the raw data, which makes the refinement similar to X-ray refinement. Second, the quality of the data is assessed. The methods have been recently reviewed in more detail [64,67].

A.Recognition of Incorrect Restraints: The Structural Consistency Hypothesis

Structure calculation algorithms in general assume that the experimental list of restraints is completely free of errors. This is usually true only in the final stages of a structure calculation, when all errors (e.g., in the assignment of chemical shifts or NOEs) have been identified, often in a laborious iterative process. Many effects can produce inconsistent or incorrect restraints, e.g., artifact peaks, imprecise peak positions, and insufficient error bounds to correct for spin diffusion.

Restraints due to artifacts may, by chance, be completely consistent with the correct structure of the molecule. However, the majority of incorrect restraints will be inconsistent with the correct structural data (i.e., the correct restraints and information from the force field). Inconsistencies in the data produce distortions in the structure and violations in some restraints. Structural consistency is often taken as the final criterion to identify problematic restraints. It is, for example, the central idea in the ‘‘bound-smoothing’’ part of distance geometry algorithms, and it is intimately related to the way distance data are usually specified: The error bounds are set wide enough that all data are geometrically consistent.

The problem in using violations to identify incorrect restraints is twofold. First, one has to distinguish between violations that appear because of insufficient convergence power of the structure calculation algorithm and violations due to incorrect restraints. Violations caused by incorrect restraints will be consistent (i.e., they will be present in the majority of structures), whereas insufficient convergence will produce violations that are randomly distributed. This reasoning has been formalized in the ‘‘self-correcting distance geometry’’ method [22,29], which calculates structures iteratively and modifies the list of restraints after each iteration. Consistent violations are identified by calculating the fraction of structures in which a particular restraint is violated by more than a threshold

˚

(e.g., 0.5 A). If this fraction exceeds a certain value (e.g., 0.5), the restraint is removed from the list for the calculation in the next iteration.

Second, it is possible that an incorrect restraint produces a systematic violation of another restraint. Currently, this can be ruled out only by manually checking the results,