Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

Modeling in NMR Structure Determination

255

inary structure (see, e.g., Ref. 19). Alternatively, it can be treated like an ambiguous NOE (see below) [20].

B. Distance Restraints

In an isolated two-spin system, the NOE (or, more accurately, the slope of its buildup) depends simply on d 6, where d is the distance between two protons. The difficulties in the interpretation of the NOE originate in deviations from this simple distance dependence of the NOE buildup (due to spin diffusion caused by other nearby protons, and internal dynamics) and from possible ambiguities in its assignment to a specific proton pair. Molecular modeling methods to deal with these difficulties are discussed further below.

Usually, simplified representations of the data are used to obtain preliminary structures. Thus, lower and upper bounds on the interproton distances are estimated from the NOE intensity [10], using appropriate reference distances for calibration. The bounds should include the estimates of the cumulative error due to all sources such as peak integration errors, spin diffusion, and internal dynamics.

The dispersion of proton chemical shifts is usually incomplete in the one-dimen- sional spectrum of a macromolecule, resulting in many degenerate resonances. As a result, few NOEs can be assigned only on the basis of resonance assignments and without any knowledge of the structure of the molecule [21,22]. Unless ambiguities can be resolved by using additional information, such as the peak shape or data from heteronuclear experiments, the remaining NOEs are ambiguous and cannot be converted into restraints on distances between proton pairs.

Nevertheless, the information from ambiguous NOEs can be converted directly into structural restraints. The structure calculation or refinement with ambiguous data can proceed in a way directly analogous to refinement with standard distance restraints, restraining

a ‘‘d 6-summed distance’’ D by means of a distance target function (see Fig. 2). By analogy with standard unambiguous distance restraints between atom pairs, we call these ‘‘ambiguous distance restraints’’ (ADRs) [21]. Similar methods can be applied to ambiguities in other experimental data, such as hydrogen bonds [23,24], disulfide bridges [21], and paramagnetic shift broadening and chemical shift differences [25,26].

The distances in the structure are restrained to the upper and lower bounds derived from NOEs by ‘‘flat-bottom’’ potentials. The potential should be gradient-bounded and have an asymptotic region for large violations that is linear [27–29] (see Fig. 2). Then, for large restraint violations, the force approaches a maximum value or can even be decreased, depending on the parameters. This makes the optimization numerically more stable and seems to improve convergence by transiently allowing larger violations during the calculation, thus allowing the structure to gradually escape deep local minima.

The limitation of the gradient of the potential is particularly important for calculations with ADRs and for data sets that potentially contain noise peaks, since it facilitates the appearance of violations due to incorrect restraints. A standard harmonic potential would put a high penalty on large violations and would introduce larger distortions into the structure.

C. The Hybrid Energy Approach

Even if the set of data from NMR experiments is as complete as possible, it is insufficient to define the positions of all the atoms in the molecule, simply because most of the data

256

Nilges

Figure 2 Use of unambiguous or ambiguous distance restraints in an optimization calculation.

(a)The distance D that is restrained can be a distance measured between two protons in the molecule or a ‘‘(d 6) 1/6 summed distance’’ with contributions from many proton pairs, where the sum runs over all contributions to a cross-peak that are possible due to chemical shift degeneracy. The question marks indicate ambiguities in the assignment of the NOE. For clarity, a situation with only two assignment possibilities is shown. There can be many more possibilities with experimental data.

(b)The restraining potential is gradient bounded to avoid large forces for large violations. kNOE is

the energy constant, and U and L are upper and lower bounds derived from the size of the NOE. The parameter σ determines the distance at which the potential switches from harmonic to asymptotic behavior, β is the asymptotic slope of the potential, and the coefficients α and γ are determined such that the potential is continuous and differentiable at U σ. If D is between L and U, the energy and gradient are zero.

Modeling in NMR Structure Determination

257

are measured for protons only. The positions of the other atoms have to be inferred, using values of bond lengths, bond angles, planarity, and van der Waals radii that are known a priori.

A molecular dynamics force field is a convenient compilation of these data (see Chapter 2). The data may be used in a much simplified form (e.g., in the case of metric matrix distance geometry, all data are converted into lower and upper bounds on interatomic distances, which all have the same weight). Similar to the use of energy parameters in X-ray crystallography, the parameters need not reflect the dynamic behavior of the molecule. The force constants are chosen to avoid distortions of the molecule when experimental restraints are applied. Thus, the force constants on bond angle and planarity are a factor of 10–100 higher than in standard molecular dynamics force fields. Likewise, a detailed description of electrostatic and van der Waals interactions is not necessary and may not even be beneficial in calculating NMR structures.

The problem of finding conformations of the molecule that satisfy the experimental data is then that of finding conformations that minimize a hybrid energy function Ehybrid, which contains different contributions from experimental data and the force field (see below). These contributions need to be properly weighted with respect to each other. However, if the chosen experimental upper and lower bounds are wide enough to avoid any geometrical inconsistencies between the force field and the data, this relative weight does not play a predominant role.

III. MINIMIZATION PROCEDURES

Finding the minimum of the hybrid energy function is very complex. Similar to the protein folding problem, the number of degrees of freedom is far too large to allow a complete systematic search in all variables. Systematic search methods need to reduce the problem to a few degrees of freedom (see, e.g., Ref. 30). Conformations of the molecule that satisfy the experimental bounds are therefore usually calculated with metric matrix distance geometry methods followed by optimization or by optimization methods alone.

Minimization is often not powerful enough for structure calculations of macromolecules unless it is used with an elaborate protocol (e.g., the ‘‘buildup method’’ [3]). More powerful approaches are based on global optimization of the hybrid energy function by molecular dynamics based simulated annealing [31–33]. Other optimization methods have been suggested for NMR structure calculation, notably Monte Carlo simulated annealing [34] and genetic algorithms [35]. Branch-and-bound algorithms have also been suggested for docking rigid monomers with ambiguous restraints [36] or with very sparse data sets [37]. An important feature of the latter is the addition of a hydrophobic potential [38] to the hybrid energy function, which serves to pack secondary structure elements.

Because the parameter-to-observable ratio is rather low, structures are calculated repeatedly with the same restraints. The aim is a random sampling of the conformational space consistent with the restraints. In metric matrix distance geometry, randomness is achieved by the random selection of distance estimates within the bounds. In optimization calculations, one achieves random searching by either selecting a starting conformation very far from the folded structure (e.g., an extended strand [4]) or by choosing starting conformations that are random (either in torsion angles or in Cartesian coordinates) [3,39].

258

Nilges

A. Metric Matrix Distance Geometry

A distance geometry calculation consists of two major parts. In the first, the distances are checked for consistency, using a set of inequalities that distances have to satisfy (this part is called ‘‘bound smoothing’’); in the second, distances are chosen randomly within these bounds, and the so-called metric matrix (Mij) is calculated. ‘‘Embedding’’ then converts this matrix to three-dimensional coordinates, using methods akin to principal component analysis [40].

There are many extensive reviews on metric matrix distance geometry [41–44], some of which provide illustrative examples [45,46]. In total, we can distinguish five steps in a distance geometry calculation:

1.Bound smoothing

2.Distance selection and metrization

3.Construction of the metric matrix

4.Embedding

5.Refinement (optimization)

Bound smoothing serves two purposes: to check consistency of the distances and to transfer information between atoms. Distances have to satisfy the triangle inequalities in a metric space of any dimension (the sum of two sides of a triangle has to be larger than the third; see Fig. 3). To ensure consistency of the distances in three-dimensional space, more inequalities would be necessary (the triangle, tetrangle, pentangle, and hexangle inequalities) [41,47]. Only the tetrangle inequality is of practical use, and it is usually not employed because of high computational costs. This inequality transfers information from one diagonal of a tetrangle to the other; in two dimensions this is the parallelogram equation |a b| |c d|.

The most important consequence of bound smoothing is the transfer of information from those atoms for which NMR data are available to those that cannot be observed directly in NMR experiments. Within the original experimental bounds, the minimal distance intervals are identified for which all N3 triangle inequalities can be satisfied. A distance chosen outside these intervals would violate at least one triangle inequality. For example, an NOE between protons pi and pj and the covalent bond between pj and carbon Cj imposes upper and lower bounds on the distance between pi and Cj, although this distance is not observable experimentally nor is it part of Echem.

The second step concerns distance selection and metrization. Bound smoothing only reduces the possible intervals for interatomic distances from the original bounds. However, the embedding algorithm demands a specific distance for every atom pair in the molecule. These distances are chosen randomly within the interval, from either a uniform or an estimated distribution [48,49], to generate a trial distance matrix. Uniform distance distributions seem to provide better sampling for very sparse data sets [48].

Note that although the bounds on the distances satisfy the triangle inequalities, particular choices of distances between these bounds will in general violate them. Therefore, if all distances are chosen within their bounds independently of each other (the method that is used in most applications of distance geometry for NMR structure determination), the final distance matrix will contain many violations of the triangle inequalities. The main consequence is a very limited sampling of the conformational space of the embedded structures for very sparse data sets [48,50,51] despite the intrinsic randomness of the tech-

Modeling in NMR Structure Determination

259

Figure 3 Flow of a distance geometry calculation. On the left is shown the development of the data; on the right, the operations. dij is the distance between atoms i and j; Lij and Uij are lower and upper bounds on the distance; Lij and Uij are the smoothed bounds after application of the triangle inequality; di0 is the distance between atom i and the geometric center; N is the number of atoms; (Mij) is the metric matrix; ri is the positional vector of atom i; e1 is the first eigenvector of (Mij) with eigenvalue λl; xi, yi, and zi are the x-, y-, and z-coordinates of atom i. (1–5 correspond to the numbered list on pg. 258.)

nique. In spite of these limitations, the algorithm is remarkably stable in its simplest form.

Metrization guarantees that all distances satisfy the triangle inequalities by repeating a bound-smoothing step after each distance choice. The order of distance choice becomes important [48,49,51]; optimally, the distances are chosen in a completely random sequence

260

Nilges

[49]. Metrization is a very computer-intensive operation. Computer time can be saved by using a partially random sequence [43] and terminating the process after 4N distances

[51](a three-dimensional object is completely specified by 4N 10 distances). Metrization leads to a much better sampling of conformational space and dramati-

cally improves the local quality of the structures when few long-range connectivities are present [48,51]. The better sampling of space comes at a certain price: The embedded structures may show errors in the topology that are not seen without metrization [31,43]. This may be due to the enforced propagation of an error in a distance choice to many other distances through the triangle inequality.

The metric matrix is the matrix of all scalar products of position vectors of the atoms when the geometric center is placed in the origin. By application of the law of cosines, this matrix can be obtained from distance information only. Because it is invariant against rotation but not translation, the distances to the geometric center have to be calculated from the interatomic distances (see Fig. 3). The matrix allows the calculation of coordinates from distances in a single step, provided that all Natom(Natom 1)/2 interatomic distances are known.

Embedding is the calculation of coordinates from the metric matrix by methods akin to principal component analysis [40,52]. The eigenvectors of the metric matrix contain the principal coordinates of the atoms. If the distances correspond to a three-dimensional object, only three eigenvalues of the matrix are nonzero (see Refs. 41 and 53 for mathematical proofs), and the first eigenvector contains all x-coordinates, the second all y-coordi- nates, and the third all z-coordinates. If the distances are not consistent with a threedimensional object (the usual situation with sparse NMR data, when the majority of distances come from the random number generator), there will be more than three positive eigenvalues. The eigenvector expansion is then truncated after the first three eigenvalues; this corresponds to a projection of a higher dimensional object into three-dimensional space.

Refinement of the embedded structures is always necessary to remove distortions in the structure. One shortcoming of the embedding algorithm is that data cannot be weighted according to their certainty in any way. During the projection, bond lengths are distorted in the same way as long-range distances guessed by the random number generator within possibly very wide bounds. Also, chirality information is completely absent during bound smoothing and embedding. The first step in the refinement is the selection of the correct enantiomer, which may be achieved on the basis of the chirality of Cα atoms [1], secondary structure elements, or partial refinement of both enantiomers and choice of the enantiomer with lower energy [51].

If the distances satisfy the triangle inequalities, they are embeddable in some dimension. One possible solution is therefore to try to start refinement in four dimensions and use the allowed deviation into the fourth dimension as an additional annealing parameter [43,54]. The advantages of refinement in higher dimensions are similar to those of soft atoms discussed below.

A time-saving variant of the distance geometry procedure described above is substructure embedding. Here, about a third of the atoms are chosen after the bound smoothing step and embedded. This procedure was originally used to improve the performance of the distance geometry algorithm by adding the distances from the embedded and partially refined structures back to the distance list [1]. The substructures can be refined directly with simulated annealing by filling in the missing atoms approximately in their correct positions [55].