Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

Comparative Protein Structure Modeling

285

is that constraints or restraints derived from a number of different sources can easily be added to the homology derived restraints. For example, restraints could be provided by rules for secondary structure packing [84], analyses of hydrophobicity [85] and correlated mutations [86], empirical potentials of mean force [87], nuclear magnetic resonance (NMR) experiments [88], cross-linking experiments, fluorescence spectroscopy, image reconstruction in electron microscopy, site-directed mutagenesis [89], intuition, etc. In this way, a comparative model, especially in the difficult cases, could be improved by making it consistent with available experimental data and/or with more general knowledge about protein structure.

D. Loop Modeling

In comparative modeling, target sequences often have inserted residues relative to the template structures or have regions that are structurally different from the corresponding regions in the templates. Thus, no structural information about these inserted or conformationally variable segments can be extracted from the template structures. These regions frequently correspond to surface loops. Loops often play an important role in defining the functional specificity of a given protein framework, forming the active and binding sites. The accuracy of loop modeling is a major factor determining the usefulness of comparative models in applications such as ligand docking. Loop modeling can be seen as a mini protein folding problem. The correct conformation of a given segment of a polypeptide chain has to be calculated mainly from the sequence of the segment itself. However, loops are generally too short to provide sufficient information about their local fold. Even identical decapeptides do not always have the same conformation in different proteins [90,91]. Some additional restraints are provided by the core anchor regions that span the loop and by the structure of the rest of a protein that cradles the loop. Although many loop modeling methods have been described, it is still not possible to model correctly and with high confidence loops longer than approximately eight residues [239].

There are two main classes of loop modeling methods: (1) the database search approaches, where a segment that fits on the anchor core regions is found in a database of all known protein structures [62,94], and (2) the conformational search approaches [95– 97]. There are also methods that combine these two approaches [92,98,99].

The database search approach to loop modeling is accurate and efficient when a database of specific loops is created to address the modeling of the same class of loops, such as β-hairpins [100], or loops on a specific fold, such as the hypervariable regions in the immunoglobulin fold [94,101]. For example, an analysis of the hypervariable immunoglobulin regions resulted in a series of rules that allowed a very high accuracy of loop prediction in other members of the family. These rules were based on the small number of conformations for each loop and the dependence of the loop conformation on its length and certain key residues. There have been attempts to classify loop conformations into more general categories, thus extending the applicability of the database search approach to more cases [102–105]. However, the database methods are limited by the fact that the number of possible conformations increases exponentially with the length of a loop. As a result, only loops up to four to seven residues long have most of their conceivable conformations present in the database of known protein structures [106,107]. Even according to the more optimistic estimate, approximately 30% and 60% of all the possible eightand nine-residue loop conformations, respectively, are missing from the database [106]. This is made even worse by the requirement for an overlap of at least one residue

286 Fiser et al.

between the database fragment and the anchor core regions, which means that the modeling of a five-residue insertion requires at least a seven-residue fragment from the database [70]. Despite the rapid growth of the database of known structures, there is no possibility of covering most of the conformations of a nine-residue segment in the foreseeable future. On the other hand, most of the insertions in a family of homologous proteins are shorter than nine residues [108,239].

To overcome the limitations of the database search methods, conformational search methods were developed [95,96,109]. There are many such methods, exploiting different protein representations, objective function terms, and optimization or enumeration algorithms. The search algorithms include the minimum perturbation method [97], molecular dynamics simulations [92,110,111], genetic algorithms [112], Monte Carlo and simulated annealing [113,114], multiple copy simultaneous search [115–117], self-consistent field optimization [118], and an enumeration based on the graph theory [119].

We now describe a new loop modeling protocol in the conformational search class [239]. It is implemented in the program MODELLER (Table 1). The modeling procedure consists of optimizing the positions of all non-hydrogen atoms of a loop with respect to an objective function that is a sum of many spatial restraints. Many different combinations of various restraints were explored. The best set of restraints includes the bond length, bond angle, and improper dihedral angle terms from the CHARMM22 force field [80,81], statistical preferences for the main chain and side chain dihedral angles [31], and statistical preferences for non-bonded contacts that depend on the two atom types, their distance through space, and separation in sequence [120]. The objective function was optimized with the method of conjugate gradients combined with molecular dynamics and simulated annealing. Typically, the loop prediction corresponds to the lowest energy conformation out of the 500 independent optimizations. The algorithm allows straightforward incorporation of additional spatial restraints, including those provided by template fragments, disulfide bonds, and ligand binding sites. To simulate comparative modeling problems, the loop modeling procedure was evaluated by predicting loops of known structure in only approximately correct environments. Such environments were obtained by distorting the anchor regions corresponding to the three residues at either end of the loop and all the

˚ ˚

atoms within 10 A of the native loop conformation for up to 2–3 A by molecular dynamics simulations. In the case of five-residue loops in the correct environments, the average

˚

error was 0.6 A, as measured by local superposition of the loop main chain atoms alone (C, N, Cα, O). In the case of eight-residue loops in the correct environments, 90% of the

˚ ˚

loops had less than 2 A main chain RMS error, with an average of less than 1.2 A (Fig. 6).

E. Side Chain Modeling

As for loops, side chain conformation is predicted from similar structures and from steric or energy considerations [5,121]. The geometry of disulfide bridges is modeled from disulfide bridges in protein structures in general [122,123] and from equivalent disulfide bridges in related structures [79]. Modeling the stability and conformation of point mutations by free energy perturbation simulations is not discussed here [124–127].

Vasquez [121] reviewed and commented on various approaches to side chain modeling. The importance of two effects on side chain conformation was emphasized. The first effect was the coupling between the main chain and side chains, and the second effect was the continuous nature of the distributions of side chain dihedral angles; for example,

Comparative Protein Structure Modeling

287

Figure 6

˚

Oxidoreductase (2nac), loop residues 28–35. Anchor distortion 1.2 A. Sample models

of varying accuracy for an eight-residue loop in an approximately correct protein environment. The calculated loops (shaded) are compared with the X-ray structure (black). Three levels of accuracy

˚

are illustrated: High accuracy corresponding to the backbone RMSD 1 A (top), medium accuracy

˚

corresponding to the backbone RMSD 2 A (middle), and low accuracy corresponding to the

˚

backbone RMSD 2 A (bottom). The panels on the left compare the loop backbone conformations after least-squares superposition of the complete protein structure. The panels on the right compare the loop backbone conformations after local superposition of the loops. The RMSD values are quoted for the main chain atoms only. The fraction of the loops modeled at each accuracy level is given in the rightmost column. The figure was prepared using MOLSCRIPT [236].

5–30% of side chains in crystal structures are significantly different from their rotamer conformations [128] and 6% of the χ1 or χ2 values are not within 40° of any rotamer conformation [129]. Both effects appear to be important when correlating packing energies and stability [130]. The correct energetics may be obtained for the incorrect reasons; i.e., the side chains adopt distorted conformations to compensate for the rigidity of the backbone. Correspondingly, the backbone shifts may hinder the use of these methods when the template structures are related at less than 50% sequence identity [131]. This is consis-

288 Fiser et al.

tent with the X-ray structure of a variant of λ repressor, which reveals that the protein accommodates the potentially disruptive residues with shifts in its α-helical arrangement and with only limited changes in side chain orientations [132]. Some attempts to include backbone flexibility in side chain modeling have been described [118,133,134], but the methods are not yet generally applicable.

Significant correlations were found between side chain dihedral angle probabilities and backbone Φ, Ψ values [129,135]. These correlations go beyond the dependence of side chain conformation on the secondary structure [136]. For example, the preferred rotamers can vary within the same secondary structure, with the changes in the Φ, Ψ dihedral angles as small as 20° [135]. Since these changes are smaller than the differences between closely related homologs, the prediction of the side chain conformation generally cannot be uncoupled from backbone prediction. This partly explains why the conformation of equivalent side chains in homologous structures is useful in side chain modeling [31]. A backbone-dependent rotamer library for amino acid side chains was developed and used to construct side chain conformations from main chain coordinates [135]. This automated method first places the side chains according to the rotamer library and then removes steric clashes by combinatorial energy minimization. It was also demonstrated that simple arguments based on conformational analysis could account for many features of the observed dependence of the side chain rotamers on the backbone [135]. Recently, the main chain–dependent side chain rotamer library was recalculated and extensively evaluated [129] (Table 1). The accuracy of the method was 82% for the χ1 dihedral angle and 72% for both χ1 and χ2 dihedral angles when the backbones of templates in the range from 30% to 90% sequence identity were used; a prediction was deemed correct when it was within 40° of the target crystal structure value.

Chung and Subbiah [131,137] gave an elegant structural explanation for the rapid decrease in the conservation of side chain packing as the sequence identity decreases below 30%. Although the fold is maintained, the pattern of side chain interactions is generally lost in this range of sequence similarity [138]. Two sets of computations were done for two sample protein sequences: The side chain conformation was predicted by maximizing packing on the fixed native backbone and on a fixed backbone with approxi-

˚ ˚

mately 2 A RMSD from the native backbone; the 2 A RMSD generally corresponds to the differences between the conserved cores of two proteins related at 25–30% sequence identity. The side chain predictions based on the two kinds of backbone turned out to be unrelated. Thus, inasmuch as packing reflects the true laws determining side chain conformation, a backbone with less than 30% sequence identity to the sequence being modeled is no longer sufficiently restraining to result in the correct packing of the buried side chains.

The solvation term is important for the modeling of exposed side chains [139– 142]. It was also demonstrated that treating hydrogen bonds explicitly could significantly improve side chain prediction [135,143]. Calculations that do not take into account the solvent, either implicitly or explicitly, introduce errors into the hydrogen-bonding patterns even in the core regions of a protein [142]. Residues with zero solvent accessibility area can still have a significant interaction energy with the solvent atoms [144].

A recent survey analyzed the accuracy of three different side chain prediction methods [134]. These methods were tested by predicting side chain conformations on near-

˚

native protein backbones with 4 A RMSD to the native structures. The three methods included the packing of backbone-dependent rotamers [129], the self-consistent meanfield approach to positioning rotamers based on their van der Waals interactions [145],