Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

Modeling in NMR Structure Determination

269

Other approaches use complete relaxation matrix analysis to obtain spin diffusion corrected distances from the NOE intensities [71,72], which are then used in conventional distance-restrained optimization. This is more efficient in the use of CPU time, because the gradients do not have to be evaluated and a full relaxation matrix calculation is necessary only a few times in a refinement. These methods invert the calculation of the NOE intensities (Fig. 8) to calculate the relaxation matrix (Rij) from a complete spectrum (NOEij), including the diagonal peaks. Since this is impossible to obtain experimentally for macromolecules, approximate iterative schemes are used. One method uses preliminary structures to calculate NOE intensities that are merged with the incomplete experimental data to obtain a complete spectrum [76]. A next generation of structures is then calculated with the distances, and an improved estimate of the relaxation matrix can be obtained. Another approach ‘‘shortcuts’’ the structure calculation and uses properties of the relaxation matrix itself (e.g., the relation of the diagonal elements to the off-diagonal elements) to iteratively correct the relaxation matrix [77]. Integration errors in the data have to be properly taken into account, otherwise they can lead to incorrect distance estimates [78].

VI. INFLUENCE OF INTERNAL DYNAMICS ON THE

EXPERIMENTAL DATA

Internal dynamics of the macromolecule influences all experimental data that can be measured by NMR. The d 6 weighting of the NOE makes the averaging very nonlinear, and the measured distance may appear much shorter than the average distance (see Fig. 9).

The ‘‘distance geometry approach’’ to the problem is to use appropriately large error bounds for the distances and a rough estimate of dynamics from the diversity of the final ensemble of structures. Although this approach has given qualitatively satisfactory agreement with dynamics measurements [79] and theoretical calculations in some cases [63,80], it is somewhat unsatisfactory. The diversity reflects the distribution of experimental data. Internal dynamics does influence this distribution, but experimental artifacts and overlap are also important factors. In addition, the diversity will depend on exactly how the distance bounds are derived from the NOE data. Furthermore, local dynamics can result in locally conflicting data, while multiple conformations appear in the calculated structures predominantly in regions with little data.

The measured NOE is an average over time and a large ensemble of structures, whereas in a standard structure calculation the lower and upper bounds refer to instantaneous distances. Methods have been proposed to account for the averaging in the interproton distance by fitting either a dynamics trajectory to the measured distance, by means of time-averaged distance restraints [81], or an ensemble of structures [82]. Formally, an ensemble-averaged distance restraint is equivalent to an ambiguous distance restraint. The difference is just a scale factor. Therefore, we can understand an ensemble-averaged NOE as an NOE that is ambiguous between different conformers in the ensemble.

The most serious problem with ensemble average approaches is that they introduce many more parameters into the calculation, making the parameter-to-observable ratio worse. The effective number of parameters has to be restrained. This can be achieved by using only a few conformers in the ensemble and by determining the optimum number of conformers by cross-validation [83]. A more indirect way of restraining the effective number of parameters is to restrict the conformational space that the molecule can search

270

Nilges

Figure 9 Treating internal dynamics during the refinement process. Due to dynamics and the d 6 weighting of the NOE, the measured distance may appear much shorter than the average distance. This can be accounted for by using ensemble refinement techniques. In contrast to standard refinement, an average distance is calculated over an ensemble of C structures (ensemble refinement) or a trajectory (time-averaged refinement). The time-averaged distance is defined with an exponential window over the trajectory. T is the total length over the trajectory, t is the time, and τ is a ‘‘relaxation time’’ characterizing the width of the exponential window.

during refinement. For example, with a full molecular dynamics force field and low temperatures, only a small fraction of the conformational space is accessible. A more direct way to restrict the number of parameters would be to use motional models. Normal modes have been used, for example, to model NMR order parameters obtained from relaxation studies [84].

Another principal difficulty is that the precise effect of local dynamics on the NOE intensity cannot be determined from the data. The dynamic correction factor [85] describes the ratio of the effects of distance and angular fluctuations. Theoretical studies based on NOE intensities extracted from molecular dynamics trajectories [86,87] are helpful to understand the detailed relationship between NMR parameters and local dynamics and may lead to structure-dependent corrections. In an implicit way, an estimate of the dynamic correction factor has been used in an ensemble relaxation matrix refinement by including order parameters for proton–proton vectors derived from molecular dynamics calculations [72]. One remaining challenge is to incorporate data describing the local dynamics of the molecule directly into the refinement, in such a way that an order parameter calculated from the calculated ensemble is similar to the measured order parameter.

Modeling in NMR Structure Determination

271

VII. STRUCTURE QUALITY AND ENERGY PARAMETERS

The well-known difficulties in calculating three-dimensional structures of macromolecules from NMR data mentioned above (sparseness of the data, imprecision of the restraints due to spin diffusion and internal dynamics) also make the validation of the structures a challenging task. The quality of the data [88] and the energy parameters used in the refinement [89] can be expected to influence the quality of structures. Several principles can be used to validate NMR structures.

First, the structure should explain the data. Apart from the energy or target function value returned by the refinement program, this check can be performed with some independent programs (e.g., AQUA/PROCHECK-NMR [90], MOLMOL [91]). The analysis of the deviations from the restraints used in calculating the structures is very useful in the process of assigning the NOE peaks and refining the restraint list. As indicators of the quality of the final structure they are less powerful, because violations have been checked and probably removed. A recent statistical survey of the quality of NMR structures found weak correlations between deviations from NMR restraints and other indicators of structure quality [88].

A similar problem arises with present cross-validated measures of fit [92], because they also are applied to the final clean list of restraints. Residual dipolar couplings offer an entirely different and, owing to their long-range nature, very powerful way of validating structures against experimental data [93]. Similar to cross-validation, a set of residual dipolar couplings can be excluded from the refinement, and the deviations from this set are evaluated in the refined structures.

Second, the structures should satisfy the a priori information used in the refinement in the form of the energy parameters. Programs like PROCHECK-NMR check for deviation from expected geometries and close non-bonded contacts.

Finally, structural properties that depend directly neither on the data nor on the energy parameters can be checked by comparing the structures to statistics derived from a database of solved protein structures. PROCHECK-NMR and WHAT IF [94] use, e.g., statistics on backbone and side chain dihedral angles and on hydrogen bonds. PROSA [95] uses potentials of mean force derived from distributions of amino acid–amino acid distances.

VIII. RECENT APPLICATIONS

Molecular modeling is an indispensable tool in the determination of macromolecular structures from NMR data and in the interpretation of the data. Thus, state-of-the-art molecular dynamics simulations can reproduce relaxation data well [9,96] and supply a model of the motion in atomic detail. Qualitative aspects of correlated backbone motions can be understood from NMR structure ensembles [63]. Additional data, in particular residual dipolar couplings, improve the precision and accuracy of NMR structures qualitatively [12].

Standard calculation methods developed for small proteins are sufficiently powerful to solve protein structures and complexes in the 30 kDa range and beyond [97,98] and protein–nucleic acid complexes [99]. Torsion angle dynamics offers increased conver-

272

Nilges

gence, in particular for nucleic acids, which are more difficult to calculate because of the sparseness of NMR data [100].

Examples of structures for which automated assignment methods were used from the start are still rare [69,101]. However, automated methods are being used increasingly as a powerful tool in structure determination in combination with manual assignment [102–105].

REFERENCES

1.T Havel, K Wu¨thrich. Bull Math Biol 46:673–698, 1984.

2.R Kaptein, ERP Zuiderweg, RM Scheek, WA van Gunsteren. J Mol Biol 182:179–182, 1985.

3.W Braun, N Go. J Mol Biol 186:611–626, 1985.

4.AT Bru¨nger, GM Clore, AM Gronenborn, M Karplus. Proc Natl Acad Sci USA 83:3801– 3805, 1986.

5.G Lipari, A Szabo, R Levy. Nature 300:197–198, 1982.

6.ET Olejniczak, CM Dobson, M Karplus, RM Levy. J Am Chem Soc 106:1923–1930, 1984.

7.R Bru¨schweiler, B Roux, M Blackledge, C Griesinger, M Karplus, R Ernst. J Am Chem Soc 114:2289–2302, 1992.

8.AG Palmer III. Curr Opin Struct Biol 7:732–737, 1997.

9.LM Horstink, R Abseher, M Nilges, CW Hilbers. J Mol Biol 287:569–577, 1999.

10.K Wu¨thrich. NMR of Proteins and Nucleic Acids. New York: Wiley, 1986.

11.JR Tolman, JM Flanagan, MA Kennedy, JH Prestegard. Proc Natl Acad Sci USA 92:9279– 9283, 1995.

12.N Tjandra, A Bax. Science 278:1111–1114, 1997.

13.M Nilges. Curr Opin Struct Biol 6:617–621, 1996.

14.D Case. Curr Opin Struct Biol 8:624–630, 1998.

15.G Cornilescu, F Delaglio, A Bax. J Biomol NMR 13:289–302, 1999.

16.JG Pearson, JP Wang, JL Markley, B Le Hong, E Oldfield. J Am Chem Soc 117:8823– 8829, 1995.

17.AJ Dingley, S Grzesiek. J Am Chem Soc 120:8293–8297, 1998.

18.F Cordier, S Grzesiek. J Am Chem Soc 121:1601–1602, 1999.

19.AM Gronenborn, GM Clore. Crit Rev Biochem Mol Biol 30:351–385, 1995.

20.M Nilges, MJ Macias, SI O’Donoghue, H Oschkinat. J Mol Biol 269:408–422, 1997.

21.M Nilges. J Mol Biol 245:645–660, 1995.

22.C Mumenthaler, W Braum. J Mol Biol 254:465–480, 1995.

23.A Kharrat, MJ Macias, T Gibson, M Nilges, A Pastore. EMBO J 14:3572–3584, 1995.

24.JG Omichinski, PV Pedone, G Felsenfeld, AM Gronenborn, GM Clore. Nature Struct Biol 4:122–132, 1997.

25.RHA Folmer, M Nilges, PJM Folkers, RNH Konings, CW Hilbers. J Mol Biol 240:341– 357, 1994.

26.M Ubbink, M Ejdeback, BG Karlsson, DS Bendall. Structure 6:323–335, 1998.

27.M Nilges, AM Gronenborn, AT Bru¨nger, GM Clore. Protein Eng 2:27–38, 1988.

28.J de Vlieg, RM Scheek, WF van Gunsteren, HJC Berendsen, R Kaptein, J Thomason. Proteins: Struct, Funct, Genet 3:209–218, 1988.

29.G Haenggi, W Braun. FEBS Lett 344:147–153, 1994.

30.RB Altman, O Jardetzky. Methods Enzymol 177:218–246, 1989.

31.AT Bru¨nger, M Nilges. Quart Rev Biophys 26:49–125, 1993.

32.AT Bru¨nger, PD Adams, LM Rice. Structure 5:325–336, 1997.

33.P Gu¨ntert, C Mumenthaler, K Wu¨thrich. J Mol Biol 17:283–298, 1997.

34.NB Ulyanov, U Schmitz, TL James. J Biomol NMR 3:547–568, 1993.

Modeling in NMR Structure Determination

273

35.MJ Bayley, G Jones, P Willet, MP Williamson. Protein Sci 7:491–499, 1998.

36.CS Wang, T Lozano-Pe´rez, B Tidor. Proteins 32:26–42, 1998.

37.DM Standley, VA Eyrich, AK Felts, RA Friesner, AE McDermott. J Mol Biol 285:1691– 1710, 1999.

38.G Casari, M Sippl. J Mol Biol 224:725–732, 1992.

39.M Nilges, GM Clore, AM Gronenborn. FEBS Lett 239:129–136, 1988.

40.GM Crippen, TF Havel. Acta Cryst A34:282–284, 1978.

41.TF Havel, ID Kuntz, GM Crippen. J Theor Biol 104:359–381, 1983.

42.ID Kuntz, JF Thomason, CM Oshiro. Methods Enzymol 177:159–204, 1989.

43.TF Havel. Prog Biophys Mol Biol 56:43–78, 1991.

44.GM Crippen, TF Havel. Distance Geometry and Molecular Conformation. Taunton, England: Research Studies Press, 1988.

45.AR Leach. Molecular Modeling: Principles and Applications. Harlow, UK: Longman, 1996, pp 426–434.

46.JM Blaney, JS Dixon. Distance geometry in molecular modeling. In: KB Lipkowitz, DB Boyd, eds. Reviews in Computational Chemistry, Vol 5. New York: VCH, 1994, pp 299–335.

47.GM Crippen. J Comput Phys 24:96–107, 1977.

48.TF Havel. Biopolymers 29(12–13):1565–1585, 1990.

49.ME Hodsdon, JW Ponder, DP Cistola. J Mol Biol 264:585–602, 1997.

50.W Metzler, D Hare, A Pardi. Biochemistry 28:7045–7052, 1989.

51.J Kuszewski, M Nilges, AT Bru¨nger. J Biomol NMR 2:33–56, 1992.

52.JC Gower. Biometrika 53:325–338, 1966.

53.W Braun. Quart Rev Biophys 19:115–157, 1987.

54.RC van Schaik, HJ Berendsen, AE Torda, WF van Gunsteren. J Mol Biol 234:751–762, 1993.

55.M Nilges, GM Clore, AM Gronenborn. FEBS Lett 229:317–324, 1988.

56.HJC Berendsen, JPM Postma, WF van Gunsteren, A DiNola, J Haak. J Chem Phys 81:3684– 3690, 1984.

57.LM Rice, AT Bru¨nger. Proteins 19:277–290, 1994.

58.EG Stein, LM Rice, AT Bru¨nger. J Magn Reson 124:154–164, 1997.

59.AK Mazur, RA Abagyan. J Biomol Struct Dyn 5:815–832, 1989.

60.DS Bae, EJ Haug. Mech Struct Mach 106:258–268, 1987.

61.A Jain, N Vaidehi, G Rodrigues. J Comput Phys 106:258–268, 1993.

62.RHA Folmer, RNH Konings, CW Hilbers, M Nilges. J Biomol NMR 9:245–258, 1997.

63.R Abseher, L Horstink, CW Hilbers, M Nilges. Proteins 31:370–382, 1998.

64.P Gu¨ntert. Quart Rev Biophys 31:145–237, 1998.

65.AT Bru¨nger. X-PLOR. A System for X-Ray Crystallography and NMR. New Haven, CT: Yale Univ Press, 1992.

66.AT Bru¨nger, PD Adams, GM Clore, WL DeLano, P Gros, RW Grosse-Kunstleve, J-S Jiang, J Kuszewski, M Nilges, NS Pannu, RJ Read, LM Rice, T Simonson, GL Warren. Acta Cryst D 54:905–921, 1998.

67.M Nilges, SI O’Donoghue. Prog NMR Spectrosc 32: 107–139, 1998.

68.AT Bru¨nger, J Kuriyan, M Karplus. Science 235:458–460, 1987.

69.M Sunnerhagen, M Nilges, G Otting, J Carey. Nature Struct Biol 4:819–826, 1997.

70.SI O’Donoghue, M Nilges. Calculation of symmetric oligomer structures from NMR data. In: R Krishna, JL Berliner, eds. Modern Techniques in Protein NMR, vol. 17 of Biological Magnetic Resonance. New York: Kluwer Academic/Plenum, pp 131–161, 1999.

71.TL James. Curr Opin Struct Biol 1:1042–1053, 1991.

72.AMJJ Bonvin, R Boelens, R Kaptein. Determination of biomolecular structures by NMR: Use of relaxation matrix calculations. In: WF van Gunsteren, PK Weiner, AJ Wilkinson, eds. Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, Vol 2. Leiden: ESCOM, 1993, pp 407–440.

274

Nilges

73.D Case. New directions in NMR spectral simulation and structure refinement. In: WF van Gunsteren, PK Weiner, AJ Wilkinson, eds. Computer Simulation of Biomolecular Systems: Theoretical and Experimental Applications, Vol 2. Leiden: ESCOM, 1993, pp 382–406.

74.L Zhou, HJ Dyson, PE Wright. J Biomol NMR 11:17–29, 1998.

75.P Yip. J Biomol NMR 3:361–365, 1993.

76.R Boelens, TMG Koning, R Kaptein. J Mol Struct 173:299–311, 1989.

77.BA Borgias, TL James. J Magn Reson 87:475–487, 1990.

78.H Liu, HP Spielmann, NB Ulyanov, DE Wemmer, TL Jamew. J Biomol NMR 6:390–402, 1995.

79.C Redfield, J Boyd, LJ Smith, RAG Smith, CM Dobson. Biochemistry 31:10431–10437, 1992.

80.KD Berndt, P Gu¨ntert, K Wu¨thrich. Proteins 24:304–313, 1996.

81.WF van Gunsteren, RM Brunne, P Gros, RC van Schaik, CA Schiffer, AE Torda. Methods Enzymol 261:619–654, 1994.

82.J Kemmink, RM Scheek. J Biomol NMR 5:33–40, 1995.

83.AMJJ Bonvin, AT Bru¨nger. J Mol Biol 250:80–93, 1995.

84.R Bru¨schweiler. J Am Chem Soc 114:5341–5344, 1992.

85.R Bru¨schweiler, D Case. Prog NMR Spectrosc 26:27–58, 1994.

86.R Bru¨schweiler, B Roux, M Blackledge, C Griesinger, M Karplus, RR Ernst. J Am Chem Soc 114:2289–2302, 1992.

87.T Schneider, AT Bru¨nger, M Nilges. J Mol Biol 285:727–740, 1999.

88.JF Doreleijers, JA Rullmann, R Kaptein. J Mol Biol 281:149–164, 1998.

89.JP Linge, M Nilges. J Biomol NMR 13:51–59, 1999.

90.R Laskowski, J Rullman, M MacArthur, R Kaptein, J Thornton. J Biomol NMR 8:477–486, 1996.

91.R Koradi, M Billeter, K Wu¨thrich. J Mol Graph 14:29–32, 51–55, 1996.

92.AT Bru¨nger, GM Clore, AM Gronenborn, R Saffrich, M Nilges. Science 261:328–331, 1993.

93.G Cornilescu, JL Marquardt, M Ottiger, A Bax. J Am Chem Soc 120:6836–6837, 1998.

94.G Vriend, C Sander. J Appl Cryst 26:47–60, 1993.

95.MJ Sippl. Proteins 17:355–362, 1993.

96.M Philippopoulos, AM Mandel, AG Palmer III, C Lim. Proteins 28:481–493, 1997.

97.DS Garrett, YJ Seok, A Peterkofsky, AM Gronenborn, GM Clore. Nature Struct Biol 6:166– 173, 1999.

98.M Caffrey, M Cai, J Kaufman, SJ Stahl, PT Wingfield, DG Covell, AM Gronenborn, GM Clore. EMBO J 17:4572–4584, 1998.

99.FH Allain, PW Howe, D Neuhaus, G Varani. EMBO J 16:5764–5772, 1997.

100.MH Kolk, M van der Graaf, SS Wijmenga, CW Pleij, HA Heus, CW Hilbers. Science 280: 434–438, 1998.

101.Y Xu, J Wu, D Gorenstein, W Braun. J Magn Reson 136:76–85, 1999.

102.FY Luh, SJ Archer, PJ Domaille, BO Smith, D Owen, DH Brotherton, AR Raine, X Xu, L Brizuela, SL Brenner, ED Laue. Nature 389:999–1003, 1997.

103.B Aghazadeh, K Zhu, TJ Kubiseski, GA Liu, T Pawson, Y Zheng, MK Rosen. Nature Struct Biol 5:1098–1107, 1998.

104.SC Li, C Zwahlen, SJ Vincent, CJ McGlade, LE Kay, T Pawson, JD Forman-Kay. Nature Struct Biol 5:1075–1083, 1998.

105.HR Mott, D Owen, D Nietlispach, PN Lowe, E Manser, L Lim, ED Laue. Nature 399:384– 388, 1999.

14

Comparative Protein Structure

Modeling

Andras´ Fiser, Roberto Sanchez,´

ˇ

Francisco Melo, and Andrej Sali

The Rockefeller University, New York, New York

I.INTRODUCTION

The aim of comparative or homology protein structure modeling is to build a three-dimen- sional (3D) model for a protein of unknown structure (the target) based on one or more related proteins of known structure (the templates) (Fig. 1) [1–6]. The necessary conditions for getting a useful model are that the similarity between the target sequence and the template structures is detectable and that the correct alignment between them can be constructed. This approach to structure prediction is possible because a small change in the protein sequence usually results in a small change in its 3D structure [7]. Although considerable progress has been made in the ab initio protein structure prediction, comparative protein structure modeling remains the most accurate prediction method. The overall accuracy of comparative models spans a wide range. At the low end of the spectrum are the low resolution models whose only essentially correct feature is their fold. At the high end of the spectrum are the models with an accuracy comparable to medium resolution crystallographic structures [6]. Even low resolution models are often useful for addressing biological questions, because function can often be predicted from only coarse structural features of a model.

At this time, approximately one-half of all sequences are detectably related to at least one protein of known structure [8–11]. Because the number of known protein sequences is approximately 500,000 [12], comparative modeling could in principle be applied to over 200,000 proteins. This is an order of magnitude more proteins than the number of experimentally determined protein structures ( 13,000) [13]. Furthermore, the usefulness of comparative modeling is steadily increasing, because the number of different structural folds that proteins adopt is limited [14,15] and because the number of experimentally determined structures is increasing exponentially [16]. It is predicted that in less than 10 years at least one example of most structural folds will be known, making comparative modeling applicable to most protein sequences [6].

All current comparative modeling methods consist of four sequential steps (Fig. 2) [5,6]. The first step is to identify the proteins with known 3D structures that are related to the target sequence. The second step is to align them with the target sequence and pick those known structures that will be used as templates. The third step is to build the model

275

276

Fiser et al.

Figure 1 The basis of comparative protein structure modeling. Comparative modeling is possible because evolution resulted in families of proteins, such as the flavodoxin family, modeled here, which share both similar sequences and 3D structures. In this illustration, the 3D structure of the flavodoxin sequence from C. crispus (target) can be modeled using other structures in the same family (templates). The tree shows the sequence similarity (percent sequence identity) and structural

˚

similarity (the percentage of the Cα atoms that superpose within 3.8 A of each other and the RMS difference between them) among the members of the family.

for the target sequence given its alignment with the template structures. In the fourth step, the model is evaluated using a variety of criteria. If necessary, template selection, alignment, and model building are repeated until a satisfactory model is obtained. The main difference among the comparative modeling methods is in how the 3D model is calculated from a given alignment (the third step). For each of the steps in the modeling process, there are programs and servers available on the World Wide Web (Table 1).

We begin this chapter by describing the techniques for all the steps in comparative modeling (Section II). We continue by discussing the errors in model structures (Section IV) and methods for detecting these errors (Section V). We conclude by listing sample applications of comparative modeling to individual proteins (Section VI) and to whole genomes (Section VII). We emphasize our own work and experience, although we have profited greatly from the contributions of many others, cited in the list of references. The citations are not exhaustive, but exhaustive lists can be found in Refs. 5 and 6. The chapter highlights pragmatically the methods and tools for comparative modeling rather than the physical principles and rules on which the methods are based.