Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
Скачиваний:
68
Добавлен:
15.08.2013
Размер:
5.59 Mб
Скачать

86

Becker

It has been shown that similar conformations that belong to adjacent energy basins separated by high energy barriers are incorrectly grouped together by the straightforward cluster analysis [29].

C. Principal Component Analysis

An inherent problem associated with conformational analysis is the high dimensionality of the molecular conformational spaces. An N-atom molecule has 3N degrees of freedom, and its corresponding conformational space is (3N 6)-dimensional. As a result, even relatively small molecules have very large conformational spaces, making them difficult to analyze. For example, a small heptapeptide may have a 100-dimensional or even 150dimensional conformational space, depending on its precise amino acid composition. Principal component analysis (PCA) is a computational tool that reduces the effective dimensionality of molecular conformational spaces while retaining an accurate representation of the interconformational distances. This task is accomplished by projecting the original multidimensional data onto an optimal low-dimensional subspace, allowing visual inspection of conformational spaces and of dynamic trajectories that traverse these spaces. Principal component analysis (PCA) was introduced to protein simulations under the name quasi-harmonic analysis by Ichiye and Karplus [31] and is becoming widely used for a variety of applications involving sampling and visualization of conformational spaces [32– 35]. A review of principal component analysis can be found in Ref. 36.

How does principal component analysis work? Consider, for example, the two-di- mensional distribution of points shown in Figure 7a. This distribution clearly has a strong linear component and is closer to a one-dimensional distribution than to a full two-dimen- sional distribution. However, from the one-dimensional projections of this distribution on the two orthogonal axes X and Y you would not know that. In fact, you would probably conclude, based only on these projections, that the data points are homogeneously distributed in two dimensions. A simple axes rotation is all it takes to reveal that the data points

Figure 7 (a) A two-dimensional distribution of points and their one-dimensional projections on the original axes. Judging just from the 1D projections one would probably conclude that the original distribution is homogeneously distributed in two dimensions. (b) The same distribution of points and their 1D projections on the new axes set obtained by PCA by using a similarity transformation. The new 1D projections highlight the strong 1D character of the distribution.

Conformational Analysis

87

are preferentially spread along one dimension, as reflected by the broad distribution along one of the new axes and a narrow distribution along the other in Figure 7b. The above procedure is what is done by PCA. Starting from a multidimensional distribution of data points (in our case, of molecular conformations), PCA performs a similarity transformation on the original axes to find a new set of axes that best fits the data. The first new axis is selected such that the variance of the distribution along it is the largest possible. The second axis is placed orthogonal to the first in the direction of the second largest variance of the distribution, and so forth. In this new axes set it is usually possible to identify a low-dimensional subspace that captures most of the relative distances between individual conformations.

In general, two related techniques may be used: principal component analysis (PCA) and principal coordinate analysis (PCoorA). Both methods start from the n m data matrix M, which holds the m coordinates defining n conformations in an m-dimensional space. That is, each matrix element Mij is equal to qij, the jth coordinate of the ith conformation. From this starting point PCA and PCoorA follow different routes.

Principal component analysis (PCA) takes the m-coordinate vectors q associated with the conformation sample and calculates the square m m MTM matrix, reflecting the relationships between the coordinates. This matrix, also known as the covariance matrix C,

is defined as

 

C (q q)(q q)T

(14)

where the averaging is over the conformation sample (in Cartesian space m 3N for an N-atom molecule). The covariance matrix C is diagonalized to obtain the eigenvectors that capture most of the variation in atomic position fluctuations.

Principal coordinate analysis (PCoorA) [37], on the other hand, operates on the square n n MMT matrix, reflecting the relationships between the conformations. The elements of this matrix, also known as the distance matrix , are distances dij between two conformations i and j [such as those defined in Eqs. (12) and (13)]. Since the distances dij can also be obtained from the n n matrix A of latent roots (eigenvectors), one can use this matrix for the projection, defining Aij 1/2d2ij and Aii 0 (for i, j 1, 2, . . . , n). To guarantee that the matrix A has a zero root (and thus guarantee that it corresponds to a real configuration) it is ‘‘centered,’’ so that the sum of every row and the sum of every column of A is zero. This centering, which does not alter the distances dij, is defined as

Aij* Aij Aij i Aij j 2Aij ij

(15)

where k is the mean over all specific indices k i, j, ij. The centered matrix A* is diagonalized using standard matrix algebra to obtain the latent eigenvectors and the diagonal matrix of eigenvalues. The resulting eigenvalues (normalized) give the percentage of the projection of the original distribution on the new coordinate set, and the eigenvectors (scaled by their corresponding eigenvalues) give the new coordinates of the original points in the new axes set. For a more detailed description of this method, see Refs. 29 and 37.

It should be stressed that PCA and PCoorA are dual methods that give the same analytical results. Using one or the other is simply a matter of convenience, whether one prefers to work with the covariance matrix C or with the distance matrix ∆.

As stated earlier, the main motivation for using either PCA or PCA is to construct a low-dimensional representation of the original high-dimensional data. The notion behind this approach is that the effective (or essential, as some call it [33]) dimensionality of a molecular conformational space is significantly smaller than its full dimensionality (3N- 6 degrees of freedom for an N-atom molecule). Following the PCA procedure, each new

88

Becker

Figure 8 A joint principal coordinate projection of the occupied regions in the conformational spaces of linear (Ala)6 (triangles) and its conformational constraint counterpart, cyclic-(Ala)6 (squares), onto the optimal 3D principal axes. The symbols indicate the projected conformations, and the ellipsoids engulf the volume occupied by the projected points. This projection shows that the conformational volume accessible to the cyclic analog is only a small subset of the conformational volume accessible to the linear peptide, (Adapted from Ref. 41.)

axis k is associated with a normalized eigenvalue λk that reflects the relative weight of that axis in reproducing the original data. An axis with a high λk value is significant for the projection, whereas axes with small λk values are insignificant. By sorting the new axes according to their λk weight it is possible to select a small subset of effective coordinates that capture most of the conformational relationships of the original high-dimen- sional space. The quality of such a projection can be estimated by the average difference between conformation distances reconstructed in the low s-dimensional subspace d(ijs) and the original distance dij. The reconstructed distances are defined as

s

 

dij(s)2 (Qik Qjk)2

(16)

k 1

where Qik is the coordinate of the ith conformation along the kth new (principal) axis. It can be shown that the average deviation of the distances in s dimensions from the exact distances is given by the sum of the first s eigenvalues,

s

 

dij2 dij(s)2 ij 1 λk

(17)

k 1

Thus by summing the normalized eigenvalues of the first s dimensions one can judge the quality of a projection onto that subspace.

Fortunately, it was found that in polypeptide systems the effective dimensionality of conformational spaces is significantly smaller than the dimensionality of the full space, with only a few principal axes contributing to the projection [38–41]. In fact, in many cases a projection quality of 70–90% can be achieved in as few as three dimensions [42], opening the way for real 3D visualization of molecular conformational space. Figure 8

Conformational Analysis

89

shows a 3D visualization of the conformational spaces of two hexapeptides, (Ala)6 and cyclic-(Ala)6, jointly projected on the same principal coordinate set. The comparison shows that the conformation volume occupied by the linear peptide is about 10 times larger than the conformation volume occupied by its conformationally constrained counterpart. Quantifying relative flexibility of analogous peptides through joint principal projections was shown to be useful, for example, in predicting their relative bioactivity [41].

V.CONCLUSION

In this chapter we surveyed a variety of computational methods that contribute to the ‘‘conformational analysis’’ of complex molecules. These include methods for searching and sampling the molecular conformation space, methods for local optimization of the sampled conformations, and basic analytical techniques. In practice, many variations of the basic methodologies are reported as researchers continuously try to improve and enhance these procedures. The different methods are often used in conjunction to form a complete conformational analysis study of a bimolecular system of interest. However, each of the different procedures can also be used separately as part of computational studies with other goals. The need for these analytical techniques is to a large extent brought about by the continuous increase is simulation times, which generates more data than ever before, requiring systematic ways to interpret and represent them.

REFERENCES

1.G Jolles, KRH Wooldridge. Drug Design: Fact or Fantasy? London: Academic Press, 1984.

2.DA Gschwend, AC Good, ID Kuntz. J Mol Recogn 9:175, 1996.

3.RE Bruccoleri, M Karplus. Conformational sampling using high-temperature molecular dynamics. Biopolymers 29:1847–1862, 1990.

4.N Metropolis, S Ulam. The Monte Carlo method. J Am Stat Assoc 44:335–341, 1949.

5.MP Allen, DJ Tildesley. Computer Simulations of Liquids. Oxford: Oxford Univ Press, 1989.

6.D Frenkel, B Smit. Understanding Molecular Simulation: From Algorithms to Applications. San Diego: Academic Press, 1996.

7.J Cao, BJ Berne. Monte Carlo methods for accelerating barrier crossing: Anti-force-bias and variable step algorithms. J Chem Phys 92:1980–1985, 1990.

8.I Andricioaei, JE Straub. On Monte Carlo and molecular dynamics methods inspired by Tsallis statistics: Methodology, optimization, and application to atomic clusters. J Chem Phys 107: 9117–9124, 1997.

9.WH Press, BP Flannery, SA Teukolsky, WT Vetterling. Numerical Recipes: The Art of Scientific Computing. Cambridge, UK: Cambridge Univ Press, 1989.

10.DE Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley, 1989.

11.LD Davis. Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold, 1991.

12.M Vieth, JD Hirst, BN Dominy, H Daigler, CL Brooks III. Assessing search strategies for flexible docking. J Comput Chem 19:1623–1631, 1998.

13.RS Judson, EP Jaeger, AM Treasurywala. A genetic algorithm based method for docking flexible molecules. THEOCHEM 114:191–206, 1994.

14.CM Oshiro, ID Kuntz, JS Dixson. Flexible ligand docking using a genetic algorithm. J Com- put-Aided Mol Des 9:113–130, 1995.

15.R Bruccoleri, M Karplus. Prediction of the folding of short polypeptide segments by uniform conformational sampling. Biopolymers 26:137–168, 1987.

90

Becker

16.GM Crippen, TF Havel. Stable calculations of coordinates from distance information. Acta Cryst A 34:282, 1978.

17.TF Havel. An evaluation of computational strategies for use in the determination of protein structure from distance constraints obtained by nuclear magnetic resonance. Prog Biophys Mol Biol 56:43, 1991.

18.RP Sheridan, R Nilakatan, JS Dixson, R Venkataraghavan. The ensemble approach to distance geometry: Application to the nicotinic pharmacophore. J Med Chem 29:899–906, 1986.

19.JM Blaney, JS Dixon. Distance geometry in molecular modeling. In: KB Lipkowitz, DB Boyd, eds. Reviews in Computational Chemistry, Vol 5. New York: VCH, pp 299–335.

20.DD Frantz, DL Freeman, JD Doll. Reducing quasi-ergodic behavior in Monte Carlo simulations by J-walking: Applications to atomic clusters. J Chem Phys 93:2769-2784, 1990.

21.E Marinari, G Parisi. Europhys Lett 19:451, 1992.

22.M Falcioni, MW Deem. A biased Monte Carlo scheme for zeolite structure solution. J Chem Phys 110:1754–1766, 1999.

23.JA McCammon, SC Harvey. Dynamics of Proteins and Nucleic Acids. Cambridge, UK: Cambridge Univ Press, 1987.

24.RH Boyd. J Chem Phys 49:2574, 1968.

25.BR Brooks, RE Bruccoleri, BD Olafson, DJ States, S Swaminathan, M Karplus. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations. J Comput Chem 4:187–217, 1983.

26.E Aarts, J Korst. Simulated Annealing and Boltzmann Machines. New York: Wiley, 1990.

27.C Wilson, S Doniach. Proteins: Struct Funct Genet 6:193, 1989.

28.Y Levy, OM Becker. Effect of conformational constraints on the topography of complex potential energy surfaces. Phys Rev Lett 81:1126–1129, 1998.

29.OM Becker. Geometrical versus topological clustering: An insight into conformation mapping. Proteins 27: 213–226, 1997.

30.H Spath. Cluster-Analysis Algorithms for Data Reduction and Classification of Objects. Chichester: Ellis Horwood, 1980.

31.T Ichiye, M Karplus. Collective motions in proteins: A covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins: Struct Funct Genet 11: 205–217, 1991.

32.ANE Garcia. Large-amplitude nonlinear motions in proteins. Phys Rev Lett 68:2696–2699, 1992.

33.A Amadei, ABM Linssen, HJC Berendsen. Essential dynamics of proteins. Proteins 17:412– 425, 1993.

34.S Hayward, A Kitao, N Go. Harmonic and anharmonic aspects in the dynamics of BPTI: A normal mode analysis and principal component analysis. Protein Sci 3:936–943, 1994.

35.OM Becker. Quantitative visualization of a macromolecular potential energy funnel’. J Mol Struct (THEOCHEM) 398–399:507–516, 1997.

36.DA Case. Curr Opin Struct Biol 4:285–290, 1994.

37.JC Gower. Some distance properties of latent root and vector methods used in multivariant analysis. Biometrika 53:325–338, 1966.

38.R Abagyan, P Argos. Optimal protocol and trajectory visualization for conformational searches of peptides and proteins. J Mol Biol 225:519–532, 1992.

39.JM Troyer, FE Cohen. Protein conformational landscapes: Energy minimization and clustering of a long molecular dynamics trajectory. Proteins 23:97–110, 1995.

40.LSD Caves, JD Evanseck, M Karplus. Locally accessible conformations of proteins: Multiple molecular dynamics simulations of crambin. Protein Sci 7:649–666, 1998.

41.OM Becker, Y Levy, O Ravitz. Flexibility, conformation spaces, and bioactivity. J Phys Chem B 104:2123–2135, 2000.

42.OM Becker. Principal coordinate maps of molecular potential energy surfaces. J Comput Chem 19:1255–1267, 1998.