Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Friesner R.A. (ed.) - Advances in chemical physics, computational methods for protein folding (2002)(en)

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

6.52 Mб

Скачать

☆

<<< < Предыдущая 17 18 19 20 21 22 23 24 25 26 27 2829 / 5429 30 31 32 33 34 35 36 37 38 39 40 41 > Следующая >>>

ab initio protein structure prediction

245

TABLE VII

Comparison of Rankings for PDB Secondary Structure and DSSP Secondary Structure for Several Cases from the Test Set

Protein	˚	˚	˚	˚	Comments
Protein	< 4 A	< 5 A	< 6 A	< 7 A	Comments
1ag2	—	—	—	87	PDB secondary structure, terminal loops deleted
1ag2	—	—	—	11	DSSP secondary structure, terminal loops included
1hsn	—	—	19	19	PDB secondary structure, terminal loops deleted
1hsn	—	—	23	10	DSSP secondary structure, terminal loops deleted
1orc	8	6	6	6	PDB secondary structure, terminal loops deleted
1orc	1	1	1	1	DSSP secondary structure, terminal loops included
1ris	9	9	9	9	PDB secondary structure, terminal loops deleted
1ris	—	—	3	3	DSSP secondary structure, terminal loops included

typically thousands of energy units. This is in sharp contrast to the results obtained with our previous tertiary folding potential, which routinely generated energy gaps between misfolded and native-like structures that were 5–10 times larger than those seen here.

D.Effects of Secondary Structure Deﬁnition and Truncation of Terminal Loops

The results presented above employ PDB-deﬁned secondary structure and in some cases involve truncation of terminal loops, primarily carried out here to facilitate direct comparisons with the results of Ref. 2. However, the process of deﬁning secondary structure even with X-ray crystallographic or NMR coordinates in hand is not entirely unambiguous, and the effects of terminal loops could be favorable or unfavorable. To examine these issues, we selected several proteins in Table V for which the results with the new potential appeared less accurate than would have been expected given the difﬁculty of the case being considered. Table VII presents results for these selected cases, listing the protein and identifying what experiments were carried out. Most of the cases examined are mixed a/b because these displayed the most signiﬁcant problems. It can be seen that in some cases the use of a different secondary structure deﬁnition (e.g., DSSP rather than PDB) and the inclusion or deletion of a terminal loop has a substantial effect on the ranking of low RMSD structures. Clearly, more work needs to be done in understanding these effects.

E.Effects of Using Ideal Rather than PDB-Derived Three-Dimensional

Topologies for Secondary Structure Elements

Having established that our new size-dependent potential is quite effective for generating low-resolution structures of proteins below 100 residues using secondary structure derived from PDB coordinates, we next ask what the effect is of using ideal torsional angles for helices and strands as opposed to PDB-derived

246 volker a. eyrich, richard a. friesner, and daron m. standley

torsion angles. Tables V and VIc summarize results for the entire test set of proteins utilizing ideal secondary structure elements. The results are surprisingly good; while there are certainly cases in which quantitative degradation of the rank of the best low-RMSD structure occurs (particularly with a/b-proteins—for example, the proteins 2fdn, 1fwp, and 5pti), in general the simulations are able to ﬁnd such structures successfully and to rank them reasonably well in terms of total energy. Even in the case of 5pti, where there is severe distortion of the b- strands in the native structure, the use of ideal strands produces reasonable results. While incorporation of strand distortion is possible in our methodology [4], the reasonable predictive capability using ideal elements is likely to save considerable computational effort because one can carry out such simulations initially and then use the results as a starting point from which to incorporate distortions and other detailed effects.

IV. USE OF PREDICTED RATHER THAN PDB-DERIVED

SECONDARY STRUCTURE ELEMENTS

A.Overview

Secondary structure prediction methods, while they have improved signiﬁcantly over the past decade (principally via the use of multiple sequence analysis), still have nontrivial error rates. The best method at present appears to be the PSIPRED approach developed by Jones [26], which is claimed to achieve an accuracy between 76% and 78% on a reasonably large training set (it also outperformed other methods in the CASP3 contest). This level of reliability appears to be sufﬁcient for low-resolution ab initio structure prediction and suggested to us that it was now worth experimenting with tertiary folding calculations based entirely on predicted, rather than PDB-derived, secondary structure [27–32]. Using servers set up on the World Wide Web, we are able to obtain predictions from PSIPRED and other secondary structure prediction algorithms for proteins in our test set. We have obtained results from a variety of servers to see what happens in cases where their predictions disagree; it is likely that ab initio prediction will involve trying a number of secondary structures, because in some cases the tertiary fold will be critical in selecting among plausible secondary structures predicted exclusively from sequence data.

Our calculations in this section endeavor to answer the following questions:

1.Can we for some percentage of cases make a successful ab initio prediction? We explore two different approaches below.

2.What are the effects of small errors in secondary structure—for example, elimination or addition of small elements, incorrect lengths of major elements and so on?

ab initio protein structure prediction

247

3.What is the impact of a major error—for example, replacing a long helix by a similar strand or missing an important loop?

In the present chapter, we have chosen to focus our ab initio prediction efforts primarily on a-helical proteins, although one mixed a/b-protein is also examined. The ab initio prediction calculations presented below are considerably more computationally intensive than those using PDB-derived secondary structure, because we have investigated a substantial number of secondary structure predictions for each protein. By studying helical systems intensively, we are able to draw conclusions concerning the necessary and sufﬁcient conditions for success for such systems from a signiﬁcant database of results. In addition to the a-helical proteins in the 50to 100-residue range from the data set above, we also include two helical proteins from the CASP3 prediction contest. Our results for the CASP3 test cases are similar to those from the PDBderived test suite.

B.Secondary Structure Prediction Methods

We use the following secondary structure prediction methods in our ab initio predictions:

*PSIPRED [26]: A two-stage neural network that predicts protein secondary structure based on the position speciﬁc scoring matrices generated by PSI-BLAST (available at http: //insulin:brunel:ac:uk/psipred/Þ. Average three-state prediction accuracy is between 76.5% and 78.3%. Currently the most accurate method.

*PhD [33,34]: Secondary structure is predicted by a system of neural networks (available at http: //cubic:bioc:columbia:edu/pp/Þ. Overall threestate prediction accuracy is 72.1%. The default secondary structure prediction settings were used in all predictions.

*JPRED [35,36]: A methodology that combines a total of six secondary structure predictions into one consensus prediction (available at http:// jura.ebi.ac.uk:8888/ at the time of this writing). Average ‘‘real world’’ accuracy is 72.9%. Note that the PhD predictions generated by JPRED differ from the original PhD predictions (denoted: orig_phd) mentioned above. In addition to using the consensus prediction, we also report results for the six individual prediction methods included in the JPRED server.

By default, secondary structure prediction accuracies reported here are determined with DSSP as the reference (for details see Ref. 26). The secondary structure assignments used in the actual calculation differ from the original predictions in that helices and strands of less than three residues are eliminated. N- and C-terminal loops are deleted.

248 volker a. eyrich, richard a. friesner, and daron m. standley

C.Simulation Protocols

The amino acid sequence of the target represents the only input data for our methodology. We do not carry out explicit database searches (i.e., threading) of any sort. Secondary structure predictions from the sources listed above are parsed and used directly in the structure predictions. In the case of JPRED we examine individually the results of all predictions that contribute to the consensus prediction (DSC [37], PhD [33,34], PREDATOR [38,39], NNSSP [40], Mulpred, and Zpred [41]). Because we do not assume any knowledge of approximate radius of gyration of the target, which is important for the selection of the correct potential energy parameters, we predict the radius of gyration via a simple formula [22] and use this prediction to assign the size bin for the tertiary folding simulation.

The ﬁrst stage of our prediction algorithm applies the MCM-based approach described above to each of the nine secondary structure predictions for each target. Simulations are usually carried out on two to four nodes of a multiprocessor machine and take between 12 and 24 hours depending on protein size. To extract the structurally unique predictions, we apply the clustering algorithm discussed above. Table VIII shows the results of this procedure for the three targets discussed in more detail below. We list results for every secondary structure prediction (unless predictions consist only of loop or coil, in which case we did not believe it worthwhile to carry out the simulation).

Because it is quite possible that simulations utilizing different secondary structure predictions results in very similar representative low energy structures, we apply a second level of ﬁltering which basically tries to eliminate structurally similar predictions and ranks the resulting ‘‘unique’’ predictions on a absolute energy scale. The ﬁrst step in this process is the determination of the subset of residues common to all predictions (regardless of whether they belong to helices or strands). Secondary structure predictions for which the number of residues included in the simulation is substantially smaller than the average (due

TABLE VIII

Individual Clustering Results for the ab initio prediction Targets Discussed in More Detail in the Text (Stage 1 of the Composite Prediction Method)a

Protein	SSP	Q3	Nres	˚	˚	˚	˚
Protein	SSP	Q3	Nres	< 4 A	< 5 A	< 6 A	< 7 A
1aj3	cons	94.90	93	—	—	—	—
1aj3	dsc	87.76	93	—	—	—	—
1aj3	mul	86.73	92	—	—	—	—
1aj3	nnssp	95.92	98	—	—	2	2
1aj3	orig_phd	89.80	89	—	—	3	2
1aj3	phd	88.78	88	—	—	—	—
1aj3	pred	88.78	92	—	—	—	—

	ab initio protein structure prediction						249
		TABLE VIII		(Continued)

Protein	SSP	Q3	Nres	˚	˚	˚	˚
Protein	SSP	Q3	Nres	< 4 A	< 5 A	< 6 A	< 7 A
1aj3	psipred	93.88	94	—	—	1	1
1aj3	zpred	93.88	98	—	—	2	2
1am3	cons	92.86	58	—	—	24	3
1am3	dsc	88.57	57	—	—	11	2
1am3	mul	77.14	59	—	—	11	11
1am3	nnssp	88.57	60	—	8	8	8
1am3	orig_phd	92.86	58	—	22	8	6
1am3	phd	94.29	58	—	4	4	2
1am3	pred	72.86	57	—	—	—	—
1am3	psipred	88.57	57	—	35	1	1
1am3	zpred	67.14	68	—	—	—	—
1mzm	cons	44.09	44	—	—	2	2
1mzm	dsc	66.67	67	—	—	26	3
1mzm	mul	37.63	68	—	—	243	4
1mzm	nnssp	38.71	93	—	—	—	—
1mzm	orig_phd	59.14	74	—	—	50	3
1mzm	phd	38.71	49	—	—	9	1
1mzm	pred	55.91	44	—	—	18	6
1mzm	psipred	78.49	78	—	1	1	1
1mzm	zpred	34.41	82	—	—	—	—
1eh2	cons	87.37	68	—	12	5	5
1eh2	dsc	80.01	73	—	—	—	—
1eh2	mul	74.74	43	—	3	3	3
1eh2	nnssp	86.32	67	—	6	3	1
1eh2	orig_phd	86.32	68	—	15	1	1
1eh2	phd	85.26	67	15	4	4	4
1eh2	pred	88.42	68	—	1	1	1
1eh2	psipred	95.79	72	—	3	2	1
1eh2	zpred	66.32	72	—	—	32	13
1bg8.A	cons	57.89	57	—	—	11	3
1bg8.A	dsc	42.11	55	—	—	64	6
1bg8.A	mul	51.32	67	—	—	—	24
1bg8.A	nnssp	63.16	67	—	—	31	9
1bg8.A	orig_phd	57.89	57	—	—	52	5
1bg8.A	phd	57.89	57	—	—	34	8
1bg8.A	pred	38.16	56	—	—	321	3
1bg8.A	psipred	50.01	52	—	92	17	5
1bg8.A	zpred	46.05	68	—	—	—	—

aHere Nres refers to the number of residues actually considered for every prediction. (cons: JPRED consensus prediction; dsc: DSC; mul: MULPRED; nnssp: NNSSP; orig_phd: PhD in its most current implementation; phd: PhD as run by JPRED; pred: PREDATOR; psipred: PSIPRED; zpred: ZPRED). Q3 refers to the three-state accuracy of a given prediction.

250 volker a. eyrich, richard a. friesner, and daron m. standley

to deletion of terminal loops) are not considered at this stage. This set of residues is then extracted from the 50 clusters lowest in energy for every secondary structure prediction, and the energies of the resulting substructures are evaluated. After a second round of clustering, we obtain the ﬁnal set of clusters (Table IX). At this point the RMSDs with respect to the native structures are reevaluated over the subset of common residues to allow a fair comparison of the tertiary folding results obtained from different secondary structure predictions. We refer to this method below as the composite energy prediction method.

D. Final Rankings of Structures for Fully Ab Initio Predictions

We examine the use of two different approaches for producing fully ab initio predictions for the 22 proteins studied in this section. One approach is simply to use the secondary structure prediction with the highest calibrated prediction— accuracy—in this case, PSIPRED. Results for this approach are summarized in

TABLE IX

Final Clustering Results for the Subset of Common Residues for All Ab Initio Prediction Targets (Stage 2 of the Composite Energy Prediction Method)a

Protein	Nres	˚	˚	˚	˚
Protein	Nres	< 4 A	< 5 A	< 6 A	< 7 A
1acp	70	—	—	10	5
1aj3	88	—	89	89	89
1am3	56	—	17	17	2
1bg8.A	52	—	—	92	1
1c5a	57	—	—	4	4
1cc5	68	—	22	12	2
1ddf	85	—	—	—	7
1eh2	65	—	4	4	3
1hsn	61	—	—	—	46
1jvr	66	39	12	12	2
1lfb	55	—	114	22	4
1mzm	66	—	—	65	4
1nkl	63	—	—	31	1
1nre	65	—	89	50	50
1pgx	53	—	—	—	35
1pou	64	30	3	3	1
1r69	57	—	5	5	1
1utg	56	—	—	4	3
2ezh	57	—	—	7	4
2ezk	67	—	—	—	—
2hp8	49	—	58	8	7
2pac	53	—	25	7	3

aHere Nres refers to the number of residues for which RMSD and energy are evaluated. We omitted predictions that were too short as compared to all others and the length of the sequence (1eh2: mul; 1jvr: psipred; 1mzm: cons, phd, pred; 1r69: mul, orig_phd, pred; 2ezh: orig_phd).

ab initio protein structure prediction

251

TABLE X

Individual Clustering Results for All Ab Initio Prediction Targets Using the PSIPRED Secondary Structure Predictionsa

Protein	Q3	Nres	˚	˚	˚	˚
Protein	Q3	Nres	< 4 A	< 5 A	< 6 A	< 7 A
1acp	83.12	72	—	—	17	5
1aj3	93.88	94	—	—	1	1
1am3	88.57	57	—	35	1	1
1bg8.A	50.01	52	—	92	17	5
1c5a	93.94	63	4	1	1	1
1cc5	74.7	75	—	6	4	4
1ddf	81.1	86	—	—	43	9
1eh2	95.79	72	—	3	2	1
1hsn	87.34	62	—	—	—	20
1jvr	72.26	3	—	—	—	—
1lfb	58.97	59	—	—	—	34
1mzm	78.49	78	—	1	1	1
1nkl	94.87	71	—	—	4	1
1nre	60.49	65	—	—	—	380
1pgx	77.14	60	—	—	—	—
1pou	73.24	67	7	5	5	5
1r69	84.13	59	—	6	6	3
1utg	85.71	62	—	32	1	1
2ezh	81.54	57	—	15	1	1
2ezk	51.61	77	—	—	—	—
2hp8	64.71	53	—	6	2	1
2pac	70.73	77	—	19	2	2

aHere Nres refers to the number of residues actually considered for every prediction.

Table X. As above we list the rank of structures below a certain RMSD cutoff. The second is the composite energy prediction method discussed above. We summarize statistics for the success rate of each of these two approaches on the entire test set and CASP3 prediction targets below.

E.Results

1.Summary and Overall Success of Fully Ab Initio Prediction

We begin by summarizing the results for all of the secondary structure prediction methods (including the composite energy prediction method described above) and all of the target proteins. As in previous sections of this chapter, the ranks of

˚ ˚ ˚

the lowest-energy cluster with RMSDs from the native structure of 4 A, 5 A, 6 A,

and 7 A are reported for both approaches. The ﬁrst, and most striking, observation is that both approaches provide a surprisingly good success rate for ab initio prediction based on criteria used in CASP3. We have observed that for proteins

in the 50–100 residue range, an RMSD below 7 A typically provides a

252 volker a. eyrich, richard a. friesner, and daron m. standley

qualitatively reasonable folding topology at low resolution. Similar conclusions have been reached by Skolnick and co-workers [42] and by Cohen and Sternberg [43], whose estimates show that the probability of achieving a structure below 6

A RMSD by chance is vanishingly small. Note also that for a signiﬁcant fraction

of proteins, structures below 6 A are found; at this level, the correspondence with the native structure is quite satisfactory in agreement with the chapters cited above.

Both proposed fully ab initio prediction methods (composite energy method and exclusive use of PSIPRED predictions) yield a number of cases in which a low RMSD structure is ranked ﬁrst; this would count as a successful prediction under any criterion. Using the assessment criteria of CASP3—that is, a maximum of ﬁve predictions—the composite energy method would achieve

an RMSD of less than 7 A in 68% of the cases; there are also four cases where

the RMSD is less than 6 A. Reliance entirely on PSIPRED would lead to an

RMSD under 7 A in 64% of the cases; however, 11 of those would have an

RMSD under 6 A. Thus, the use of the composite energy method appears to succeed slightly more often, however, the use of PSIPRED exclusively generates highly accurate predictions in signiﬁcantly more cases.

We have employed the protocol described above in a completely automated fashion; but only in an actual blind test can one be sure that the results suffer from no unconscious bias. If these results hold up under truly blind test conditions, this would represent a signiﬁcant advance in ab initio prediction methodology as judged by other ab initio efforts in CASP3.

While our new potential energy function certainly represents a step forward, there are also obviously areas where more work needs to be done. Primarily, the causes of failure to routinely achieve a low-RMSD structure in the top ﬁve predictions in some cases must be analyzed and understood. These failures are thus more interesting at this point than the successes because they point the way to development of an improved methodology. We therefore analyze a number of these cases in detail below so as to reveal the underlying difﬁculties and directions in which solutions must be developed.

2.Detailed Analysis of Speciﬁc Cases

Figure 8 presents the detailed secondary structure predictions for each of the cases that we analyze below. In conjunction with the tertiary folding results summarized in Table VIII, as well as the results using PDB-derived secondary structure presented above, we can extract insight into how various types of errors in secondary structure prediction affect tertiary folding accuracy. Due to the large amount of data, we have selected a subset of interesting examples to analyze in detail, however, the conclusions, summarized in the discussion following consideration of individual examples, reﬂect an examination of the results for all 22 of the proteins studied.

253

Figure 8(a–e). Secondary structure predictions for 1aj3, 1am3, 1mzm, 1eh2, 1bg8 Chain A (cons: JPRED consensus prediction; dsc: DSC; mul: MULPRED; nnssp: NNSSP; orig_phd: PhD in its most current implementation; phd: PhD as run by JPRED; pred: PREDATOR; psipred: PSIPRED; zpred: ZPRED). References for the secondary prediction algorithms are given in the text.

254

Figure 8(a–e) (Continued)

<<< < Предыдущая 17 18 19 20 21 22 23 24 25 26 27 2829 / 5429 30 31 32 33 34 35 36 37 38 39 40 41 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.201311.09 Mб21Emerging Tools for Single-Cell Analysis.pdf
#
15.08.20135.93 Mб69Enzymes (Second Edition).pdf
#
15.08.20131.57 Mб45Fitts D.D. - Principles of Quantum Mechanics[c] As Applied to Chemistry and Chemical Physics (1999)(en).pdf
#
15.08.20135.12 Mб120Flow Cytometry - First Principles (Second Edition).pdf
#
15.08.20132.96 Mб20Frantisek Svec - Capillary Electrochromatography.pdf
#
15.08.20136.52 Mб12Friesner R.A. (ed.) - Advances in chemical physics, computational methods for protein folding (2002)(en).pdf
#
15.08.201347.76 Mб11From Biotechnology to Genomes-The Meaning of the Double Heli.pdf
#
15.08.20137.65 Mб89Fundamental processes of dye chemistry. 1949.pdf
#
15.08.201314.85 Mб38Furniss B.S., Hannaford A.J., Smith P.W.G. - Vogel[ap]s Textbook Of Practical Organic Chemistry (1989)(5th ed.)(en).pdf
#
15.08.20132.17 Mб26Gallup G.A. - Valence Bond Methods, Theory and applications (CUP, 2002).pdf
#
15.08.201324.81 Mб182Garrett R.H., Grisham C.M. - Biochemistry (1999)(2nd ed.)(en).pdf