Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang

.pdf
Скачиваний:
60
Добавлен:
10.08.2013
Размер:
10.5 Mб
Скачать

360

Jianpeng Ma

Our study of topology determination supports an important hypothesis that, for a given protein skeleton, its native topology was the one chosen by evolution to accommodate the largest structural variation, not merely the one trapped in a deep, but narrow, energy well. Such a hypothesis led to the use of the average energy of an ensemble of structures, slightly randomized in the vicinity of native skeleton, as the parameter to rank the topology candidates. The ensemble-averaging scheme appears to be an effective way of compensating the inevitable errors in the artificially constructed structures and in empirical potential functions.

The contents of sections in this chapter are adopted from three seminal research papers (Kong and Ma, 2003; Kong et al., 2004; Wu et al., 2005a) with necessary modifications.

11.2Sheetminer: Locating Sheets in Intermediate-Resolution Density Maps

Figure 11.1 shows the overall procedure of sheetminer, which does not rely on any 3D structure prediction methods. Rather, it is based on a morphological analysis of intermediate-resolution density maps, i.e., shape recognition in 3D space. One of the most important features of sheetminer is the flat density map on which most of the essential analyses are based. It allows one to maximally capture the elements of shape of the density maps without being severely influenced by the fluctuation of local density values. Based on their distance to the surface of the flat density map, the voxels in the flat density map are divided into two groups, surface voxels and kernel voxels. Then, for each kernel voxel, a condensation scheme is used to increase the contrast on the edge of density maps. After that, the identification of sheets is primarily achieved based on the ratio of two competing parameters, maximum disk inclusion number and minimum local thickness calculated for each kernel voxel. The identified sheet densities are then processed by a set of refinement steps before they are marked as the final output. The parameters used in sheetminer are chosen empirically based on exhaustive trials since there is no general rule in defining them.

This section is adapted from the original research article (Kong and Ma, 2003) from which interested readers can find more technical details.

11.2.1Locating Sheets in Simulated Density Maps

The algorithm sheetminer was first tested on intermediate-resolution density maps simulated from high-resolution crystal structures. A total of 12 structurally unrelated proteins were chosen because, among them, the number, size, and shape of-sheets vary widely and they are thus expected to reasonably represent a complete sampling of known -sheet morphology. They are roughly split into three groups: group I protein contains a single -sheet (Arnold and Rossmann, 1988; Hoover and Ludwig, 1997; Rees et al., 1983; Wittinghofer et al., 1991), group II contains

11. Intermediate-Resolution Density Maps

361

Fig. 11.1 Flowchart for the entire computational procedure of -sheet identification in intermediate-resolution density maps implemented in sheetminer.

multiple independent -sheets (Eklund et al., 1981; Khan et al., 2000; Mayer et al., 2002; Wang et al., 2000), and group III contains typical heavy -motifs such as-barrel and -propeller (Gaudet et al., 1999; Steinbacher et al., 1994; Wilson et al., 1992; Zanotti et al., 1998).

11.2.1.1 Results at 8-A˚ Resolution

The selected PDB model was first blurred (Ludtke et al., 1999) to a resolution of

˚

8 A. At this resolution, visual identification of -sheets is difficult, especially for the ones deeply buried inside proteins. Then sheetminer was used to identify sheet densities. In all 12 proteins tested, there were a total of 35 independent -sheets and 34 of them were successfully located by sheetminer (Fig. 11.2, only three examples are shown here). One sheet was missed by sheetminer (circled in Fig. 11.3a) and two small regions were mistakenly identified as sheets (false positives; one such case

362

Jianpeng Ma

Fig. 11.2 Sheet-searching results based on simulated density maps from a total of 12 protein crystal structures. There are totally 35 independent -sheets with a wide distribution of morphology. Sheetminer successfully located 34 of them and only missed one in MoFe protein of nitrogenase. The schematic ribbon diagrams on the left show the crystal structures with the -sheets (in darker

˚

color). The middle diagrams show the blurred structures at 8-A resolution. In diagrams on the right, the identified sheet regions are shown on top of the ribbon diagrams.

is shown in Fig. 11.3b). Therefore, sheetminer is very effective in defining sheet regions at this resolution.

In order to quantitatively investigate the accuracy of sheetminer in discerning sheet regions, we computed the values of sensitivity and specificity. Sensitivity is defined as the probability of a positive identification among voxels that are true sheet voxels, and specificity is defined as the probability of a true negative identification among the voxels that are not true sheet voxels. The average values of sensitivity and specificity are 87.1 and 73.3%, respectively. Thus, the agreement between the computationally searched sheet density maps and the real ones is very good at a

˚

resolution of 8 A. The high value of sensitivity indicates that the method is reliable to outline the rough size of the sheet regions. The good value of specificity indicates that

11. Intermediate-Resolution Density Maps

363

a) b)

Fig. 11.3 Errors in sheet-searching results. (a) The -sheet in MoFe protein of nitrogenase (PDB code lhl 1) that was missed by sheetminer. The region of the sheet is circled. This is the only one missed out of the total of 35 sheets in 12 proteins. (b) One of only two small regions that were mistakenly identified as -sheets (false positives). The figure shows the one in aldose reductase (PDB code 1 ads).

the method seldomly predicts false positives, i.e., mistakenly identifying nonsheet voxels as sheet voxels. From a practical point of view, one would expect that the specificity would be somewhat more important than the sensitivity because it is far more important to correctly identify the overall locations of the sheets than to define the exact size of the sheets. The latter is also a variable quantity even between methods for assigning secondary structures at high resolutions.

11.2.1.2 Resolution Dependency

˚

In our experience, sheetminer works best in the resolution range of 6 to 10 A. However, the exact outcome also depends on the nature of systems: for large sheets, sheetminer can work to a lower resolution, but for small sheets, it is much harder with lower resolutions. The results for a typical five-stranded sheet in p21ras (Wittinghofer

˚ ˚

et al., 1991) at resolutions of 6, 8, and 10 A are shown in Fig. 11.4. At 6-A resolution, the algorithm very accurately located not only the overall shape, but also the detailed

˚ ˚

edge of the sheet. At 8-A resolution, the result is still satisfactory. At 10-A resolution, the map is significantly fuzzier, but sheetminer was still able to find the rough location of the sheet, despite large errors on the edge. The sensitivity values are 87.2, 78.7, and 55.3%, while the specificity values are 92.5, 93.3, and 95.0%, for 6, 8, and 10

˚

A, respectively. The specificity seems to be much less resolution-dependent. Certain degree of resolution dependence of the sensitivity is expected and

should not undermine the applicability of sheetminer. The state-of-the-art cryo-EM techniques can now provide structures at intermediate resolutions and many of them

˚

are at or near resolutions of 6 to 8 A (Zhou et al., 2001a). The program helixhunter (Jiang et al., 2001) also has a similar resolution dependency. Not surprisingly, the identification of -sheets is more sensitive to resolution than is that of -helices.

364

 

Jianpeng Ma

6 Å

8 Å

10 Å

Fig. 11.4 Sheet-searching results of a typical five-stranded sheet in p21r as (PDB code 121p) at

˚

resolutions of 6, 8, and 10 A. The upper panels show the blurred structures at three resolutions and the lower panels show the corresponding results in which the found sheet density maps are

˚

superimposed on the ribbon representations of the crystal structures. At 6-A resolution, sheetminer

˚

very accurately located not only the overall shape, but also the detailed edge of the sheet. At 8-A

˚

resolution, the result is also satisfactory. At 10-A resolution, the morphology of the density map is significantly fuzzier, but sheetminer was still able to identify the rough location of the sheet, despite large errors on the edges.

11.2.2Application to Real Experimental Cryo-EM Density Maps

To test its applicability to real experimental density maps, sheetminer was also examined on the F41 fragment of bacterial flagellar filament that has both X-ray structure and intermediate-resolution cryo-EM structure available. The X-ray structure of the

˚

fragment of Salmonella flagellar filament was available at 2.0 A (PDB code 1io1) (Samatey et al., 2001), and the cryo-EM structure of the same fragment was obtained

˚

from the 9-A cryo-EM structure of the R-type straight flagellar filament (Mimori et al., 1995). Sheetminer successfully located two regions of -sheets that encompass five out of the six -sheets observed in the X-ray structure (Samatey et al., 2001) and missed only one isolated small two-stranded sheet (Fig. 11.5). It is worth

˚

pointing out that a 9-A experimental cryo-EM density map is significantly noisier

˚

than a 9-A simulated density map. Thus, the success in this case further confirmed the applicability of sheetminer in dealing with actual experimental data.

11. Intermediate-Resolution Density Maps

365

˚

Fig. 11.5 Sheet-searching results for the F41 fragment of bacterial flagellar filament. The 2.0-A X-ray structure of the F41 fragment of Salmonella flagellar is shown on the left (PDB code liol); the

˚

cryo-EM structure of the same fragment obtained from the 9-A cryo-EM structure of the R-type straight flagellar filament is shown in the middle; and the sheet-searching results are shown on the right, superimposed on the ribbon diagram of the crystal structure. Five out of the six -sheets observed in the crystal structure were successfully located. Only an isolated small two-stranded-sheet was missed (behind the three longest helices).

11.2.3Application to an 8-A˚ Experimental X-ray Density Map

Sheetminer was also tested on crystallographic electron density maps of equivalent resolution. An example is shown for flavodoxin (PDB code 1ag9). The X-ray electron density map was first generated from experimental diffraction data up to a resolution

˚ ˚

of 8 A (the original structure has a resolution of 1.8 A), and then sheetminer was applied to analyze the sheet density. The result is shown in Fig. 11.6 along with that

˚

from an 8-A density map simulated based on the atomic coordinates. The overall results are similar in both cases. One important point is that, with sheetminer, the

˚

conventionally not-so-useful X-ray diffraction data in the resolution range of 4–8 A can be used to extract meaningful structural information.

Fig. 11.6 Comparison of sheet-searching results for X-ray and simulated density maps of flavo-

˚

doxin at 8 A. The X-ray density map was generated using the experimental diffraction data up to

˚ ˚

a resolution of 8 A (the resolution of the original X-ray structure, PDB code lag9, was 1.8 A).

˚

The 8-A simulated density map was obtained by artificial blurring of the X-ray structure. The sheet-searching results were superimposed on the ribbon diagrams of the crystal structure (left for simulated density map and right for X-ray density map).

366

Jianpeng Ma

11.2.4Concluding Discussion

It is clear that sheetminer works better for large sheets than for smaller sheets. Usually, the locations of major sheets can be consistently found at intermediate resolutions, but the exact edges of the sheets can be fuzzy. Such an inaccuracy often makes it difficult to establish the exact length of strands even if their overall positions are well defined. Similar problems are also seen in the helix-hunting algorithm (Jiang et al., 2001). However, this should not be a severe problem in many regards because even the exact length of secondary structural elements in high-resolution X-ray structures can vary when using different assignment methods. More importantly, the identification of protein folds would be more sensitive to the overall spatial arrangement, rather than the exact length, of secondary structural elements.

The final outputs after the multistep processing of density maps by sheetminer are flat, but continuous, density maps corresponding to sheet regions. They could effectively narrow down the searching space for further model building into a pseudo 2D space. The algorithms for building pseudo-C -traces of -sheets identified in density maps will be presented in the next section.

11.3Sheettracer: Building Pseudo-traces for -Strands in Intermediate-Resolution Density Maps

Sheettracer (Kong et al., 2004) is tightly coupled to the sheetminer method to trace individual -strands based on the relatively thin, but continuous, sheet density maps output from sheetminer (Kong and Ma, 2003). Figure 11.7 shows the overall procedure of sheettracer. A deconvolution method was also developed to enhance the features of secondary structures in intermediate-resolution density maps.

The morphological analysis of density maps used by sheettracer is based on two observations: protein main-chain density is relatively higher in value than that of side chains and all neighboring -strands are parallel or nearly parallel. The first observation enables the use of local peak-filtering to select backbone voxels, whose geometrical distribution helps define sheet morphology. The second observation facilitates local first principal component axis projection to condense the density without losing intrastrand connectivity. Differing from other thinning schemes that only consider the contacting neighbors, this local projection scheme reinforces the linear distribution of voxels but simultaneously increases the distance between voxels of different strands. This condensation results in a significantly increased efficiency in segments clustering.

˚

We tested the methods on the simulated 6-A density maps from 12 representative protein crystal structures, encompassing a wide range of sheet morphologies. Sheettracer successfully built pseudo-C models in the sheet densities output by

˚

sheetminer, with average values of 79.5%, 96.3%, and 1.54 A for sensitivity, speci-

˚

ficity, and rms deviations, respectively. For even lower-resolution (8 A) simulated

11. Intermediate-Resolution Density Maps

367

Fig. 11.7 Flowchart for the computational procedure of sheettracer in intermediate-resolution density maps.

data, a deconvolution method was used to permit sheettracer to build pseudo-C

˚

models with average values of 71.3%, 93.8%, and 1.77 A for sensitivity, specificity, and rms deviations, respectively. Furthermore, sheettracer and the deconvolution method were also tested on experimental maps of the 2 protein of reovirus at

˚

resolutions of 7.6 and 11.8 A.

This section is adapted from the original research article (Kong et al., 2004) from which interested readers can find more technical details.

368

Jianpeng Ma

Fig. 11.8 Stepwise processing of sheet density maps to discern individual -strands, using the sheet in the GroEL minichaperone as an example. (a) Sheet density identified by sheetminer shown in voxels. (b) Selected voxels by local peak filter. (c) Surviving voxels after local first principal component axis projection using the voxels in (b) as input. (d) Surviving voxels after local linearity filtering using the voxels in (c) as input. (e) Clustered backbone voxels after k-segments processing. The lines are the fitted segments (the first principal component axes).

11.3.1Stepwise Discerning -Strands on GroEL Minichaperone

Sheetminer (Kong and Ma, 2003) outputs clusters of voxels, each delineating a thin, but continuous volume of density representing a single -sheet. Sheettracer then uses a multistep process to build pseudo-C -traces in each identified sheet. Here we first illustrate an example, a -sheet of the apical domain of the molecular chaperonin GroEL, also known as the minichaperone (Wang et al., 2000) (PDB code 1fy9 ).

First, each cluster of voxels was processed by a local peak filter (Fig. 11.8a) to identify voxels that are most likely involved in forming the backbones of individual strands (Fig. 11.8b). The local peak-filtering algorithm enhances high local density values and thereby adjusts to variations in the magnitude of densities throughout the map, which permits effective selection of backbone voxels even in regions of relatively weak density. The next step was to condense the selected voxels using local first principal component axis projection. It is to enforce the voxel distribution along the longest axis that is meant to coincide with a strand backbone (Fig. 11.8c). The outcome was a significantly narrowed distribution of voxels that were then processed by a local linearity filter to pick backbone voxels with good local linearity (Fig. 11.8d). After that, k-segments clustering (Verbeek et al., 2002) was employed to group voxels into smaller subsets, each of which was to represent one part of a-strand (Fig. 11.8e). Finally, all subsets belonging to the same strand were merged together so that each cluster of voxels represents an independent -strand and a pseudo-C -trace was then built for each strand.

11.3.2Discerning -Strands and Building Pseudo-C -Traces in 12 Proteins

Sheettracer was further tested on simulated density maps of 11 other structurally unrelated proteins. They were the same set of proteins used to test sheetminer described

11. Intermediate-Resolution Density Maps

369

˚

Fig. 11.9 Sheet-tracing results for all 12 proteins based on 6 A simulated density maps. The pseudo-C -traces depicted in darker color are superimposed on the X-ray structures of the proteins shown in lighter color. Only one protein from each group is shown. They are (a) carboxypeptidase A; (b) horse liver alcohol dehydrogenase; (c) phosducin. The arrows in the pseudo-C -traces are artificially assigned based on the crystal structures.

in the previous section. Figure 11.9 shows the results with the built pseudo-C -traces superimposed on the crystal structures (only one example for each group is given). The results were statistically analyzed based on three separate measures: sensitivity, specificity, and rms deviations (Kong and Ma, 2003). The rms deviation was calculated as the average distance of each built pseudo-C -atom from its closest sheet C -atom in the superimposed crystal structure. The average sensitivity and specificity for the 12 proteins are 79.5 and 96.3%, respectively. The rms deviation is always

˚ ˚

smaller than 2.0 A, with an average of 1.54 A. Given the limited resolution, such statistical results of trace-building seem reasonable. Note that, in Fig. 11.9, strand directions were assigned according to the known X-ray structures since sheettracer was unable to specify them.