Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Биомеханика

Файл:

Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang

.pdf

Скачиваний:

Добавлен:

10.08.2013

Размер:

10.5 Mб

Скачать

☆

<<< < Предыдущая 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 3738 / 4138 39 40 41 > Следующая >>>

360	Jianpeng Ma

Our study of topology determination supports an important hypothesis that, for a given protein skeleton, its native topology was the one chosen by evolution to accommodate the largest structural variation, not merely the one trapped in a deep, but narrow, energy well. Such a hypothesis led to the use of the average energy of an ensemble of structures, slightly randomized in the vicinity of native skeleton, as the parameter to rank the topology candidates. The ensemble-averaging scheme appears to be an effective way of compensating the inevitable errors in the artiﬁcially constructed structures and in empirical potential functions.

The contents of sections in this chapter are adopted from three seminal research papers (Kong and Ma, 2003; Kong et al., 2004; Wu et al., 2005a) with necessary modiﬁcations.

11.2Sheetminer: Locating Sheets in Intermediate-Resolution Density Maps

Figure 11.1 shows the overall procedure of sheetminer, which does not rely on any 3D structure prediction methods. Rather, it is based on a morphological analysis of intermediate-resolution density maps, i.e., shape recognition in 3D space. One of the most important features of sheetminer is the ﬂat density map on which most of the essential analyses are based. It allows one to maximally capture the elements of shape of the density maps without being severely inﬂuenced by the ﬂuctuation of local density values. Based on their distance to the surface of the ﬂat density map, the voxels in the ﬂat density map are divided into two groups, surface voxels and kernel voxels. Then, for each kernel voxel, a condensation scheme is used to increase the contrast on the edge of density maps. After that, the identiﬁcation of sheets is primarily achieved based on the ratio of two competing parameters, maximum disk inclusion number and minimum local thickness calculated for each kernel voxel. The identiﬁed sheet densities are then processed by a set of reﬁnement steps before they are marked as the ﬁnal output. The parameters used in sheetminer are chosen empirically based on exhaustive trials since there is no general rule in deﬁning them.

This section is adapted from the original research article (Kong and Ma, 2003) from which interested readers can ﬁnd more technical details.

11.2.1Locating Sheets in Simulated Density Maps

The algorithm sheetminer was ﬁrst tested on intermediate-resolution density maps simulated from high-resolution crystal structures. A total of 12 structurally unrelated proteins were chosen because, among them, the number, size, and shape of-sheets vary widely and they are thus expected to reasonably represent a complete sampling of known -sheet morphology. They are roughly split into three groups: group I protein contains a single -sheet (Arnold and Rossmann, 1988; Hoover and Ludwig, 1997; Rees et al., 1983; Wittinghofer et al., 1991), group II contains

11. Intermediate-Resolution Density Maps

361

Fig. 11.1 Flowchart for the entire computational procedure of -sheet identiﬁcation in intermediate-resolution density maps implemented in sheetminer.

multiple independent -sheets (Eklund et al., 1981; Khan et al., 2000; Mayer et al., 2002; Wang et al., 2000), and group III contains typical heavy -motifs such as-barrel and -propeller (Gaudet et al., 1999; Steinbacher et al., 1994; Wilson et al., 1992; Zanotti et al., 1998).

11.2.1.1 Results at 8-A˚ Resolution

The selected PDB model was ﬁrst blurred (Ludtke et al., 1999) to a resolution of

8 A. At this resolution, visual identiﬁcation of -sheets is difﬁcult, especially for the ones deeply buried inside proteins. Then sheetminer was used to identify sheet densities. In all 12 proteins tested, there were a total of 35 independent -sheets and 34 of them were successfully located by sheetminer (Fig. 11.2, only three examples are shown here). One sheet was missed by sheetminer (circled in Fig. 11.3a) and two small regions were mistakenly identiﬁed as sheets (false positives; one such case

362	Jianpeng Ma

Fig. 11.2 Sheet-searching results based on simulated density maps from a total of 12 protein crystal structures. There are totally 35 independent -sheets with a wide distribution of morphology. Sheetminer successfully located 34 of them and only missed one in MoFe protein of nitrogenase. The schematic ribbon diagrams on the left show the crystal structures with the -sheets (in darker

color). The middle diagrams show the blurred structures at 8-A resolution. In diagrams on the right, the identiﬁed sheet regions are shown on top of the ribbon diagrams.

is shown in Fig. 11.3b). Therefore, sheetminer is very effective in deﬁning sheet regions at this resolution.

In order to quantitatively investigate the accuracy of sheetminer in discerning sheet regions, we computed the values of sensitivity and speciﬁcity. Sensitivity is deﬁned as the probability of a positive identiﬁcation among voxels that are true sheet voxels, and speciﬁcity is deﬁned as the probability of a true negative identiﬁcation among the voxels that are not true sheet voxels. The average values of sensitivity and speciﬁcity are 87.1 and 73.3%, respectively. Thus, the agreement between the computationally searched sheet density maps and the real ones is very good at a

resolution of 8 A. The high value of sensitivity indicates that the method is reliable to outline the rough size of the sheet regions. The good value of speciﬁcity indicates that

11. Intermediate-Resolution Density Maps

363

a) b)

Fig. 11.3 Errors in sheet-searching results. (a) The -sheet in MoFe protein of nitrogenase (PDB code lhl 1) that was missed by sheetminer. The region of the sheet is circled. This is the only one missed out of the total of 35 sheets in 12 proteins. (b) One of only two small regions that were mistakenly identiﬁed as -sheets (false positives). The ﬁgure shows the one in aldose reductase (PDB code 1 ads).

the method seldomly predicts false positives, i.e., mistakenly identifying nonsheet voxels as sheet voxels. From a practical point of view, one would expect that the speciﬁcity would be somewhat more important than the sensitivity because it is far more important to correctly identify the overall locations of the sheets than to deﬁne the exact size of the sheets. The latter is also a variable quantity even between methods for assigning secondary structures at high resolutions.

11.2.1.2 Resolution Dependency

In our experience, sheetminer works best in the resolution range of 6 to 10 A. However, the exact outcome also depends on the nature of systems: for large sheets, sheetminer can work to a lower resolution, but for small sheets, it is much harder with lower resolutions. The results for a typical ﬁve-stranded sheet in p21ras (Wittinghofer

˚ ˚

et al., 1991) at resolutions of 6, 8, and 10 A are shown in Fig. 11.4. At 6-A resolution, the algorithm very accurately located not only the overall shape, but also the detailed

˚ ˚

edge of the sheet. At 8-A resolution, the result is still satisfactory. At 10-A resolution, the map is signiﬁcantly fuzzier, but sheetminer was still able to ﬁnd the rough location of the sheet, despite large errors on the edge. The sensitivity values are 87.2, 78.7, and 55.3%, while the speciﬁcity values are 92.5, 93.3, and 95.0%, for 6, 8, and 10

A, respectively. The speciﬁcity seems to be much less resolution-dependent. Certain degree of resolution dependence of the sensitivity is expected and

should not undermine the applicability of sheetminer. The state-of-the-art cryo-EM techniques can now provide structures at intermediate resolutions and many of them

are at or near resolutions of 6 to 8 A (Zhou et al., 2001a). The program helixhunter (Jiang et al., 2001) also has a similar resolution dependency. Not surprisingly, the identiﬁcation of -sheets is more sensitive to resolution than is that of -helices.

364		Jianpeng Ma
6 Å	8 Å	10 Å

Fig. 11.4 Sheet-searching results of a typical ﬁve-stranded sheet in p21r as (PDB code 121p) at

resolutions of 6, 8, and 10 A. The upper panels show the blurred structures at three resolutions and the lower panels show the corresponding results in which the found sheet density maps are

superimposed on the ribbon representations of the crystal structures. At 6-A resolution, sheetminer

very accurately located not only the overall shape, but also the detailed edge of the sheet. At 8-A

resolution, the result is also satisfactory. At 10-A resolution, the morphology of the density map is signiﬁcantly fuzzier, but sheetminer was still able to identify the rough location of the sheet, despite large errors on the edges.

11.2.2Application to Real Experimental Cryo-EM Density Maps

To test its applicability to real experimental density maps, sheetminer was also examined on the F41 fragment of bacterial ﬂagellar ﬁlament that has both X-ray structure and intermediate-resolution cryo-EM structure available. The X-ray structure of the

fragment of Salmonella ﬂagellar ﬁlament was available at 2.0 A (PDB code 1io1) (Samatey et al., 2001), and the cryo-EM structure of the same fragment was obtained

from the 9-A cryo-EM structure of the R-type straight ﬂagellar ﬁlament (Mimori et al., 1995). Sheetminer successfully located two regions of -sheets that encompass ﬁve out of the six -sheets observed in the X-ray structure (Samatey et al., 2001) and missed only one isolated small two-stranded sheet (Fig. 11.5). It is worth

pointing out that a 9-A experimental cryo-EM density map is signiﬁcantly noisier

than a 9-A simulated density map. Thus, the success in this case further conﬁrmed the applicability of sheetminer in dealing with actual experimental data.

11. Intermediate-Resolution Density Maps

365

Fig. 11.5 Sheet-searching results for the F41 fragment of bacterial ﬂagellar ﬁlament. The 2.0-A X-ray structure of the F41 fragment of Salmonella ﬂagellar is shown on the left (PDB code liol); the

cryo-EM structure of the same fragment obtained from the 9-A cryo-EM structure of the R-type straight ﬂagellar ﬁlament is shown in the middle; and the sheet-searching results are shown on the right, superimposed on the ribbon diagram of the crystal structure. Five out of the six -sheets observed in the crystal structure were successfully located. Only an isolated small two-stranded-sheet was missed (behind the three longest helices).

11.2.3Application to an 8-A˚ Experimental X-ray Density Map

Sheetminer was also tested on crystallographic electron density maps of equivalent resolution. An example is shown for ﬂavodoxin (PDB code 1ag9). The X-ray electron density map was ﬁrst generated from experimental diffraction data up to a resolution

˚ ˚

of 8 A (the original structure has a resolution of 1.8 A), and then sheetminer was applied to analyze the sheet density. The result is shown in Fig. 11.6 along with that

from an 8-A density map simulated based on the atomic coordinates. The overall results are similar in both cases. One important point is that, with sheetminer, the

conventionally not-so-useful X-ray diffraction data in the resolution range of 4–8 A can be used to extract meaningful structural information.

Fig. 11.6 Comparison of sheet-searching results for X-ray and simulated density maps of ﬂavo-

doxin at 8 A. The X-ray density map was generated using the experimental diffraction data up to

˚ ˚

a resolution of 8 A (the resolution of the original X-ray structure, PDB code lag9, was 1.8 A).

The 8-A simulated density map was obtained by artiﬁcial blurring of the X-ray structure. The sheet-searching results were superimposed on the ribbon diagrams of the crystal structure (left for simulated density map and right for X-ray density map).

366	Jianpeng Ma

11.2.4Concluding Discussion

It is clear that sheetminer works better for large sheets than for smaller sheets. Usually, the locations of major sheets can be consistently found at intermediate resolutions, but the exact edges of the sheets can be fuzzy. Such an inaccuracy often makes it difﬁcult to establish the exact length of strands even if their overall positions are well deﬁned. Similar problems are also seen in the helix-hunting algorithm (Jiang et al., 2001). However, this should not be a severe problem in many regards because even the exact length of secondary structural elements in high-resolution X-ray structures can vary when using different assignment methods. More importantly, the identiﬁcation of protein folds would be more sensitive to the overall spatial arrangement, rather than the exact length, of secondary structural elements.

The ﬁnal outputs after the multistep processing of density maps by sheetminer are ﬂat, but continuous, density maps corresponding to sheet regions. They could effectively narrow down the searching space for further model building into a pseudo 2D space. The algorithms for building pseudo-C -traces of -sheets identiﬁed in density maps will be presented in the next section.

11.3Sheettracer: Building Pseudo-traces for -Strands in Intermediate-Resolution Density Maps

Sheettracer (Kong et al., 2004) is tightly coupled to the sheetminer method to trace individual -strands based on the relatively thin, but continuous, sheet density maps output from sheetminer (Kong and Ma, 2003). Figure 11.7 shows the overall procedure of sheettracer. A deconvolution method was also developed to enhance the features of secondary structures in intermediate-resolution density maps.

The morphological analysis of density maps used by sheettracer is based on two observations: protein main-chain density is relatively higher in value than that of side chains and all neighboring -strands are parallel or nearly parallel. The ﬁrst observation enables the use of local peak-ﬁltering to select backbone voxels, whose geometrical distribution helps deﬁne sheet morphology. The second observation facilitates local ﬁrst principal component axis projection to condense the density without losing intrastrand connectivity. Differing from other thinning schemes that only consider the contacting neighbors, this local projection scheme reinforces the linear distribution of voxels but simultaneously increases the distance between voxels of different strands. This condensation results in a signiﬁcantly increased efﬁciency in segments clustering.

We tested the methods on the simulated 6-A density maps from 12 representative protein crystal structures, encompassing a wide range of sheet morphologies. Sheettracer successfully built pseudo-C models in the sheet densities output by

sheetminer, with average values of 79.5%, 96.3%, and 1.54 A for sensitivity, speci-

ﬁcity, and rms deviations, respectively. For even lower-resolution (8 A) simulated

11. Intermediate-Resolution Density Maps

367

Fig. 11.7 Flowchart for the computational procedure of sheettracer in intermediate-resolution density maps.

data, a deconvolution method was used to permit sheettracer to build pseudo-C

models with average values of 71.3%, 93.8%, and 1.77 A for sensitivity, speciﬁcity, and rms deviations, respectively. Furthermore, sheettracer and the deconvolution method were also tested on experimental maps of the 2 protein of reovirus at

resolutions of 7.6 and 11.8 A.

This section is adapted from the original research article (Kong et al., 2004) from which interested readers can ﬁnd more technical details.

368	Jianpeng Ma

Fig. 11.8 Stepwise processing of sheet density maps to discern individual -strands, using the sheet in the GroEL minichaperone as an example. (a) Sheet density identiﬁed by sheetminer shown in voxels. (b) Selected voxels by local peak ﬁlter. (c) Surviving voxels after local ﬁrst principal component axis projection using the voxels in (b) as input. (d) Surviving voxels after local linearity ﬁltering using the voxels in (c) as input. (e) Clustered backbone voxels after k-segments processing. The lines are the ﬁtted segments (the ﬁrst principal component axes).

11.3.1Stepwise Discerning -Strands on GroEL Minichaperone

Sheetminer (Kong and Ma, 2003) outputs clusters of voxels, each delineating a thin, but continuous volume of density representing a single -sheet. Sheettracer then uses a multistep process to build pseudo-C -traces in each identiﬁed sheet. Here we ﬁrst illustrate an example, a -sheet of the apical domain of the molecular chaperonin GroEL, also known as the minichaperone (Wang et al., 2000) (PDB code 1fy9 ).

First, each cluster of voxels was processed by a local peak ﬁlter (Fig. 11.8a) to identify voxels that are most likely involved in forming the backbones of individual strands (Fig. 11.8b). The local peak-ﬁltering algorithm enhances high local density values and thereby adjusts to variations in the magnitude of densities throughout the map, which permits effective selection of backbone voxels even in regions of relatively weak density. The next step was to condense the selected voxels using local ﬁrst principal component axis projection. It is to enforce the voxel distribution along the longest axis that is meant to coincide with a strand backbone (Fig. 11.8c). The outcome was a signiﬁcantly narrowed distribution of voxels that were then processed by a local linearity ﬁlter to pick backbone voxels with good local linearity (Fig. 11.8d). After that, k-segments clustering (Verbeek et al., 2002) was employed to group voxels into smaller subsets, each of which was to represent one part of a-strand (Fig. 11.8e). Finally, all subsets belonging to the same strand were merged together so that each cluster of voxels represents an independent -strand and a pseudo-C -trace was then built for each strand.

11.3.2Discerning -Strands and Building Pseudo-C -Traces in 12 Proteins

Sheettracer was further tested on simulated density maps of 11 other structurally unrelated proteins. They were the same set of proteins used to test sheetminer described

11. Intermediate-Resolution Density Maps

369

Fig. 11.9 Sheet-tracing results for all 12 proteins based on 6 A simulated density maps. The pseudo-C -traces depicted in darker color are superimposed on the X-ray structures of the proteins shown in lighter color. Only one protein from each group is shown. They are (a) carboxypeptidase A; (b) horse liver alcohol dehydrogenase; (c) phosducin. The arrows in the pseudo-C -traces are artiﬁcially assigned based on the crystal structures.

in the previous section. Figure 11.9 shows the results with the built pseudo-C -traces superimposed on the crystal structures (only one example for each group is given). The results were statistically analyzed based on three separate measures: sensitivity, speciﬁcity, and rms deviations (Kong and Ma, 2003). The rms deviation was calculated as the average distance of each built pseudo-C -atom from its closest sheet C -atom in the superimposed crystal structure. The average sensitivity and speciﬁcity for the 12 proteins are 79.5 and 96.3%, respectively. The rms deviation is always

˚ ˚

smaller than 2.0 A, with an average of 1.54 A. Given the limited resolution, such statistical results of trace-building seem reasonable. Note that, in Fig. 11.9, strand directions were assigned according to the known X-ray structures since sheettracer was unable to specify them.

<<< < Предыдущая 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 3738 / 4138 39 40 41 > Следующая >>>

Соседние файлы в предмете Биомеханика

#
10.08.201314.05 Mб69Biomolecular Sensing Processing and Analysis - Rashid Bashir and Steve Wereley.pdf
#
10.08.201320.53 Mб86Bioreaction Engineering Principles - Jens Nielsen.pdf
#
10.08.201326.55 Mб119Bioregenerative Engineering Principles and Applications - Shu Q. Liu..pdf
#
10.08.20134.43 Mб401Biosignal and Biomedical Image Processing MATLAB based Applications - John L. Semmlow.pdf
#
10.08.20133.76 Mб65Biotechnology for Biomedical Engineers - Martin L. Yarmush et al.pdf
#
10.08.201310.5 Mб60Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang.pdf
#
10.08.201330.78 Mб43CRC Press - Biomedical Photonics Handbook.pdf
#
10.08.20134.33 Mб60Cytoskeletal Mechanics - Mofrad and Kamm.pdf
#
10.08.20133.42 Mб67E coli in Motion - Howard C. Berg.pdf
#
10.08.201316.8 Mб59Engineering and Manufacturing for Biotechnology - Marcel Hofman & Philippe Thonart.pdf
#
10.08.20137.9 Mб234Environmental Biotechnology - Jordening and Winter.pdf