Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang

.pdf
Скачиваний:
61
Добавлен:
10.08.2013
Размер:
10.5 Mб
Скачать

8. Protein Contact Map Prediction

277

Zaki, M.J., Shan, J., and Bystroff, C. 2000. Mining residue contacts in proteins using local structure predictions. Proceedings IEEE International Symposium on Bio-Informatics and Biomedical Engineering, Arlington, VA.

Zhang, C., and Kim, S.H. 2000. Environment-dependent residue contact energies for proteins. Proc. Natl. Acad. Sci. USA 97:2550–2555.

Zhao, Y., and Karypis, G. 2003. Prediction of contact maps using support vector machines. BIBE 2003, Bethesda, MD. IEEE Computer Society, pp. 26–36.

9Modeling Protein Aggregate Assembly and Structure

Jun-tao Guo, Carol K. Hall, Ying Xu, and Ronald Wetzel

9.1 Introduction

One might say that “protein science” got its start in the domestic arts, built around the abilities of proteins to aggregate in response to environmental stresses such as heating (boiled eggs), heating and cooling (gelatin), and pH (cheese). Characterization of proteins in the late nineteenth century likewise focused on the ability of proteins to precipitate in response to certain salts and to aggregate in response to heating. Investigations by Chick and Martin (Chick and Martin, 1910) showed that the inactivating response of proteins to heat or solvent treatment is a two-step process involving separate denaturation and precipitation steps. Monitoring the coagulation and flocculation responses of proteins to heat and other stresses remained a major approach to understanding protein structure for decades, with solubility, or susceptibility to aggregation, serving as a kind of benchmark against which results of other methods, such as viscosity, chemical susceptibility, immune activity, crystallizability, and susceptibility to proteolysis, were compared (Mirsky and Pauling, 1936; Wu, 1931). Toward the middle of the last century, protein aggregation studies were largely left behind, as improved methods allowed elucidation of the primary sequence of proteins, reversible unfolding studies, and ultimately high-resolution structures. Curiously, the field of protein science, and in particular protein folding, is now gravitating back to a closer look at protein aggregation and protein aggregates. Unfortunately, the means developed during the second half of the twentieth century for studying native, globular proteins have not proved immediately amenable to the study of aggregate structures. Great progress is being made, however, to modify classical methods, including NMR and X-ray diffraction, as well as to develop newer techniques, that together should continue to expand our picture of aggregate structure (Kheterpal and Wetzel, 2006; Wetzel, 1999).

The current situation presents opportunities and challenges for computational methods, but with an ironic contrast to the history of the development of these methods for solvated globular proteins. Computational modeling methods for globular proteins were developed against a background in which many protein structures and properties were well known and reasonably well understood, and could therefore provide target structures and fundamental design concepts. In contrast, there are no complete high-resolution structures of protein aggregates like amyloid. Furthermore,

279

280

Jun-tao Guo et al.

at least some aggregate structures appear to be hierarchical, and it is not well understood how much the higher degrees of order, such as bundling of protofilaments into amyloid fibrils, might contribute to the overall stability of the fibril. This introduces an awkward uncertainty when attempting to model a fibril substructure; it is not clear how much stability should be expected within the substructure, and how much might be provided only through its superassembly into a higher order of structure. It does not seem unreasonable to assume that the folding and stabilization of aggregates is guided by the same general relationships that guide the folding and stabilization of globular proteins (Williams et al., 2004). At the same time, it is possible that protein aggregates might be stabilized by some factors that are not so well understood, owing to our ignorance of both aggregate structure and assembly characteristics.

This chapter provides an overview of this challenging field, at a time when computational approaches are just beginning to be utilized to model aggregate structures and assembly. The first part of the chapter gives an overview of folding and aggregation and a survey of the process and products of misfolding and aggregation. The second segment provides a brief description of some of the major physical techniques currently being used to characterize various aspects of aggregate structure. The final part describes computational methods for approaching aggregate structure and aggregate assembly.

9.2 Folding and Misfolding

The recognition that simple proteins in ideal circumstances achieve their native folded states through thermodynamic control, influenced simply by their amino acid sequences and necessary cofactors (Anfinsen, 1973), introduced the conundrum of how it might be possible for all possible folded states to be explored during protein folding, to identify and secure the free energy minimum, in a realistic time frame (Levinthal, 1969). The solution to this paradox is now thought to be a combination of two factors: the lack of complete randomness in the starting, unfolded state (Fleming and Rose, 2005), and a restricted folding surface, often characterized as a rough-textured funnel, that features many (but far from infinite) parallel and interconnected pathways leading from the large number of unfolded conformations to the native state (Dill and Chan, 1997). There are exceptions to the general rule of thermodynamic control, however. Fifteen years ago two proteins were described that fold to a kinetically controlled local free energy minimum, producing a relatively stable, isolatable folded state that can be induced to reengage the folding pathway to produce a more stably folded monomeric structure (Creighton, 1992). These are alpha-lytic protease (Baker et al., 1992) and the serpin PAI-1 (Levin and Santell, 1987).

Besides these rarely reported alternative folded states, some globular proteins exhibit additional alternative states: misfolded aggregates. Although the ability of proteins to convert, under stress, to irreversibly aggregated, insoluble structures has long been appreciated, protein aggregates have never seriously been included in

9. Modeling Protein Aggregate Assembly and Structure

281

debates over whether protein folding is controlled kinetically or thermodynamically. This may be because aggregates were considered an unnatural aberration, or because it has never been clear whether aggregate formation itself is under kinetic or thermodynamic control.

Over the past two to three decades, the biological importance of protein aggregates as nontrivial, alternative folded states has been established. Misfolding/ aggregation has been revealed to be a major side reaction during normal protein folding in the cell (Turner and Varshavsky, 2000), which is consistent with the huge investment paid by molecular and cellular evolution in developing pathways to manage aggregation such as molecular chaperones (Hartl and Hayer-Hartl, 2002) and the ubiquitin proteasome system (Petrucelli and Dawson, 2004; Vigouroux et al., 2004). Aggregates are associated with a wide variety of human diseases, including Alzheimer’s disease, Creutzfeldt-Jakob disease, Huntington disease, Type II diabetes, and Parkinson disease (Martin, 1999; Merlini and Bellotti, 2003). These manifestations of aggregation can be added to the more biotechnological challenges of protein stability (often synonymous with avoidance of aggregation), inclusion body formation during recombinant expression, and aggregate formation during refolding (Wetzel, 1994; Wetzel and Goeddel, 1983). Given the ubiquitous nature of protein aggregation, the importance of learning more about aggregate structure is clear.

The question of kinetic versus thermodynamic control of the formation of amyloid and other aggregates is only in the early stages of being addressed. For simple peptides that have no highly stable solution conformation, it is sometimes possible to establish an equilibrium between bulk phase monomer and amyloid, and to actually determine an equilibrium constant and free energy (O’Nuallain et al., 2005; Williams et al., 2004, 2006). For stably folded, globular proteins, which only form amyloid under conditions that disfavor the native state (Colon and Kelly, 1992; Hurle et al., 1994; McCutchen et al., 1993), it may be difficult to identify conditions where both native and fibril species can be populated simultaneously so that relative stabilities might be directly assessed. Kinetic partitioning appears to play a big role in aggregation during folding reactions of some proteins, where rapid transformation to an aggregation-committed intermediate can effectively take molecules out of the productive folding pathway (Finke et al., 2000; Goldberg et al., 1991; HaasePettingell and King, 1988). The above considerations are important for two major reasons. First, without having some idea of the relative thermodynamic stability of amyloid fibrils, it is difficult to validate computational models. Second, an awareness of both the thermodynamics and kinetics of aggregation will be important when evaluating computational simulations of the aggregation process.

9.2.1 Types of Aggregates

Inclusion body formation in bacteria: Bacterial inclusion bodies (IBs) rich in protein were first observed when bacteria were fed amino acid analogues that were incorporated into protein (Prouty et al., 1975) and were later observed in recombinant expression of proteins, where they require elaborate methodologies for either

282

Jun-tao Guo et al.

suppressing their formation or facilitating recovery of native protein (De Bernardez Clark et al., 1999; Marston and Hartley, 1990). Except for a general enrichment in-sheet, we have very little knowledge of the intimate details of protein structure within IBs.

In spite of their different appearances under the EM, IBs and amyloid fibrils share quite a few similarities in their formation and properties (Wetzel, 1992), including a shared role of native state destabilization in their formation (Chan et al., 1996) and a shared susceptibility toward mutations (Wetzel, 1994). Monomeric nucleation, only recently described in the aggregation kinetics of polyglutamine amyloid formation (Chen et al., 2002), was very recently also observed in protein aggregation in bacteria (Ignatova and Gierasch, 2005). IBs and amyloid, along with other protein aggregates, including thermally induced aggregates and aggregates formed during folding in vitro, appear to consist largely of -sheet structure (Oberg et al., 1994; Sunde and Blake, 1997). The ultrastructure of neuronal IBs formed in Huntington’s disease tissue shows that these large aggregates are collections of fibrillar structures (DiFiglia et al., 1997), presumably polyglutamine amyloid fibrils (Scherzinger et al., 1997).

Amyloid and other disease-related aggregation: Over the past 10–20 years, amyloid fibrils and related pathological protein aggregates have become increasingly important subjects of research. This is primarily because of the evidence that these aggregates are associated with human disease (Martin, 1999; Merlini and Bellotti, 2003). Relative to other aggregates, we know a considerable amount about amyloid structure. Some of these details will be discussed below in the review of biophysical and biochemical techniques for analyzing fibril structure. Nonamyloid aggregates, such as spherical oligomers and protofibrils (Caughey and Lansbury, 2003), may be even more important to human disease than are mature fibrils.

Amyloid fibrils are normally found outside the cell, but there are clearly aberrant aggregation reactions that occur inside the cell as well. This is not surprising, given the complex biochemical systems that have evolved to deal, at least in part, with protein misfolding and aggregation, including molecular chaperones (Hartl and Hayer-Hartl, 2002), the ubiquitin proteasome system (Petrucelli and Dawson, 2004; Vigouroux et al., 2004), aggresomes (Kopito, 2000), and autophagy (Glickman, 2000). We know much less about intracellular aggregates than we do about extracellular amyloid. Aggregation in the presence of chaperones, even when they can not completely block aggregation, can lead to aggregates with substantially altered properties (Muchowski et al., 2000). The important cross-talk between fundamental biophysics and evolved biochemical pathways in the intracellular aggregation of proteins is little understood.

Aggregation during refolding: The denaturing conditions required for recovery of material from IBs yield proteins that must be refolded. Some proteins can be easily refolded from the denatured state in vitro; these include classical models for protein folding studies like ribonuclease and staph nuclease. More typically, proteins refold

9. Modeling Protein Aggregate Assembly and Structure

283

inefficiently, leading to formation of aggregates that limit the yields of the folding reaction. An important area of biotechnology is the development of methods for improving refolding yields (De Bernardez Clark et al., 1999).

Aggregation of stressed native proteins: Aggregation in vitro induced by exposure of the native protein to stresses like heat or denaturing solvents is also important in biotechnological applications such as enzyme reactors. This is also the classic protein aggregation first studied as one of the few established means of characterizing protein structure (Chick and Martin, 1910). Aggregation often takes place in the thermal unfolding transition zone. If aggregates form at an elevated temperature, triggered by unfolding of the native state, they tend to remain stable and insoluble when the solution is returned to lower temperatures. This essentially irreversible removal of native protein from solution, either at an elevated temperature or during the return to the normal temperature range (which many times can occur without detectible chemical changes), is the basis for the thermal inactivation of many proteins. The ability of a particular protein to aggregate under defined conditions depends broadly on two features: (1) the stability of the native fold and (2) the aggregation tendency of the unfolded or misfolded states, including folding intermediates (Wetzel, 1994).

The above type of aggregate is presumably related to another major aggregation problem of the pharmaceutical industry, the aggregates that form in protein solutions during processing or during storage in the vial. Aggregates in injected proteins are a serious problem for several reasons, including (1) diminution of active molecules,

(2) inflammation at the injection site, and (3) stimulation of an antibody response (Cleland et al., 1993; Hermeling et al., 2004; Shire et al., 2004). These aggregates can form in more subtle ways, not only during long-term storage, but also under the influence of shear forces (for example, during injection) or due to an inadequate lyophilization process. The study of how these aggregates form, their structures, their biological effects, and most importantly how to avoid them, is of critical importance to the pharmaceutical industry.

9.2.2Structural Hierarchy of Amyloid and Conformational Isomerism

An added complication to thinking about amyloid structure is the number of levels of structure involved. Like the coiled coils of collagenlike molecules, amyloid fibrils exhibit a quaternary structure that is both a distinguishing feature of amyloid and at the same time a source of great variation in morphological forms (Goldsbury et al., 2000; Sunde and Blake, 1997). The fundamental assembly unit of amyloid is the protofilament, a structure normally not seen in isolation but rather as a component of higher structure in fibrils. The classical amyloid fibril appears in the EM as a twisted rope; the protofilaments are the major strands making up the rope. There can be two to six protofilaments bundled together to make an amyloid fibril, with the number tending to be characteristic of a particular protein fibril. Protofilaments

284

Jun-tao Guo et al.

appear to be capable of assembling into other types of superstructures as well, such as ribbons containing multiple protofilaments in a flat array, or twisted bundles composed of multiple-protofilament ribbons (Goldsbury et al., 2000). These alternatively assembled states, often viewed in the same preparation of fibrils (Goldsbury et al., 2000), may arise from, or at least be associated with, different conformations of the component protein as it is packed into the fibril structure (Petkova et al., 2005). If, as it appears, all of these forms are based on the protofilament, it would appear that the simplest target of structure determination and prediction should be the protofilament.

It remains to be seen whether differences in quaternary structure (that is, protofilament packing) are the basis for self-propagating conformational variants of amyloid fibrils, or result from underlying differences in secondary and tertiary structure (such as formation of the amyloid filament core). Whatever their source in the assembly pathways and/or structural nuances of amyloid, different superassembled states of protofilaments can be very important biologically. Conformational variation among amyloid fibrils is now considered to be the underlying factor controlling strain phenomena and species barriers in mammalian (Horiuchi et al., 2000) and yeast (Tanaka et al., 2004) prion biology.

The recent recognition that a single amyloidogenic polypeptide can make multiple amyloid conformational states that can exhibit tangible structural differences (Petkova et al., 2005; Tanaka et al., 2004) introduces a further complication into the goal of approaching amyloid structure through a combined and iterative program of experiment and computational modeling. A necessary prerequisite to successful improvement of modeling approaches is the ability to validate methods through comparison of modeled structures with experimental data. Yet different approaches to characterizing some amyloid fibril structures have led to conflicting results. For example, solid-state NMR suggests that the C-terminal amino acids of A (1–40) are involved in H-bonded -sheet (Petkova et al., 2005) in the fibril, while scanning proline mutagenesis (Williams et al., 2004) and hydrogen exchange (Whittemore et al., 2005; Kheterpal et al., 2006) suggest that the C-terminal residues are disordered. It now seems possible that both results are correct: the solid-state NMR result was obtained on fibrils grown under conditions of agitation, while the proline and hydrogen exchange results were obtained on fibrils grown without stirring; these two methods of preparation are linked to fibrils with substantially differing structures (Petkova et al., 2005).

While the existence of amyloid isomerism is, in the short term, an unwelcome complication, in the long term it is an advantage for computational studies since it provides an opportunity to predict and account for relatively subtle variations among multiple fibril conformations.

9.3 Experimental Approaches to Aggregate Structure

X-ray diffraction: The difficult challenges of measuring aggregate structure owe mainly to the difficulty of crystallizing intact fibrils, so that detailed crystal

9. Modeling Protein Aggregate Assembly and Structure

285

diffraction patterns cannot be obtained. Particularly well-ordered aggregates can exhibit a few key reflections in powder diffraction experiments consistent with dominant repeating structure. In particular, amyloid fibrils aligned in the X-ray beam typically generate what is called the crosspattern, in which the reflections associated with strand–strand repeats in the hydrogen-bonding direction and in the -sheet packing direction are at 90to each other. The pattern indicates that the -extended chains are oriented in the fibril (and protofilament) such that they are perpendicular to the fibril axis, and held together by hydrogen bonds that are oriented parallel to the fibril axis (Sunde and Blake, 1997). In rare cases, high-resolution data on crystalline forms of short, amyloidogenic peptides have provided a number of potential insights into amyloid structure, such as extremely low water content (Balbirnie et al., 2001; Nelson et al., 2005) and intimate side chain packing interactions (Makin et al., 2005; Nelson et al., 2005). While these and other (Elam et al., 2003; Schiffer et al., 1985) examples of infinite -sheet formation in crystal structures may offer important insights into amyloid fibril structure, it must be kept in mind that protein crystals and amyloid fibrils, while related, are different, for example in the hierarchical structure of fibrils discussed above. The question of whether crystals of amyloidogenic peptides accurately reflect fine details of amyloid structure has not been addressed experimentally.

Electron microscopy (EM): Our impressions of the hierarchical, filamentous structure of amyloid, and more globular structures of some other aggregates, come primarily from EM studies. Scanning transmission EM can be used to determine the mass per length of fibrils and other aggregates (Goldsbury et al., 2000). This analysis confirms objectively and quantitatively what is already apparent by visual inspection, that most fibril preparations are heterogeneous, containing a number of distinct types of fibrils that vary both by the amount of twist and by the number of protofilaments per fibril (Goldsbury et al., 2000). These morphological variations may, at least in some cases, correspond to different self-replicating structural variants that can exhibit different stabilities (Tanaka et al., 2004) and molecular substructures (Petkova et al., 2005). In particularly favorable cases, especially in cryo-EM, image averaging has provided significantly enhanced detail, such as the number of protofilaments and the overall cross-sectional density within the fibril (Jimenez et al., 1999, 2002; Wille et al., 2002).

Atomic force microscopy (AFM): Scanning probe techniques can provide striking images that offer similar resolution to EM but with different strengths and sometimes superior results (Stine et al., 1996). Staining is not necessary in AFM, and precise height information can be obtained. The protofilament-based structural hierarchy of amyloid fibrils can be studied using AFM (Kanno et al., 2005). Nonfibrillar, potentially biologically relevant oligomeric states can also be visualized (Harper et al., 1997). In principle, scanning probe imaging also offers a wealth of probe types that might provide information on the chemical nature of the imaged surfaces. Force measurements can elucidate segmental stability (Kellermayer et al., 2005).

286

Jun-tao Guo et al.

While technically challenging, another advantage of AFM over EM is the ability to monitor temporal changes in structure in situ (Goldsbury et al., 1999).

Circular dichroism (CD): Like most other optical spectroscopies, CD is generally considered to be ineffective in analysis of large, light-scattering aggregates such as most fibril preparations. In rare cases the technique works remarkably well, not only demonstrating the high -sheet content expected of an amyloid preparation, but giving fibril formation kinetics that track with other measures of amyloid assembly (Chen et al., 2002). Conformational variants can also be distinguished by CD in favorable cases (Yamaguchi et al., 2005).

Vibrational spectroscopy: Fourier transform infrared spectroscopy (FTIR) is one of the few techniques available for analysis of the protein solid state, and has been used to advantage in amyloid studies to demonstrate the existence of -sheet, other secondary structural elements, and the parallel/antiparallel nature of the sheet. The method is useful as a qualitative tool to follow changes in secondary structure as an assembly reaction proceeds, and, with appropriate cautions, can be used to extract secondary structure content. There are a number of technical problems that must be sorted out for the effective use of this method in analyzing protein aggregates (Nilsson, 2004).

Solid-state NMR: Great strides have been made in recent years in the collection and analysis of NMR data on protein aggregates in the solid state. Chemical shift information can be interpreted in terms of the , angles of the alpha carbons, and

˚

interatomic distances up to about 6 A can be deduced from dipole–dipole couplings (Tycko, 2000). In favorable cases, the collection of a large number of distance restraints can allow piecing together of a fairly high resolution structure within a single extended chain element (Jaroniec et al., 2004). Other key structural information obtained using solid-state NMR includes the parallel/antiparallel nature of adjacent strands in protofilaments (Benzinger et al., 1998), the identification of long-range, nonbonded interactions (Petkova et al., 2002), and the demonstration of altered intrastrand folding in self-propagating fibrils exhibiting different morphologies in EMs (Petkova et al., 2005). Since solid-state NMR data can be collected on bona fide amyloid fibrils, when high-resolution data can be obtained (Jaroniec et al., 2004) this method appears to offer the most complete insights available into amyloid fibril structure.

Electron spin resonance: Judiciously chosen probes coupled to the thiol side chain of mutationally introduced Cys residues have been used to sense local environment in globular proteins (Hubbell et al., 2000), and this technique is now being applied to amyloid fibrils. In principle, one can observe distances between spin labels, and the mobility and solvent accessibility of the label at different positions in a protein structure. The method has been used to map out regions of parallelism and side-chain mobility in amyloid fibrils of A (1–40) (Torok et al., 2002) and a number of other proteins.

9. Modeling Protein Aggregate Assembly and Structure

287

Hydrogen–deuterium exchange: Hydrogen exchange (HX) has been used with great success to map structural features of globular proteins, in particular backbone secondary structure and sterically inaccessible sites. For globular proteins where data acquisition by real-time NMR and mass spectrometry (MS) is possible, methods can be adapted to examine the structure of the native state as well as dynamic aspects of protein structure and folding (Ferraro et al., 2004; Li and Woodward, 1999). Analysis of exchange into aggregates is significantly complicated by the requirement to break aggregate structure and restore the soluble, monomeric state to the precursor protein before it can be analyzed. The challenge in studies of aggregate structure has been to minimize and adjust for the loss of exchange information occurring during the sample processing. Loss of information due to HX during sample processing can be minimized in an MS approach using an in-line T-tube that makes sample dissolution and presentation to the MS probe one seamless operation (Kheterpal et al., 2000). Effective schemes for correcting data to account for exchange during sample preparation have been described (Kheterpal et al., 2003b). The HX-MS technique is rapid and sensitive enough to collect exchange protection data on the oligomeric/protofibrillar intermediates of A (1–40) amyloid assembly; interestingly, this work showed that there is a class of highly protected H-bonds even in metastable protofibrils (Kheterpal et al., 2003a). Proteolytic fragmentation coincident with fibril dissolution can be used to obtain protection information on segments of the peptide in the fibril (Wang et al., 2003, Kheterpal et al., 2006). By using MS to analyze a large collection of overlapping proteolytic fragments from fibrils exposed to D2O, it is possible to assign protection factors at the single residue level (Del Mar et al., 2005).

When signal dispersion allows, and individual hydrogens have been assigned, NMR can be used to follow the fate of each hydrogen as signal loss follows deuterium exchange. The trick to obtaining exchange information by NMR is to identify an aprotic NMR solvent that dissolves the protein, minimizes artifactual exchange during data collection, and allows good dispersion of the amide proton resonances. An effective solvent has been dichloroacetic acid/dimethylsulfoxide mixtures, and this has allowed the NMR method to map exchange protection in amyloid fibrils composed of peptides (Kuwata et al., 2003) and even the small protein 2-microglobulin (Hoshino et al., 2002). Identical A (1–40) fibrils have been exposed to HD exchange monitored by both MS (Kheterpal et al., 2000, 2003b, 2006) and NMR (Whittemore et al., 2005), and the results from the two methods are in very good agreement.

Limited proteolysis: Limited proteolysis has been used to understand globular protein structure, motility, and folding (Fontana et al., 1997). The major advantage of the technique is the ability to provide structural information on systems that are not readily amenable to other techniques. Major disadvantages are low resolution and a dependence for success on factors, such as proteolysis kinetics, often out of the control of the experimenter. The basic principle is that solvent-accessible regions of the polypeptide backbone are revealed by their ability to be cleaved by an endoproteinase such as trypsin. Potential limitations of the use of the method are: (1) proteolysis