Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang

.pdf
Скачиваний:
61
Добавлен:
10.08.2013
Размер:
10.5 Mб
Скачать

7. Local Structure Prediction of Proteins

227

where any discernible sequence similarity has been lost as a result of mutation and insertion/deletion events. A classical example of this is chymotrypsin, where fusion of two duplicated genes, each coding for a separate -barrel domain, has resulted in a two-domain enzyme. The active site consists of amino acids of both domains and shows a greatly enhanced activity as compared to a suspected ancestral active center within an individual ancestral barrel (Heringa, 1994). The amino acid sequences of the two barrels have diverged so much that the duplication event had to be inferred from the structural similarity (McLachlan, 1979).

7.5.3 Protein Repeats Detection

Considerable sequence divergence as well as the short lengths of many sequence repeats imply that repeats detection can be a particularly arduous task. The problem of recognizing internal sequence repeats in proteins has been tackled by many researchers. One of the pioneers in the automatic detection of repeats was McLachlan, who devised the first methods over three decades ago (McLachlan, 1972). These methods relied on Fourier analysis (McLachlan and Stewart, 1976; McLachlan, 1977) and this technique remained popular (Kolaskar and Kulkarni-Kale, 1992; Taylor et al., 2002). Although Fourier transforms are designed to detect periodic behavior, the application to protein sequence signals is compromised by the fact that many repeats are distant as a result of mutations and insertions/deletions, and can be intervened by different irregular sequence stretches. Moreover, proteins can contain multiple repeat types, all with different base periodicities, which decrease the periodic signal for any one type. Finally, Fourier techniques require a relatively large number of repetitions, whereas many proteins contain only few repeats.

Another approach to delineate repeats in protein sequences was made by exploring DP. First attempts were made by McLachlan (1983) who used the DP technique over fixed window lengths on myosin rod repeats (McLachlan, 1983). Boswell and McLachlan (1984) elaborated the method by incorporating dampening factors and allowing the occurrence of gaps (Boswell and McLachlan, 1984). Argos (1987) also adopted the window technique but exploited physicochemical properties of amino acids in addition to the PAM250 residue exchange matrix (Dayhoff et al., 1983), and used the technique to detect repeats in, for example, frog transcription factor IIIA (TFIIIA), human hemopexin, and chick tropoelastin (Argos, 1987). Huang et al. (1990) used local alignments (Smith and Waterman, 1981) to find the repeats in rabbit globin genes (Huang et al., 1990). Their method SIM is a memory optimized implementation of the approach introduced by Waterman and Eggert (1987), which calculates a list of top-scoring nonintersecting local alignments, meaning that no alignment has a given matched amino acid pair in common.

Following these initial developments, a number of methods of delineating internal repeats in protein sequences were reported. These include the early and popular REPRO method (Heringa and Argos, 1993), the fast but inexact method of Pellegrini et al. (1999), the RADAR method (Heger and Holm, 2000), and the TRUST method (Szklarczyk and Heringa, 2004). Taylor et al. (2002) devised a method based on

228

V.A. Simossis and J. Heringa

Fourier analysis to automatically annotate repeating local 3D fragments in protein tertiary structures. These methods are discussed in more detail in Section 7.8.4. Links to various web interfaces to these methods are provided in Section 7.9.3.

7.6 Applications to Multiple Sequence Alignment

The prediction of protein local structure elements has increasingly infiltrated the field of sequence alignment in recent years. In this section we will discuss how structural features such as secondary structure, supersecondary structure, and repeats can enrich the information used in sequence-based alignment methods toward a more accurate detection of similarity.

7.6.1Structure Is More Conserved Than Sequence

Most alignment methods, irrespective of whether they align two or more sequences, rely entirely on the residue information provided by the sequences they align. As would be expected, the detection of similarities between sequences becomes harder as the level of mutational change that has occurred through evolution increases (Rost et al., 1994). It has been known for many years that alignment quality suffers when the sequence identity of two sequences drops below 30%, the so-called “twilight zone”.

Unlike primary structure, the higher structural levels are more conserved through evolution (Chothia and Lesk, 1986). The reason for this is that function is mostly connected to the structure of a protein rather than its residue composition. Therefore, although mutations may alter individual residues in a protein, the structure remains relatively unchanged so that functionality is not lost or inhibited. As a result, structure is a better candidate for detection of homology in distant relatives. Consequently, the use of structural information has been integrated into many alignment methods (Heringa, 1999, 2000, 2002; Ginalski et al., 2003; Chung and Yona, 2004; Ginalski et al., 2004; Simossis and Heringa, 2004b; von Ohsen et al., 2004; Soding, 2005). However, the number of known structures compared to the number of protein sequences remains limiting because the rate at which sequences are added to databases is much faster than that at which protein structures are solved. As a result, in the absence of a crystal structure, predictions of protein local structures can be used to fill the gap.

7.6.2Integrating Predicted Local Structure Information into an Alignment

The integration of known or predicted secondary structure information into an alignment algorithm can be done in several ways. Early on, the approach simply involved increasing the gap penalties in helical or strand regions in order to bias the algorithm

7. Local Structure Prediction of Proteins

229

to insert gaps in between SSE regions (Sander and Schneider, 1991). In more recent approaches, in addition to the generalized exchange matrices, secondary-structure specific exchange matrices [e.g., the L¨uthy series (Luthy et al., 1994)] have been used for scoring those sequence or profile positions that belong to the same secondary structure class (Heringa, 1999, 2000, 2002). Other researchers have combined the two types of matrices in different schemes (Yu et al., 1998; Hedman et al., 2002; Ginalski et al., 2003; Chung and Yona, 2004; Teodorescu et al., 2004).

In terms of other predicted local structure elements such as repeats and supersecondary structure, no systematic analysis has been done on how its incorporation might aid alignment quality. However, it is conceivable that the identification of repeats prior to alignment would greatly aid the correct positioning of repeated regions and avoid incorrect shifts of the alignment. Similarly and probably more importantly, the assignment of basic supersecondary structures to the known or predicted secondary structure would also greatly improve the alignment of sequences. At least, it would help discern between secondary structure patterns that although similar in two dimensions, do not actually fold the same way in the 3D structure and therefore may not align as tightly as would be assumed by current secondary structure-guided methods.

7.6.3 Local Structure Prediction and Alignment Interdependence

The majority of the current secondary structure-integrating strategies are limited to pairwise local alignment strategies implemented for homology detection (Ginalski et al., 2003, 2004; Chung and Yona, 2004; von Ohsen et al., 2004; Soding, 2005). Conversely, the use of predicted secondary structure to guide MSA has not been exhaustively investigated. Early on, Heringa used PREDATOR (Frishman and Argos, 1996, 1997) predictions to guide the alignments of the DP method PRALINE (Heringa, 1999) and found improvements in alignment quality when aligning 13 flavodoxins with cheY, a distant signal transduction protein that has very low sequence similarity but shares the same fold as the flavodoxins (Heringa, 2000). In this case, the secondary structure prediction program used did not depend on the MSA quality.

Later, Heringa (2002) also extended the MSA–secondary structure prediction interrelationship to an iterative scheme using SSPRED (Mehta et al., 1995), a more advanced MSA-dependent method of the time. In this scenario, an initial MSA is used for the prediction of the secondary structures of the sequences to be aligned and then these predictions are reintroduced to produce a new secondary structureguided alignment. The new, more correct alignment is then used in the next iteration step to derive new, more accurate secondary structure predictions and so on. Simossis and Heringa have recently re-designed PRALINE (Heringa, 1999) to use the MSA-dependent secondary structure prediction methods PHD (Rost and Sander, 1993), PROFsec (Rost, personal communication), JNET (Cuff and Barton, 2000), and SSPro2 (Pollastri et al., 2002) in this iterative approach. Preliminary results show that for the alignment of the 13 flavodoxin sequences and cheY, the initial

230

V.A. Simossis and J. Heringa

PHD prediction for the most difficult sequence (cheY) is vastly improved by this iterative scheme.

7.7Applications to Local Protein Tertiary Structure Prediction

Protein tertiary structure prediction is a vast and intense area of research. The ability to predict a protein’s 3D structure from the amino acid sequence is one of the outstanding grand challenges in molecular biology, despite almost 40 years of computational research on the subject. A multitude of approaches have been attempted over the years to predict tertiary structure, ranging from simplified lattice models to full-scale energy-based atomic modeling using complex force fields. These can be grouped into two fundamentally different classes of methods to predict 3D structure from amino acid sequence. The first is ab initio prediction, which attempts to predict the folding of an amino acid sequence without any direct reference to other known protein structures. Computer-based calculations are employed that attempt to minimize the free energy of a structure with a given amino acid sequence or to simulate the folding process. The utility of these methods is limited by the vast number of possible conformations, the marginal stability of proteins, and the subtle energetics of weak interactions in aqueous solution. For a detailed account of ab initio prediction, see Chapter 13. The second group of methods takes advantage of our growing knowledge of 3D structures of proteins. In these knowledge-based methods, an amino acid sequence of unknown structure is examined for compatibility with any known protein structures. These techniques are also referred to as threading (see Chapter 12). If a significant match is detected, the known structure can be used as an initial model. Knowledge-based methods have led to many insights into the 3D conformation of proteins of known sequence but unknown structure. To date, the most reliable way to predict a protein structure is homology modeling, where the sequence of an unknown protein is aligned to another homologous protein sequence for which the tertiary structure is known. Typically, for those parts of the query sequence that are aligned with core secondary structures of the template structure, the backbone topologies of these structures are taken. It is clear that for this transfer of information the quality of the alignment between query and template sequence is crucial. A recent survey suggested that the recent improvements in scope and quality of comparative models largely come from the increased number of available protein sequences, resulting in better multiple sequence alignments (Cozzetto and Tramontano, 2005). Techniques have also been created to optimize the alignment of query and template sequence(s) by incorporating information from the template structure (e.g., Kleinjung et al., 2004). Two tasks then remain: one is to model the sidechains of the core elements, while the other is to model the loops connecting the core SSEs. Loop modeling has been defined as finding the ensemble of possible backbone structures, associated with the sequence segment corresponding to the loop, that are geometrically consistent with preceding and following parts of the

7. Local Structure Prediction of Proteins

231

loop whose 3D structures are given. The latter, also referred to as loop closure, is a complicated chore to achieve.

A vast number of folding experiments suggest that two conformational states are present to any significant extent, folded and unfolded. Such observations demonstrate that protein folding and unfolding result from a cooperative transition (Bystroff et al., 2000). The ultimate consequence of cooperativity is that if a protein is placed in conditions under which some part of the protein structure is rendered thermodynamically unstable, the interactions between it and the remainder of the protein will be lost. The loss of these interactions, in turn, will then destabilize the remainder of the structure. However, the conclusion that conditions leading to the disruption of any part of a protein structure will unravel the protein completely, cannot be generally maintained given the recent observations of natively disordered protein regions or even complete proteins (see Section 7.4).

Applications such as homology modeling or protein docking are based on the assumption that a protein’s inner core is less prone to movement than surface residues. This notion is supported by the fact that within homologous families, variations of the basic 3D topology associated with a given family are normally located at loop regions, ranging from the extension of a loop by one or a few extra residues, via additional SSEs, to complete domain insertions.

The most important applications of local 3D structure modeling are recognizing and modeling protein ligand-binding sites (An et al., 2005) and protein–protein interaction (PPI) sites, where the ability to model the conformation of surface residues is a crucial issue. Particularly, PPI sites are notoriously difficult to predict (Bordner and Abagyan, 2005). Furthermore, the computational methods designed for these tasks are computationally intensive, such that web interfaces to available programs are largely absent.

Another use of local 3D protein structures is by using local segments as found in the PDB database of tertiary structures (Dutta and Berman, 2005). An important example of this is the Robetta server (Kim et al., 2004) for homology or ab initio modeling, which makes use of fragment libraries. Fragment libraries are the pieces of experimentally determined structures that Robetta uses to guide the search of conformational space when predicting structures using its ab initio protocol, as well as longer loop conformations in homology models.

Apart from improvements in force fields leading to enhanced and flexible docking approaches, further developments might come from new mesoscopic modeling approaches, in which protein structures are not described at the atomic level, but by means of mesoscopic quantities like the number of effective particles (“beads”) in a polymer and an effective potential between these particles. Such approaches aim to be more computationally efficient, allowing genomic pipeline screening modes, while preserving or even enhancing accuracy.

Another avenue to future improvements will be to utilize the ever-growing genomic sequence database and exploit evolutionary comparison methods, bridging for example multiple alignment information and structural descriptions of known binding sites and/or ligands. In such a knowledge-based approach, protein–ligand and

232

V.A. Simossis and J. Heringa

protein–protein interactions might be delineated in the absence of three-dimensional modeling scenarios.

7.8 Software Packages

In the following section we describe local structure prediction software packages that at the time this chapter was written were either available for use as a web service or downloadable for local use. The examples cover secondary and supersecondary structure prediction tools using all machine learning approaches discussed (Sections 7.9.1 and 7.9.2), disordered region (Section 7.9.3), and repeats detection (Section 7.9.4).

7.8.1Secondary Structure Prediction

7.8.1.1 k-Nearest-neighbor

Software Package 1. PSSP/ APSSP/ APSSP2

In the original PSSP secondary structure prediction method (Raghava, 2000) the authors introduced the combination of an NN and a customised kNN technique on single sequence prediction. The principle behind this combination was that on the one hand, protein queries that had closely related examples of known secondary structure would get a better prediction using the kNN technique than solely using an NN, but on the other hand, in the event of example absence, the NN would provide a better prediction than the kNN approach. The combination of the predictions of the two techniques was based on per-residue state-specific probabilities of correct prediction calculated by the NN rather than a binary (one or the other) use of the techniques according to the query. In addition, the final prediction was further filtered using an extra NN, much like that operating in PHD, where single-residue strands and helices were corrected.

The customization of the kNN technique used in PSSP was first the extending of the existing database of known examples that had earlier been derived from 126 proteins (Raghava, 2000) to a much larger training dataset by using all of the proteins in the 1998 version of the PDB database (Berman et al., 2000). This way, the number of possible examples was greatly increased; leading to an increase in the number of proteins the technique could correctly handle. Second, due to the increase in examples, the authors developed a way to minimize the computational time of comparing the query to the example database, reporting an 800-fold increase in computational speed (Raghava, 2000).

The most recent versions of PSSP are APSSP (Raghava, 2002b) and APSSP2 (Raghava, 2002a). APSSP and APSSP2 are both three-step methods that use an MSA as input to a combined NN and Example-Based Learning (EBL) system. The main difference between the two is the way the initial step is carried out. In APSSP, the first step is performed automatically by the external secondary structure prediction

7. Local Structure Prediction of Proteins

233

method Jnet (Cuff and Barton, 2000). The Jnet method is described in detail later on in the consensus approach section. Conversely, in its first step APSSP2 generates an MSA using the PSIBLAST search tool (Altschul et al., 1997) and an initial prediction is produced using the same standard NN as PSSP (Raghava, 2000). In the second step, a customized EBL technique has replaced the kNN approach and is used to generate a second separate prediction. As in PSSP, in the third step, the secondary structures predicted from the first two steps are combined based on prediction reliability scores.

The PSSP and APSSP methods have now been replaced by APSSP2, which is available online as an automatic prediction server and as part of the EVA assessment server (Koh et al., 2003) (see Section 7.9.1).

7.8.1.2 Neural Networks

Software Package 1. PHD/PHDpsi/PROFsec

The secondary structure prediction method PHD (Rost and Sander, 1993) was the first algorithm to employ NNs and database searching. At a time when prediction was stuck in the high ends of 60%, it gave a groundbreaking boost to about 73% (Rost and Sander, 1993). The modus operandi of PHD was the search of the SWISSPROT (Bairoch and Boeckmann, 1991; Boeckmann et al., 2003) database using the MAXHOM MSA method (Sander and Schneider, 1991). The resulting MSA was passed into the PHD three-layer network and generated its prediction [more details can be found in Rost and Sander (1993) and a review in Heringa (2000)]. In later years, PHD was updated to PHDpsi (Przybylski and Rost, 2002) that used the iterative homology search engine PSI-BLAST on the much larger BIG database, which is a nonredundant merge of the PDB (Berman et al., 2000), TrEMBL, and SWISS-PROT. More recently, although still unpublished, PHD has evolved even more and now takes advantage of bidirectional recurrent NNs (BRNNs) in PROFsec (Rost, personal communication).

All three PHD flavors can be used online as part of the Predict Protein Server (Rost, personal communication), which is one of the first members of the EVA assessment server (Koh et al., 2003) (see Section 7.9.1).

Software Package 2. PSIPRED

The PSIPRED method incorporates MSA information and NNs (Jones, 1999). The alignment information used is represented by a position-specific scoring matrix (PSSM) generated by the PSI-BLAST algorithm (Altschul et al., 1997; Altschul and Koonin, 1998) and is inputted to a two-layered NN.

The accuracy of the PSIPRED method is 76.5%, as evaluated by the author (Jones, 1999) and continues to rank among the best methods according to the EVA assessment server (Koh et al., 2003) (see Section 7.9.1).

Software Package 3. SSPro

SSPro is an NN prediction method that employs 11 BRNNs (bidirectional recurrent neural networks) to generate its predictions, instead of the commonly used

234

V.A. Simossis and J. Heringa

feedforward networks. The first version of SSPro (Baldi et al., 1999) used BLAST (Altschul et al., 1997) to generate multiple alignments as input, while in the second and current SSPro version (Pollastri et al., 2002), multiple alignments of homologue sequences are obtained using PSI-BLAST (Altschul et al., 1997; Altschul and Koonin, 1998).

The authors have quoted SSPro2 to have an average prediction accuracy (Q3) of 78% (Pollastri et al., 2002). In addition, the SSPro algorithm has also been experimentally implemented to predict eight-state secondary structure (H: -helix, G: 3/10-helix, I: -helix, E: extended strand, B: -bridge, T: turn, S: bend, C: coil) from primary sequence. In the same paper as SSPro2, the authors present SSpro8 trained using BLAST and PSI-BLAST profiles. The quoted overall performance (Q8) of SSpro8 was 62–63%. An automatic server is available online and is part of the EVA assessment server (Koh et al., 2003) (see Section 7.9.1).

Software Package 4. YASPIN

YASPIN (Lin et al., 2005) uses a feedforward perceptron NN with one hidden layer to predict the SSEs (Bishop, 1995). These predictions are then filtered by an HMM.

The YASPIN NN uses the soft-max transition function (Bishop, 1995) with a window of 15 residues. For each residue in that window, 20 units are used for the scores in the PSSM and 1 unit is used to mark where the window spans termini of protein chains. In total, the input layer has 315 units (21 × 15). For the hidden layer we have used 15 units. The output layer has 7 units, corresponding to 7 local structure states: helix beginning (Hb), helix (H), helix end (He), beta beginning (Eb), beta (E), beta end (Ee), and coil (C).

The seven-state output of the NN is then filtered through an HMM, which uses the Viterbi algorithm (Durbin, 1998) to optimally segment the seven-state predictions. The HMM defines the transition probabilities between the seven local structure states. The final output is a three-state secondary structure prediction (H: helix, E: beta strand, C: coil). The YASPIN server has recently been added to the EVA assessment server (Koh et al., 2003) (see Section 7.9.1).

7.8.1.3 Hidden Markov Models

Software Package 1. SAM-T99/SAM-T02

The SAM (Sequence Alignment and Modelling system) software package (Hughey and Krogh, 1996) is a collection of tools that use linear HMMs for sequence analysis. Integrated into the package are the SAM-T99 (Karplus et al., 1998; 1999) and SAMT02 (Karplus et al., 2002; 2003) structure prediction methods. Both methods predict the fold and secondary structure of a target protein sequence using multitrack HMMs and NNs.

The SAM-T02 method is an updated version of SAM-T99 and also incorporates secondary structure information into the scoring schemes it uses. Its HMMs and NNs have been trained on MSAs generated by the SAM-T2K iterated search procedure (Karplus et al., 2001).

7. Local Structure Prediction of Proteins

235

The procedure SAM-T02 follows is to build an MSA of homologues to the target sequence using SAM-T2K and then employ NNs to make local structure predictions. The refinement and analysis of the HMM alignments returned can be performed by additional software found in the SAM package. Online servers for both SAM-T99 and SAM-T02 are available, but the authors recommend the use of the most updated SAM-T02 method (see Section 7.9.1). Although SAM-T02 has not yet been added to the EVA assessment server (Koh et al., 2003), SAM-T99 is still one of the highest performing methods (Section 7.9.1).

7.8.1.4 Support Vector Machines

Software Package 1. Hua and Sun

Hua and Sun (2001) were the first to apply SVMs to predict the secondary structure at each location along a protein strand. In their method SSEs fall into three categories: helix (H), sheet (E), or coil. Accordingly, this multiclass recognition problem was addressed by training three separate SVMs, one per SSE. The protein sequence is encoded in redundant binary fashion, using an 11-residue sliding window. The final classification of a given amino acid is the label associated with the SVM that assigns the discriminant score that is farthest from zero. The per-residue accuracy (Q3) and segment overlap (SOV) (Zemla et al., 1999) scores quoted by the authors are 73.5 and 76.2%, respectively, which are comparable to existing NN-based methods. Unfortunately, no SVM methods are currently part of an automated assessment server.

7.8.1.5 Consensus Prediction

Software Package 1. Jpred/Jnet

The initial implementation of Jpred was a purely majority voting consensus-deriving method (Cuff et al., 1998; Cuff and Barton, 1999) (for review see Heringa, 2000). The current Jpred (Jnet) server (Cuff and Barton, 2000) uses a refined and processed PSI- BLAST-generated alignment, the PSI-BLAST and HMM profile of that alignment and performs its predictions through two fully connected three-layer NNs.

The alignments Jnet uses are generated using PSI-BLAST to scan a Seg- (Wootton and Federhen, 1996) and helixfilt- (D.Jones, unpublished) filtered version of the combined SWISS-PROT (Bairoch and Boeckmann, 1991) and TrEMBL protein sequence database. After three iterations of PSI-BLAST, all sequence pairs in the generated alignment are compared and the sequence percentage identities are used to cluster them. All sequences in the alignment with more than 75% identity are removed. The alignment is then further processed by removing all gaps from the target sequence including the corresponding column beneath that gap. This type of alignment processing is also observed in PHD, PHDpsi, and PROFsec and it is essential for the NN to work.

In addition to the alignment, Jnet also uses the PSI-BLAST-generated PSSM (PSI-BLAST profile) and the HMM profile of the alignment (using HMMER). The three input files are each used as input to the two neural networks. The first neural

236

V.A. Simossis and J. Heringa

network uses a 17-residue sliding window and predicts the per-residue propensity of helix, strand, and coil. The second network acts as a filter for each per-residue prediction from the first network, using a 19-residue sliding window. Finally, the three predictions are used to generate a per-residue consensus, which is the final prediction.

The authors of Jnet have quoted an average prediction accuracy of 76.4% when using all three input types, 71.6% when only using the refined and processed PSI-BLAST alignment, 74.4% when using the alignment, and HMM profile of the alignment, and 75.2% when using the alignment and the PSI-BLAST profile of the alignment. The EVA server has stopped the assessment of the Jpred method due to a move in URLs. The quoted Q3 of 72.8% over 167 proteins refers to the original Jpred method (see Section 7.9.1).

Software Package 2. SymSSP/SymPRED

The original “majority voting” technique was improved upon by the use of dynamic programming (DP) (Needleman and Wunsch, 1970) in the SymSSP (Simossis and Heringa, 2004a) and soon after in the SymPRED (Simossis and Heringa, submitted) methods. In both applications, an alignment of secondary structures is reduced to a weighted profile that describes helix, strand, or coil content of each position. The profile is scanned in windows of increasing length, from single position (window of 1) up to the length of the whole sequence, for each secondary structure type. Each secondary structure type segment is scored as the sum of all of its positions. These equivalent secondary structure-specific window scores are compared and the highest one is used to fill a search matrix. Finally, the DP routine finds the optimal path through the search matrix and thus provides an optimally segmented consensus prediction.

In the SymSSP method, the strategy was applied to the alignment-based predictions of a single method that preprocesses the input alignment prior to prediction by removing whole alignment positions that show a gap in the top sequence. When most of the discarded information was recovered for the methods PHD (Rost and Sander, 1993), PROFsec (B. Rost, personal communication), and SSPro (Pollastri et al., 2002), the results showed consistent modest improvement of the prediction quality of these methods based on Q3 and SOV (Zemla et al., 1999) score results . In the case of SymPRED, the DP strategy was applied to the predictions of various prediction methods and the results were compared to recent simple “majority voting” investigations (Albrecht et al., 2003; McGuffin and Jones, 2003; Ward et al., 2003). In both investigations the predictions produced were of higher quality than those produced by the simple “majority voting” strategy.

7.8.2Supersecondary Structure Prediction

7.8.2.1 Software Package 1. COILS2

Closely related to secondary structure prediction is the prediction of coiled-coil structures. If a soluble protein is predicted to contain -helices, higher-order information