Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Computational Methods for Protein Structure Prediction & Modeling V1 - Xu Xu and Liang

.pdf
Скачиваний:
61
Добавлен:
10.08.2013
Размер:
10.5 Mб
Скачать

7. Local Structure Prediction of Proteins

237

as well as increased confidence in predictions made could be gained from testing the possibility of it containing a coiled-coil supersecondary structure.

The program COILS2 (Lupas et al., 1991; Lupas, 1996) compares a query sequence with a database of known parallel two-stranded coiled coils. A similarity score is derived and compared to two score distributions, one for globular proteins (without coiled coils) and one for known coiled-coil structures. The two scores are then converted to a probability for the query sequence to adopt a coiled-coil conformation. Since the program assumes the presence of heptad repeats, probabilities are derived using default window lengths of 14, 21, and 28 amino acids. The program can also use user-defined window lengths for the prediction of extreme coiled-coil lengths. A recently updated scoring matrix, based on data from recent coiled-coil structures and containing amino acid type propensities for various positions in the heptad repeat, shows improved recognition of coiled-coil elements. The COILS2 method accurately recognizes left-handed two-stranded coiled coils but loses sensitivity for coiled-coil structures consisting of more than two strands. Also, it is not able to recognize right-handed or buried coiled–coil helices and therefore is not applicable to transmembrane coiled-coil structures known to show basically similar coiled-coil conformations as soluble proteins, albeit with dramatically different and more hydrophobic constituent amino acids (Langosch and Heringa, 1998).

7.8.2.2 Software Package 2. WD-repeats Prediction

The server “WD repeat Family of Proteins” (see http://bmerc-www.bu.edu/ wdrepeat/) is able to recognize putative WD-repeat sequences associated with 4- to 9-bladed 3D WD-repeat structures. These models combine a particular so-called Type-1 structural model with sequence-specific pattern information. Multidomain proteins can be handed to the server intact; the region containing the WD-repeat domain will be identified by the server automatically.

The analysis algorithm is based on probabilistic Discrete State-space Models (DSMs), and optimal filtering and smoothing algorithms (Stultz et al., 1993). The mathematical basis for the models and algorithms is described in White et al. (1994).

A protein sequence submitted to the server is first classified as “generic” or “wd repeat.” The class “generic” is designed for proteins not containing WD repeats. Superclass “wd repeat” is designed for the WD-repeat family of proteins. Under this superclass, there are six macroclasses for WD-repeat proteins, each of which contains a different number of WD repeats. Sequences containing fewer than four WD repeats will not be reported as a WD-repeat protein. This is due to the assumption that all WD-repeat proteins adopt a -propeller fold, which must have at least four blades to form a circular structure. The 4- to 9-bladed (WD4 to WD9) models that can be produced by the server correspond to sequence length ranges of 187–279, 233–332, 278–385, 323–437, 368–489, and 413–541 residues, respectively. To handle longer sequences, the algorithm is able to add leader and trailer to the models on the fly. Therefore, all models can recognize WD repeats within sequences longer than its maximum domain length up to an upper limit on sequence length of 1000 residues.

238

V.A. Simossis and J. Heringa

Each WD repeat has two conserved profiles denoted “profile 1” and “profile 2” (which may be approximated as “GHXXXVXXVXFX” and “XLASGSXDXTIKVWD,” respectively, as shown at http://bmerc-www.bu.edu/wdrepeat/) that are used in the DSM prediction. The probabilities of occurrence of each of these profiles will be reported if WD repeats are identified in the sequence. In addition, the strands within each of the aligned putative WD repeats will be designated, although individual -strand probabilities will not be reported. To provide the user insight in the 3D orientation of the WD repeats, a skeleton coordinate file in PDB format is included.

7.8.3Disordered Region Prediction

7.8.3.1 Software Package 1. PONDR

The PONDR suite contains several disorder prediction methods (Obradovic et al., 2003). The predictions from the methods VL2 and VLXT in the PONDR suite (Obradovic et al., 2003) come from ensembles of feedforward NNs trained on combinations of amino acid composition, flexibility, and sequence complexity. Sequence information is parsed using windows of generally 21 amino acids. The amino acid attributes are calculated over this window, and these values are used as inputs for the NNs, which calculate a value for the central amino acid in each window. These prediction values are then smoothed over a sliding window of 9 amino acids. If a residue value exceeds a threshold, the residue is declared disordered. Another predictor VL3 was trained using ordinary least squares regression with partitioning of the training set to cluster various “flavors” of disorder (Vucetic et al., 2003). Recently, a new disorder predictor VSL1 was added to the PONDR suite. The VSL1 predictor obtained the best results in a comparison including 20 different disorder prediction methods presented at the CASP6 structure prediction meeting in December 2004. The methods in the PONDR suite are not freely available.

7.8.3.2 Software Package 2. FoldIndex

The FoldIndex program is based on the calculations developed by Uversky et al. (2000) and predicts whether a sequence will fold by computing its mean net charge and hydrophobicity (Uversky et al., 2000). The window parameter for the FoldIndex classifier was set to 31 residues as this value achieved the highest accuracy on a validation set. The resulting data show that the combination of low mean hydrophobicity and relatively high net charge represents an important prerequisite for the absence of regular structure in proteins under physiologic conditions, thus leading to “natively unfolded” proteins.

7.8.3.3 Software Package 3. DisEMBL

Linding et al. (2003a) developed the NN-based method DisEMBL. The authors carefully selected a number of protein sets—including a coil and a “hot loop”

7. Local Structure Prediction of Proteins

239

set—to train the neural nets using 5-fold cross validation, while the best parameter settings were selected based on ROC curves. The optimal network architecture was a window size of 19 residues and 30 hidden units. The coil and hot loop NN ensembles, the score distributions of positive and negative test examples were estimated using Gaussian kernel density estimation. Based on these distributions, a calibration curve for converting NN output scores to probabilities was constructed. To predict disorder for an unknown query sequence, the network output is smoothed and the resulting amino acid disorder probabilities are plotted.

7.8.3.4 Software Package 4. GLOBPLOT

The GLOBPLOT method (Linding et al., 2003b) is based on the hypothesis that the tendency for disorder can be expressed as P = RCSS where RC and SS arethe propensity for a given amino acid to be in “random coil” and regular “secondary structure,” respectively. The RC and SS propensity values were derived by the authors employing a data set using a single representative of each superfamily in the SCOP database (version 1.59). The two types of propensities were then combined in a single “Russel/Linding” amino acid propensity set, which is able to discriminate between disorder and globular packing.

7.8.3.5 Software Package 5. DISOPRED

The DISOPRED2 method (Ward et al., 2004) exploits an SVM classifier based on a linear kernel function and compares favorably to the above methods across the range of decision thresholds. Ward et al. (2004) also noted that using homologous sequences improves disorder prediction slightly as compared to single sequence prediction, but the beneficial effect is clearly lower than that for secondary structure prediction.

7.8.3.6 Software Package 6. PDISORDER

The PDISORDER method (Softberry, Inc.) exploits a combination of machine learning techniques comprising NNs, linear discriminant functions, and an acute smoothing procedure. At the recent CASP6 prediction assessment workshop, the method scored high in terms of the correlations it yields with crystallographic B-factors, which are included as evidence for disorder.

7.8.3.7 Software Package 7. DISpro

Cheng et al. (2005) reported a state-of-the-art disorder prediction accuracy of 92.8% with a false positive rate of 5% on large cross-validated tests. Their method DISpro uses evolutionary information in the form of profiles, predicted secondary structure and relative solvent accessibility, and ensembles of 1D-recursive NNs. The method shows an improved performance over previous methods, for example using the CASP5 data set (Cheng et al., 2005).

240

V.A. Simossis and J. Heringa

7.8.4Internal Repeats Recognition

7.8.4.1 Software Package 1. REPRO

Heringa and Argos (1993) adapted the basic Waterman and Eggert algorithm to repeat situations within a single protein by demanding, in addition to top-scoring alignments being nonintersecting, that locally aligned fragments do not overlap. They introduced a graph-based iterative clustering mechanism, which takes the thus produced list of top-scoring nonoverlapping local alignments for a single query sequence, declares the N-terminal matched amino acid pair in each top alignment as start sites of a repeats pair, and then attempts to delineate associated start-sites within the top alignments (i.e., find more repeats internal to the top alignment) that match the repeat type based on alignment consistency with already clustered members of the repeat type. If such new repeats are found, the cluster procedure is iterated. The cluster consistency criterion assesses the number of established repeats that align with a putative repeat, and selects it only if three or more of such top-scoring alignments can be found and if at least one of these associated alignments has already contributed one or more repeat members to the current repeat type and therefore can be trusted to be “in phase” with that repeat type. After the clustering phase, the repeats can be multiply aligned and turned into a profile, which can then be slid over the query sequence to verify the repeats already found and possibly detect new incarnations missed by the preceding algorithmic steps (Heringa and Argos, 1993): If new repeats are found, the profile can be updated and the procedure iterated. The REPRO algorithm is able to detect multiple repeat types independently, and is a sensitive but slow technique. A web server for the REPRO algorithm is available at http://ibivu.cs.vu.nl (George and Heringa, 2000).

7.8.4.2 Software Package 2. Pellegrini et al.

A quick algorithm for calculating the length and copy number of internal repeat sets has been devised by Pellegrini et al. (1999). The method uses the Waterman and Eggert algorithm and converts the scores of the selected top alignments to probabilities. An N × N path matrix, where N is the length of the protein sequence, is then filled with ones for matrix cells corresponding to local nonintersecting alignments that score above a preset threshold value for the probabilities, and zero values elsewhere. Two simple summing protocols are then applied to this matrix to obtain an approximate notion of the repeat length and copy number, albeit the repeat boundaries are not determined. Marcotte et al. (1999) used the algorithm to derive a general census of repeats in proteins using the SWISS-PROT protein sequence database.

7.8.4.3 Software Package 3. RADAR

The method RADAR (Heger and Holm, 2000) basically follows the algorithmic steps of the REPRO method (Heringa and Argos, 1993). It calculates nonintersecting

7. Local Structure Prediction of Proteins

241

Table 7.1 A list of all prediction methods independently assessed by the EVA server and their corresponding overall scores and test set sizes, until the end of 2004. Methods whose names are in boldface have been covered in Section 7.8

Method

Test set

Score

Server URL (assume “http://” at the start of each address)

 

 

 

 

APSSP2

393

75.1

www.imtech.res.in/raghava/apssp2/

Jpred

167

72.8

www.compbio.dundee.ac.uk/ www-jpred/submit.html

JUFO

133

68.9

www.jens-meiler.de/jufo.html

PHD

446

72.2

cubic.bioc.columbia.edu/predictprotein/

PHDpsi

440

73.3

cubic.bioc.columbia.edu/predictprotein/

PROF king

443

72.7

www.aber.ac.uk/ phiwww/prof/

PROFsec

443

75.3

cubic.bioc.columbia.edu/predictprotein/

Prospect

315

71.7

compbio.ornl.gov/cgi-bin/PROSPECT/

PSIPRED

443

76.2

bioinf.cs.ucl.ac.uk/psipred/psiform.html

SABLE

156

76.0

sable.cchmc.org/

SABLE2

99

76.9

sable.cchmc.org/

SAM-T99sec

396

76.0

www.cse.ucsc.edu/research/compbio/HMM-apps/T99-query.html

SCRATCH

217

75.7

www.igb.uci.edu/tools/scratch/

SSPRO2

257

74.3

www.igb.uci.edu/tools/scratch/

SSPRO4

68

78.7

www.igb.uci.edu/tools/scratch/

YASPIN

80

71.0

ibivu.cs.vu.nl/programs/yaspinwww/

 

 

 

 

local alignments, and then uses these in an iterative procedure to determine the shortest nonreducible repeat unit and determine the associated boundaries. A profile is constructed from a multiple alignment of a repeat set, and slid over the query sequence to capture more repeats. The whole procedure is then iterated in an attempt to find multiple repeat types. The RADAR step to find the shortest possible repeat unit, includes an iterative wraparound DP algorithm to detect the smallest repeat unit within a potentially reducible set of repeats. The RADAR method is sensitive and sufficiently fast for genomic application.

7.8.4.4 Software Package 4. REP

Andrade et al. (2000) produced a supervised repeats detection method REP, which searches the query sequence using a number of profiles, each profile containing

Table 7.2 A list of methods for predicting coiled-coil and WD-repeat protein regions from sequence

Method

Server URL (assume “http://” at the start of each address)

Reference

 

 

 

COILS2

iubio.bio.indiana.edu:7780/archive/00000527/

Lupas et al., 1991

WD-repeat

bmerc-www.bu.edu/psa/request.htm

The same URL

Prediction

 

 

 

 

 

242

 

V.A. Simossis and J. Heringa

Table 7.3 A list of methods for predicting disordered protein regions from sequence

 

 

 

 

 

Server URL (assume “http://”

 

 

Method

at the start of each address)

Methodology

Reference

 

 

 

 

FoldIndex

bioportal.weizmann.ac.il/

Charge-

Priluski et al., 2005

 

fldbin/findex

hydrophobicity

 

 

www.ics.uci.edu/ baldig/

patterns

 

DISpro

Neural net

Cheng et al., 2005

 

diso.html

 

 

DISEMBL

dis.embl.de/

Neural net

Linding et al., 2003a

GLOBPLOT

globplot.embl.de/

Amino acid

Linding et al., 2003b

 

 

propensities

 

DISOPRED2

bioinf.cs.ucl.ac.uk/disopred/

SVM

Ward et al., 2004

 

disopred.html

 

 

PONDR

www.pondr.com

Neural net

Obradovic et al., 2003

DRIPPRED

sbcweb.pdc.kth.se/cgi-bin/

Kohonen

http://www.forcasp.org/

 

maccallr/disorder/submit.pl

self-organizing

paper2127.html

 

 

maps

 

PDISORDER

www.softberry.com/berry.phtml?

Neural network

http://www.forcasp.org/

 

topic=pdisorder&group=

Linear

upload/2197.28.pdf

 

programs&subgroup=propt

discriminant

 

 

 

function

 

Acute smoothing procedure

the information of a multiple alignment of a known repeats family. The user can scan the query sequence for the following repeat types: Ankyrin, Armadillo, HAT, HEAT, HEAT AAA, HEAT ADB, HEAT IMB, Kelch, Leucin-e-Rich Repeats, PFTA, PFTB, RCC1, TPR, and WD40.

Table 7.4 List of methods for internal repeats recognition

 

Server URL (assume “http://”

 

 

Method

at the start of each address)

Methodology

Reference

 

 

 

 

Pellegrini

www.doe-mbi.ucla.edu/

Waterman and

Pellegrini et al., 1999

et al.

Services/Repeats/

Eggert

 

 

 

algorithm

 

RADAR

www.ebi.ac.uk/Radar/

Local alignment

Heger and Holm, 2000

REP

www.embl-heidelberg.de/ andrade/

Checking known

Andrade et al., 2000

 

papers/rep/search.html

repeat types

 

REPRO

ibivu.cs.vu.nl/programs/reprowww/

Local alignment,

George and Heringa,

 

 

graph clustering

2000

TRUST

ibivu.cs.vu.nl/programs/trustwww/

Transitivity

Szklarczyk and

 

 

 

Heringa, 2004

 

 

 

 

7. Local Structure Prediction of Proteins

243

7.8.4.5 Software Package 5. TRUST

Szklarczyk and Heringa (2004) developed a method TRUST for protein internal repeats detection based on transitivity of repeats. The authors reported an increased sensitivity and accuracy of the method. This is achieved by exploiting the concept of transitivity of alignments, which relies on mutual reinforcement (or attenuation) of repeat signals, and thus can be used as a noise filter. Starting from local suboptimal alignments, the application of transitivity allows (1) identification of distant repeat homologues for which no alignments were found; (2) gaining confidence about consistently well-aligned regions; and (3) reducing the contribution of nonhomologous repeats. The thus obtained increased consistency generally leads to a virtually noise-free profile representing a generalized repeat with high fidelity. The TRUST method also employs a rigid statistical test for self-sequence and profile-sequence alignments.

7.9 Resources

This section contains useful resources available at the time this chapter was written for online software applications and other useful material.

7.9.1Secondary Structure Prediction

7.9.2Supersecondary Structure Prediction

7.10 Summary

This chapter presents an overview of issues in predicting local structural features of proteins. The inherent hierarchical order of protein structure is discussed in a bottomup fashion, from secondary structure via supersecondary structure to prediction aspects of local three-dimensional structure, the latter including protein disordered region detection and internal repeats recognition. Some approaches to use these structural features in multiple sequence alignment are also discussed. State-of-the- art prediction methods are described and the addresses of their web interfaces, if available, are provided.

References

Albrecht, M., Tosatto, S.C., Lengauer, T., and Valle, G. 2003. Simple consensus procedures are effective and sufficient in secondary structure prediction. Protein Eng. 16:459–462.

Altschul, S.F., and Koonin, E.V. 1998. Iterated profile searches with PSI-BLAST—A tool for discovery in protein databases. Trends Biochem. Sci. 23:444–447.

244

V.A. Simossis and J. Heringa

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

An, J., Totrov, M., and Abagyan, R. 2005. Pocketome via comprehensive identification and classification of ligand binding envelopes. Mol. Cell. Proteomics 4:752–761.

An, Y., and Friesner, R.A. 2002. A novel fold recognition method using composite predicted secondary structures. Proteins 48:352–366.

Andrade, M.A., Ponting, C.P., Gibson, T.J., and Bork, P. 2000. Homology-based method for identification of protein repeats using statistical significance estimates. J. Mol. Biol. 298:521–537.

Argos, P. 1987. Analysis of sequence-similar pentapeptides in unrelated protein tertiary structures. Strategies for protein folding and a guide for site-directed mutagenesis. J. Mol. Biol. 197:331–348.

Bagos, P.G., Liakopoulos, T.D., and Hamodrakas, S.J. 2005. Evaluation of methods for predicting the topology of beta-barrel outer membrane proteins and a consensus prediction method. BMC Bioinformatics 6:7.

Bairoch, A., and Boeckmann, B. 1991. The SWISS-PROT protein sequence data bank. Nucleic Acids Res. 19 (Suppl.):2247–2249.

Baldi, P., Brunak, S., Frasconi, P., Soda, G., and Pollastri, G. 1999. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15:937–946.

Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235–242.

Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford, Clarendon Press.

Blanco, F.J., Rivas, G., and Serrano, L. 1994. A short linear peptide that folds into a native stable beta-hairpin in aqueous solution. Nat. Struct. Biol. 1:584– 590.

Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., Pilbout, S., and Schneider, M. 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31:365–370.

Bordner, A.J., and Abagyan, R. 2005. Statistical analysis and prediction of protein– protein interfaces. Proteins Struct. Funct. Bioinf. 60:353–366.

Boswell, D.R., and McLachlan, A.D. 1984. Sequence comparison by exponentiallydamped alignment. Nucleic Acids Res. 12:457–464.

Bracken, C. 2001. NMR spin relaxation methods for characterization of disorder and folding in proteins. J. Mol. Graph. Model 19:3–12.

Bystroff, C., Thorsson, V., and Baker, D. 2000. HMMSTR: A hidden Markov model for local sequence–structure correlations in proteins. J. Mol. Biol. 301:173– 190.

7. Local Structure Prediction of Proteins

245

Byvatov, E., and Schneider, G. 2003. Support vector machine applications in bioinformatics. Appl. Bioinf. 2:67–77.

Cai, Y. D., Feng, K.Y., Li, Y.X., and Chou, K.C. 2003. Support vector machine for predicting alpha-turn types. Peptides 24:629–630.

Capriotti, E., Fariselli, P., Rossi, I., and Casadio, R. 2004. A Shannon entropy-based filter detects high-quality profile–profile alignments in searches for remote homologues. Proteins 54:351–360.

Chandonia, J.M., and Karplus, M. 1999. New methods for accurate prediction of protein secondary structure. Proteins 35:293–306.

Cheng, J., Sweredoski, M.J., and Baldi, P. 2005. Accurate prediction of protein disordered regions by mining protein structure data. Data Mining Knowledge Discovery 11:213–222.

Chothia, C. 1984. Principles that determine the structure of proteins. Annu. Rev. Biochem. 53:537–572.

Chothia, C., and Lesk, A.M. 1986. The relation between the divergence of sequence and structure in proteins. EMBO J 5:823–826.

Chou, P.Y., and Fasman, G.D. 1974. Prediction of protein conformation. Biochemistry 13:222–245.

Chung, R., and Yona, G. 2004. Protein family comparison using statistical models and predicted structural information. BMC Bioinformatics 5:183.

Churchill, G.A. 1989. Stochastic models for heterogeneous DNA sequences. Bull. Math. Biol. 51:79–94.

Cozzetto, D., and Tramontano, A. 2005. Relationship between multiple sequence alignments and quality of protein comparative models. Proteins 58:151–157.

Cregut, D., Civera, C., Macias, M.J., Wallon, G., and Serrano, L. 1999. A tale of two secondary structure elements: When a beta-hairpin becomes an alpha-helix. J. Mol. Biol. 292:389–401.

Crippen, G.M. 1978. The tree structural organization of proteins. J. Mol. Biol. 126:315–332.

Cristianini, N., and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. New York, Cambridge University Press.

Cuff, J.A., and Barton, G.J. 1999. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34:508–519.

Cuff, J.A., and Barton, G.J. 2000. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40:502–511.

Cuff, J.A., Clamp, M.E., Siddiqui, A.S., Finlay, M., and Barton, G.J. 1998. JPred: A consensus secondary structure prediction server. Bioinformatics 14:892–893.

Dayhoff, M.O., Barker, W.C., and Hunt, L.T. 1983. Establishing homologies in protein sequences. Methods Enzymol. 91:524–545.

de la Cruz, X., Hutchinson, E.G., Shepherd, A., and Thornton, J.M. 2002. Toward predicting protein topology: An approach to identifying beta hairpins. Proc. Natl. Acad. Sci. USA. 99:11157–11162.

246

V.A. Simossis and J. Heringa

de la Cruz, X., and Thornton, J.M. 1999. Factors limiting the performance of prediction-based fold recognition methods. Protein Sci. 8:750–759.

Derreumaux, P. 2001. Evidence that the 127–164 region of prion proteins has two equi-energetic conformations with beta or alpha features. Biophys. J. 81:1657– 1665.

Dickerson, R.E., Timkovich, R., and Almassy, R.J. 1976. The cytochrome fold and the evolution of bacterial energy metabolism. J. Mol. Biol. 100:473–491.

Dunker, A.K., Brown, C.J., Lawson, J.D., Iakoucheva, L.M., and Obradovic, Z. 2002. Intrinsic disorder and protein function. Biochemistry 41:6573–6582.

Dunker, A.K., Lawson, J.D., Brown, C.J., Williams, R.M., Romero, P., Oh, J.S., Oldfield, C.J., Campen, A.M., Ratliff, C.M., Hipps, K.W., Ausio, J., Nissen, M.S., Reeves, R., Kang, C., Kissinger, C.R., Bailey, R.W., Griswold, M.D., Chiu, W., Garner, E.C., and Obradovic, Z. 2001. Intrinsically disordered protein.

J. Mol. Graph. Model 19:26–59.

Dunker, A.K., Obradovic, Z., Romero, P., Garner, E.C., and Brown, C.J. 2000. Intrinsic protein disorder in complete genomes. Genome Inform. Ser. Workshop Genome Inform. 11:161–171.

Durbin, R. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. New York,Cambridge University Press.

Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. 2000. Markov chains and hidden Markov models. In Biological Sequence Analysis: Probalistic Models of Proteins and Nucleic Acids. New York, Cambridge University Press, pp. 46–79.

Dutta, S., and Berman, H.M. 2005. Large macromolecular complexes in the Protein Data Bank: A status report. Structure 13:381–388.

Dyson, H.J., and Wright, P.E. 2002. Insights into the structure and dynamics of unfolded proteins from nuclear magnetic resonance. Adv. Protein Chem. 62:311– 340.

Eddy, S.R. 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6:361–365. Edgar, R.C., and Sjolander, K. 2004. COACH: Profile–profile alignment of protein

families using hidden Markov models. Bioinformatics 20:1309–1318. Forcellino, F., and Derreumaux, P. 2001. Computer simulations aimed at structure

prediction of supersecondary motifs in proteins. Proteins 45:159–166. Frenkel, D., and Smit, B. 2002. Monte Carlo simulations. In: Understanding Molec-

ular Simulation: From Algorithms to Applications (D. Frenkel, M. Klein, M. Parrinello, and B. Smit, Eds.). San Diego, Academic Press, pp. 23–58.

Friedberg, I., Kaplan, T., and Margalit, H. 2000. Evaluation of PSI-BLAST alignment accuracy in comparison to structural alignments. Protein Sci. 9:2278– 2284.

Frishman, D., and Argos, P. 1996. Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng. 9:133–142.

Frishman, D., and Argos, P. 1997. Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27:329–335.