Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

1Foundation of Mathematical Biology / The Elements of Statistical Learning

.pdf
Скачиваний:
48
Добавлен:
15.08.2013
Размер:
287.66 Кб
Скачать

Peptide Binding: Background

Milik M, Sauer D, Brunmark AP et al.,

Nature Biotechnology, 16:753-6, 1998.

Predict the amino acid sequences of peptides that bind to the particular MHC class I molecule, Kb.

The peptides of interest are 8-mers which may result from proteolysis of invading viral particles.

Some bind to class I MHC molecules.

These complexes are presented on the infected cell surface where recognized by cytotoxic T lymphocytes which destroy the infected cell.

Hence, MHC binding is an essential prerequisite for any peptide to induce an immune response

) the task of identifying peptides that bind to

MHC molecules is immunologically important.

Peptide Binding: Problem

Studies shown that binding peptides typically have

specific amino acids at specific anchor positions.

Rules for predicting binding based solely on anchor position preferences, motifs, are inadequate.

Binding is also known to be influenced by

(i)presence of secondary anchor positions, and

(ii)between-position amino acid interactions.

It is the search for this more complex structure that constitutes the problem of interest.

Complex structure ./ Artificial Neural Networks.

Position 1

Position 2

0.8

 

0.4 0.5

0.6

 

0.3

0.4

Non-Binders

0.2

Binders

0.2

 

0.1

0.0

 

0.0

 

A C D E F G H I K L N P Q R S T V Y

A C D E F G H I K L M N P Q R S T V W Y

Position 3

Position 4

0.2 0.4 0.6

0.0 A C D E F G H I K L M N P Q R S T V W Y

0.1 0.2 0.3

0.0 A C D E F G H I K L M N P Q R S T V W Y

Position 5

Position 6

0.4 0.3 0.2 0.1 0.0 A C D E F G H I K L M N P Q R S T V W Y

0.10 0.20

0.0 A C D E F G H I K L M N P Q R S T V W Y

Position 7

Position 8

0.2 0.4

0.0 A D E F G H I K L M N P Q R S T V W Y

0.2 0.4 0.6

0.0 A C D E F G H I K L M N P Q R S T V W Y

Peptide Binding: Data Structure, Issues

Binary outcome: Binding (yes/no).

8 unordered categorical covariates:

the amino acids at the respective positions.

Highly polymorphic data: respectively

18, 20, 20, 20, 20, 20, 19, 20 distinct amino acids.

Key concerns: large number of corresponding indicator variables, between position interactions.

To avert related difficulties Milik et al., use select biophysical and biochemical properties of amino acids: adequacy? ) potential information loss.

This structure is representative of a vast class of problems: Genotype 7!Phenotype.

Peptide Binding: Regression Difficulties

Problems occur irrespective of outcome type.

Regression modelling of binding:

Default starting model includes each position. This entails estimating 149 coefficients;

just assimilating the output will be difficult.

This for a simple model in a small (8-mer) setting.

Adjacent and/or second nearest neighbor amino acids impact ability to bind to MHC:

this suggests including third-order interactions.

But, problems even for second-order interactions: SAS, S-Plus break – lack of dynamic memory. Not remedied by expansion or forward selection.

Full Tree // Training data

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

92/223

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos8:A,C,D,E,G,H,K,N,P,Q,R,S,T,V,W

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos8:F,I,L,M,Y

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

17/101

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

8/122

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos1:A,C,D,E,F,G,H,I,K,L,N,P,R,V

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos5:E,P,S,T,V

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos1:Q,S,T,Y

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos5:A,F,I,L,M,N,Y

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

17/41

 

 

 

 

 

 

 

 

 

 

 

 

3/10

 

 

1/112

 

 

 

 

 

0/60

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos5:A,C,D,G,I,L,N,P,Q,R,S,T,V

 

 

 

 

 

 

 

 

pos2:F,L,M

 

 

 

 

 

 

pos6:D,E,L,V

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos5:F,H,M,Y

 

 

 

 

 

 

 

 

pos2:A,D,H,T

 

 

 

 

 

pos6:A,G,H,I,N,P,Q,R,S,T,Y

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

1

 

 

 

0

 

 

 

1

 

 

 

1

 

 

 

 

 

4/27

 

 

 

 

 

 

 

 

 

1/14

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2/5

 

0/5

 

1/5

 

0/107

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos6:D,E,H,L,M,P,Q,R,T,V

 

 

 

 

 

pos2:A,N,P

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

pos6:S,Y

 

 

 

 

 

 

 

 

pos2:G,S,T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

1

 

 

 

1

 

 

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0/22

1/5

 

1/5

0/9

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Tree Deviance versus Tree Size // Test data

120

110

deviance

100

 

90

80

2

4

6

8

size

Predictions: test data

1

37/87

pos8:A,C,D,E,G,H,K,N,P,Q,R,S,T,V,W

pos8:F,I,L,M,Y

0

1

7/37

7/50

pos1:A,C,D,E,F,G,H,I,K,L,N,P,R,V pos5:E,P,S,T,V pos1:Q,S,T,Y pos5:A,F,I,L,M,N,Y

0

 

0

 

0

 

1

 

 

 

 

 

 

 

1/23

6/14

0/1

2/44

Peptide Binding: Tree Attributes

Salient feature of trees re unordered categorical covariates (amino acids) is flexible (exhaustive) and automated handling of groups of levels: avoid computing/examining individual coefficients; covariate integrity preserved.

Interactions are readily accommodated.

Easy interpretation/prediction via tree schematic.

Oft-cited deficiency of tree methods is piecewise constant response surfaces provide poor/inefficient approximations to smooth response surfaces: motivated MARS (HTF, Secn 9.4) modifications.

Here such concerns are moot. Notion of a smooth response surface requires ordered covariates – otherwise nothing to be smooth with respect to.