Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

1Foundation of Mathematical Biology / The Elements of Statistical Learning

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

287.66 Кб

Скачать

☆

<<< < Предыдущая 1 2 3 4 56 / 66

Peptide Binding: Background

Milik M, Sauer D, Brunmark AP et al.,

Nature Biotechnology, 16:753-6, 1998.

Predict the amino acid sequences of peptides that bind to the particular MHC class I molecule, Kb.

The peptides of interest are 8-mers which may result from proteolysis of invading viral particles.

Some bind to class I MHC molecules.

These complexes are presented on the infected cell surface where recognized by cytotoxic T lymphocytes which destroy the infected cell.

Hence, MHC binding is an essential prerequisite for any peptide to induce an immune response

) the task of identifying peptides that bind to

MHC molecules is immunologically important.

Peptide Binding: Problem

Studies shown that binding peptides typically have

speciﬁc amino acids at speciﬁc anchor positions.

Rules for predicting binding based solely on anchor position preferences, motifs, are inadequate.

Binding is also known to be inﬂuenced by

(i)presence of secondary anchor positions, and

(ii)between-position amino acid interactions.

It is the search for this more complex structure that constitutes the problem of interest.

Complex structure ./ Artiﬁcial Neural Networks.

Position 1

Position 2

0.8		0.4 0.5
0.6		0.3
0.4	Non-Binders	0.2
0.4	Binders	0.2
0.2		0.1
0.0		0.0
	A C D E F G H I K L N P Q R S T V Y	A C D E F G H I K L M N P Q R S T V W Y

Position 3

Position 4

0.2 0.4 0.6

0.0 A C D E F G H I K L M N P Q R S T V W Y

0.1 0.2 0.3

0.0 A C D E F G H I K L M N P Q R S T V W Y

Position 5

Position 6

0.4 0.3 0.2 0.1 0.0 A C D E F G H I K L M N P Q R S T V W Y

0.10 0.20

0.0 A C D E F G H I K L M N P Q R S T V W Y

Position 7

Position 8

0.2 0.4

0.0 A D E F G H I K L M N P Q R S T V W Y

0.2 0.4 0.6

0.0 A C D E F G H I K L M N P Q R S T V W Y

Peptide Binding: Data Structure, Issues

Binary outcome: Binding (yes/no).

8 unordered categorical covariates:

the amino acids at the respective positions.

Highly polymorphic data: respectively

18, 20, 20, 20, 20, 20, 19, 20 distinct amino acids.

Key concerns: large number of corresponding indicator variables, between position interactions.

To avert related difﬁculties Milik et al., use select biophysical and biochemical properties of amino acids: adequacy? ) potential information loss.

This structure is representative of a vast class of problems: Genotype 7!Phenotype.

Peptide Binding: Regression Difﬁculties

Problems occur irrespective of outcome type.

Regression modelling of binding:

Default starting model includes each position. This entails estimating 149 coefﬁcients;

just assimilating the output will be difﬁcult.

This for a simple model in a small (8-mer) setting.

Adjacent and/or second nearest neighbor amino acids impact ability to bind to MHC:

this suggests including third-order interactions.

But, problems even for second-order interactions: SAS, S-Plus break – lack of dynamic memory. Not remedied by expansion or forward selection.

Full Tree // Training data

92/223

pos8:A,C,D,E,G,H,K,N,P,Q,R,S,T,V,W

pos8:F,I,L,M,Y

17/101

8/122

pos1:A,C,D,E,F,G,H,I,K,L,N,P,R,V

pos5:E,P,S,T,V

pos1:Q,S,T,Y

pos5:A,F,I,L,M,N,Y

17/41

3/10

1/112

0/60

pos5:A,C,D,G,I,L,N,P,Q,R,S,T,V

pos2:F,L,M

pos6:D,E,L,V

pos5:F,H,M,Y

pos2:A,D,H,T

pos6:A,G,H,I,N,P,Q,R,S,T,Y

4/27

1/14

2/5

0/5

1/5

0/107

pos6:D,E,H,L,M,P,Q,R,T,V

pos2:A,N,P

pos6:S,Y

pos2:G,S,T

0/22

1/5

0/9

Tree Deviance versus Tree Size // Test data

120

110

deviance	100
	90

size

Predictions: test data

37/87

pos8:A,C,D,E,G,H,K,N,P,Q,R,S,T,V,W

pos8:F,I,L,M,Y

0	1
7/37	7/50

pos1:A,C,D,E,F,G,H,I,K,L,N,P,R,V pos5:E,P,S,T,V pos1:Q,S,T,Y pos5:A,F,I,L,M,N,Y

0	0	0	1

1/23	6/14	0/1	2/44

Peptide Binding: Tree Attributes

Salient feature of trees re unordered categorical covariates (amino acids) is ﬂexible (exhaustive) and automated handling of groups of levels: avoid computing/examining individual coefﬁcients; covariate integrity preserved.

Interactions are readily accommodated.

Easy interpretation/prediction via tree schematic.

Oft-cited deﬁciency of tree methods is piecewise constant response surfaces provide poor/inefﬁcient approximations to smooth response surfaces: motivated MARS (HTF, Secn 9.4) modiﬁcations.

Here such concerns are moot. Notion of a smooth response surface requires ordered covariates – otherwise nothing to be smooth with respect to.

<<< < Предыдущая 1 2 3 4 56 / 66

Соседние файлы в папке 1Foundation of Mathematical Biology

#
15.08.2013248.78 Кб46Foundation of Mathematical Biology Statistics Lecture 3-4.pdf
#
15.08.20132.11 Mб45Foundation of Mathematical Biology.pdf
#
15.08.2013287.66 Кб48The Elements of Statistical Learning.pdf