Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

1Foundation of Mathematical Biology / The Elements of Statistical Learning

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

287.66 Кб

Скачать

☆

<<< < Предыдущая 12 / 62 3 4 5 6 > Следующая >>>

Prostate Cancer Example

Subjects: 97 potential radical prostatectomy pts

Outcome: log prostate speciﬁc antigen ( lpsa)

Covariates: log cancer volume (lcavol), log prostate weight (lweight), age, log amount of benign hyperplasia (lbph), seminal vesicule invasion (svi), Gleason score (gleason), log capsular penetration (lcp), percent Gleason scores 4 or 5 (pgg45).

Term	Value StdError tvalue Pr(>\|t\|)
Intercept	0.6694	1.2964	0.5164	0.6069
lcavol	0.5870	0.0879	6.6768	0.0000
lweight	0.4545	0.1700	2.6731	0.0090
age -0.0196		0.0112	-1.7576	0.0823
lbph	0.1071	0.0584	1.8316	0.0704
svi	0.7662	0.2443	3.1360	0.0023
lcp -0.1055		0.0910	-1.1589	0.2496
gleason	0.0451	0.1575	0.2866	0.7751
pgg45	0.0045	0.0044	1.0236	0.3089

Prostate Cancer: Correlation Matrix

	lcv	lwt	age	lbh	svi	lcp	gle	pgg lpsa
lcavol	1.00	0.194	0.2	0.027	0.54	0.675	0.432	0.43	0.7
lweight	0.19	1.000	0.3	0.435	0.11	0.100	-0.001 0.05		0.4
age	0.22	0.308	1.0	0.350	0.12	0.128	0.269	0.28	0.2
lbph	0.03	0.435	0.4	1.000	-0.09 -0.007		0.078	0.08	0.2
svi	0.54	0.109	0.1	-0.086	1.00	0.673	0.320	0.46	0.6
lcp	0.68	0.100	0.1	-0.007	0.67	1.000	0.515	0.63	0.5
gleason	0.43	-0.001	0.3	0.078	0.32	0.515	1.000	0.75	0.4
pgg45	0.43	0.051	0.3	0.078	0.46	0.632	0.752	1.00	0.4
lpsa	0.73	0.354	0.2	0.180	0.57	0.549	0.369	0.42	1.0

Prostate Cancer: Forward Stepwise Selection

	lcavol lweight age lbph svi lcp gleason pgg45
1	T	F	F	F	F	F	F	F
2	T	T	F	F	F	F	F	F
3	T	T	F	F	T	F	F	F
4	T	T	F	T	T	F	F	F
5	T	T	T	T	T	F	F	F
6	T	T	T	T	T	F	F	T
7	T	T	T	T	T	T	F	T
8	T	T	T	T	T	T	T	T

Residual sum of squares:
1	2	3	4	5	6	7	8

58.952.9 47.7 46.4 45.5 44.8 44.2 44.1

F-statistics for inclusion:

111.2 10.5 10.0 2.5 1.9 1.3 1.3 0.1

Prostate Cancer: Forward Stepwise Selection

	•
	55
residual sum of squares	•
residual sum of squares	50
		•
		•
	45		•
	45		•
			•	•
	2	4	6	8
			size

Prostate Cancer: Backward Stepwise Selection

	•
	120
squares	100
residual sum of	80
	60	•
		•
		•
			•	•	•	•
				•			•
							•
	0	2		4		6
				size

Coefﬁcient Shrinkage (Secn 3.4)

Selection procedures interpretable. But – due to in / out nature – variable ) high prediction error. Shrinkage continuous: reduces prediction error.

Ridge Regression: shrinks coefs by penalizing size:

β		arg min		N		p	βjxi j)	p	βj )
β	ridge =	arg min		∑(yi β0 ∑			βjxi j)	+ λ ∑	βj )
ˆ	ridge =						2		2
		β	(	i=1		j=1		j=1
				i=1		j=1		j=1
Center X : xi j			xi j x¯j		ˆ	= y¯) X		N p.
Center X : xi j			xi j x¯j		(β0	= y¯) X		N p.

Minimize	RSS(β; λ) = (y Xβ)T (y Xβ)+ λβT β
Solution	ˆ ridge = (	T	1 T
Solution	β	X X + λI)	X y

Now nonsingular even if XT X not full rank. Interpretation via SVD: pp 60 - 63.

Choice of λ?? Microarray applications??

Coefﬁcient Shrinkage ctd (Secn 3.4)

The Lasso: like ridge but with L1 penalty:

β		arg min		N	p	βjxi j)	p
β	lasso =	arg min		∑(yi β0 ∑		βjxi j)	+ λ ∑ jβjj)
ˆ	lasso =					2
		β	(	i=1	j=1		j=1
				i=1	j=1		j=1

The L1 penalty makes the solution nonlinear in y

) quadratic programming algorithm.

Why use? – small λ will cause some coefs to be exactly zero ) synthesizes selection and shrinkage: interpretation and prediction error beneﬁts.

Choice of λ?? Microarray applications??

Model Assessment and Selection

Generalization performance of a model pertains to its predictive ability on independent test data.

Crucial for model choice and quality evaluation.

These represent distinct goals:

Model Selection: estimate the performance of a series of competing models in order to choose the best.

Model Assessment: having chosen a best model, estimate its prediction error on new data.

Numerous criteria, strategies.

Bias, Variance, Complexity Secn 7.2

Outcome Y (assume continuous); input vector X ; prediction model fˆ(X ).

L(Y; fˆ(X )): loss function for measuring errors between Y and fˆ(X ). Common choices are:

(Y fˆ(X ))2 squared error

L(Y; fˆ(X ) = <8

:jY fˆ(X )j absolute error

Test or generalization error: expected prediction error over independent test sample

Err = E[L(Y; fˆ(X )] where X ;Y drawn randomly from their joint distribution.

Training error: average loss over training sample:

	=	1	N	L(y	; fˆ(x ))
err			∑
		N
				i	i

			i=1

Bias, Variance, Complexity ctd

Typically, training error < test error because same data is being used for ﬁtting and error assessment. Fitting methods usually adapt to training data so err overly optimistic estimate of Err.

Part of discrepancy due to where evaluation points occur. To assess optimism use in-sample error:

1 N

Errin = N ∑ EY new[L(Yinew; fˆ(xi)]

i=1

Interest is in test or in-sample error of fˆ

) Optimal model minimizes these.

Assume Y = f (X ) + ε; E(ε) = 0; Var(ε) = σ2ε.

<<< < Предыдущая 12 / 62 3 4 5 6 > Следующая >>>

Соседние файлы в папке 1Foundation of Mathematical Biology

#
15.08.2013248.78 Кб46Foundation of Mathematical Biology Statistics Lecture 3-4.pdf
#
15.08.20132.11 Mб45Foundation of Mathematical Biology.pdf
#
15.08.2013287.66 Кб48The Elements of Statistical Learning.pdf