Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

1Foundation of Mathematical Biology / Foundation of Mathematical Biology

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

2.11 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 78 / 108 9 10 > Следующая >>>

UCSF What happened when we applied the t test naively?

We compute 6817 t-statistics (one for each gene)

What is the critical value?

♦P = 0.05

♦N = 27

♦M = 11

♦Degrees of freedom = 27+11-2 = 36

♦Critical value (two-tailed test): 2.03

Of the 6817 genes, 1636 are “significant”

Less than 40% of these are significant on the test set!

What happened?

We made 6817 independent tests of a statistic at a significance level of 0.05

We should expect about 341 genes to show up even if we have no real effect, assuming that our statistical assumptions are OK

How can we use permutation to do a better job?

UCSF		Permutation analysis in array data:
UCSF		Conservative approach is to take the max statistic

We are defining our new statistic to be one computed over the vector of all genes coupled to the class information

We define our statistic to be the maximum of a particular statistic, computed for each gene

We will use two statistics

♦Kendall’s Tau, measuring the rank correlation of gene expression levels against the AML/ALL classes represented as 0 and 1

♦The t statistic, functionally implemented on paired data of gene expression levels and classes represented as 0 and 1

♦For each case, we define our new statistic as the max(over all genes)

UCSF								Permutation analysis in array data:
				Conservative approach is to take the max statistic
	Sample					Genes 1…9					Class
1		0.99	0.98	0.98	0.97	0.97	0.95	0.95	0.95	0.96	1

2		1.15	1.11	1.07	1.04	1.01	0.99	0.98	0.96	0.96	1

3		1.11	1.14	1.22	1.3	1.37	1.39	1.39	1.39	1.37	1

4		1	1.01	1.01	0.99	0.96	0.93	0.91	0.89	0.88	1

5		1.04	1.01	0.97	0.94	0.93	0.92	0.9	0.9	0.91	1

6		1.17	1.25	1.32	1.38	1.43	1.46	1.5	1.53	1.55	0

7		1.12	1.16	1.2	1.26	1.34	1.42	1.49	1.54	1.53	0

8		0.96	0.97	0.97	0.97	0.96	0.96	0.97	0.98	0.98	0

9		1.03	1.04	1.05	1.06	1.07	1.09	1.1	1.12	1.17	0

10		1.16	1.19	1.21	1.23	1.25	1.25	1.26	1.27	1.28	0

		0.16	0.24	0.18	0.27	0.27	0.27	0.38	0.38	0.42

Statistic for each gene

Maximum magnitude statistic

UCSF							Permutation 1: Bogus correlation
	Sample					Genes 1…9					Class
1		0.99	0.98	0.98	0.97	0.97	0.95	0.95	0.95	0.96	1
2		1.15	1.11	1.07	1.04	1.01	0.99	0.98	0.96	0.96	1
3		1.11	1.14	1.22	1.3	1.37	1.39	1.39	1.39	1.37	1
4		1	1.01	1.01	0.99	0.96	0.93	0.91	0.89	0.88	1
5		1.04	1.01	0.97	0.94	0.93	0.92	0.9	0.9	0.91	1
6		1.17	1.25	1.32	1.38	1.43	1.46	1.5	1.53	1.55	0
7		1.12	1.16	1.2	1.26	1.34	1.42	1.49	1.54	1.53	0
8		0.96	0.97	0.97	0.97	0.96	0.96	0.97	0.98	0.98	0
9		1.03	1.04	1.05	1.06	1.07	1.09	1.1	1.12	1.17	0
10		1.16	1.19	1.21	1.23	1.25	1.25	1.26	1.27	1.28	0
		0.15	0.09	0.09	0.04	0.02	0.02	0.02	0.07	0.04

Statistic for each gene

Maximum magnitude statistic

UCSF

Repeated permutation yields a cumulative distribution

Unadjusted critical value

♦τ = 0.17

♦Yields 1751 genes as “significant”

♦Less than half confirmed on the test set

Adjusted critical value

♦τ = 0.354

♦51 genes significant

♦90% of these are confirmed on the test set

Permutation Based Estimation of Significance

	1
	0.9
	0.8
Proportion	0.7
	0.6
	0.5
Cumulative	0.4
	0.3
	0.2
	0.2
	0.1
	0
	0.24	0.26	0.28	0.3	0.32	0.34	0.36	0.38	0.4
					Max(τ)

From the cumulative distribution, we observe that τ = 0.354 corresponds to p = 0.05.

UCSF		We get similar results using the T test

Unadjusted critical value

♦t = 2.03

♦Yields 1636 genes as “significant”

♦Less than half confirmed on the test set

Adjusted critical value

♦t = 5.16

♦40 genes significant

♦80% of these are confirmed on the test set

Is it safe to conclude anything about more than just the gene with the max statistic?

♦Yes.

♦If we were to generate the null distribution of the mth best gene, the 95th percentile would be lower than our initial critical value.

Is this estimate better than Bonferonni?

♦It can be.

♦If there are strong cross-correlations in the data, this procedure is not penalized by the redundancy.

♦The Bonferonni correction makes the implicit assumption that all variables are independent.

UCSF

CGH Analysis: Visualization and Correlation with Outcome

Data (J. Gray, K. Chin)		Is there a statistically significant correlation
♦ 60 CGH profiles		between CGH profile similarity and outcome
•	1225 “observables”	(e.g. survival)?
•	52 tumor profiles
•	8 normal profiles	Are there relationships among the measured
♦ Patient information		Are there relationships among the measured
♦ Patient information		variables?
•	Age of onset	variables?
•	Age of onset
•	Overall survival
•	Disease free survival

•	Alive or dead								Tumor and Normal CGH Profiles
•	Alive or dead
♦ Tumor status			0.4
			0.4
•	Size/Stage	number)	0.2
			0.2
•	Estrogen receptor
•	Estrogen receptor	copy	0
•	Progesterone	copy
		Log(Relative

	receptor		-0.2
•	p53
			-0.4
			1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19 20 2122 X
										Genomic Position

UCSF		We can visualize complex profile data
UCSF		using 3D virtual worlds

u r

v i v

a l

Alive

))

((

Dead

										n
									io
									t
								a
							c
						o
					l
				e
			m
		o
	n
e
G

UCSF		By sliding the opaque XZ plane,
UCSF		we can select peaks above background

Normals shown in white at survival = -1 month

One remaining background peak from normals

UCSF		One particular locus sticks out

CHR 9

♦The center of this valley is on chromosome 9

♦The normal profiles show a slight depression there as well

♦Is this locus significant?

<<< < Предыдущая 1 2 3 4 5 6 78 / 108 9 10 > Следующая >>>

Соседние файлы в папке 1Foundation of Mathematical Biology

#
15.08.2013248.78 Кб46Foundation of Mathematical Biology Statistics Lecture 3-4.pdf
#
15.08.20132.11 Mб45Foundation of Mathematical Biology.pdf
#
15.08.2013287.66 Кб48The Elements of Statistical Learning.pdf