Добавил:

Sekretar kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Ростовский Государственный Медицинский Университет

Предмет:

Медицина общая

Файл:

Ординатура / Офтальмология / Английские материалы / Medical Statistics from Scratch_Bowers_2008.pdf

Скачиваний:

Добавлен:

28.03.2026

Размер:

4.18 Mб

Скачать

☆

<<< < Предыдущая 16 17 18 19 20 21 22 23 24 25 26 2728 / 4328 29 30 31 32 33 34 35 36 37 38 39 40 > Следующая >>>

THE POWER OF A TEST

151

The power of a test

We can now come back to the three questions above. To answer the ﬁrst question – the power of a test is deﬁned to be (1 − β); it is a measure of its capacity to reject the null hypothesis when it is false. In other words, to detect an effect if one is present. In practice, β is typically set at 0.2 or 0.1. This provides power values of 0.80 (or 80 per cent), and 0.90 (or 90 per cent) respectively. So if there is an effect, then the probability of the test detecting it is 0.80 or 0.90.

The power of a test is a measure of its capacity to reject the null hypothesis when it is false. In other words, its capacity to detect an effect if one is present.

Although you would like to minimise both α and β, unfortunately they are, for a given sample size, linked. You can’t make β smaller without making α larger, and vice versa. Thus when you decide a value for α, you are also inevitably ﬁxing the value of β. To answer the second question

– the only way to reduce both simultaneously (and increase the power of a test) is to increase the sample size.

To answer the third question, is there a more powerful test? Brieﬂy, parametric tests are more powerful than non-parametric tests (see p. 127 on the meaning of these terms). For example, a Mann-Whitney test has 95 percent of the power of the two-sample t test.7 The Wilcoxon matched-pairs test similarly has 95 per cent of the power of the matched-pairs t test. As for the chi-squared test, there is usually no obvious alternative when used for categorical data, so comparisons of power are less relevant, but it is known to be a powerful test. Generally you should of course use the most powerful test that the type of data, and its distributional shape, will allow.

An example from practice

The following is an extract from the RCT of epidural analgesic in the prevention of stump and phantom pain after amputation, referred to in Table 5.3. The authors of the study outline their thinking on power thus:

The natural history of phantom pain after amputation shows rates of about 70%, and in most patients the pain is not severe. Since epidural treatment is an invasive procedure, we decided that a clinically relevant treatment should reduce the incidence of phantom pain to less than 30% at week 1 and then at 3, 6, and 12 months after amputation. Before the start of the study, we estimated that a sample size of 27 patients per group would be required to detect a between-group difference of 40% in the rate of phantom pain (type I error rate 0.05; type II error rate 0.2; power = 0.8).

7In view of the restrictions associated with the two-sample t test, the Mann-Whitney test seems an excellent alternative!

152 CH 12 TESTING HYPOTHESES ABOUT THE DIFFERENCE BETWEEN TWO POPULATION PARAMETERS

Exercise12.5 a) Explain, with the help of a few clinical examples, why you would normally want to minimise α, when testing a hypothesis. (b) α is conventionally set to 0.05, or 0.01. Why, if you want to minimise it, don’t you set it at 0.001 or 0.000001, or even 0?

Maximising power – calculating sample size

Generally, the bigger the sample, the more powerful the test.8 The minimum size of a sample for a given power is determined both by the chosen level of alpha, as well as the power required. The sample size calculation can be summarised thus:

Decide on the minimum size of the effect that would be clinically useful (or otherwise of interest).

Decide the signiﬁcance level α, usually 0.05.

Decide the power required, usually 80 per cent.

Do the sample size calculation, using some appropriate software, or the rule of thumb described below.

Minitab has an easy to use sample size calculator for the most commonly used tests. Machin, et al. (1987) is a comprehensive collection of sample size calculations for a large number of different test situations.

Rules of thumb9

Comparing the means of two independent populations (metric data)

The required sample size n is given by the following expression:

n = 2 × s.d.2 × k E 2

Where s.d. is the population standard deviation (assumed equal in both populations). This can be estimated using the sample standard deviations, if they are available from a pilot study, say. Otherwise the s.d. will have to be guessed using whatever information is available. E is the minimum change in the mean that would be clinically useful or otherwise interesting. k is a magic number which depends on the power and signiﬁcance levels required, and is obtained from Table 12.3.

8These sample size calculations also apply if you are calculating conﬁdence intervals. Samples that are too small produce wide conﬁdence intervals, sometimes too wide to enable a real effect to be identiﬁed.

9I am indebted to Andy Vail for this material.

		RULES OF THUMB			153
Table 12.3 Table of magic numbers for sample size calculations

			Power, (1 − β)
		70 %	80 %	90 %	95 %

Signiﬁcance level, α	0.05	6.2	7.8	10.5	13.0
	0.01	9.6	11.7	14.9	17.8

For example, suppose you propose to use a case-control study to examine the efﬁcacy of a program of regular exercise, as an alternative to your current drug of choice, in treating moderately hypertensive patients. The minimal difference in mean systolic blood pressures between the cases (given the exercise program), and the controls (given the existing drug), that you think clinically worthwhile is 10 mmHg. You will have to make an intelligent guess as to the standard deviation of systolic blood pressure (assumed the same in both groups – see above). Information on this, and many other measures, is likely to be available from reference sources, from the research literature, from colleagues, etc. Let’s assume systolic blood pressure s.d. = 12 mmHg. If power required is 80 per cent, with a signiﬁcance level of 0.05, then from Table 12.3, k = 7.8, and the sample size required per group is:

n = 2 × 122 × 7.8 = 22.5 102

So you will need at least 23 subjects in each of the two groups (always round up to next highest integer) to detect a difference between the means of 10 mmHg. Note that these sample sizes will also be large enough for two matched populations since these require smaller sample sizes for the same power.

Comparing the proportions in two independent populations (binary data)

The required sample size, n, is given by:

n = [Pa × (1 − Pa )] + [Pb × (1 − Pb )] × k (Pa − Pb )2

Where Pa is the proportion with treatment a, Pb is proportion with treatment b, so (Pa − Pb ) is the effect size; and k is the magic number from Table 12.3.

For example, suppose the percentage of elderly patients in a large district hospital with pressure sores is currently around 40 per cent, or 0.40. You want to test a new pressure-sore- reducing mattress, and you would like the percentage with pressure sores to decrease to at least 20 per cent, or 0.20. So Pa = 0.40, and (1 − Pa ) = 0.60; Pb = 0.20, and (1 − Pb ) = 0.80; therefore (Pa − Pb ) = (0.40 − 0.20) = 0.20. If power required is 80 per cent and signiﬁcance

154 CH 12 TESTING HYPOTHESES ABOUT THE DIFFERENCE BETWEEN TWO POPULATION PARAMETERS

level α = 0.05, then required sample size per group is:

n = (0.40 × 0.60) + (0.20 × 0.80) × 7.8 = 78.0 0.202

Thus you would need at least 78 subjects in each group, which would also be big enough for matched proportions.

Exercise 12.6 In the above examples for: (a) hypertension and (b) the pressure sore example; what sample sizes would be required if power and signiﬁcance levels were respectively:

(i) 90 per cent and 0.05; (ii) 90 per cent and 0.01; (iii) 80 per cent and 0.01?

Exercise 12.7 Suppose you are proposing to use a randomised controlled trial to study the effectiveness of St John’s Wort, as an alternative to an existing drug for the treatment of mild to moderate depression. The percentage of patients reporting an improvement in mood three months after existing drug treatment is 70 per cent. You would be satisﬁed if the percentage reporting mood improvement after three months of St John’s Wort was 80 per cent. How big a sample would you require to detect this improvement if you wanted your test to have, (a) 80 per cent power and an α of 0.05; (b) 90 per cent power and an α of 0.01?

<<< < Предыдущая 16 17 18 19 20 21 22 23 24 25 26 2728 / 4328 29 30 31 32 33 34 35 36 37 38 39 40 > Следующая >>>

Соседние файлы в папке Английские материалы

#
28.03.20268.13 Mб0Mastering Corneal Collagen Cross-linking Techniques (C3-R CCL CxL)_Garg, Kanellopoulos, O'Brart, Lovisolo, Pinelli_2008.pdf
#
28.03.202613.68 Mб0Mastering theTechniques of Lens Based Refractive Surgery (Phakic IOLs)_Garg, Alio, Dementiev_2005.pdf
#
28.03.202651.38 Mб0Mechanisms of the Glaucomas_Shields, Tombran-Tink, Barnstable_2008.pdf
#
28.03.20261.67 Mб0Medical Contact Lens Practice_Millis_2005.pdf
#
28.03.20268.96 Mб0Medical Retina_Bandello, Querques_2012.pdf
#
28.03.20264.18 Mб0Medical Statistics from Scratch_Bowers_2008.pdf
#
28.03.20264.93 Mб0Medical Treatment of Glaucoma_Weinreb, Liebmann_2010.pdf
#
28.03.20266.28 Mб0Minimally Invasive Ophthalmic Surgery_Fine, Mojon_2010.djvu
#
28.03.202613.07 Mб0Minimally Invasive Ophthalmic Surgery_Fine, Mojon_2010.pdf
#
28.03.202610.66 Mб0Minimally Invasive Techniques of Oculofacial Rejuvenation_Bosniak, Cantisano-Zilkha_2005.pdf
#
28.03.202617.96 Mб0Minimizing Incisions and Maximizing Outcomes in Cataract Surgery_Alio, Fine_2010.pdf