- •Contents
- •Preface to the 2nd Edition
- •Preface to the 1st Edition
- •Introduction
- •Learning Objectives
- •Variables and Data
- •The good, the Bad, and the Ugly – Types of Variable
- •Categorical Variables
- •Metric Variables
- •How can I Tell what Type of Variable I am Dealing with?
- •2 Describing Data with Tables
- •Learning Objectives
- •What is Descriptive Statistics?
- •The Frequency Table
- •3 Describing Data with Charts
- •Learning Objectives
- •Picture it!
- •Charting Nominal and Ordinal Data
- •Charting Discrete Metric Data
- •Charting Continuous Metric Data
- •Charting Cumulative Data
- •4 Describing Data from its Shape
- •Learning Objectives
- •The Shape of Things to Come
- •5 Describing Data with Numeric Summary Values
- •Learning Objectives
- •Numbers R us
- •Summary Measures of Location
- •Summary Measures of Spread
- •Standard Deviation and the Normal Distribution
- •Learning Objectives
- •Hey ho! Hey ho! It’s Off to Work we Go
- •Collecting the Data – Types of Sample
- •Types of Study
- •Confounding
- •Matching
- •Comparing Cohort and Case-Control Designs
- •Getting Stuck in – Experimental Studies
- •7 From Samples to Populations – Making Inferences
- •Learning Objectives
- •Statistical Inference
- •8 Probability, Risk and Odds
- •Learning Objectives
- •Calculating Probability
- •Probability and the Normal Distribution
- •Risk
- •Odds
- •Why you can’t Calculate Risk in a Case-Control Study
- •The Link between Probability and Odds
- •The Risk Ratio
- •The Odds Ratio
- •Number Needed to Treat (NNT)
- •Learning Objectives
- •Estimating a Confidence Interval for the Median of a Single Population
- •10 Estimating the Difference between Two Population Parameters
- •Learning Objectives
- •What’s the Difference?
- •Estimating the Difference between the Means of Two Independent Populations – Using a Method Based on the Two-Sample t Test
- •Estimating the Difference between Two Matched Population Means – Using a Method Based on the Matched-Pairs t Test
- •Estimating the Difference between Two Independent Population Proportions
- •Estimating the Difference between Two Independent Population Medians – The Mann–Whitney Rank-Sums Method
- •Estimating the Difference between Two Matched Population Medians – Wilcoxon Signed-Ranks Method
- •11 Estimating the Ratio of Two Population Parameters
- •Learning Objectives
- •12 Testing Hypotheses about the Difference between Two Population Parameters
- •Learning Objectives
- •The Research Question and the Hypothesis Test
- •A Brief Summary of a Few of the Commonest Tests
- •Some Examples of Hypothesis Tests from Practice
- •Confidence Intervals Versus Hypothesis Testing
- •Nobody’s Perfect – Types of Error
- •The Power of a Test
- •Maximising Power – Calculating Sample Size
- •Rules of Thumb
- •13 Testing Hypotheses About the Ratio of Two Population Parameters
- •Learning Objectives
- •Testing the Risk Ratio
- •Testing the Odds Ratio
- •Learning Objectives
- •15 Measuring the Association between Two Variables
- •Learning Objectives
- •Association
- •The Correlation Coefficient
- •16 Measuring Agreement
- •Learning Objectives
- •To Agree or not Agree: That is the Question
- •Cohen’s Kappa
- •Measuring Agreement with Ordinal Data – Weighted Kappa
- •Measuring the Agreement between Two Metric Continuous Variables
- •17 Straight Line Models: Linear Regression
- •Learning Objectives
- •Health Warning!
- •Relationship and Association
- •The Linear Regression Model
- •Model Building and Variable Selection
- •18 Curvy Models: Logistic Regression
- •Learning Objectives
- •A Second Health Warning!
- •Binary Dependent Variables
- •The Logistic Regression Model
- •19 Measuring Survival
- •Learning Objectives
- •Introduction
- •Calculating Survival Probabilities and the Proportion Surviving: the Kaplan-Meier Table
- •The Kaplan-Meier Chart
- •Determining Median Survival Time
- •Comparing Survival with Two Groups
- •20 Systematic Review and Meta-Analysis
- •Learning Objectives
- •Introduction
- •Systematic Review
- •Publication and other Biases
- •The Funnel Plot
- •Combining the Studies
- •Solutions to Exercises
- •References
- •Index
ESTIMATING THE DIFFERENCE BETWEEN TWO MATCHED POPULATION MEANS |
125 |
smoked one to nine cigarettes a day, and 473 had mothers who had smoked 10 or more cigarettes a day. The figure shows the 95 per cent confidence intervals for differences in mean weight according to sex of baby and smoking habits of mothers: at birth, and at three and six months.
The results show, for example, that at birth, the difference between the sample mean weight of female babies born to non-smoking mothers and those born to mothers smoking 10 or more cigarettes a day, was (3220 − 3052) = 168 g. That is, the infants of smoking mothers are on average lighter by 168 g. Is this difference statistically significant in the population, or due simply to chance? The 95 per cent confidence interval of (−234 to −102) g, does not include zero, so you can be 95 per cent confident that the difference is real, i.e. is statistically significant.
Exercise 10.1 Interpret the sample mean and confidence intervals shown in Table 10.2 for all four differences in weights at six months.
Estimating the difference between two matched population means – using a method based on the matched-pairs t test
If the data within each of the two groups whose means you are comparing is widely spread compared to the difference in the spreads between the groups,5 this can make it more difficult to detect any difference in their means. When data is matched (see Chapter 7 for an explanation of matching), this reduces much of the within-group variation, and, for a given sample size, makes it easier to detect any differences between groups. As a consequence, you can achieve better precision (narrower confidence intervals), without having to increase sample size. The disadvantage of matching is that it is sometimes difficult to find a sufficiently large number of matches (as you saw in the case-control discussion earlier).
In the independent groups case, the mean of each group is computed separately, and then a confidence interval for the difference in these means is calculated. In the matched groups case, we use a method based on the matched-pairs t test, in which the difference between each pair of values is computed first and then a confidence interval for the mean of these differences is calculated.
An example from practice
Table 10.3 shows the 95 per cent confidence intervals for the difference in bone mineral density in two matched groups of women, one group depressed and one ‘normal’ (Michelson et al. 1995). (Ignore the ‘SD from expected peak’ rows.) Only one of the confidence intervals contains zero, indicating that there is no difference in population mean bone mineral density at the radius, but there is at all of the other five sites.
5 Called ‘between-group’ variation.
126 |
CH 10 ESTIMATING THE DIFFERENCE BETWEEN TWO POPULATION PARAMETERS |
Table 10.3 Confidence intervals for the differences between the population mean bone mineral densities in two individually matched groups of women, one group depressed, the other ‘normal’, using a method based on the matched-pairs t test. Reproduced from NEJM, 335, 1176–81, by permission of Massachusetts Medical Society
|
|
|
Depressed |
Normal |
Mean Difference |
P |
||||||
Bone Measured† |
|
|
Women |
|
Women |
(95% CI) |
Value |
|||||
|
|
|
|
|
|
|
|
|
|
|
||
Lumbar spine (anteroposterior) |
|
|
± 0.15 |
|
± 0.09 |
|
|
|
||||
Density (g/cm2) |
peak |
|
1.00 |
1.07 |
0.08 |
(0.02 to 0.14) |
0.02 |
|||||
SD from expected |
− |
0.42 |
± |
1.28 |
0.26 |
± |
0.82 |
0.68 |
(0.13 to 1.33) |
|
||
|
‡ |
|
|
|
|
|
|
|
||||
Lumbar spine (lateral) |
|
|
|
± 0.09 |
|
± 0.07 |
|
|
|
|||
Density (g/cm2) |
|
|
|
0.74 |
0.79 |
0.05 |
(0.00 to 0.09) |
0.03 |
||||
SD from expected peak |
−0.88 |
± 1.07 |
−0.36 |
± 0.80 |
0.50 |
(0.04 to 1.03) |
|
|||||
Femoral neck |
|
|
|
|
± 0.11 |
|
± 0.11 |
|
|
|
||
Density (g/cm2) |
|
|
|
0.76 |
0.88 |
0.11 |
(0.06 to 0.17) |
<0.00 |
||||
SD from expected peak |
−1.30 |
± 1.07 |
−0.22 |
± 0.99 |
1.08 |
(0.55 to 1.61) |
|
|||||
Ward’s triangle |
|
|
|
|
± 0.14 |
|
± 0.13 |
|
|
|
||
Density (g/cm2) |
|
|
|
0.70 |
0.81 |
0.11 |
(0.06 to 0.17) |
<0.00 |
||||
SD from expected peak |
−0.93 |
± 1.24 |
0.18 |
± 1.22 |
1.11 |
(0.60 to 1.62) |
|
|||||
Trochanter |
|
|
|
|
± 0.11 |
|
± 0.08 |
|
|
|
||
Density (g/cm2) |
|
|
|
0.66 |
0.74 |
0.08 |
(0.04 to 0.13) |
<0.001 |
||||
SD from expected peak |
−0.70 |
± 1.22 |
0.26 |
± 0.91 |
0.97 |
(0.46 to 1.47) |
|
|||||
Radius |
|
|
|
|
± 0.04 |
|
± 0.04 |
|
|
|
||
Density (g/cm2) |
|
|
|
0.68 |
0.70 |
0.01 |
(–0.01 to 0.04) |
0.25 |
||||
SD from expected peak |
−0.19 |
± 0.67 |
0.03 |
± 0.67 |
0.21 |
(–0.21 to 0.64) |
|
|||||
*Plus-minus values are means ± SD. CI denotes confidence interval.
†Values for “SD from expected peak” are the numbers of standard deviations from the expected peak density derived from a population-based study of normal white women.3
‡This measurement was made in 23 depressed women and 23 normal women.
Exercise 10.2 In Table 10.3, which population difference in bone mineral density is estimated with the greatest precision?
You can also calculate a confidence interval for the difference in two population percentages provided they derive from two metric variables. For the difference between two population proportions, however, a different approach is needed. This is an extension of the single proportion case discussed in Chapter 9, as you will now see.
Estimating the difference between two independent population proportions
Suppose you want to calculate a 95 per cent confidence interval for the difference between the population proportion of women having maternity unit births who smoked during pregnancy and the proportion having home births who smoked. The sample data on smoking status for the sample of 60 mothers is shown in Table 10.1.
ESTIMATING THE DIFFERENCE BETWEEN TWO INDEPENDENT POPULATION MEDIANS |
127 |
There are 10 mothers who smoked among the 30 giving birth in the maternity unit and six among the 30 giving birth at home. This gives sample proportions of 10/30 = 0.3333, and 6/30 = 0.2000, respectively. You can check whether this difference is statistically significant or likely to be due to chance alone, by calculating a 95 per cent confidence interval for the difference in the corresponding population proportions.6 To do this by hand is a bit long-winded and you would want to use a computer program to do the calculation for you.
An example from practice
If you look back at Table 9.1, the randomised trial of integrated versus conventional care for asthma patients, the last column shows the 95 per cent confidence intervals for the difference in population percentages between the two groups, for a number of patient perceptions of the scheme. As you can see, none of the confidence intervals include zero, so you can be 95 per cent confident that the difference in population percentages between the groups of patients is statistically significant in each case.
Estimating the difference between two independent population medians – the Mann–Whitney rank-sums method
As you know from Chapter 5, the mean may not be the most representative measure of location if the data is skewed, and is not appropriate anyway if the data is ordinal. In these circumstances, you can compare the population medians rather than the means, and in place of the 2-sample t test (a parametric procedure), use a method based on the Mann–Whitney test (a non-parametric procedure).
Parametric versus non-parametric methods
A parametric procedure can be applied to data which is metric, and also has some particular distribution, most commonly the Normal distribution. A non-parametric procedure does not make these distributional requirements. So if you are analysing data that is either metric but not Normal, or is ordinal, then you need to use a non-parametric approach. The Mann–Whitney procedure only requires that the two population distributions have the same approximate shape, but does not require either to be Normal. It is the nonparametric equivalent of the two-sample t test.
Briefly, the Mann–Whitney method starts by combining the data from both groups, which are then ranked. The rank values for each group are then separated and summed. If the medians of the two groups are the same, then the sums of the ranks of the two groups should be
6The 95 per cent confidence interval is (−0.088 to 0.355). Since this interval includes 0, we conclude that there is no difference in the proportion of mothers who smoked at home and in the maternity unit.
128 |
CH 10 ESTIMATING THE DIFFERENCE BETWEEN TWO POPULATION PARAMETERS |
Mann-Whitney Test and CI: Apgar matn, Apgar home
Apgar ma |
N |
= |
30 |
Median |
= |
7.000 |
Apgar ho |
N = |
30 |
Median |
= |
8.000 |
|
Point estimate for ETA1-ETA2 is |
-1.000 |
|||||
95.2 Percent CI |
for ETA1-ETA2 is (-2.000,0.000) |
|||||
Confidence interval for the difference in the two medians.
W = 790.5
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.0668 The test is significant at 0.0616 (adjusted for ties)
Cannot reject at alpha = 0.05
Figure 10.3 Minitab’s Mann–Whitney output for a 95 per cent confidence interval for the difference between two independent median Apgar scores – for infants born in maternity units and at home (raw data in Table 10.1). Note that Minitab uses Greek ‘ETA’ to denote the population median
similar. However, if the rank sums are different, you need to know whether this difference could simply be due to chance, or is because there really is a statistically significant difference in the population medians. A Mann–Whitney confidence interval for the difference will help you decide between these alternatives.
As an illustration, let’s compare the difference in the population median Apgar scores for the maternity unit and home birth infants, using the sample data in Table 10.1. These are independent groups, but since this data is ordinal, we cannot use the two-sample t test, but we can use the Mann–Whitney test of medians. The output from Minitab is shown in Figure 10.3, with the 95 per cent confidence interval in the fourth row.7 Since the confidence interval of (−2 to 0) contains zero, you must conclude that the difference in the population median Apgar scores is not statistically significant. Notice that the confidence level is given as 95.2 per cent, not 95 per cent. Confidence intervals for medians cannot always achieve the precise confidence level you asked for, because of the way in which a median is calculated.
An example from practice
Table 10.4 is from a randomised controlled double-blind trial to compare the cost effectiveness of two treatments in relieving pain after blunt instrument injury in an A&E department (Rainer et al. 2000). It shows the median times spent by two groups of patients in various clinical situations. One group received ketorolac, the other group morphine. The penultimate column contains the 95 per cent confidence intervals for the difference in various median treatment times (minutes), between the groups (ignore the last column). As the footnote to the table indicates, these results were obtained using the Mann–Whitney method.
The only confidence interval not containing zero is that for the difference in median ‘time between receiving analgesia and leaving A&E’, for which the difference in the sample medians is 20.0 minutes. So this is the only treatment time for which the difference in population median
7 As far as I am aware, SPSS does not appear to calculate a confidence interval for two independent medians.
Table 10.4 Mann–Whitney confidence intervals for the difference between two independent groups of patients in their median times spent in several clinical situations. One group received ketorolac, the other morphine median number (interquartile range) of minutes relating to participants treatment. Reproduced from BMJ, 321, 1247–51, courtesy of BMJ Publishing Group
|
|
|
|
Median difference |
|
|
|
|
Ketorolac group (n = 75) |
Morphine group (n = 73) |
(95% confidence |
|
|
||
Variable |
|
interval) |
P value* |
||||
|
|
|
|
|
|
|
|
Interval between arrival in emergency department |
38.0 |
(30.0 to 54.0) |
39.0 (29.0 to 53.0) |
1.0 |
(−5.0 to 7.0) |
0.72 |
|
and doctor prescribing analgesia |
|
|
|
|
|
|
|
Preparation for analgesia |
5.0 |
(5.0 to 10.0) |
10.0 (5.5 to 12.5) |
2.0 (0 to 5.0) |
0.0002 |
|
|
Undergoing radiography |
5.0 |
(5.0 to 10.0) |
5.0 (4.0 to 10.0) |
0 |
(−1.0 to 0) |
0.75 |
|
Total time spent in emergency department |
155.0 (112.0 to 198.0) |
171.0 (126.0 to 208.5) |
15.0 |
(−4.0 to 33.0) |
0.11 |
|
|
Interval between receiving analgesia and leaving |
115.0 |
(75.0 to 149.0) |
130.0 (95.0 to 170.0) |
20.0 (4.0 to 39.0) |
0.02 |
|
|
emergency department |
|
|
|
|
|
|
|
*Mann–Whitney U test.
Table 10.5 Confidence interval estimates from the Wilcoxon signed-ranks method for the difference in population food intakes per day, for a number of substances, from a study of the dietary habits of schizophrenics. Values are median (range). Reproduced from BMJ, 317, 784–5, courtesy of BMJ Publishing Group
|
|
Men |
|
Women |
|
|
All |
Wilcoxon signed ranks test |
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Patients |
Controls |
|
Patients |
Controls |
|
Patients |
Controls |
Median difference |
|
|||
Intake/day |
|
(n = 17) |
(n = 17) |
|
(n = 13) |
(n = 13) |
|
(n = 30) |
(n = 30) |
(95% Cl) |
P |
|||
|
|
|
|
|
|
|
|
|
|
|
||||
Energy (MJ) |
11.84 |
14.19 |
|
8.87 (5.07–13.02) |
9.99 (5.25–16.25) |
9.71 |
11.98 |
|
2.06 (0.26–4.23) |
0.04 |
||||
|
(7.67–17.93) |
(6.94–23.22) |
|
|
|
|
(5.07–17.94) |
(5.25–23.22) |
15.9 (−1.1 to 32.8) |
|
||||
Protein (g) |
92.5 |
114.2 |
|
68.7 (38.4–104.2) |
82.5 (40.5–142.7) |
84.5 |
96.0 |
|
0.07 |
|||||
|
(65.1–157.4) |
(74–633) |
|
|
|
|
(38.4–157.4) |
(40.5 to 633.0) |
|
|
||||
Total fibre (g) |
13.0 |
22.0 |
|
10.7 (7.3–18.0) |
15.5 (10.7–22.9) |
12.6 (7.3–20.8) |
18.9 (8.7–86.2) |
7.0 (3.6 to 10.6) |
0.0001 |
|||||
|
(8.5–20.8) |
(8.7–86.2) |
|
|
|
|
|
|
|
|
|
|
||
Retinol (μg) |
647 |
817 |
|
533 (288–7556) |
817 (201–11585) |
590 |
817 |
|
310 (93 to 1269) |
0.02 |
||||
|
(294–1498) |
(134–12341) |
|
|
|
|
|
(288–7556) |
(134–12341) |
|
|
|||
Carotene (μg) |
783 |
2510 |
|
2048 (550–4657) |
3079 (956–6188) |
1443 |
2798 |
|
1376 (549 to 2452) |
0.004 |
||||
|
(219–3638) |
(523–11313) |
|
|
|
|
|
(219–4657) |
(523–11313) |
|
|
|||
Vitamin C (mg) |
41.0 |
81.0 |
|
|
40.0 (3–165) |
61.0 (27.0–291.0) |
40.5 (3.0–204) |
80.5 (14.0–219) |
33.5 (2.0 to 64.0) |
0.03 |
||||
|
(4.0–204) |
(14.0–262) |
|
|
|
|
|
|
|
|
|
|
||
Vitamin E (mg) |
4.8 |
10.26 |
|
|
4.5 (2.3–6.0) |
5.38 (3.6–14.7) |
4.7 (2.3–18.0) |
7.8 (2.2–32.0) |
2.9 (1.45 to 5.35) |
0.0002 |
||||
|
(3.4–18.0) |
(2.23–32.0) |
|
|
|
|
|
|
|
|
|
|
||
Alcohol (g) |
3.8 (0–19.4) |
11.7 (0–80) |
|
0 (0–5.6) |
1.8 (0–12) |
|
0 (0–19.4) |
5.7 (0–80) |
5.4 (1.2 to 9.9) |
0.009 |
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
