Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Study Design and Statistical Analysis a practical guide for clinicians_Katz _2006

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
669.78 Кб
Скачать

109

 

Analyzing repeated observations of the same subject

 

 

Table 5.23. Presence of gastritis among patients with

 

 

gastroesophageal reflux and H. pylori infection

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Follow-up

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Baseline

 

Normal

Gastritis

Total

 

 

 

 

 

 

 

 

 

 

 

Normal

7

 

17

 

24

(41%)

 

 

Gastritis

4

 

31

 

35

(59%)

 

 

Total

11

(19%)

48 (81%)

59

(100%)

 

 

 

 

 

 

 

 

 

 

P 0.007 by McNemar’s test.

Data from Kuipers, E.J., et al. Atrophic gastritis and Helicobacter pylori in patients with reflux esophagitis treated with omeprazole or fundoplication. New Engl. J. Med. 1996; 334: 1018–22.

At baseline 41% of patients had normal mucosa and 59% had gastritis (Table 5.23, last column) while at follow-up only 19% had normal mucosa and 81% had gastritis (bottom row).

For calculating McNemar’s test the only two cells that matter are the two cells that indicate change (i.e., going from normal at baseline to having gastritis at follow-up or going from gastritis at baseline to having a normal examination at follow-up). I have bolded these two cells in Table 5.23. Adding these two cells together, we find that 21 patients had a change in status. If there were no tendency toward gastritis in patients with gastroesophageal reflux we would expect that about half of these 21 patients (10.5 patients) would go in each direction (half from normal to gastritis and half from gastritis to normal).

However, our distribution (4 and 17) is clearly different from 10.5. Is it possible that the difference is due to chance? We use McNemar’s test to determine the probability of obtaining these results if the null hypothesis (no difference) were true. In the case of this example, the probability is equal to 0.007. We can therefore reject the null hypothesis and consider alternative hypotheses, such as long-term acid suppression leads to the development of gastritis.

To appreciate the importance of using an analytic tool that incorporates the pairing of observations, let us assume that you analysed the data shown in Table 5.23 as if the observations were unpaired. Feeling smart from having read Section 5.2 you bypass chi-squared because you recognize that the expected number of subjects in one of the cells (normal at baseline and normal at follow-up) is 5 (0.41 11 4.5). The P-value associated with a two-tailed Fisher’s exact test is 0.10. Therefore, you would wrongly assume that the null hypothesis is correct: that there is no significant difference between the baseline and the follow-up examination.

110

 

Bivariate statistics

 

 

 

 

Table 5.24. Changes in behaviors associated with HIV transmission*

 

 

 

 

 

 

 

 

 

 

 

 

 

Baseline (%)

6 months (%)

12 months (%)

18 months (%)

P-value

 

 

 

 

 

 

 

HIV-positive sex partner

 

20

18

18

10

0.001

Unprotected receptive anal sex

32

27

28

29

0.02

Condom failure

 

19

13

10

12

0.001

Urethritis

 

9

3

2

2

0.001

 

 

 

 

 

 

 

 

* Behaviors are coded as yes or no.

Data from Buchbinder, S.P., et al. Feasibility of human immunodeficiency virus vaccine trials in homosexual men in the United States: risk behavior, seroincidence, and willingness to participate. J. Infect. Dis. 1996; 174: 954–61.

5.10.B Three or more repeated measurements of a dichotomous variable

Use Cochran’s Q to assess multiple observations of a dichotomous variable on the same subjects.

When you have multiple observations of a dichotomous variable on the same subjects use Cochran’s Q.

Cochran’s Q follows a chi-squared distribution. When the value is large you can reject the null hypothesis that there are no differences among the repeated observations of the subjects.97

For example, Buchbinder and colleagues analysed changes in behaviors associated with HIV transmission among 1256 HIV-negative men who have sex with men.98 Subjects were assessed at baseline, at 6, 12, and 18 months. The investigators compared the percent of subjects engaging in each behavior at the four different times by calculating Cochran’s Q-test for each of the four behaviors. In other words, they calculated four Cochran’s Q-tests, each of which has a P-value, as shown in Table 5.24. They found that there were significant differences in the percentages of subjects engaging in the four behaviors over time.

5.10.C Paired measurements of a normally distributed interval variable

Use a paired t-test to compare paired observations of a normally distributed interval variable.

When you have a paired observation of a normally distributed interval variable use the paired t-test.

The paired t-test for repeated observations is calculated as the mean change between the paired observations divided by the standard deviation of that change.

mean change in the pair

paired t

standard deviation of the change

97For more on Cochran’s Q,: Fleiss, J.L., Levin, B., Paik, M.C. Statistical Methods for Rates and Proportions (3rd edition). Hoboken, New Jersey: Wiley & Sons, 2003, pp. 126–33.

98Buchbinder, S.P., Douglas, J.M., McKirnan, D.J., et al. Feasibility of human immunodeficiency virus vaccine trials in homosexual men in the United States: risk behavior, seroincidence, and willingness to participate. J. Infect. Dis. 1996: 174: 954–61.

111

 

Analyzing repeated observations of the same subject

 

 

 

Table 5.25. Effect of cocaine use, cigarettes, and cocaine plus cigarettes on

 

 

coronary artery stenosis

 

 

 

 

 

 

 

 

 

 

 

 

Pre-exposure

Post-exposure

 

 

 

 

stenosis diameter

stenosis diameter

 

 

 

Exposure

(mm)

(mm)

P-value*

 

 

 

 

 

 

 

 

Cocaine use (n 6)

1.21

1.09

0.01

 

 

Cigarette (n 12)

1.13

1.07

0.32

 

 

Cocaine plus cigarette (n 12)

1.20

0.96

0.001

 

 

 

 

 

 

* P-value based on a paired t-test.

Data from Moliterno, D.J., Willard, J.E., Lange, R.A., et al. Coronary-artery vasoconstriction induced by cocaine, cigarette smoking, or both. New Engl. J. Med. 1994; 330: 454–9.

As with the unpaired t-test, large t-values are associated with small P-values. When the P-value is small the likelihood that the observed difference is due to chance is low.

For example, Moliterno and colleagues assessed the effect of cocaine and cigarette smoking on coronary artery vasoconstriction.99 They compared the degree of coronary artery stenosis (measured using coronary angiography) before and after three different exposures received by three different groups of subjects. One group (n 6) were exposed to intranasal cocaine use. A second group (n 12) was exposed to cigarette smoke; a third group was exposed to both intranasal cocaine use and cigarette smoke. For each group, the investigators used a paired t-test to compare the diameter of the stenosis at baseline to the diameter following exposure.

In all three groups there was a narrowing of the stenosis (vasoconstriction); the narrowing was statistically significant (as indicated by the P-value of the paired t-test) with cocaine use and with cocaine use plus smoking (Table 5.25).

5.10.D Multiple ( 3) repeated observations of a normally distributed

interval variable

Use repeated-measures ANOVA to assess repeated observations of a normally distributed interval variable.

When you have repeated observations of a normally distributed interval variable use repeated-measures ANOVA.

The simplest type of repeated-measures ANOVA is the comparison of the response of a single group of subjects on three or more occasions. It is analogous to a paired t-test (Section 5.10.C) except you have more than two measurements.

99Moliterno, D.J., Willard, J.E., Lange, R.A., et al. Coronary artery vasoconstriction induced by cocaine, cigarette smoking, or both. New Engl. J. Med. 1994; 330: 454–9.

112

Bivariate statistics

 

 

Internet education

Internet behavior therapy 82

Mean body weight (kg)

80

78

76

74

72

70

68

Baseline

3 months 6 months

Figure 5.10

Comparison of weight loss among participants receiving internet education to

 

those who received internet behavioral therapy. Reprinted with permission from:

 

Tate, D.F., et al. Using internet technology to deliver a behavioral weight loss

 

program. J. Am. Med. Assoc. 2001; 285: 1172–7. Copyright 2001 American

 

Medical Association. All rights reserved.

 

As with standard ANOVA (Section 5.6.A) repeated-measures ANOVA produces

 

an F-value. If the value of F is large, and the P-value is small, you can reject the

 

null hypothesis (that the means of the different observations are the same).

 

Repeated-measures ANOVA can also be used to compare repeated observa-

 

tions of two or more groups of subjects. In this case, we are testing the null

 

hypothesis that the repeated means are not different between the groups. If the

 

value of F is large, and the P-value is small, we can reject the null hypothesis.

 

For example, Tate and colleagues compared two behavioral weight loss pro-

 

grams.100 Participants were randomized to receive either education or behavioral

 

therapy (both over the internet!). Weight was assessed at baseline, 3 months, and

 

6 months. The null hypothesis was that there was no difference in the measured

 

weight of the two groups over time.

 

As you can see in Figure 5.10 participants who received the behavioral ther-

 

apy had greater weight loss over time than those that received the education.

 

Repeated-measures ANOVA indicated that the difference in weight loss between

 

the two groups over time was significant at P 0.005.

 

Unfortunately, repeated-measures ANOVA has several limitations. Specifi-

 

cally, you must have the same number of observations of each subject and the

 

observations must be made at the same time. Since these conditions are not

100Tate, D.F., Wing, R.R., Winett, R.A. Using internet technology to deliver a behavioral weight loss program. J. Am. Med. Assoc. 2001; 285: 1172–7.

loss.102

113 Analyzing repeated observations of the same subject

usually met with clinical studies, investigators more commonly use generalized estimating equations or mixed-effects models in these situations. These techniques are beyond the scope of this book.101

5.10.E Paired observations of a non-normally distributed interval variable or an

ordinal variable

Use the Wilcoxon signed rank test to compare paired observations of a non-normally distributed interval variable or of an ordinal variable.

When you have paired observations of a non-normally distributed interval variable or of an ordinal variable use the Wilcoxon signed rank test (also known as the Wilcoxon matched pairs test).

The test is based on ranking the differences between the paired observations. Specifically, for each pair of observations, you can calculate a difference. In some cases that difference will be positive (if the first observation is higher than second observation). In other cases that difference will be negative (if the first observation is lower than the second observation). If a similar numbers of pairs have positive rankings as negative rankings, then when you add up the ranks you will get zero (or a value close to zero). If the rankings are overwhelmingly positive or negative, then when you add up the rankings, you will get a large absolute number. (An absolute number is the number without consideration of a positive or negative sign.)

If the absolute value is large for a given sample size, then the P-value will be small and you can reject the null hypothesis that there is no difference between the two sets of observations.

For example, Davi and colleagues compared the urinary 11-dehydro- thromboxane B2 excretion levels among 11 obese women before and after weight (11-dehydro-thromboxane B2 is a marker of platelet activation; platelet activation may be one of the intervening factors between obesity and increased risk of cardiovascular disease.) You can see from Figure 5.11 that the level of

11-dehydro-thromboxane B2 declined in the vast majority of women. Although the urinary 11-dehydro-thromboxane B2 excretion levels are meas-

ured on an interval scale, the distribution was skewed. With a skewed distribution and a small sample size, a paired t-test would not be valid. Instead, the authors used a Wilcoxon signed rank test and found that there was a statistically significant (P 0.05) decrease in urinary 11-dehydro-thromboxane B2 excretion levels from after weight loss.

101Katz, M.H. Multivariable Analysis: A Practical Guide for Clinicians (2nd edition). Cambridge University Press, 2005, Chapter 12.

102Davi, G., Guagnana, M.T., Ciabattoni, G., et al. Platelet activation in obese women: role of inflammation and oxidant stress. J. Am. Med. Assoc. 2002; 288: 2008–14.

114

Bivariate statistics

 

 

Creatinine (pg/mg)

2100

1400

700

0

Urinary 11-dehydro-thromboxane B2

P 0.005

Before

After

Figure 5.11

Urinary 11-dehydro-thromboxane B2 excretion levels among 11 obese women

 

before and after weight loss. The dotted lines indicate the range of excretion of

 

11-dehydro-thromboxane B2 excretion levels among non-obese women. Reprinted

 

with permission from Davi, G., et al. Platelet activation in obese women: role of

 

inflammation and oxidant stress. J. Am. Med. Assoc. 2002; 288: 2008–14.

 

Copyright 2002 American Medical Association. All rights reserved.

 

Herrstedt and colleagues used the Wilcoxon signed rank test to compare

 

paired observations of an ordinal variable.103 The ordinal variable was severity

 

of nausea and the study was a randomized, double-blinded crossover trial of

 

two different antiemetic regimens: ondansetron or ondansetron plus metopi-

 

mazine. All patients were receiving chemotherapy. Each patient received one

 

regimen on a round of chemotherapy and the other regimen on the next round

 

of chemotherapy.

 

Patients who received ondansetron plus metopimazine were more likely

 

to have no or mild nausea and more less likely to have moderate or severe nau-

 

sea than patients who received only ondansetron (Table 5.26). The Wilcoxon

 

signed rank test was significant (P 0.006).

5.10.F Repeated ( 3) observations of a non-normally distributed interval variable or an ordinal variable

Use the Friedman statistic to assess repeated observations of a non-normally distributed interval variable or an ordinal variable.

When you have repeated observations of a non-normally distributed interval variable or an ordinal variable use the Friedman statistic.

The Friedman statistic, like the Wilcoxon signed rank test, is based on rankings. The test ranks the multiple observations of each subject, sums the rankings

103Herrstedt, J., Sigsgaard, T., Boesgaard, M., Jensen, T.P., Dombernowsky, P. Ondansetron plus metopimazine compared with ondansetron alone in patients receiving moderately emetogenic chemotherapy. New Engl. J. Med. 1993; 328: 1076–80.

115

 

Analyzing repeated observations of the same subject

 

 

 

Table 5.26. Response of 30 chemotherapy patients to two different regimens

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Antiemetic regimen

 

 

 

 

 

 

 

 

 

Severity of nausea

Ondansetron

Ondansetron metopimazine

 

 

 

 

 

 

 

 

 

None

6

(20)

12

(40)

 

 

Mild

9

(30)

10

(33)

 

Moderate

10

(33)

8

(27)

 

 

Severe

5

(17)

0 (0)

 

 

 

 

 

 

 

 

Values are represented as n (%).

P-value of Wilcoxon signed rank test 0.006.

Data from Herrstedt, J., et al. Ondansetron plus metopimazine compared with ondansetron alone in patients receiving moderately emetogenic chemotherapy. New Engl. J. Med.

1993; 328: 1076–80.

under each condition (or time point) and compares the observed rank sums for each condition to what would be expected by chance.

What you would expect by chance is that the observed rank sums for the different conditions would be about the same. However, if the rankings come out substantially higher (or lower) for one of the conditions, then the Friedman statistic will yield a small P-value and you can reject the null hypothesis that there are no differences in the observations at the different conditions/time points.

For example, Darzins and colleagues assessed differences in the type of care subjects preferred depending on whether the care was for an unfamiliar person, a family member, or for himself or herself.104 The type of care was an interval scale range from palliative care to intensive care. The sample included doctors, nurses, health professionals, high school students and members of the general public. Respondents were given a clinical vignette about an 82-year-old-man with dementia who arrives in the emergency department with life-threatening gastrointestinal bleeding. No guidance on the type of care that the patient would want is available from the patient or his family. The respondent is asked to say how the patient should be treated under three different conditions: that the vignette patient is unfamiliar to the subject, that the patient is a family member of the subject, or that the subject is the patient, himself or herself.

There were dramatic differences in the subjects’ choice of care depending on whether they were choosing it for an unfamiliar person, a family member,

104Darzins, R., Molloy, D.W., Harrison, C. Treatment for life-threatening illness. New Engl. J. Med. 1993; 329: 736.

116

Bivariate statistics

 

 

Table 5.27. Choice of type of care for life-threatening condition in a patient with dementia

Type of care

Unfamiliar patient (%)

Family member (%)

Self (%)

 

 

 

 

Palliative

37

54

68

Limited

39

31

23

Surgical

12

8

4

Intensive

13

7

5

 

 

 

 

Friedman statistic is significant, P 0.001. Data from Darzins, R., et al.

Treatment for life-threatening illness. New Engl. J. Med. 1993; 329: 736

or for themselves (Table 5.27). The Friedman statistic was significant at P 0.001.105

5.11 How do I test bivariate associations with matched data?

Matching is a useful strategy for eliminating confounding, especially for small case–control studies (Section 2.6.C). However, when you individually match cases and controls you need to use an analytic method that incorporates the matching. Table 5.28 compares tests used to assess associations with unmatched and matched data (two groups).

As I hope you notice, most of the tests for matched data shown in Table 5.28 are the same as the tests for paired observations of the same subject shown in Table 5.22. That’s because repeated observations of the same subject are essentially matched analyses where the subject is serving as his or her own control.

5.11.A Matched comparisons of a dichotomous variable

Use McNemar’s test to compare matched data on a dichotomous outcome.

When you wish to compare the matched pairs on a dichotomous variable use McNemar’s test (just as you would with paired observations of the same person on a dichotomous variable, Section 5.10.A).

Just as when using McNemar’s test with paired observations only the discordant pairs contribute to the determination of McNemar’s test.

105As this study asks subjects to make hypothetical decisions you may feel that it does not qualify

as repeated “observations” of the same subjects. But it is repeated decision-making by subjects under different hypothetical conditions.

117

 

Bivariate associations with matched data

 

 

 

Table 5.28. Comparison of bivariate tests for unmatched and

 

 

matched data

 

 

 

 

 

 

 

 

 

 

Unmatched data

Matched data

 

 

 

 

 

 

 

Dichotomous variable

Chi-squared

McNemar’s test

 

 

 

Odds ratio

Matched odds ratio

 

 

Normally distributed interval

t-test

Paired t-test

 

 

variable

 

 

 

Non-normally distributed

Mann-Whitney test

Wilcoxon signed rank

 

 

variable

 

test

 

Ordinal variable

Mann-Whitney test

Wilcoxon signed rank

 

 

 

 

test

 

 

Survival time

Log-rank

No readily available test

 

 

 

 

 

For example, Kujala and colleagues compared the mortality of twin pairs due to smoking.106 There were 84 twin pairs that were discordant on both the risk factor (one twin smoked and one twin did not smoke) and the outcome (one twin died and one did not). If smoking were unrelated to mortality then we would expect that for half of the pairs it would be the smoker who died and for the other half it would be the nonsmoking twin who died. In fact, in 67 of the pairs it was the smoker in the pair who died, and in only 17 of the pairs was it the nonsmoker who died. McNemar’s test was significant at P 0.001.

Besides McNemar’s test, the matched odds ratio is often used to report the results for matched studies. It is easy to compute. It is:

number of pairs where the one with the

risk factor experiences the outcome

matched odds ratio

number of pairs where the one without the risk factor experiences the outcome

In the case of Kujala and colleagues study of smoking and mortality among twins, the matched odds ratio is:

67 3.9

17

106Kujala, U.M., Kaprio, J., Koskenvuo, M. Modifiable risk factors as predictors of all-cause mortality: the roles of genetics and childhood environment. Am. J. Epidemiol. 2002; 156: 985–93.

118

Bivariate statistics

 

 

Table 5.29. Comparison of sleep characteristics of 1st-degree relatives with sleep apnea/hypopnea compared to controls

 

Cases* (mean)

Controls (mean)

P-value

Slow wave (deep) sleep

78 min

91 min

0.03

Minutes of light sleep

209 min

179 min

0.006

2% desaturations

6/h

3/h

0.04

3% desaturations

4/h

2/h

0.04

 

 

 

 

*Cases are 1st-degree relatives of patients with sleep apnea.

P-value is based on paired t-test.

Data from Mathur, R. and Douglas, N.J. Family studies in patients with the sleep apnea–hyponea syndrome. Ann. Intern. Med. 1995; 122: 174–8.

5.11.B Matched measurements of a normally distributed interval variable

When you have matched data on a normally distributed interval variable use a paired t-test (just as you would with paired measures of the same person on a normally distributed interval variable, Section 5.10.C).

For example, Mathur and colleagues conducted a case–control study of whether familial factors were associated with development of sleep apnea/ hypopnea.107 The investigators matched 51 1st-degree relatives of patients with sleep apnea (cases) to 51 controls of similar age, sex, height, and weight. They used paired t-tests to compare cases to their matched controls on several normally distributed interval variables.

The paired t-test for comparing matched results is the same as the paired t-test for comparing two measurements of the same person. The only difference is that with the former the “pair” is the case and the control and with the latter the pair is the two measurements.

As you can see, 1st-degree relatives had significantly less slow wave (deep) sleep, more light sleep, and more 2% and 3% oxyhemoglobin desaturations per hour than controls (Table 5.29).

5.11.C Matched measurements of a non-normally distributed interval or ordinal variable

When you have matched data on a non-normally distributed interval or ordinal variable use the Wilcoxon signed rank test (just as you would with

107Mathur, R., Douglas, N.J. Family studies in patients with the sleep apnea–hypopnea syndrome. Ann. Intern. Med. 1995; 122: 174–8.