Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
4.18 Mб
Скачать

15

Measuring the association between two variables

Learning objectives

When you have finished this chapter you should be able to:

Explain the meaning of association.

Draw and interpret a scatterplot, and from it assess the linearity, direction and strength of an association.

Distinguish between negative and positive association.

Explain what a correlation coefficient is.

Describe Pearson’s correlation coefficient r , its distributional requirements, and interpret a given value of r .

Describe Spearman’s correlation coefficient rs and interpret a given value of rs .

Describe the circumstances under which Pearson’s r or Spearman’s rs is appropriate.

Association

When we say that two ordinal or metric variables are associated, we mean that they behave in a way that makes them appear ‘connected’ - changes in either variable seem to coincide with

Medical Statistics from Scratch, Second Edition David Bowers

C 2008 John Wiley & Sons, Ltd

172

CH 15 MEASURING THE ASSOCIATION BETWEEN TWO VARIABLES

changes in the other variable. It’s important to note (at this point anyway), that we are not suggesting that change in either variable is causing the change in the other variable, simply that they exhibit this commonality. As you will see, association, if it exists, may be positive (low values of one variable coincide with low values of the other variable, and high values with high values) or negative (low values with high values and vice versa).

In this chapter, I want to discuss two alternative methods of detecting an association. The first method relies on a plot of the sample data, called a scatterplot, in which values of one variable are plotted on the vertical axis and values of the other on the horizontal axis. The second approach is numeric, making both comparison and inference possible.

The scatterplot

A scatterplot will enable you to see if there is an association between the variables, and if there is, its strength and direction. But the scatterplot will only provide a qualitative assessment, and thus has obvious limitations. First, it’s not always easy to say which of two sample scatterplots indicates the stronger association and second, it doesn’t allow us to make inferences about possible associations in the population.

An example from practice

As part of a study of the possible association between Crohn’s disease (CD) and ulcerative colitis (UC), researchers in Canada (Blanchard et al. 2001) produced the scatterplot shown in Figure 15.1. It doesn’t matter which variable is plotted on which axis for the scatterplot itself, but in the study of causal relationships between variables (which I will discuss in Chapter 17), the choice of axis becomes more important.

Looking at the scatterplot it’s not difficult to see that something is going on here. The scatter is not just a random cloud of points, but appears to display a pattern – low CD levels seem to be associated with low UC levels, and higher CD levels with high UC levels. You could justly claim that the two variables appear to be positively associated.

As a second example, Figure 15.2 shows a scatterplot taken from a study into the possible relationship between percentage mortality from aortic aneurysm, and the number of aortic aneurysm episodes dealt with per year, in each of 22 hospitals (McKee and Hunter 1995). This scatterplot displays a negative association between the two variables, low values for number of episodes seem to be associated with high values for percentage mortality, and vice versa.

As a final example from practice, Figure 15.3 shows a scatterplot taken from the crosssection study into the possible contribution of channel blockers (prescribed for depression), to the suicide rate in 284 Swedish municipalities (Lindberg et al. 1998), first referred to in Figure 3.10. The scatterplot here is very much more fuzzy than the two previous plots, and it would be hard to claim, merely from eyeballing it, that there is any notable association between the two variables (although admittedly there is some evidence of a rather weak positive association).

When you set out to investigate a possible association between two variables, a scatterplot is almost always worthwhile, and will often produce an insight into the way the two variables co-behave. In particular, it may reveal whether an association between them is linear. The

ASSOCIATION

173

UC Incidence Rate per 100,000

35

30

25

20

15

10

 

 

 

 

 

 

 

5

 

 

 

 

r=0.49, p<0.001

0

 

 

 

 

 

 

 

0

5

10

15

20

25

30

35

CD Incidence Rate per 100,000

Figure 15.1 Scatterplot of the age-standardised incidence rates of Crohn’s disease (CD) and ulcerative colitis (UC) by Manitoba postal area, Canada, 1987–1996. The scatterplot suggests a positive association between the two variables. Reproduced from Americal Jnl of Epidemiology 2001, 154: 328–33, Fig. 3 p. 331, by permission of OUP

% Mortality

100

90

80

70

60

50

40

30

20

10

0

0

10

20

30

40

50

60

70

 

 

 

Episodes/year

 

 

 

Figure 15.2 A scatterplot of percentage mortality from aortic aneurysm, and number of aortic aneurysm episodes dealt with per year, in 22 hospitals. The plot suggests a negative association between the two variables. Reproduced from Quality in Health Care, 4, 5–12, courtesy of BMJ Publishing Group

property of linearity is important in some branches of statistics and we’ll meet it again ourselves in Chapter 17. Put simply, a linear association is one in which the points in the scatterplot seem to cluster around a straight line. The two scatterplots in Figure 15.4 illustrate the difference between a linear and a non-linear association. The scatter in Figure 15.4a seems to be linear; but in Figure 15.4b it shows some curviness.

174

No of suicides per 10 000 inhabitants/year

CH 15 MEASURING THE ASSOCIATION BETWEEN TWO VARIABLES

4

3

2

1

0

10

20

30

40

50

0

Use of calcium channel blockers (defined daily does/1000 inhabitants/year)

Figure 15.3 A scatterplot taken from a cross-section study into the possible contribution of channel blockers (prescribed for depression) to the suicide rate, in 284 Swedish municipalities. The plot suggests a weak, if any, relationship between the variables. Reproduced courtesy of BMJ Publishing Group

Exercise 15.1 Draw a scatterplot of Apgar score against birthweight for the 30 maternityunit born infants using the data in Table 2.5, and comment on what it shows about any association between the two variables.

Exercise 15.2 The scatterplot in Figure 15.5 is from a study into the effect of passive smoking on respiratory symptoms (Janson et al. 2001). In addition, the ‘best’ straight line has been drawn through the points.1 Comment on what the scatterplot suggests about the nature and strength of any association between the two variables.

Exercise 15.3 The scatterplot of percentage body fat against body mass index (bmi) in Figure 15.6 is from a cross-section study into the relationship between body mass index and body fat, in black populations in Nigeria, Jamaica and the USA (Luke et al. 1997). The aim of the study was to investigate whether per cent body fat rather than bmi could be used as a measure of obesity. What does the scatterplot tell you about the nature and strength of any association between these two variables?

1I’ll have more to say about what constitutes the best straight line in Chapter 17, but loosely speaking, it’s the line which passes as close as possible to all the points.