Добавил:

Sekretar kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Ростовский Государственный Медицинский Университет

Предмет:

Медицина общая

Файл:

Ординатура / Офтальмология / Английские материалы / Medical Statistics from Scratch_Bowers_2008.pdf

Скачиваний:

Добавлен:

28.03.2026

Размер:

4.18 Mб

Скачать

☆

<<< < Предыдущая 19 20 21 22 23 24 25 26 27 28 29 30 3132 / 4332 33 34 35 36 37 38 39 40 41 42 43 > Следующая >>>

THE CORRELATION COEFFICIENT

175

Figure 15.4 (a) A linear association (b) A non-linear association

environmental	work (%)
Proportion exposed to	tobacco smoke at

40	r =0.80 p<0.0001

0	10	20	30	40	50	60
	Proportion of current smokers (%)

Figure 15.5 Scatterplot from a study into the effect of passive smoking on respiratory symptoms. Reprinted courtesy of Elsevier (The Lancet 2001, 358, 2103–9, Fig. 1, p. 2105)

The correlation coefﬁcient

The principal limitation of the scatterplot in assessing association is that it does not provide us with a numeric measure of the strength of the association; for this we have to turn to the correlation coefﬁcient. Two correlation coefﬁcients are widely used: Pearson’s and Spearman’s.

Pearson’s correlation coefﬁcient

Pearson’s product-moment correlation coefﬁcient, denoted ρ (Greek rho), in the population, and r in the sample, measures the strength of the linear association between two variables. Loosely speaking, the correlation coefﬁcient is a measure of the average distance of all of the points from an imaginary straight line drawn through the scatter of points (analogous to the standard deviation measuring the average distance of each value from the mean).

176	CH 15 MEASURING THE ASSOCIATION BETWEEN TWO VARIABLES

Percent Body Fat

UNITED STATES

bmi

Figure 15.6 Scatterplot of per cent body fat against body mass index from a cross-section study into the relationship between bmi and body fat, in black population samples from Nigeria, Jamaica and the USA. Reproduced from Amer. J. Epid., 145, 620–8, courtesy of Oxford University Press

For Pearson’s correlation coefﬁcient to be appropriately used, both variables must be metric continuous and, if a conﬁdence interval is to be determined, also approximately Normally distributed. The value of Pearson’s correlation coefﬁcient can vary as follows: from −1, indicating a perfect negative association (all the points lie exactly on a straight line); through 0, indicating no association; to +1, indicating perfect positive association (all points exactly on a line). In practice, with real sample data, you will never see values of –1, 0 or +1. Calculation of r by hand is very tedious and prone to error, so we will avoid it here. But it can be done in a ﬂash with a computer statistics program, such as SPSS or Minitab.

Is the correlation coefﬁcient statistically signiﬁcant in the population?

To assess the statistical signiﬁcance of a population correlation coefﬁcient and hence decide whether there is a statistically signiﬁcant association between the two variables, you can either perform a hypothesis test (is the p-value less than 0.05?), or calculate a conﬁdence interval (does it include zero?). For the hypothesis test, the hypotheses are:

H0: ρ = 0

H1: ρ =0

For example, for the data shown in the scatterplot in Figure 15.1, the sample r = 0.49, with a p-value < 0.001. This indicates a statistically signiﬁcant positive association in the population between incidence rate of Crohn’s disease and ulcerative colitis.

THE CORRELATION COEFFICIENT

177

A useful rule of thumb if you have a value for r but no conﬁdence interval or p-value, is that to be statistically signiﬁcant, r must be greater than 2/√n, where n is the sample size. For example, if n = 100, then r has to be greater than 2/10 = 0.200 to be statistically signiﬁcant.

An example from practice

Table 15.1 is taken from the same cross-section study as Exercise 15.3, and shows the sample Pearson’s correlation coefﬁcient for the association between bmi and per cent body fat, with blood pressure, and waist and hip measurements, along with an indication of the statistical signiﬁcance or otherwise of the p-value.

Unfortunately, the authors have not given the actual p-values, but only indicated whether they were less than 0.05 or less than 0.01. This is not good practice; the actual p-values should always be provided. As you can see, the population correlation coefﬁcient between both bmi and per cent body fat, with waist and hip circumference, is positive and statistically signiﬁcant in every case. However, bmi is more closely associated (higher r values) than body fat, except in Jamaican men. Apart from the association with systolic blood pressure in US males, there is no statistically signiﬁcant association with either of the blood pressure measurements.

Exercise 15.4 Table 15.2 is from a case-control study of medical record validation (Olson et al. 1997), and shows the value of Pearson’s r, and the 98 per cent conﬁdence intervals, for the correlation between gestational age, as estimated by the mother, and as determined from medical records, for a number of demographic sub-groups (ignore the last column). The cases were the mothers of child leukaemia patients, the matched controls were randomly selected by random telephone calling. Identify: (a) any correlation coefﬁcients not statistically signiﬁcant; (b) the strongest correlation; (c) the weakest correlation.

Spearman’s rank correlation coefﬁcient

If either (or both) of the variables is ordinal, then Spearman’s rank correlation coefﬁcient (usually denoted ρs in the population and rs in the sample) is appropriate. This is a non-parametric measure. As with Pearson’s correlation coefﬁcient, Spearman’s correlation coefﬁcient varies from –1, through 0, to +1, and its statistical signiﬁcance can again be assessed with a p-value or a conﬁdence interval. The null hypothesis is that the population correlation coefﬁcient ρs = 0. Spearman’s rs is not quite as bad to calculate by hand as Pearson’s r but bad enough, and once again you would want to do it with the help of a computer program.

An example from practice

Table 15.3 is from the same cross-section study ﬁrst referred to in Figure 4.3, into the use of the Ontario mammography services. The authors wanted to know whether the variation in

Table 15.1 Correlation coefﬁcients from a cross-section study into the relationship between body mass index (bmi) and body fat, in black population samples from Nigeria, Jamaica, and the USA. The aim of the study was to investigate whether body fat rather than bmi could be used as a measure of obesity.‡,§ Reproduced from Amer. J. Epid., 145, 620–8, courtesy of Oxford University Press

Women

Men

Nigeria

Jamaica

United States

Nigeria

Jamaica

United States

Variable

BMI

% fat

BMI

% fat

BMI

% fat

BMI

% fat

BMI

% fat

BMI

% fat

Waist circumference

0.90**

0.77**

0.87**

0.77**

0.91**

0.85**

0.89**

0.79**

0.69**

0.76**

0.93**

0.83**

Hip circumference

0.93**

0.81**

0.91**

0.82**

0.93**

0.87**

0.89**

0.76**

0.64**

0.72**

0.93**

0.82**

Systolic blood pressure

0.24

0.16

0.15

0.21

0.09

0.24

0.24*

0.23*

Diastolic blood pressure

0.16

0.14

0.20

0.16

0.07

0.10

0.31

0.24

0.16

0.11

0.22

0.20

*p < 0.05; ** p < 0.01.

† Weight (kg)/height (m)2. ‡Data were adjusted for age.

§No signiﬁcant difference was found between correlation coefﬁcients for body mass index and percentage of body fat.

THE CORRELATION COEFFICIENT

179

the ranked utilisation rates (number of visits per 1000 women) was similar across the age groups. They did this by measuring the strength of the association between the ranked rates for each pair of different age groups. When the association was strong and signiﬁcant, they concluded that the variation in the usage rate was similar.

The results show that the rs for the association between the ranked usage rates for 30–39 year-olds, and the 40–49 year-olds, across the 33 districts was 0.6496 (ﬁrst row of table), with a p-value of 0.0005. So this association is positive and statistically signiﬁcant in these two age group populations. Indeed, the correlation coefﬁcients between all pairs of age groups are statistically signiﬁcant, with all p-values < 0.05. The authors thus concluded that variation in

Table 15.2 Pearson’s r and 98 per cent conﬁdence intervals for the association between gestational age, as estimated by the mother and from medical records, for a number of demographic sub-groups. Reproduced from Amer. J. Epid., 145, 58–67, courtesy of Oxford University Press

	Correlation of		Kappa
	gestational age	98% CI*	statistic†

All gestational ages	0.839	0.817–0.859	0.62
Case/control status
Cases	0.849	0.813–0.878	0.63
Controls	0.835	0.805–0.861	0.61
Education
<High school	0.694	0.553–0.797	0.51
High school	0.833	0.790–0.868	0.63
>High school	0.835	0.804–0.861	0.62
Household income
<$22,000	0.791	0.734–0.837	0.59
$22,000–$ 34,999	0.882	0.849–0.908	0.62
≥$35,000	0.843	0.800–0.877	0.65
Unknown	0.745	0.641–0.823	0.60
Time (years) from delivery to interview
<2	0.896	0.862–0.921	0.64
2–3.9	0.821	0.784–0.852	0.63
4–5.9	0.828	0.775–0.869	0.61
6–8	0.852	0.734–0.920	0.42
Maternal age (years)
<25	0.822	0.773–0.861	0.64
25–29	0.889	0.862–0.912	0.63
30–34	0.760	0.694–0.813	0.57
≥35	0.888	0.824–0.930	0.64
Birth order
First born	0.880	0.853–0.903	0.67
Second born	0.815	0.778–0.846	0.57
≥Third born	0.632	0.416–0.781	0.52
Maternal race
White	0.846	0.822–0.866	0.64
Other	0.782	0.680–0.855	0.42

*CI, conﬁdence Interval.

† Three categories, <38, 38–41, ≥42 weeks.

180	CH 15 MEASURING THE ASSOCIATION BETWEEN TWO VARIABLES

Table 15.3 Spearman correlation coefﬁcients from a cross-section study of the use of the Ontario mammography services in relation to age. Each correlation coefﬁcient measures the strength of the association in the variation between the ranked usage rate across the 33 heath districts for each pair of age groups. Reproduced from J. Epid. Comm. Health, 51, 378–82, courtesy of BMJ Publishing Group

Age group (y)	30–39y	40–49y	50–69y	70+y
30–39	1.0000 0.6496 (p < 0.0001)		0.5949 (p = 0.0005)	0.5488 (p = 0.0014)
40–49		1.0000	0.9021 (p < 0.0001)	0.8985 (p < 0.0001)
50–69			1.0000	0.9513 (p < 0.0001)
70+				1.0000

usage rate was similar for the four age groups across the 33 health districts. However, whether association is the correct way to measure similarity in two sets of values is a question I will return to in the next chapter.

Two other correlation coefﬁcients can only be mentioned brieﬂy. Kendal’s rank-order correlation coefﬁcient, denoted τ (tau), is appropriate in the same circumstances as Spearman’s rs , i.e. with ranked data (which may be ordinal or continuous). Tau is available in SPSS, but not in Minitab. The point-biserial correlation coefﬁcient is appropriate if one variable is metric continuous and the other is truly dichotomous (which means that the variable can take only two values; alive or dead, male or female, etc.). Unfortunately, this latter measure of association is not available in either SPSS or Minitab.

If you plan to use a correlation coefﬁcient you should ensure that the assumptions referred to above are satisﬁed, in particular that the association is linear - which can be checked by a scatterplot. Moreover, with Pearson’s correlation coefﬁcient you should interpret any results with suspicion if there are outliers present in either data set, since these can distort the results.

Finally it is worth noting again that just because two variables are signiﬁcantly associated, does not mean that there is a cause–effect relationship between them.

<<< < Предыдущая 19 20 21 22 23 24 25 26 27 28 29 30 3132 / 4332 33 34 35 36 37 38 39 40 41 42 43 > Следующая >>>

Соседние файлы в папке Английские материалы

#
28.03.20268.13 Mб0Mastering Corneal Collagen Cross-linking Techniques (C3-R CCL CxL)_Garg, Kanellopoulos, O'Brart, Lovisolo, Pinelli_2008.pdf
#
28.03.202613.68 Mб0Mastering theTechniques of Lens Based Refractive Surgery (Phakic IOLs)_Garg, Alio, Dementiev_2005.pdf
#
28.03.202651.38 Mб0Mechanisms of the Glaucomas_Shields, Tombran-Tink, Barnstable_2008.pdf
#
28.03.20261.67 Mб0Medical Contact Lens Practice_Millis_2005.pdf
#
28.03.20268.96 Mб0Medical Retina_Bandello, Querques_2012.pdf
#
28.03.20264.18 Mб0Medical Statistics from Scratch_Bowers_2008.pdf
#
28.03.20264.93 Mб0Medical Treatment of Glaucoma_Weinreb, Liebmann_2010.pdf
#
28.03.20266.28 Mб0Minimally Invasive Ophthalmic Surgery_Fine, Mojon_2010.djvu
#
28.03.202613.07 Mб0Minimally Invasive Ophthalmic Surgery_Fine, Mojon_2010.pdf
#
28.03.202610.66 Mб0Minimally Invasive Techniques of Oculofacial Rejuvenation_Bosniak, Cantisano-Zilkha_2005.pdf
#
28.03.202617.96 Mб0Minimizing Incisions and Maximizing Outcomes in Cataract Surgery_Alio, Fine_2010.pdf