Добавил:

Sekretar kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Ростовский Государственный Медицинский Университет

Предмет:

Медицина общая

Файл:

Ординатура / Офтальмология / Английские материалы / Medical Statistics from Scratch_Bowers_2008.pdf

Скачиваний:

Добавлен:

28.03.2026

Размер:

4.18 Mб

Скачать

☆

<<< < Предыдущая 19 20 21 22 23 24 25 26 27 28 29 30 31 3233 / 4333 34 35 36 37 38 39 40 41 42 43 > Следующая >>>

Measuring agreement

Learning objectives

When you have ﬁnished this chapter you should be able to:

Explain the difference between association and agreement.

Describe Cohen’s kappa, calculate its value and assess the level of agreement. Interpret published values for kappa.

Describe the idea behind ordinal kappa.

Outline the Bland–Altman approach to measuring agreement between metric variables.

To agree or not agree: that is the question

Association is a measure of the inter-connectedness of two variables; the degree to which they tend to change together, either positively or negatively. Agreement is the degree to which the values in two sets of data actually agree. To illustrate this idea look at the hypothetical data in Table 16.1, which shows the decision by a psychiatrist and by a psychiatric social worker (PSW) whether to section (Y), or not section (N), each of 10 individuals with mental ill-health. We would say that the two variables were in perfect agreement if every pair of values were the same.

Medical Statistics from Scratch, Second Edition David Bowers

C 2008 John Wiley & Sons, Ltd

182			CH 16 MEASURING AGREEMENT
	Table 16.1 Decision by a psychiatrist and a psychiatric social worker
	whether or not to section 10 individuals suffering mental ill-health

	Patient	1	2	3	4	5	6	7	8	9	10

	Psychiatrist	Y	Y	N	Y	N	N	N	Y	Y	Y
	PSW	Y	N	N	Y	N	N	Y	Y	Y	N

In practical situations this won’t happen, and here you can see that only seven out of the 10 decisions are the same, so the observed level of proportional agreement is 0.70 (70 per cent).

Cohen’s kappa

However, if you had asked each clinician simply to toss a coin to make the decision (heads – section, tails – don’t section), some of their decisions would probably still have been the same – by chance alone. You need to adjust the observed level of agreement for the proportion you would have expected to occur by chance alone. This adjustment gives us the chance-corrected proportional agreement statistic, known as Cohen’s kappa, κ :

κ =

(proportion of observed agreement − proportion of expected agreement)

(1 − proportion of expected agreement)

We can calculate the expected values using a contingency table in exactly the same way as we did for chi-squared (row total × column total ÷ overall total – see Chapter 14). Table 16.2 shows the data in Table 16.1 expressed in the form of a contingency table, with the psychiatrist’s scores in the rows, the PSW’s scores in the columns, and with row and column totals added. The expected values are shown in brackets in each cell.

Table 16.2 Contingency table showing observed (and expected ) decisions by a psychiatrist and a psychiatric social worker on whether to section 10 patients (data from Figure 16.1)

		Psychiatric Social Worker		Totals	Expected
		Yes	No		value:
Psychiatrist	Yes	4 (3)	2 (3)	6	(5 × 6)/10 = 3
Psychiatrist	No	1 (2)	3 (2)	4
	No	1 (2)	3 (2)	4
	Totals	5	5	10

We have seen that the observed agreement is 0.70, and we can calculate the expected agreement to be 5 out of 10 or 0.50.1 Therefore:

κ= (0.70 − 0.50)/(1 − 0.50) = 0.20/0.50 = 0.40

1We can expect the two clinicians to agree on ‘Yes’ three times, and ‘No’ two times, making ﬁve agreements in total.

	COHEN’S KAPPA	183
Table 16.3 How good is the
agreement – assessing kappa

Kappa	Strength of agreement

≤0.20	Poor
0.21–0.40	Fair
0.41–0.60	Moderate
0.61–0.80	Good
0.81–1.00	Very good

So after allowing for chance agreements, agreement is reduced from 70 per cent to 40 per cent. Kappa can vary between zero (agreement no better than chance), and one (perfect agreement), and you can use Table 16.3 to asses the quality of agreement. It’s possible to calculate a conﬁdence interval for kappa, but these will usually be too narrow (except for quite small samples) to add much insight to your result.

An example from practice

Table 16.4 is from a study into the development of a new quality of life scale for patients with advanced cancer and their families – the Palliative Care Outcome scale (POS) (Hearn et al. 1998). It shows agreement between the patient and staff (who also completed the scale questionnaires) for a number of items on the POS scale. The table also contains values of Spearman’s rs , and the proportion of agreements within one point on the POS scale. The level of agreement between staff and patient is either fair or moderate for all items, and agreement within one point is either good or very good.

Table 16.4 From a palliative care outcome scale (POS) study showing levels of agreement between the patient and staff assessment for a number of items on the POS scale. Reproduced from Quality in Health Care, 8, 219–27, courtesy of BMJ Publishing Group

						Proportion
	No of	Patient score	Staff score		Spearman	agreement
Item	patients	(% severe)	(% severe)	K	correlation	within 1 score

At ﬁrst assessment: 145 matched assessments
Pain	140	24.3	20.0	0.56	0.67	0.87
Other symptoms	140	27.2	26.4	0.43	0.60	0.86
Patient anxiety	140	23.6	30.0	0.37	0.56	0.83
Family anxiety	137	49.6	46.0	0.28	0.37	0.72
Information	135	12.6	13.4	0.39	0.36	0.79
Support	135	10.4	14.1	0.22	0.32	0.79
Life worthwhile	133	13.6	16.5	0.43	0.54	0.82
Self worth	132	15.9	23.5	0.37	0.53	0.82
Wasted time	135	5.9	6.7	0.33	0.32	0.95
Personal affairs	129	7.8	13.2	0.42	0.49	0.96

184	CH 16 MEASURING AGREEMENT

Table 16.5 Injury Severity Scale (ISS) scores given from case notes by two experienced trauma clinicians to 16 patients in a major trauma unit. Reproduced from BMJ, 307, 906–9. by permission of BMJ Publishing Group

								Case no.

Observer no.	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16

1	9	14	29	17	34	17	38	13	29	4	29	25	4	16	25	45
2	9	13	29	17	22	14	45	10	29	4	25	34	9	25	8	50

Exercise 16.1 Do the highest and the lowest levels of agreement in Table 16.4 coincide with the highest and lowest levels of correlation? Will this always be the case?

Exercise 16.2 Table 16.5 is from a study in a major trauma unit into the variation between two experienced trauma clinicians in assessing the degree of injury of 16 patients from their case notes (Zoltie et al. 1993). The table shows the Injury Severity Scale (ISS) score awarded to each patient.2 Categorise the scores into two groups: ISS scores of less than 16, and of 16 or more. Express the results in a contingency table, and calculate: (a) the observed and expected proportional agreement; (b) kappa. Comment on the level of agreement.

A limitation of kappa is that it is sensitive to the proportion of subjects in each category (i.e. to prevalence), so caution is needed when comparing kappa values from different studies – these are only helpful if prevalences are similar. Moreover, Cohen’s kappa as described above is only appropriate for nominal data, as in the sectioning example above, although most data can be ‘nominalised’, like the ISS values above. In the next paragraph, however, I describe, brieﬂy, a version of kappa which can handle ordinal data.

Measuring agreement with ordinal data – weighted kappa

The idea behind weighted kappa is best illustrated by referring back to the data in Table 16.5. The two clinician’s ISS scores agree for only ﬁve patients. So the proportional observed agreement is only 5/16 = 0.3125 (31.25 per cent). However, in several cases the scores have a ‘near miss’; patient 2, for example, with scores of 14 and 13. Other pairs of scores are further apart, patient 15 is given scores of 25 and 8! Weighted kappa gives credit for near misses, but its calculation is too complex for this book.

Measuring the agreement between two metric continuous variables

When it comes to measuring agreement between two metric continuous variables the obvious problem is the large number of possible values – it’s quite possible that none of them will be

2The ISS is used for the assessment of severity of injury, with a range from 0 to 75. ISS scores of 16 or above indicate potentially life-threatening injury, and survival with ISS scores above 51 is considered unlikely.

MEASURING THE AGREEMENT BETWEEN TWO METRIC CONTINUOUS VARIABLES

185

the same. One solution is to use a Bland-Altman chart (Bland and Altman 1986). This involves plotting, for each pair of measurements, the differences between the two scores (on the vertical axis) against the mean of the two scores (on the horizontal axis).

A pair of tramlines, called the 95 per cent limits of agreement, are drawn a distance of two sd above and below the zero difference line (where sd = standard deviations of the differences). If

HP-ABPM (DBP; mmHg)

–10

–20

–30
60	70	80	90	100	110	120

(HP+ABPM)/2 (DBP; mmHg)

The 95 % limits of agreement are drawn 2 s.d.s either side of the zero difference line (s.d. = standard deviation of the difference between the two measures). . .

... and for reasonable agreement, most of the points should lie between them.

Figure 16.1 A Bland-Altman chart to measure agreement between two metric continuous variables; diastolic blood pressure as measured by patients at home with a cuff-measuring device (HP), and as measured by the same patients using an ambulatory device (ABPM). Reproduced from Brit. J. General Practice, 48, 1585–9, courtesy of the Royal College of General Practitioners

186	CH 16 MEASURING AGREEMENT

all of the points on the graph fall between the tramlines, then agreement is ‘acceptable’, but the more points there are outside the tramlines, the less good the agreement. Moreover the spread of the points should be reasonably horizontal, indicating that differences are not increasing (or decreasing) as the values of the two variables increase.

An example from practice

The idea is illustrated in Figure 16.1, for agreement between two methods of measuring diastolic blood pressure (Brueren et al. 1998). In this example, there are only a few points outside the

± 2 standard deviation tramlines and the spread of points is broadly horizontal. We would assess this chart as suggesting reasonably good agreement between the two methods of blood pressure measurement.

The continuous horizontal line across the middle of the chart represents the mean of the differences between the two measures. Note that this is below the zero mark indicating some bias in the measures. It looks as if the ABPM values are greater on the whole than the HP values.

To sum up, two variables that are in reasonable agreement will be strongly associated, but the opposite is not necessarily true. The two measures are not equivalent; association does not measure agreement.

VIII

Getting into a Relationship

<<< < Предыдущая 19 20 21 22 23 24 25 26 27 28 29 30 31 3233 / 4333 34 35 36 37 38 39 40 41 42 43 > Следующая >>>

Соседние файлы в папке Английские материалы

#
28.03.20268.13 Mб0Mastering Corneal Collagen Cross-linking Techniques (C3-R CCL CxL)_Garg, Kanellopoulos, O'Brart, Lovisolo, Pinelli_2008.pdf
#
28.03.202613.68 Mб0Mastering theTechniques of Lens Based Refractive Surgery (Phakic IOLs)_Garg, Alio, Dementiev_2005.pdf
#
28.03.202651.38 Mб0Mechanisms of the Glaucomas_Shields, Tombran-Tink, Barnstable_2008.pdf
#
28.03.20261.67 Mб0Medical Contact Lens Practice_Millis_2005.pdf
#
28.03.20268.96 Mб0Medical Retina_Bandello, Querques_2012.pdf
#
28.03.20264.18 Mб0Medical Statistics from Scratch_Bowers_2008.pdf
#
28.03.20264.93 Mб0Medical Treatment of Glaucoma_Weinreb, Liebmann_2010.pdf
#
28.03.20266.28 Mб0Minimally Invasive Ophthalmic Surgery_Fine, Mojon_2010.djvu
#
28.03.202613.07 Mб0Minimally Invasive Ophthalmic Surgery_Fine, Mojon_2010.pdf
#
28.03.202610.66 Mб0Minimally Invasive Techniques of Oculofacial Rejuvenation_Bosniak, Cantisano-Zilkha_2005.pdf
#
28.03.202617.96 Mб0Minimizing Incisions and Maximizing Outcomes in Cataract Surgery_Alio, Fine_2010.pdf