Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
4.18 Mб
Скачать

34

CH 3 DESCRIBING DATA WITH CHARTS

bands of the APACHE II score. Quite clearly, for those less severely ill, percentage mortality among the medical emergency group is noticeably higher than among the post-operative group. For those patients classified as the most severely ill (scores of 35+), the situation is reversed.

Figure 3.6 A stacked bar chart of hair colour by sex

The stacked bar chart

Figure 3.6 shows a stacked bar chart for the same hair colour and sex data shown in Table 3.1. Instead of appearing side by side, as in the clustered bar chart of Figure 3.5, the bars are now stacked on top of each other.1 Stacked bar charts are appropriate if you want to compare the total number of subjects in each group (total number of boys and girls for example), but not so good if you want to compare category sizes between groups, e.g. redheaded girls with redheaded boys.

Exercise 3.6 Draw a stacked bar chart showing the same data as in Figure 3.6, but grouped by hair colour (i.e. hair colour on the horizontal axis).

Charting discrete metric data

We can use bar charts to graph discrete metric data in the same way as with ordinal data.2

1We could, alternatively, have used four columns for the four colour categories, with two groups per column (boys and girls). As with the clustered bar chart, the most appropriate arrangement depends on what aspects of the data you want to compare.

2In theory we should represent the discrete metric values with vertical lines and not bars, since they are ‘point’ values, but most common computer statistics packages don’t offer this facility.

CHARTING CONTINUOUS METRIC DATA

35

Number of schools (n=37) 25

20

15

10

5

 

 

0

2

3

1

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19 20 21 22

23

 

 

 

 

Number of cases per school

 

Figure 3.7 Bar chart used to represent discrete metric data on numbers of measles cases in 37 schools. Reproduced from Amer. J. Epid., 146, 881–2, courtesy of OUP

An example from practice

Figure 3.7 is an example of a bar chart used to present numbers of measles cases (discrete metric data), in 37 schools in Kentucky in a school year (Prevots et al. 1997).

Exercise 3.7 What does Figure 3.7 tell you about the distribution of measles cases in these 37 schools?

Charting continuous metric data

The histogram

A continuous metric variable can take a very large number of values, so it is usually impractical to plot them without first grouping the values. The grouped data is plotted using a frequency histogram, which has frequency plotted on the vertical axis and group size on the horizontal axis.

A histogram looks like a bar chart but without any gaps between adjacent bars. This emphasises the continuous nature of the underlying variable. If the groups in the frequency table are all of the same width, then the bars in the histogram will also all be of the same width.3 Figure 3.8 shows a histogram of the grouped birthweight data in Table 2.6.

One limitation of the histogram is that it can represent only one variable at a time (like the pie chart), and this can make comparisons between two histograms difficult, because, if you try to plot more than one histogram on the same axes, invariably parts of one chart will overlap the other.

3 But if one group is twice as wide as the others then the frequency must be halved, etc.

36

Number of infants

CH 3 DESCRIBING DATA WITH CHARTS

10

9

8

7

6

5

4

3

2

1

0

2700-2999 3000-3299 3300-3599 3600-3899 3900-4199 4200-4499

Birthweight (g)

Figure 3.8 Histogram of the grouped birthweight data in Table 2.6

Exercise 3.8 The histogram in Figure 3.9 is from the British Regional Heart Study and shows the serum potassium levels (mmol/l) of 7262 men aged 40–59 not receiving treatment for hypertension (Wannamethee et al. 1997). Comment on what the histogram reveals about serum potassium levels in this sample of 7262 British men.

Exercise 3.9 The grouped age data in Table 3.2 is from a study to identify predictive factors for suicide, and shows the age distribution by sex of 974 subjects who attempted suicide unsuccessfully, and those among them who were later successful (Nordentoft et al. 1993). Sketch separate histograms of percentage age for the male attempters and for the later succeeders. Comment on what the charts show.

 

800

 

 

 

 

 

 

 

 

 

600

 

 

 

 

 

 

 

 

of men

400

 

 

 

 

 

 

 

 

No.

 

 

 

 

 

 

 

 

 

 

200

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

< 3.5

3.7

4.0

4.3

4.6

4.9

5.2

5.5

5.8

Serum potassium (mmol/l)

Figure 3.9 Histogram of the serum potassium levels of 7262 British men aged 40–59 years. Reproduced from Amer. J. Epid., 145, 598–607, courtesy of OUP

CHARTING CUMULATIVE DATA

37

Table 3.2 Grouped age data from a follow-up cohort study to identify predictive factors for suicide. Reproduced from BMJ, 1993, 306, 1637–1641, by permission of BMJ Publishing Group

 

No (%) attempting suicide

 

No (%) later successful

 

 

 

 

 

 

 

 

Men (n = 412)

Women (n = 562)

 

Men (n = 48)

Women (n = 55)

Age (years)

 

 

 

 

 

 

 

 

 

15–24

57

(13.8)

80

(14.2)

3

(6.3)

3

(5.5)

25–34

131

(31.8)

132

(23.5)

10

(20.8)

12

(21.8)

35–44

103

(25.0)

146

(26.0)

16

(33.3)

16

(29.1)

45–54

62

(15.0)

90

(16.0)

11

(22.9)

9

(16.4)

55–64

38

(9.2)

58

(10.3)

4

(8.3)

4

(7.3)

65–74

18

(4.4)

43

(7.7)

3

(6.3)

8

(14.5)

75–84

1

(0.2)

11

(2.0)

0

 

2

(3.6)

>85

2

(0.5)

2

(0.4)

1

(2.1)

1

(1.8)

Living alone

96

(23.3)

85

(15.1)

17

(35.4)

14

(25.5)

Employed

139

(33.7)

185

(32.9)

14

(29.2)

13

(23.6)

 

 

 

 

 

 

 

 

 

 

Charting cumulative data

The step chart

You can chart cumulative ordinal data or cumulative discrete metric data (data for both types of variables are integers) with a step chart. In a step chart the total height of each step above the horizontal axis represents the cumulative frequency, up to and including that category or value. The height of each individual step is the frequency of the corresponding category or value.

An example from practice

Figure 3.10 is a step chart of the cumulative rate of suicide (number per 1000 of the population), in 152 Swedish municipalities, taken from a study into the use of calcium channel blockers

Suicide rate per 1000 population

10

8

6

4

2

0

0

1

2

3

4

5

6

7

Follow up (years)

Figure 3.10 A step chart of the cumulative rate of suicide (number per 1000 of the population) in 152 Swedish municipalities. 617 users (continuous line) and 2780 non-users (dotted line). Reproduced from BMJ, 316, 741–5, courtesy of BMJ Publishing Group

38

CH 3 DESCRIBING DATA WITH CHARTS

(prescribed for hypertension) and the risk of suicide (Lindberg et al. 1998). So for example, in year 4 the suicide rate per 1000 of the population was (7 5.2) = 1.8 (the approximate height of the step). And over the course of the first four years, the suicide rate had risen to seven per thousand. You can produce step charts for numeric ordinal data, such as cumulative Apgar scores in exactly the same way, although not, as far as I am aware, with Word or Excel, or with SPSS or Minitab.

Table 3.3 Cumulative and relative cumulative frequency for the grouped birthweight from the data in Table 2.6

Birthweight

No of infants

Cumulative

% cumulative

(g)

(frequency)

frequency

frequency

 

 

 

 

2700–2999

2

2

6.67

3000–3299

3

5

16.67

3300–3599

9

14

46.67

3600–3899

9

23

76.67

3900–4199

4

27

90.00

4200–4499

3

30

100.00

 

 

 

 

Exercise 3.10 Draw a step chart for the percentage cumulative Apgar scores in Table 3.3.

The cumulative frequency curve or ogive

With continuous metric data, there is assumed to be a smooth continuum of values, so you can chart cumulative frequency with a correspondingly smooth curve, known as a cumulative frequency curve, or ogive.4 If you add columns for cumulative and relative cumulative frequency to the grouped birthweight data in Table 2.6, you get Table 3.3.

If you want to draw an ogive by hand, you plot, for each group or class, the group cumulative frequency value against the lower limit of the next higher group. So, for example, 16.67 is plotted against 3300, 46.67 against 3600, and so on. The points should be joined with a smooth curve.5 The result is shown in Figure 3.11. Notice that I have put a percentage cumulative frequency of zero in the imaginary group 2400–2699 g. This enables me to close the ogive at the left-hand end.

The ogive can be very useful if you want to estimate the cumulative frequency for any value on the horizontal axis, which is not one of the original group values. For example, suppose you want to know what percentage of infants had a birthweight of 3650g or less. By drawing a line vertically upwards from a value of 3750 g on the horizontal axis to the ogive, and then horizontally to the vertical axis, you can see that about 63 per cent of the infants weighed 3750 g or less. You can of course ask such questions in reverse, for example, what birthweight marks the lowest 50 per cent of birthweights? This time you would start with a value of 50 per cent

4The ‘g’ in ogive is pronounced as the j in ‘jive’.

5Unfortunately, I couldn’t find a program that would allow me to join the points with a smooth curve.

CHARTING CUMULATIVE DATA

39

 

100

 

 

 

 

 

 

 

90

 

 

 

 

 

 

frequency

80

 

 

 

 

 

 

70

 

 

 

 

 

 

60

 

 

 

 

 

 

 

 

 

 

 

 

 

cumulative

50

 

 

 

 

 

 

40

 

 

 

 

 

 

30

 

 

 

 

 

 

%

20

 

 

 

 

 

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

2700

3000

3300

3600

3900

4200

4500

Birthweight (g)

Figure 3.11 The relative cumulative frequency curve (or ogive) for the percentage cumulative birthweight data in Table 3.3

on the vertical axis, move right to the ogive, then down to the value of about 3700 g on the horizontal axis.

An example from practice

Figure 3.12 shows two per cent ogives for total cholesterol concentration in two groups taken from a study into the effectiveness of health checks conducted by nurses in primary care (Imperial Cancer Fund OXCHECK Study Group 1995)

 

100

 

 

 

 

 

 

 

 

 

90

 

 

 

 

 

 

 

 

 

80

 

 

 

 

 

 

 

 

(%)

70

 

 

 

 

 

 

 

 

Frequency

 

 

 

 

 

 

 

 

60

 

 

 

 

 

 

 

 

50

 

 

 

 

 

 

 

 

Cumulative

40

 

 

 

 

 

 

 

 

30

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

20

 

 

 

 

 

 

Control

 

 

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

Intervention

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

3

4

5

6

7

8

9

10

11

Total cholesterol (mmol/l)

Figure 3.12 Percentage cumulative frequency curves for total cholesterol concentration in two groups. Reproduced from BMJ, 310, 1099–104, courtesy of BMJ

40

CH 3 DESCRIBING DATA WITH CHARTS

Exercise 3.11 (a) Comment on what Figure 3.12 reveals about the cholesterol levels in the two groups. (b) Sketch percentage cumulative frequency curves for the age of the male suicide attempters and later succeeders, shown in Table 3.2. For each of the two groups, half of the subjects are older than what age?

Charting time-based data – the time series chart

If the data you have collected are from measurements made at regular intervals of time (minutes, weeks, years, etc.), you can present the data with a time series chart. Usually these charts are used with metric data, but may also be appropriate for ordinal data. Time is always plotted on the horizontal axis, and data values on the vertical axis.

 

160

 

 

140

M

 

120

 

million

100

 

80

 

per

 

 

 

Rate

60

 

 

 

 

40

F

 

20

 

 

0

 

 

1974

1999

 

 

Year

Figure 3.13 Suicide rates for males and females aged 15–29 years in England and Wales

 

 

CHARTING CUMULATIVE DATA

 

41

 

Table 3.4 Choosing an appropriate chart

 

 

 

 

 

 

 

 

 

 

Histogram

 

 

Data type

Pie chart

Bar chart

(if grouped)

Step chart

Ogive

 

 

 

 

 

 

Nominal

yes

yes

no

no

no

Ordinal

no

yes

no

yes (cumulative)

no

Metric discrete

no

yes

yes

yes (cumulative)

yes (cumulative)

Metric continuous

no

no

yes

no

yes (cumulative)

 

 

 

 

 

 

An example from practice

Figure 3.13 shows the suicide rates (number of suicides per one million of population), for males and females aged 15–29 years in England and Wales, between 1974 and 1999. The contrasting patterns in the male/female rates are noticeable, more perhaps in this chart form than if shown in a table.

There is one other useful chart, the boxplot, but that will have to wait until we meet some new ideas in the next two chapters. Meanwhile Table 3.4 may help you to decide on the most appropriate chart for any given set of data.