Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
4.18 Mб
Скачать

4

Describing data from its shape

Learning objectives

When you have finished this chapter you should be able to:

Explain what is meant by the ‘shape’ of a frequency distribution.

Sketch and explain: negatively skewed, symmetric and positively skewed distributions. Sketch and explain a bimodal distribution.

Describe the approximate shape of a frequency distribution from a frequency table or chart.

Sketch and describe a Normal distribution.

The shape of things to come

I have said previously that the choice of the most appropriate procedures for summarising and analysing data will depend on the type of variable involved. Variable type is the most important consideration. In addition, however, the way the data are distributed – the shape of the distribution, can also be influential. By ‘shape’ I mean:

Are the values fairly evenly spread throughout their possible range? This is a uniform distribution.

Medical Statistics from Scratch, Second Edition David Bowers

C 2008 John Wiley & Sons, Ltd

44

CH 4 DESCRIBING DATA FROM ITS SHAPE

Are most of the values concentrated towards the bottom of the range, with progressively fewer values towards the top of the range? This is a right or positively skewed distribution. . .

. . . or towards the top of the range, with progressively fewer values towards the bottom of the range? This is a left or negatively skewed distribution.

Do most of the values clump together around one particular value, with progressively fewer values both below and above this value? This is a symmetric or mound-shaped distribution.

Do most of the values clump around two or more particular values? This is a bimodal or multimodal distribution.

One simple way to assess the shape of a frequency distribution is to plot a bar chart, or a histogram. Here are some examples of the shapes described above.

Negative skew1

Figure 4.1 shows age distribution of 2454 patients with acute pulmonary embolism and is drawn from 52 hospitals in seven countries (Goldhaber et al. 1999). You can see that most values lie towards the top end of the range, with progressively fewer lower values. This distribution is negatively skewed.

Exercise 4.1 In Figure 4.1, which age group has: (a) the highest number of patients? (b) the lowest number?

Positive skew

The histogram in Figure 4.2 shows serum E2 levels from a study of hormone replacement therapy for osteoporosis prevention (Rodgers and Miller 1999). This distribution has most of its values in the lower end of the range with progressively fewer towards the upper end. There is a single high valued outlier. This distribution is positively skewed.

Exercise 4.2 In Figure 4.2, if the outlier was removed, would the distribution be less or more skewed?

1Skewness is the primary measure used to describe the asymmetry of frequency distributions, and many computer programs will calculate a skewness coefficient for you. This can vary between –1 (strong negative skew), and +1 (strong positive skew). Values of zero or close to it, indicate lower levels of skew, but do not necessarily mean that the distribution is symmetric.

Number of patients

 

THE SHAPE OF THINGS TO COME

45

600

25%

 

24%

 

 

 

550

 

 

500

 

 

450

 

 

400

 

 

350

14%

14%

300

11%

 

250

 

 

 

200

7%

 

 

 

150

5%

 

 

 

100

 

 

50

 

 

0.2%

 

 

0

15–29 30–39 40–49 50–59 60–69 70–79

>80

<15

Age (years)

Figure 4.1 An example of negative skew. The age distribution of 2454 patients with acute pulmonary embolism. Reproduced with permission from Elsevier (The Lancet, 1999, Vol No. 353, pp. 1386–9)

12

No. of patients

10

 

 

 

 

 

 

 

8

 

 

 

 

 

 

 

6

 

 

 

 

 

 

 

4

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

0

 

 

 

350

400

450

500

100

150 200

250

300

 

Serum E2 (pmol/I)

 

 

 

Figure 4.2 An example of positive skew. Serum E2 levels in 45 patients in a study of HRT for the prevention of osteoporosis. Reproduced with permission of the British Journal of General Practice (1997, Vol. 47, pages 161–165)

46

CH 4 DESCRIBING DATA FROM ITS SHAPE

 

250

 

 

200

 

women

150

 

per 1000

 

100

 

Rates

 

 

 

 

50

 

 

0

90+

 

30–34 35–39 40–44 45–49 50–54 55–59 60–64 65–69 70–74 75–79 80–84 85–89

Age group

Figure 4.3 Histogram of mammography utilisations rate (per 1000 women), by broad age group, in 33 health districts in Ontario. Reproduced from J. Epid. Comm. Health, 51, 378–82, courtesy of BMJ Publishing Group

Symmetric or mound-shaped distributions

The bar chart in Figure 4.3 is from a study into the use of the mammography service by women in the 33 health districts of Ontario, from mid-1990 to end-1991 (Goel et al. 1997). It shows the variation in the utilisation rates2 by women for a number of age groups. You can see that the distribution is reasonably symmetric and mound shaped, and has only one peak.

Exercise 4.3 (a) What sort of skew is exhibited by the Apache scores in Figure 3.5? (b) The simple bar chart in Figure 4.4 is from a study describing the development of a new scale to measure psychiatric anxiety, called the Psychiatric Symptom Frequency scale (PSF) (Lindelow et al.), Describe the shape of the distribution of PSF in terms of symmetry, skewness, etc. Does this chart tell the whole story?

Exercise 4.4 Comment on the shapes of the age distributions shown in Table 3.2, for male and female suicide attempters, and later succeeders (you may also want to look at the histograms you drew in Exercise 3.9).

2 The utilisation rate is the number of consultations per 1000 women.

Frequency

THE SHAPE OF THINGS TO COME

47

300

250

200

150

100

 

 

 

 

 

 

 

 

50

 

 

 

 

 

 

 

 

0

0

5

10

15

20

25

30

35

 

 

 

 

 

PSF score

 

 

 

Figure 4.4 Simple bar chart showing the lowest 95 per cent of values of the Psychiatric Symptom Frequency scale. Reproduced from J. Epid. Comm. Health, 51, 549–57, courtesy of BMJ Publishing Group

Bimodal distributions

A bimodal distribution is one with two distinct humps. These are less common than the shapes described above, and are sometimes the result of two separate distributions, which have not been disentangled. Figure 4.5 shows a hypothetical bimodal distribution of systolic blood pressure. The upper peak could be due to a sub-group of hypertensive patients, but whose presence in the group has not been separately identified.

Frequency (no. women)

10

8

6

4

2

0

110–119

120–129

130–139

140–149

150–159

Systolic blood pressure (mmHg)

Figure 4.5 A bimodal frequency distribution

48

CH 4 DESCRIBING DATA FROM ITS SHAPE

Normal-ness

There is one particular symmetric bell-shaped distribution, known as the Normal distribution, which has a special place in the heart of statisticians.3 Many human clinical features are distributed Normally, and the Normal distribution has a very important role to play in what is to come later in this book.

An example from practice

Figure 4.6 shows a histogram for the distribution of the cord platelet count (109/l), in 4382 Finnish infants, from a study of the prevalence and causes of thrombocytopenia4 in full-term infants (Sainio et al. 2000). You can see, even without the help of the Normal curve superimposed upon it, that the distribution has a very regular bell-shaped symmetric distribution – in fact is pretty well as Normal as it gets with real data.

Although the Normal distribution is one of the most important in a health context, you may also encounter the binomial and Poisson distributions. As an example of the former, suppose you need to choose a sample of 20 patients from a very large list of patients, which contains equal numbers of males and females. The chance of choosing a male patient is thus 1 in 2. Provided that the probability of picking a male patient each time remains fixed at 1 in 2, the binomial equation will tell you the probability of getting any given number of males (or females), in your 20 selected patients. For example, the probability of getting eight males in a sample of 20 patients is 0.1201 – about 12 chances in a 100.

3Note the capitalised, ‘N’, to distinguish this statistical usage from that of the word ‘normal’ meaning usual, ordinary, etc.

4Thrombocytopenia is deemed to exist when the cord platelet count is less than 150 × 109/l. It is a risk factor for intraventricular haemorrhage and contributes to the high neurological morbidity in infants affected.

THE SHAPE OF THINGS TO COME

49

800

N = 4382

600

 

 

 

400

 

 

 

200

 

 

 

0

 

 

 

25

75

125 175 225 275 325

375 425 475

 

 

Cord-platelet count × 109/L

Mean = 308 × 109/L

SD = 69 × 109/L

525 575 625 675 725

Figure 4.6 A Normal frequency curve superimposed on a histogram of cord platelet count (109/l) in 4382 infants. Reproduced from Obstetrics and Gynecology, 95, 441–4, courtesy of Lippincott Williams Wilkins

The Poisson distribution is appropriate for calculating chance or probability when events occur in a seemingly random and unpredictable fashion. It describes the probability of a given number of events occurring in a fixed period of time. For example, suppose that the average number of children with burns arriving at an Emergency Department in any given 24-hour period is 12. Then the Poisson equation indicates that the probability of one child with burns arriving in the next hour is 30 in 100, the probability of two is about 7 in a 100.

To sum up so far. You have seen that you can describe the principal features of a set of data using tables and charts. A description of the shape of the distribution is also an important part of the picture. In the next chapter you will meet a way of describing data using numeric summary values.