Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
4.18 Mб
Скачать

1

First things first – the nature of data

Learning objectives

When you have finished this chapter, you should be able to:

Explain the difference between nominal, ordinal, and metric discrete and metric continuous variables.

Identify the type of a variable.

Explain the non-numeric nature of ordinal data.

Variables and data

A variable is something whose value can vary. For example, age, sex and blood type are variables. Data are the values you get when you measure1 a variable. For example, 32 years (for the variable age), or female (for the variable sex). I have illustrated the idea in Table 1.1.

1I am using ‘measure’ in the broadest sense here. We wouldn’t measure the sex or the ethnicity of someone, for example. We would instead usually observe it or ask the person or get the value from a questionnaire. But we would measure their height or their blood pressure. More on this shortly.

Medical Statistics from Scratch, Second Edition David Bowers

C 2008 John Wiley & Sons, Ltd

4

 

 

CH 1 FIRST THINGS FIRST – THE NATURE OF DATA

 

 

 

 

 

Table 1.1 Variables and data

 

 

 

 

 

 

 

 

 

 

 

 

The variables ...

... and the data.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mrs Brown

Mr Patel

 

Ms Manda

 

 

 

 

 

 

 

 

 

 

 

 

Age

 

32

24

 

20

 

 

 

Sex

 

Female

Male

 

Female

 

 

 

Blood type

 

O

O

 

A

 

The good, the bad, and the ugly – types of variable

There are two major types of variable – categorical variables and metric2 variables. Each of these can be further divided into two sub-types, as shown in Figure 1.1, which also summarises their main characteristics.

Categorical variables

Metric variables

Nominal

Ordinal

Discrete

Continuous

Values in

Values in

Integer values

Continuous values

arbitrary

ordered

on proper numeric

on proper numeric

categories

categories

line or scale

line or scale

(no units)

(no units)

(counted units)

(measured units)

 

 

 

 

Figure 1.1 Types of variable

Categorical variables

Nominal categorical variables

Consider the variable blood type. Let’s assume for simplicity that there are only four different blood types: O, A, B, and A/B. Suppose we have a group of 100 patients. We can first determine the blood type of each and then allocate the result to one of the four blood type categories. We might end up with a table like Table 1.2.

2You will also see metric data referred to as interval/ratio data. The computer package SPSS uses the term ‘scale’ data.

CATEGORICAL VARIABLES

5

Table 1.2 Blood types of 100 patients (fictitious data)

 

Number of patients

Blood type

(or frequency)

 

 

O

65

A

15

B

12

A/B

8

 

 

By the way, a table like Table 1.2 is called a frequency table, or a contingency table. It shows how the number, or frequency, of the different blood types is distributed across the four categories. So 65 patients have a blood type O, 15 blood type A, and so on. We’ll look at frequency tables in more detail in the next chapter.

The variable ‘blood type’ is a nominal categorical variable. Notice two things about this variable, which is typical of all nominal variables:

The data do not have any units of measurement.3

The ordering of the categories is completely arbitrary. In other words, the categories cannot be ordered in any meaningful way.4

In other words we could just as easily write the blood type categories as A/B, B, O, A or B, O, A, A/B, or B, A, A/B, O, or whatever. We can’t say that being in any particular category is better, or shorter, or quicker, or longer, than being in any other category.

Exercise 1.1 Suggest a few other nominal variables.

Ordinal categorical variables

Let’s now consider another variable some of you may be familiar with – the Glasgow Coma Scale, or GCS for short. As the name suggests, this scale measures the degree of brain injury following head trauma. A patient’s Glasgow Coma Scale score is judged by their responsiveness, as observed by a clinician, in three areas: eye opening response, verbal response and motor response. The GCS score can vary from 3 (death or severe injury) to 15 (mild or no injury). In other words, there are 13 possible values or categories of brain injury.

Imagine that we determine the Glasgow Coma Scale scores of the last 90 patients admitted to an Emergency Department with head trauma, and we allocate the score of each patient to one of the 13 categories. The results might look like the frequency table shown in Table 1.3.

3For example, cm, or seconds, or ccs, or kg, etc.

4We are excluding trivial arrangements such as alphabetic.

6

CH 1 FIRST THINGS FIRST – THE NATURE OF DATA

 

 

Table 1.3 A frequency table showing

 

 

the (hypothetical) distribution of 90

 

 

Glasgow Coma Scale scores

 

 

 

 

 

 

 

 

 

Glasgow Coma

Number of

 

 

Scale score

patients

 

 

 

 

 

 

3

8

 

 

4

1

 

 

5

6

 

 

6

5

 

 

7

5

 

 

8

7

 

 

9

6

 

 

10

8

 

 

11

8

 

 

12

10

 

 

13

12

 

 

14

9

 

 

15

5

 

 

 

 

 

 

The Glasgow Coma Scale is an ordinal categorical variable. Notice two things about this variable, which is typical of all ordinal variables:

The data do not have any units of measurement (so the same as for nominal variables).

The ordering of the categories is not arbitrary as it was with nominal variables. It is now possible to order the categories in a meaningful way.

In other words, we can say that a patient in the category ‘15’ has less brain injury than a patient in category ‘14’. Similarly, a patient in the category ‘14’ has less brain injury than a patient in category ‘13’, and so on.

However, there is one additional and very important feature of these scores, (or any other set of ordinal scores). Namely, the difference between any pair of adjacent scores is not necessarily the same as the difference between any other pair of adjacent scores.

For example, the difference in the degree of brain injury between Glasgow Coma Scale scores of 5 and 6, and scores of 6 and 7, is not necessarily the same. Nor can we say that a patient with a score of say 6 has exactly twice the degree of brain injury as a patient with a score of 12. The direct consequence of this is that ordinal data therefore are not real numbers. They cannot be placed on the number line.5 The reason is, of course, that the Glasgow Coma Scale data, and

5The number line can be visualised as a horizontal line stretching from minus infinity on the left to plus infinity on the right. Any real number, whether negative or positive, decimal or integer (whole number), can be placed somewhere on this line.