- •Contents
- •Preface to the 2nd Edition
- •Preface to the 1st Edition
- •Introduction
- •Learning Objectives
- •Variables and Data
- •The good, the Bad, and the Ugly – Types of Variable
- •Categorical Variables
- •Metric Variables
- •How can I Tell what Type of Variable I am Dealing with?
- •2 Describing Data with Tables
- •Learning Objectives
- •What is Descriptive Statistics?
- •The Frequency Table
- •3 Describing Data with Charts
- •Learning Objectives
- •Picture it!
- •Charting Nominal and Ordinal Data
- •Charting Discrete Metric Data
- •Charting Continuous Metric Data
- •Charting Cumulative Data
- •4 Describing Data from its Shape
- •Learning Objectives
- •The Shape of Things to Come
- •5 Describing Data with Numeric Summary Values
- •Learning Objectives
- •Numbers R us
- •Summary Measures of Location
- •Summary Measures of Spread
- •Standard Deviation and the Normal Distribution
- •Learning Objectives
- •Hey ho! Hey ho! It’s Off to Work we Go
- •Collecting the Data – Types of Sample
- •Types of Study
- •Confounding
- •Matching
- •Comparing Cohort and Case-Control Designs
- •Getting Stuck in – Experimental Studies
- •7 From Samples to Populations – Making Inferences
- •Learning Objectives
- •Statistical Inference
- •8 Probability, Risk and Odds
- •Learning Objectives
- •Calculating Probability
- •Probability and the Normal Distribution
- •Risk
- •Odds
- •Why you can’t Calculate Risk in a Case-Control Study
- •The Link between Probability and Odds
- •The Risk Ratio
- •The Odds Ratio
- •Number Needed to Treat (NNT)
- •Learning Objectives
- •Estimating a Confidence Interval for the Median of a Single Population
- •10 Estimating the Difference between Two Population Parameters
- •Learning Objectives
- •What’s the Difference?
- •Estimating the Difference between the Means of Two Independent Populations – Using a Method Based on the Two-Sample t Test
- •Estimating the Difference between Two Matched Population Means – Using a Method Based on the Matched-Pairs t Test
- •Estimating the Difference between Two Independent Population Proportions
- •Estimating the Difference between Two Independent Population Medians – The Mann–Whitney Rank-Sums Method
- •Estimating the Difference between Two Matched Population Medians – Wilcoxon Signed-Ranks Method
- •11 Estimating the Ratio of Two Population Parameters
- •Learning Objectives
- •12 Testing Hypotheses about the Difference between Two Population Parameters
- •Learning Objectives
- •The Research Question and the Hypothesis Test
- •A Brief Summary of a Few of the Commonest Tests
- •Some Examples of Hypothesis Tests from Practice
- •Confidence Intervals Versus Hypothesis Testing
- •Nobody’s Perfect – Types of Error
- •The Power of a Test
- •Maximising Power – Calculating Sample Size
- •Rules of Thumb
- •13 Testing Hypotheses About the Ratio of Two Population Parameters
- •Learning Objectives
- •Testing the Risk Ratio
- •Testing the Odds Ratio
- •Learning Objectives
- •15 Measuring the Association between Two Variables
- •Learning Objectives
- •Association
- •The Correlation Coefficient
- •16 Measuring Agreement
- •Learning Objectives
- •To Agree or not Agree: That is the Question
- •Cohen’s Kappa
- •Measuring Agreement with Ordinal Data – Weighted Kappa
- •Measuring the Agreement between Two Metric Continuous Variables
- •17 Straight Line Models: Linear Regression
- •Learning Objectives
- •Health Warning!
- •Relationship and Association
- •The Linear Regression Model
- •Model Building and Variable Selection
- •18 Curvy Models: Logistic Regression
- •Learning Objectives
- •A Second Health Warning!
- •Binary Dependent Variables
- •The Logistic Regression Model
- •19 Measuring Survival
- •Learning Objectives
- •Introduction
- •Calculating Survival Probabilities and the Proportion Surviving: the Kaplan-Meier Table
- •The Kaplan-Meier Chart
- •Determining Median Survival Time
- •Comparing Survival with Two Groups
- •20 Systematic Review and Meta-Analysis
- •Learning Objectives
- •Introduction
- •Systematic Review
- •Publication and other Biases
- •The Funnel Plot
- •Combining the Studies
- •Solutions to Exercises
- •References
- •Index
3
Describing data with charts
Learning objectives
When you have finished this chapter you should be able to:
Choose the most appropriate chart for a given data type.
Draw pie charts; and simple, clustered and stacked, bar charts.
Draw histograms.
Draw step charts and ogives.
Draw time series charts.
Interpret and explain what a chart reveals.
Picture it!
In terms of describing data, of seeing ‘what’s going on’, an appropriate chart is almost always a good idea. What ‘appropriate’ means depends primarily on the type of data, as well as on what particular features of it you want to explore. In addition, if you are writing a report, a chart will always give you an ‘impact’ factor. Finally, a chart can often be used to illustrate or explain a complex situation for which a form of words or a table might be clumsy, lengthy or otherwise
Medical Statistics from Scratch, Second Edition David Bowers
C 2008 John Wiley & Sons, Ltd
30 |
CH 3 DESCRIBING DATA WITH CHARTS |
Figure 3.1 Pie chart: children receiving Malathion in nit lotion study, percentage by hair colour. Data in Table 2.1
inadequate. In this chapter I am going to examine some of the commonest charts available for describing data, and indicate which charts are appropriate for each type of data.
Charting nominal and ordinal data
The pie chart
You will all know what a pie chart is, so just a few comments here. Each segment (slice) of a pie chart should be proportional to the frequency of the category it represents. For example, Figure 3.1 is a pie chart of hair colour for the children receiving Malathion in the nit lotion study in Table 2.1. I have chosen to display the percentage values, which are often more helpful. A disadvantage of a pie chart is that it can only represent one variable (in Figure 3.1, hair colour). You will therefore need a separate pie chart for each variable you want to chart. Moreover a pie chart can lose clarity if it is used to represent more than four or five categories.
Exercise 3.1 The two pie charts in Figure 3.2 are from a study to investigate the types of stroke in patients with asymptotic internal-carotid-artery stenosis (Inzitari et al. 2000). They show the types (in percentages) of disabling and non-disabling ipsilateral strokes, among two categories of patients: those with < 60 per cent stenosis, and those with 60–99 per cent stenosis. What is the most common type of stroke in each of the two categories of stenosis? What is the second most common type?
Exercise 3.2 Sketch a pie chart for the patient satisfaction data in Table 2.4.
CHARTING NOMINAL AND ORDINAL DATA |
31 |
|||
<60% Stenosis |
60–99% Stenosis |
|
||
19.2% |
3.3% |
27.5% |
5.0% |
|
|
|
10.0% |
|
|
5.9% |
12.6% |
|
||
|
|
|||
|
|
|
|
|
16.6% |
|
5.0% |
|
|
|
|
|
|
|
|
|
|
27.5% |
|
|
42.4% |
25.5% |
|
|
Disabling cardioembolic |
Disabling Iacunar |
Disabling Iarge-artery |
|
|
Nondisabling cardioembolic |
Nondisabling Iacunar |
Nondisabling Iarge-artery |
|
|
Figure 3.2 Pie charts showing the types (by percentages) of disabling and non-disabling ipsilateral strokes, among two categories of patients, those with < 60 per cent stenosis, and those with 60– 99 per cent stenosis. Reproduced from NEJM, 342, 1693–9, by permission of New England Journal of Medicine
The simple bar chart
An alternative to the pie chart for nominal data is the bar chart. This is a chart with frequency on the vertical axis and category on the horizontal axis. The simple bar chart is appropriate if only one variable is to be shown. Figure 3.3 is a simple bar chart of hair colour for the group of children receiving Malathion in the nit lotion study. Note that the bars should all be the same width, and there should be (equal) spaces between bars. These spaces emphasise the categorical nature of the data.
Figure 3.3 Simple bar chart of hair colour of children receiving Malathion in nit lotion study (data in Table 2.1)
32 |
CH 3 DESCRIBING DATA WITH CHARTS |
Exercise 3.3 Use the data in Table 1.8 to sketch a simple bar chart, showing the hair colour of the children receiving d-phenothrin.
Exercise 3.4 Draw a simple bar chart for the patient satisfaction data in Table 2.4. In Exercise 3.2, you drew a pie chart for this data. Which chart do you think works best? Why?
The clustered bar chart
If you have more than one group you can use the clustered bar chart. Suppose you also know the sex of the children receiving Malathion in the above example. This gives us two sub-groups, boys and girls, with the data shown in Table 3.1.
There are two ways of presenting a clustered bar chart. Figure 3.4 shows one possibility, with hair colour categories on the horizontal axis. This arrangement is helpful if you want to compare the relative sizes of the groups within each category (e.g. redheaded boys versus redheaded girls).
Table 3.1 Frequency distribution of hair colour by sex of Malathion children in nit lotion study
|
|
Frequency |
|
|
|
Hair colour |
Boys |
Girls |
|
|
|
Blonde |
4 |
11 |
Brown |
29 |
20 |
Red |
1 |
3 |
Dark |
14 |
13 |
|
|
|
Figure 3.4 Clustered bar chart of hair colour by sex for children in Table 3.1
CHARTING NOMINAL AND ORDINAL DATA |
33 |
Alternatively, the chart could have been drawn with the categories boys and girls, on the horizontal axis. This format would be more useful if you wanted to compare category sizes within each group. For example, red haired girls compared to dark haired girls. Which chart is more appropriate depends on what aspect of the data you want to examine.
Exercise 3.5 Use the data in Table 3.1 to sketch a clustered percentage bar chart showing the hair colour of children receiving Malathion and d-phenothrin. There are two possible formats. Explain why you chose the one you did.
An example from practice
The clustered bar chart in Figure 3.5 is from a study describing the development of the APACHE II scale, used to assess risk of death, and used mainly in ICUs (Knaus et al. 1985). APACHE II has a range of 0 (least risk of death) to 71 (greatest risk). Data was available on two groups of patients, one group admitted to ICU for medical emergencies, the second admitted directly to ICU following surgery. The bar chart shows the percentage death rate (vertical axis), against
APACHE II AND HOSPITAL DEATH
Noroperative and Postoperative Patients
|
100.0% |
|
|
|
|
|
|
|
|
90.0% |
|
|
|
|
|
|
|
|
80.0% |
|
|
|
|
|
|
|
|
70.0% |
|
|
|
|
|
|
|
Rate |
60.0% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Death |
50.0% |
|
|
|
|
|
|
|
40.0% |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
30.0% |
|
|
|
|
|
|
|
|
20.0% |
|
|
|
|
|
|
|
|
10.0% |
|
|
|
|
|
|
|
|
0.0% |
|
|
|
|
|
|
|
|
0–4 |
5–9 |
10–14 |
15–19 |
20–24 |
25–29 |
30–34 |
35+ |
Apache II Score
|
Nonoperative |
|
Postoperative |
Figure 3.5 Clustered bar chart of APACHE II scores. Data on two groups of patients, one group admitted to ICU for medical emergencies, the second admitted directly to ICU following surgery. The vertical axis is death rate (per cent). Reproduced from Critical Care Medicine, 13, 818–29, courtesy of Lippincott Williams Wilkins
