- •Contents
- •Preface to the 2nd Edition
- •Preface to the 1st Edition
- •Introduction
- •Learning Objectives
- •Variables and Data
- •The good, the Bad, and the Ugly – Types of Variable
- •Categorical Variables
- •Metric Variables
- •How can I Tell what Type of Variable I am Dealing with?
- •2 Describing Data with Tables
- •Learning Objectives
- •What is Descriptive Statistics?
- •The Frequency Table
- •3 Describing Data with Charts
- •Learning Objectives
- •Picture it!
- •Charting Nominal and Ordinal Data
- •Charting Discrete Metric Data
- •Charting Continuous Metric Data
- •Charting Cumulative Data
- •4 Describing Data from its Shape
- •Learning Objectives
- •The Shape of Things to Come
- •5 Describing Data with Numeric Summary Values
- •Learning Objectives
- •Numbers R us
- •Summary Measures of Location
- •Summary Measures of Spread
- •Standard Deviation and the Normal Distribution
- •Learning Objectives
- •Hey ho! Hey ho! It’s Off to Work we Go
- •Collecting the Data – Types of Sample
- •Types of Study
- •Confounding
- •Matching
- •Comparing Cohort and Case-Control Designs
- •Getting Stuck in – Experimental Studies
- •7 From Samples to Populations – Making Inferences
- •Learning Objectives
- •Statistical Inference
- •8 Probability, Risk and Odds
- •Learning Objectives
- •Calculating Probability
- •Probability and the Normal Distribution
- •Risk
- •Odds
- •Why you can’t Calculate Risk in a Case-Control Study
- •The Link between Probability and Odds
- •The Risk Ratio
- •The Odds Ratio
- •Number Needed to Treat (NNT)
- •Learning Objectives
- •Estimating a Confidence Interval for the Median of a Single Population
- •10 Estimating the Difference between Two Population Parameters
- •Learning Objectives
- •What’s the Difference?
- •Estimating the Difference between the Means of Two Independent Populations – Using a Method Based on the Two-Sample t Test
- •Estimating the Difference between Two Matched Population Means – Using a Method Based on the Matched-Pairs t Test
- •Estimating the Difference between Two Independent Population Proportions
- •Estimating the Difference between Two Independent Population Medians – The Mann–Whitney Rank-Sums Method
- •Estimating the Difference between Two Matched Population Medians – Wilcoxon Signed-Ranks Method
- •11 Estimating the Ratio of Two Population Parameters
- •Learning Objectives
- •12 Testing Hypotheses about the Difference between Two Population Parameters
- •Learning Objectives
- •The Research Question and the Hypothesis Test
- •A Brief Summary of a Few of the Commonest Tests
- •Some Examples of Hypothesis Tests from Practice
- •Confidence Intervals Versus Hypothesis Testing
- •Nobody’s Perfect – Types of Error
- •The Power of a Test
- •Maximising Power – Calculating Sample Size
- •Rules of Thumb
- •13 Testing Hypotheses About the Ratio of Two Population Parameters
- •Learning Objectives
- •Testing the Risk Ratio
- •Testing the Odds Ratio
- •Learning Objectives
- •15 Measuring the Association between Two Variables
- •Learning Objectives
- •Association
- •The Correlation Coefficient
- •16 Measuring Agreement
- •Learning Objectives
- •To Agree or not Agree: That is the Question
- •Cohen’s Kappa
- •Measuring Agreement with Ordinal Data – Weighted Kappa
- •Measuring the Agreement between Two Metric Continuous Variables
- •17 Straight Line Models: Linear Regression
- •Learning Objectives
- •Health Warning!
- •Relationship and Association
- •The Linear Regression Model
- •Model Building and Variable Selection
- •18 Curvy Models: Logistic Regression
- •Learning Objectives
- •A Second Health Warning!
- •Binary Dependent Variables
- •The Logistic Regression Model
- •19 Measuring Survival
- •Learning Objectives
- •Introduction
- •Calculating Survival Probabilities and the Proportion Surviving: the Kaplan-Meier Table
- •The Kaplan-Meier Chart
- •Determining Median Survival Time
- •Comparing Survival with Two Groups
- •20 Systematic Review and Meta-Analysis
- •Learning Objectives
- •Introduction
- •Systematic Review
- •Publication and other Biases
- •The Funnel Plot
- •Combining the Studies
- •Solutions to Exercises
- •References
- •Index
6
Doing it right first time – designing a study
Learning objectives
When you have finished this chapter you should be able to:
Explain what a sample is, and what the difference between study and target populations is.
Explain why it is important for a sample to be as representative of the population from which it is taken as possible.
Define a random sample, and explain what a sampling frame is.
Briefly outline what is meant by a contact sample, and by stratified and systematic samples.
Explain the difference between observational and experimental studies.
Explain the difference between matched and independent groups.
Briefly describe case-series, cross-section, cohort and case-control studies, and their limitations and advantages.
Explain the problem of confounding.
Medical Statistics from Scratch, Second Edition David Bowers
C 2008 John Wiley & Sons, Ltd
72 |
CH 6 DOING IT RIGHT FIRST TIME – DESIGNING A STUDY |
Outline the general idea of the clinical trial.
Explain the concept of randomisation, and why it is important, and demonstrate that you can use a random number table to perform a simple block randomisation.
Describe the concept of blinding, and what it is intended to achieve.
Outline and compare the design of the parallel and cross-over randomised controlled trials, and summarise their respective advantages and shortcomings.
Explain what intention-to-treat means.
Be able to choose an appropriate study design to answer some given research question.
Hey ho! Hey ho! It’s off to work we go
There are two main threads here. First, the study design question, and second, the data collection question. Study design embraces issues like:
What is the research question? What are we hypothesising?
Which variables do we need to measure?
Which is our main outcome variable (the variable we are most interested in)?
How many subjects need to be included in the study?
Who exactly are the subjects? How should we select them?
How many groups do we need?
Are we going to make some form of clinical intervention or simply observe?
Do we need a comparison group?
At what stage are we going to take measurements? Before, during, after, etc.?
How long will the study take? And so on.
Study design is a systematic way of dealing with these issues, and offers a good-practice blueprint that is applicable in almost all research situations.
Second, the data collection question. Having decided an appropriate study design, we then have to consider the following:
How are we going collect the data from the subjects?
How do we ensure that the sample is as representative as possible?
I want to start with the data collection question. First, though, a brief mention of what we mean by a population.
HEY HO! HEY HO! IT’S OFF TO WORK WE GO
|
STUDY POPULATION |
TARGET POPULATION |
All low-birthweight |
|
|
All low-birthweight babies |
babies born in three |
born in the UK in 2007. |
maternity units in |
|
Birmingham in 2007. |
73
SAMPLE
The last 300 babies born in these three maternity units.
Figure 6.1 The target population, the study population and the sample
Samples and populations
In clinical research, we usually study a sample of individuals who are assumed to be representative of a wider group, to whom (with a good research design and appropriate sampling) the research might apply. This wider group is known as the target population, for example ‘all low-birthweight babies born in the UK in 2007’.
It would be impossible to study every single baby in such a large target population (or every member of any population). So instead, we might choose to take a sample from a (hopefully) more accessible group. For example, ‘all low-birthweight babies born in three maternity units in Birmingham in 2007’. This more restricted group is the study population. Suppose we take as our sample the last 300 babies born in these three maternity units. What we find out from this sample we hope will also be true of the study population, and ultimately of the target population. The degree to which this will be the case depends largely on the representativeness of our sample. These ideas are shown schematically in Figure 6.1. I’ll have more to say about this process in Chapter 7.
Exercise 6.1 Explain the differences between a target population, a study population and a sample. Explain, with an example, why it is almost never possible to study every member of a population.
Sampling error
Needless to say, samples are never perfect replicas of their populations, so when we draw a conclusion about a population based on a sample, there will always be what is known as sampling error. For example, if the percentage of women in the UK population with genital chlamydia is 3.50 per cent (we wouldn’t know this of course), and a sample produces a sample percentage of 2.90 per cent, then the difference between these two values, 0.60 per cent, is the sampling error. We can never completely eliminate sampling error, since this is an inherent feature of any sample.
74 |
CH 6 DOING IT RIGHT FIRST TIME – DESIGNING A STUDY |
Collecting the data – types of sample
Now the data collection question. There are many books wholly dedicated to the various methods of collecting sample data. I am going to do little more than mention a couple of these methods by name. Those interested in more details of the methods referred to should consult other readily available sources.
The simple random sample and its offspring
The most important consideration is that any sample should be representative of the population from which it is taken. For example, if your population has equal numbers of male and female babies, but your sample consists of twice as many male babies as female, then any conclusions you draw are likely to be, at least, misleading. Generally, the most representative sample is a simple random sample. The only way that a simple random sample will differ from the population will be due to chance alone.
For a sample to be truly random, every member of the population must have an equal chance of being included in the sample. Unfortunately, this is rarely possible in practice, since this would require a complete and up-to-date list (name and contact details) of, for example, every lowbirthweight baby born in the UK in 2007. Such a list is called a sampling frame. In practice, compiling an accurate sampling frame for any population is hardly ever going to be feasible!
This same problem applies also to two close relatives of simple random sampling – systematic random sampling, and stratified random sampling. In the former, some fixed fraction of the sampling frame is selected, say every 10th or every 50th member, until a sample of the required size is obtained. Provided there are no hidden patterns in the sampling frame, this method will produce samples as representative as a random sample. In stratified sampling, the sampling frame is first broken down into strata relevant to the study, for example men and women; or nonsmokers, ex-smokers and smokers. Then each separate stratum is sampled using a systematic sampling approach, and finally these strata samples are combined. But both methods require a sampling frame.
Contact or consecutive samples
The need for an accurate sampling frame makes random sampling impractical in any realistic clinical setting. One common alternative is to take as a sample, individuals in current or recent contact with the clinical services, such as consecutive attendees at a clinic. For example, in the study of stress as a risk factor for breast cancer (Table 1.6), the researchers took as their sample 332 women attending a clinic at Leeds General Infirmary for a breast lump biopsy.
Alternatively, researchers may study a group of subjects in situ, for example on a ward, or in some other setting. In the nit lotion study (Table 2.1), researchers took as their sample all infested children from a number of Parisian primary schools, based on the high rates of infestation in those same schools the previous year.
If your sample is not a random sample, then the obvious question is, ‘How representative is it of the population?’ And, moreover, which population are we talking about here? In the breast cancer study, if the researchers were confident that their sample of 332 women was
