- •Contents
- •Preface to the 2nd Edition
- •Preface to the 1st Edition
- •Introduction
- •Learning Objectives
- •Variables and Data
- •The good, the Bad, and the Ugly – Types of Variable
- •Categorical Variables
- •Metric Variables
- •How can I Tell what Type of Variable I am Dealing with?
- •2 Describing Data with Tables
- •Learning Objectives
- •What is Descriptive Statistics?
- •The Frequency Table
- •3 Describing Data with Charts
- •Learning Objectives
- •Picture it!
- •Charting Nominal and Ordinal Data
- •Charting Discrete Metric Data
- •Charting Continuous Metric Data
- •Charting Cumulative Data
- •4 Describing Data from its Shape
- •Learning Objectives
- •The Shape of Things to Come
- •5 Describing Data with Numeric Summary Values
- •Learning Objectives
- •Numbers R us
- •Summary Measures of Location
- •Summary Measures of Spread
- •Standard Deviation and the Normal Distribution
- •Learning Objectives
- •Hey ho! Hey ho! It’s Off to Work we Go
- •Collecting the Data – Types of Sample
- •Types of Study
- •Confounding
- •Matching
- •Comparing Cohort and Case-Control Designs
- •Getting Stuck in – Experimental Studies
- •7 From Samples to Populations – Making Inferences
- •Learning Objectives
- •Statistical Inference
- •8 Probability, Risk and Odds
- •Learning Objectives
- •Calculating Probability
- •Probability and the Normal Distribution
- •Risk
- •Odds
- •Why you can’t Calculate Risk in a Case-Control Study
- •The Link between Probability and Odds
- •The Risk Ratio
- •The Odds Ratio
- •Number Needed to Treat (NNT)
- •Learning Objectives
- •Estimating a Confidence Interval for the Median of a Single Population
- •10 Estimating the Difference between Two Population Parameters
- •Learning Objectives
- •What’s the Difference?
- •Estimating the Difference between the Means of Two Independent Populations – Using a Method Based on the Two-Sample t Test
- •Estimating the Difference between Two Matched Population Means – Using a Method Based on the Matched-Pairs t Test
- •Estimating the Difference between Two Independent Population Proportions
- •Estimating the Difference between Two Independent Population Medians – The Mann–Whitney Rank-Sums Method
- •Estimating the Difference between Two Matched Population Medians – Wilcoxon Signed-Ranks Method
- •11 Estimating the Ratio of Two Population Parameters
- •Learning Objectives
- •12 Testing Hypotheses about the Difference between Two Population Parameters
- •Learning Objectives
- •The Research Question and the Hypothesis Test
- •A Brief Summary of a Few of the Commonest Tests
- •Some Examples of Hypothesis Tests from Practice
- •Confidence Intervals Versus Hypothesis Testing
- •Nobody’s Perfect – Types of Error
- •The Power of a Test
- •Maximising Power – Calculating Sample Size
- •Rules of Thumb
- •13 Testing Hypotheses About the Ratio of Two Population Parameters
- •Learning Objectives
- •Testing the Risk Ratio
- •Testing the Odds Ratio
- •Learning Objectives
- •15 Measuring the Association between Two Variables
- •Learning Objectives
- •Association
- •The Correlation Coefficient
- •16 Measuring Agreement
- •Learning Objectives
- •To Agree or not Agree: That is the Question
- •Cohen’s Kappa
- •Measuring Agreement with Ordinal Data – Weighted Kappa
- •Measuring the Agreement between Two Metric Continuous Variables
- •17 Straight Line Models: Linear Regression
- •Learning Objectives
- •Health Warning!
- •Relationship and Association
- •The Linear Regression Model
- •Model Building and Variable Selection
- •18 Curvy Models: Logistic Regression
- •Learning Objectives
- •A Second Health Warning!
- •Binary Dependent Variables
- •The Logistic Regression Model
- •19 Measuring Survival
- •Learning Objectives
- •Introduction
- •Calculating Survival Probabilities and the Proportion Surviving: the Kaplan-Meier Table
- •The Kaplan-Meier Chart
- •Determining Median Survival Time
- •Comparing Survival with Two Groups
- •20 Systematic Review and Meta-Analysis
- •Learning Objectives
- •Introduction
- •Systematic Review
- •Publication and other Biases
- •The Funnel Plot
- •Combining the Studies
- •Solutions to Exercises
- •References
- •Index
8
Probability, risk and odds
Learning objectives
When you have finished this chapter you should be able to:
Define probability, explain what an event is and calculate simple probabilities. Explain the proportional frequency approach to calculating probability.
Explain how probability can be used with the area properties of the Normal distribution.
Define and explain the idea of risk and its relationship with probability.
Calculate the risk of some outcome from a contingency table and interpret the result. Define and explain the idea of odds.
Calculate odds from a case-control 2 × 2 table and interpret the result.
State the equation linking probability and odds and be able to calculate one given the other.
Explain what the risk ratio of some outcome is, calculate a risk ratio and interpret the result.
Explain what the odds ratio for some outcome is, calculate an odds ratio and interpret the result.
Medical Statistics from Scratch, Second Edition David Bowers
C 2008 John Wiley & Sons, Ltd
98 |
CH 8 PROBABILITY, RISK AND ODDS |
Explain why it’s not possible to calculate a risk ratio in a case-control study.
Define number needed to treat, explain its use and calculate NNT in a simple example.
Chance would be a fine thing – the idea of probability
Probability is a measure of the chance of getting some outcome of interest from some event. The event might be rolling a dice and the outcome of interest might be getting a six; or the event might be performing a biopsy with the outcome of interest being evidence of malignancy and so on. Some basic ideas about probability:
The probability of a particular outcome from an event will lie between zero and one.
The probability of an event that is certain to happen is equal to one. For example, the probability that everybody dies eventually.
The probability of an event that is impossible is zero. For example, throwing a seven with a normal dice.
If an event has as much chance of happening as of not happening (like tossing a coin and getting a head), then it has a probability of 1/2 or 0.5.
If the probability of an event happening is p, then the probability of the event not happening is 1 – p.
CALCULATING PROBABILITY |
99 |
Table 8.1 Frequency table showing causes of blunt injury to limbs in 75 patients
|
Frequency (number of |
Proportional |
|
Cause of injury |
patients) n = 75 |
frequency |
46/75 = |
Falls |
46 |
0.613 |
0.613 |
|
|||
Crush |
20 |
0.267 |
|
Motor vehicle crash |
6 |
0.080 |
|
Other |
3 |
0.040 |
|
|
|
|
|
Calculating probability
You can calculate the probability of a particular outcome from an event with the following expression:
The probability of a particular outcome from an event is equal to the number of outcomes that favour that event, divided by the total number of possible outcomes.
To take a simple example: What is the probability of getting an even number when you roll a dice?
Total number of possible outcomes = 6 (1 or 2 or 3 or 4 or 5 or 6)
Total number of outcomes favouring the event ‘an even number’ = 3 (i.e. 2 or 4 or 6)
So probability of getting an even number = 3/6 = 1/2= 0.5
The above method for determining probability works well with experiments where all of the outcomes have the same probability, e.g. rolling dice, tossing a coin, etc. In the real world you will often have to use what is called the proportional frequency approach, which uses existing frequency data as the basis for probability calculations.
As an example, look at Table 8.1 (which is Table 2.3 reproduced for convenience) which shows the causes of blunt injury to limbs. I have added an extra column showing the proportional frequency (category frequency divided by total frequency). Notice that the proportional frequencies sum to one.
Exercise 8.1 Table 1.6 shows the basic characteristics of the two groups of women receiving a breast lump diagnosis in the stress and breast cancer study. What is the probability that a woman chosen at random: (a) will have had her breast lump diagnosed as (i) benign? (ii) malignant?; (b) will be post-menopausal?; (c) will have had three or more children?
Exercise 8.2 Table 1.7 is from a study of thrombotic risk during pregnancy. What is the probability (under classification 1) that a subject chosen at random will be aged: (a) less than 30?; (b) more than 29?
100 |
CH 8 PROBABILITY, RISK AND ODDS |
Now ask the question, ‘What is the probability that if you chose one of these 75 patients at random their injury will have been caused by a fall?’. The answer is the proportional frequency for the ‘fall’ category, i.e. 0.613. In other words, we can interpret proportions as equivalent to probabilities. Probability is a huge subject with many textbooks devoted to it, but for our purposes in this book we don’t really need to know any more.
Probability and the Normal distribution
We know that if data is Normally distributed then about 95 per cent of the values will lie no further than two standard deviations from the mean (see Figure 5.5). In probability terms, we can say that there is a probability of 0.95 that a single value chosen at random will lie no further than two standard deviations from the mean. In the case of the Normally distributed birthweight data, this means that there is a probability of 0.95 that the birthweight of one of these infants chosen at random will be between 2890 g and 4398 g.
Exercise 8.3 Using the information on cord platelet count in Figure 4.6, determine the probability that one infant chosen at random from this sample will have a cord platelet count: (a) between 101 × 109/l and 515 × 109/l; (b) less than 239 × 109/l.
Risk
As I mentioned earlier a risk is the same as a probability, but the former word tends to be favoured in the clinical arena. So the definition of probability given earlier applies equally here to risk. In other words, the risk of any particular outcome from an event is equal to the number of favourable outcomes divided by the total number of outcomes. Risk accordingly can vary between zero and one.
As an example, and also to re-visit the contingency table, look again at the table in Table 6.1 from the cohort study of coronary heart disease (CHD) in adult life and the risk factor ‘weighing 18 lbs or less at one year’. The risk (or probability) that those adults who as infants weighed 18 lbs or less at one year will have CHD, is equal to the number who weighed 18 lbs or less at one year and had CHD, divided by the total number who weighed 18 lbs or less. This is equal to 4/15 = 0.2667.
Similarly, the risk (or probability) for those who weighed more than 18 lbs at one year will have CHD equals the number who weighed more than 18 lbs at one year and had CHD, divided by the total number who weighed more than 18 lbs. This is equal to 38/275 = 0.1382 and thus is only half the risk of those weighing 18 lbs or less.
The risk for a single group, as it is described it above, is also known as the absolute risk, mainly to distinguish it from relative risk, which is the risk for one group compared to the risk for some other group (which we’ll come to shortly).
