Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Introduction to Statistics for Biomedical Engineers - Kristina M. Ropella.pdf
Скачиваний:
126
Добавлен:
10.08.2013
Размер:
1.72 Mб
Скачать

c h a p t e r 3

Data Summary and

Descriptive Statistics

We assume now that we have collected our data through the use of good experimental design. We now have a collection of numbers, observations, or descriptions to describe our data, and we would like to summarize the data to make decisions, test a hypothesis, or draw a conclusion.

3.1WHY DO WE COLLECT DATA?

The world is full of uncertainty, in the sense that there are random or unpredictable factors that influence every experimental measure we make. The unpredictable aspects of the experimental outcomes also arise from the variability in biological systems (due to genetic and environmental factors) and manufacturing processes, human error in making measurements, and other underlying processes that influence the measures being made.

Despite the uncertainty regarding the exact outcome of an experiment or occurrence of a future event, we collect data to try to better understand the processes or populations that influence an experimental outcome so that we can make some predictions. Data provide information to reduce uncertainty and allow for decision making. When properly collected and analyzed, data help us solve problems. It cannot be stressed enough that the data must be properly collected and analyzed if the data analysis and subsequent conclusions are to have any value.

3.2WHY DO WE NEED STATISTICS?

We have three major reasons for using statistical data summary and analysis:

1.The real world is full of random events that cannot be described by exact mathematical expressions.

2.Variability is a natural and normal characteristic of the natural world.

3.We like to make decisions with some confidence. This means that we need to find trends within the variability.

10  introduction to statistics for bioMEDical engineers

3.3WHAT QUESTIONS DO WE HOPE TO ADDRESS WITH OUR STATISTICAL ANALYSIS?

There are several basic questions we hope to address when using numerical and graphical summary of data:

1.Can we differentiate between groups or populations?

2.Are there correlations between variables or populations?

3.Are processes under control?

Finding physiological differences between populations is probably the most frequent aim of biomedical research. For example, researchers may want to know if there is a difference in life expectancy between overweight and underweight people. Or, a pharmaceutical company may want to determine if one type of antibiotic is more effective in combating bacteria than another. Or, a physician wonders if diastolic blood pressure is reduced in a group of hypertensive subjects after the consumption of a pressure-reducing drug. Most often, biomedical researchers are comparing populations of people or animals that have been exposed to two or more different treatments or diagnostic tests, and they want to know if there is difference between the responses of the populations that have received different treatments or tests. Sometimes, we are drawing multiple samples from the same group of subjects or experimental units. A common example is when the physiological data are taken before and after some treatment, such as drug intake or electronic therapy, from one group of patients. We call this type of data collection blocking in the experimental design. This concept of blocking is discussed more fully in Chapter 2.

Another question that is frequently the target of biomedical research is whether there is a correlation between two physiological variables. For example, is there a correlation between body build and mortality? Or, is there a correlation between fat intake and the occurrence of cancerous tumors. Or, is there a correlation between the size of the ventricular muscle of the heart and the frequency of abnormal heart rhythms? These type of questions involve collecting two set of data and performing a correlation analysis to determine how well one set of data may be predicted from another. When we speak of correlation analysis, we are referring to the linear relation between two variables and the ability to predict one set of data by modeling the data as a linear function of the second set of data. Because correlation analysis only quantifies the linear relation between two processes or data sets, nonlinear relations between the two processes may not be evident. A more detailed description of correlation analysis may be found in Chapter 7.

Finally, a biomedical engineer, particularly the engineer involved in manufacturing, may be interested in knowing whether a manufacturing process is under control. Such a question may arise if there are tight controls on the manufacturing specifications for a medical device. For example,

Data Summary and Descriptive Statistics  11

if the engineer is trying to ensure quality in producing intravascular catheters that must have diameters between 1 and 2 cm, the engineer may randomly collect samples of catheters from the assembly line at random intervals during the day, measure their diameters, determine how many of the catheters meet specifications, and determine whether there is a sudden change in the number of catheters that fail to meet specifications. If there is such a change, the engineers may look for elements of the manufacturing process that change over time, changes in environmental factors, or user errors. The engineer can use control charts to assess whether the processes are under control. These methods of statistical analysis are not covered in this text, but may be found in a number of references, including [3].

3.4HOW DO WE graphically SUMMARIZE DATA?

We can summarize data in graphical or numerical form. The numerical form is what we refer to as statistics. Before blindly applying the statistical analysis, it is always good to look at the raw data, usually in a graphical form, and then use graphical methods to summarize the data in an easy to interpret format.

The types of graphical displays that are most frequently used by biomedical engineers include the following: scatterplots, time series, box-and-whisker plots, and histograms.

Details for creating these graphical summaries are described in [3–6], but we will briefly describe them here.

3.4.1 Scatterplots

The scatterplot simply graphs the occurrence of one variable with respect to another. In most cases, one of the variables may be considered the independent variable (such as time or subject number), and the second variable is considered the dependent variable. Figure 3.1 illustrates an example of a scatterplot for two sets of data. In general, we are interested in whether there is a predictable relationship that maps our independent variable (such as respiratory rate) into our dependent variable (such a heart rate). If there is a linear relationship between the two variables, the data points should fall close to a straight line.

3.4.2 Time Series

A time series is used to plot the changes in a variable as a function of time. The variable is usually a physiological measure, such as electrical activation in the brain or hormone concentration in the blood stream, that changes with time. Figure 3.2 illustrates an example of a time series plot. In this figure, we are looking at a simple sinusoid function as it changes with time.

12  introduction to statistics for bioMEDical engineers

Dependent Variable

10

9

8

7

6

5

4

3

2

1

0

0

10

20

Independent Variable

FIGURE 3.1: Example of a scatterplot.

3.4.3 Box-and-Whisker Plots

These plots illustrate the first, second, and third quartiles as well as the minimum and maximum values of the data collected. The second quartile (Q2) is also known as the median of the data. This quantity, as defined later in this text, is the middle data point or sample value when the samples are listed in descending order. The first quartile (Q1) can be thought of as the median value of the samples that fall below the second quartile. Similarly, the third quartile (Q3) can be thought of as the median value of the samples that fall above the second quartile. Box-and-whisker plots are useful in that they highlight whether there is skew to the data or any unusual outliers in the samples (Figure 3.3).

 

2

 

 

 

 

1

 

 

 

Amplitude

0

 

 

 

 

 

 

 

 

-1

 

 

 

 

-2

 

 

 

 

5

10

15

20

Time (msec)

FIGURE 3.2: Example of a time series plot. The amplitude of the samples is plotted as a function of time.