Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

become widespread, however, and most elementary statistical instruction, as well as a large amount of discourse in this text, still employs the traditional “parametric” statistical strategies.

Nevertheless, the revolution is happening, and you should be prepared for its eventual results. Respectable statisticians today are even proposing such heresies as, “All techniques designed before the advent of automatic computing must be abandoned.”8 Until the revolution becomes totally established, however, the traditional statistical strategies will continue to be used and published. Consequently, the traditional strategies will be discussed here to prepare you for today’s realities. The new strategies will regularly be mentioned, however, to indicate things that can not only be done today, but routinely expected within the foreseeable future.

6.5Demarcating Zones of Probability

Regardless of whether a sample is constructed empirically or theoretically, zones of probability in the distribution have a particularly important role in both statistical description and statistical inference. Descriptively, the zones are used to demarcate the “bulk” of a set of data or to form a “range of normal.” Inferentially, analogous zones lead to decisions that a particular observation is too “uncommon” or “far out” to be accepted as a regular part of the distribution. For these purposes, boundary markers must be set for zones of probability.

6.5.1Choosing Zones

In gambling, we may want to know the probabilities for a single event, such as drawing a spade in cards or tossing a 7 in dice. In statistical activities, however, we often want to know the probabilities for a zone of events or values. We would seldom ask about the probability that a single chloride value will be exactly 101, but we might regularly want to know the chances that the value will lie in a zone below 93 or above 107.

A zone almost always refers to a series of directly contiguous dimensional (or ordinal) values in a distribution. In the chloride example just cited, the selected zone would extend from 93 to 107. Sometimes, however, a zone can be a cluster of selected noncontiguous values. For example, when you make a “field” bet in tosses of two dice, the successful zone consists of any member of the items 2, 3, 4, 9, 10, 11, or 12.

6.5.2Setting Markers

One way to set markers for a zone is from the cumulative relative frequencies associated with an actual or resampled distribution. The array of frequencies could be examined to note the values that demarcate zones of either chloride measurements in Table 4.1 or resampled means in Table 6.1.

A direct check of individual frequencies can be avoided, however, if the distribution has certain mathematical characteristics, such as the distributions eponymically cited with such names as Gauss, “Student,” and Bernouilli. If we know (or can assume) that a collection of data has one of these model distributions, we can easily find markers for the desired probabilities, without resorting to specific tabulations of frequencies.

This mathematical attribute has given the eponymic model distributions their statistical popularity. To determine probabilites, we first calculate a special index, called a “test statistic,” from the observed data. We then assume, based on mathematical justifications discussed later, that the test statistic is associated with a particular mathematical model such as the standard Gaussian distribution. The value of the test statistic then becomes the marker for a zone of probability. For example, the Z-score is often used as a test statistic that has specific probability values, shown in Section 4.8.3, for each of its locations in a standard Gaussian distribution. [Other well-known test statistics are referred to a t (or “Student”) distribution or to a chi-square distribution.]

© 2002 by Chapman & Hall/CRC

The percentiles and standard deviations discussed in Chapters 4 and 5 can thus be used in two ways: descriptively, to indicate specific zones for an empirical set of data, and inferentially, to demarcate zones of probability in a mathematical model.

6.5.3Internal and External Zones

The total cumulative probability in any distribution always has a sum of 1. This total can be divided into different sets of zones. An internal zone of data contains values that are within or smaller than the demarcated boundary. An external zone contains values that are outside or beyond the demarcated boundary. For example, the ipr95 has an internal cumulative probability of .95 and an external cumulative probability of .05. At the 97.5 percentile point, the internal cumulative probability is .975 (for values of data smaller than the associated percentile value), and the external cumulative probability is .025.

6.5.4Two-Zone and Three-Zone Boundaries

A single percentile point, such as P.95, demarcates two zones — one internal and the other external. Boundaries established by two percentiles, such as the 2.5 and 97.5 percentile points that form the ipr 95, will demarcate three zones of cumulative probability: a central internal zone, surrounded by two external zones. The distinctions are shown in Figure 6.1.

FIGURE 6.1

Cumulative probability zones for ipr95. Upper drawing shows the 95 percentile point as one boundary with two zones. Lower drawing shows the 2.5 and 97.5 percentile points as two boundaries with three zones.

 

Internal Zone

= .95

External

 

Zone

 

 

 

= .05

 

 

 

P.95

External

 

 

External

Zone

Internal Zone

= .95

Zone

= .025

= .025

 

 

P.025

 

 

P.975

6.5.5P-Values and Tails of Probability

In many appraisals of probability, the focus is on the external zone, i.e., the chance of observing something more extreme than what is contained within the selected boundary. This external zone of cumulative probability is called a P-value. The external zone is often demarcated as the area located in the “tail” of the curve of relative frequencies for a mathematical model.

The cumulative aspect of the frequencies is omitted when the P-values or other cumulative results are cited as probabilities, but the cumulative feature is implicit in the word tail, which describes the location of the zone. Thus, if only two zones have been demarcated with a single boundary, the single external zone is called a one-tailed probability. A key distinction in one-tailed probabilities is stipulating a direction to indicate whether the focus of the external zone is on values that are above or below the selected boundary. If three zones have been formed from two boundaries, the two external zones are called a two-tailed probability. Two-tailed probabilities are always bi-directional, demarcating external zones that occupy both extremes of the distribution.

6.5.6Confidence Intervals and Internal Zones

For the entity called the confidence interval for a central index, the focus is on an internal zone of probability. It usually has two boundaries, set around the proposed location of the central index, for a

© 2002 by Chapman & Hall/CRC

zone within which the location might vary under circumstances discussed earlier and also later in Chapter 7.

6.5.7To-and-from Conversions for Observations and Probabilities

Observations in a set of data can readily be converted to probabilities, and vice versa. Beginning with observed (or cited) values of data, we can find probabilities; or beginning with probabilities, we can find observed values.

6.5.7.1 From Observations to Probabilities — For the data of Table 4.1, suppose we want to know the probability that a chloride value is below 93. The single cited boundary of 93 establishes a directional two-zone arrangement. For the extreme values below 93, the cumulative relative frequency is external to the boundary. For values of 93 or higher, the cumulative relative frequency is internal. In Table 4.1, the external probability for values below 93 is .0598 and the internal probability is .9402 (= l .0598).

For chloride levels bounded by values of 93 and 107, the three zones have an internal probability for the zone of 93–107, an external probability for the zone below 93, and another external probability for the zone above 107. The respective values for these three probabilities in Table 4.l are .0598, .8598, and .0804.

Note that the internal and external probabilities in a frequency distribution always have a sum of 1. This important distinction is often disregarded, however, when people make subjective estimates about probabilities for a particular choice (such as a diagnosis or murder suspect) among a series of candidates.10 The total of the probability values estimated for individual candidates may exceed 1. Other human inconsistencies in using probabilities (and magnitudes) are discussed in Section 6.8.

6.5.7.2 From Probabilities to Observed Values — To determine cumulative relative frequencies or probabilities, we read horizontally rightward in Table 4.l beginning with the observed chloride values. We could also, however, go leftward, from cumulative frequencies toward chloride values. For example, from the definition of percentiles, we know that the 25th percentile occurs where the ascending cumulative frequency (or lower external probability) passes 0.25. Table 4.1 shows that the 25th percentile occurs at the chloride level of 99. The table also shows that the 50th percentile is at the chloride level of 102, and the 75th percentile at 105. An ipr95, extending from the 2.5 to the 97.5 percentile in Table 4.l, could be marked by the chloride levels of 89 to 110.

6.6Gaussian Z-Score Demarcations

Despite all the difficulties in non-Gaussian distributions, the standard deviation is a splendid way of demarcating inner zones when a distribution is truly Gaussian. In particular, the Z-scores determined from the standard deviation of Gaussian data will correspond directly to probability values discerned from the mathematical formula for the Gaussian curve.

One of these probability values, often called a probability density, is the height of the curve at a particular value, i.e., the relative frequency of items having the individually cited Z-score. These were the probability values shown with the Gaussian formula and small table in Section 4.8.2. (In a tabulation of either dimensional or categorical data, the probability density represents the proportion or relative frequency of the distribution occurring at a particular individual value.)

A different set of probability values refers to cumulative relative frequencies rather than the focal density at a cited value of Z. These cumulative proportions were shown for one side of the curve in Table 4.3 and for both sides in the in-text table in Section 4.8.3. Because of the symmetry and versatility of the Gaussian curve, a single value of Z can be associated with probabilities that represent several different zones of cumulative relative frequencies. These cumulative relative frequencies are the areas formed under the Gaussian curve.

© 2002 by Chapman & Hall/CRC

6.6.1One-Tailed Probabilities

A percentile value demarcates a one-tailed probability for the zone of data spanned as the distribution moves upward from its lowest value.

The top part of Figure 6.2 shows the demarcation for a positive value of Z. The zone of internal probability is the percentile value associated with Z, counting from the lower end of the distribution. The rest of the distribution contains the zone of external probability. As shown in the middle part of Figure 6.2, if Z has the same but negative value, the zone of internal probability again starts from the lower end of the distribution, but is smaller than in the first situation. The zones of internal and external probability in each situation are opposite counterparts.

In many appraisals, however, the data analyst is interested in external probabilities for values of Z that are more extreme, in the same direction, than the observed value. Thus, if Z = 1.3, we want external probabilities for values of Z that are Š 1.3. If Z is 1.3, we want external probabilities for values of Z that are ≤ − 1.3. In the latter situation, therefore, the external probability would be redesignated on the lower side of the Gaussian curve, as shown in the bottom part of Figure 6.2. The upper and lower figures become mirror images around the central value where Z = 0.

If the external one-tailed probability value is called P, it corresponds to a percentile value of 1 P. For example, a Z-score of 1.0 in a Gaussian curve represents a one-tailed exterior probability (or P value) of .16. The corresponding interior probability or percentile point is 1 .16 = .84. This same result can be obtained by noting in Table 4.3 that a Z-score of 1.0 has a cumulative relative probability of .34 on the right side of the curve. Because the left side of the curve contains a cumulative probability of .50, the total will produce an interior probability of .84 for the percentile point.

 

Z

Zone of Internal

Zone of External

Probability

Probability

-Z

 

Zone of Internal

Zone of External

Probability

Probability

-Z

 

Zone of External

Corresponding Zone

Probability for

of Internal

More Extreme

Probability

Values of Z

 

FIGURE 6.2

Zones of internal and external probability for a single value of Z in a Gaussian curve.

6.6.2Two-Tailed Probabilities

The proportion of data contained in the symmetrical inner zone formed between the positive and negative values of a Gaussian Z has an inner probability. The remaining data, on the two outside portions beyond the inner zone, have a total external probability, which is called a two-tailed

Pvalue.

The distinction is shown in Figure 6.3. In this instance,

if the Z-score is ±1, the external probability is .16 on one side of the curve and .16 on the other, so that the two-tailed P value is .32. The inner zone of probability is 1 .32 =

.68. (The latter value should look familiar. It was mentioned in Section 5.3.2.2 with the statement that a zone of X ± s, i.e., one standard deviation around the mean, occupies 68% of a set of Gaussian data.)

Zone of

Internal

Probability

-Z

+Z

Zone of External

Zone of External

Probability

Probability

FIGURE 6.3

Zones of internal and external probability for the customary value of± Z in a Gaussian curve.

© 2002 by Chapman & Hall/CRC

6.6.3Distinctions in Cumulative Probabilities for Z-Scores

In special tables or in computer print-outs for a Gaussian distribution, the Z-scores and the corresponding cumulative probabilities are almost always shown as two-tailed (external) P values. These can easily be halved to denote the one-tailed P values, but the distinctions must be carefully considered if you want to determine percentiles, to find internal rather than external probabilities, or to go back and forth from Z values to P values. Table 6.3 shows the corresponding one-tailed and two-tailed cumulative probability values of Z-scores in a standard Gaussian distribution.

To find internal probabilities or percentiles, the cited external probabilities must be suitably subtracted. With a two-tailed P value, symbolized as 2P, the internal probability (for the zone of inner values) is 1 2P. For a one-tailed inner zone of probability, however, the two-tailed P value must be halved before it is subtracted from 1. Because percentile points represent one-tailed probability values, the Z-score corresponding to the 95th percentile would be 1.645, because it has an external one-tailed probability of .05.

TABLE 6.3

Cumulative Relative Frequencies and P Values Associated with Z Values in a Standard Gaussian Distribution

 

One-Tailed

 

Two-Tailed

 

Z

External Probability*

 

External Probability

Internal Probability

 

 

 

 

0

.5000

1

0

±0.500

.3090

.6170

.3830

±0.674

.2500

.5000

.5000

±1.000

.1585

.3170

.6830

±1.282

.1000

.2000

.8000

±1.500

.0670

.1340

.8660

±1.645

.0500

.1000

.9000

±1.960

.0250

.0500

.9500

±2.000

.0230

.0460

.9540

±2.240

.0125

.0250

.9750

±2.500

.0060

.0120

.9880

±2.576

.0050

.0100

.9900

±3.000

.0073

.0027

.9973

±3.290

.0005

.0010

.9990

*For positive Z values only. For negative Z values, the external probability (for values higher than Z) is 1 minus the cited result. For example, at Z = − 1.282, the one-tailed external probability is 1 .1 = .9.

6.6.3.1“Decreasing Exponential” Z/P Relationship — As Z gets larger, the external twotailed P values get smaller. Since P is an area under the curve, the Z and P values have a “decreasing exponential” rather than linear relationship in Gaussian curves. The distinctions are shown in Figure 6.4. A 19% increase in Z, from 1.645 to 1.96, will halve the two-tailed P value, from .1 to .05. A 40% decrease in Z, from 3.29 to 1.96, will produce a 50-fold increase in the two-tailed P, from .001 to .05. Consequently, whenever Z exceeds 1, small changes in Z can have a major proportionate impact on P in the zones used for critical decisions about “significance.”

If interested in one-tailed P values, we would find the Z-score associated with twice the one-tailedP. This one-tailed value for Z will be smaller, but not half as small, as the two-tailed value. Thus, for a two-tailed P of .025, Z is 2.24; but for a one-tailed P of .025, Z is 1.96, which corresponds to a twotailed P of .05. A Z-score of 1.645 has a P of .1 when two-tailed and .05 when one-tailed.

6.6.3.2Ambiguous Interpretations — These Z/P relationships are particularly noteworthy when external probabilites are cited for P values that represent statistical inferences rather than direct descriptions. In tests of statistical inference, as noted later, many investigators prefer to use one-tailed

©2002 by Chapman & Hall/CRC

Two-tailed

P Values

.65

.60

.55

.50

.45

.40

.35

.30

.25

.20

.15

.10

.05

0

.5

1.0

1.5

2.0

2.5

3.0

3.5

Z-scores

FIGURE 6.4

Relationship of 2- tailed P values and Z-Scores in a Gaussian curve.

P values because, for the same value of Z, the one-tailed P will be lower (and hence more “significant”) than the two-tailed value. A currently unresolved dispute, discussed later in Chapter 11, has arisen between advocates of the two opposing policies: to always demand 2-tailed or sometimes accept 1-tailed P values for expressing “significance.” Regardless of which policy is adopted, however, a reader will need to know how the decisions were made by the investigator (or editor).

If the inferential result is merely cited as “P < .05,” however, the reader will not know whether it is one-tailed or two-tailed. To avoid ambiguity, P values can be reported as “2P < .05” when two-tailed and “P < .05” when one-tailed. This convention has begun to appear in published medical literature, but is not yet common. Until it becomes universal, writers should specify (in the associated text) whether P values are one-tailed or two-tailed.

6.6.4Designated and Observed Zones of Probability

Special symbols are used to distinguish a previously designated zone of probability from zones that are observed in the data. For example, we might decide to give the accolade of high honors to anyone who ranks in or above the 95th percentile in a certification test. Someone who scores in the 97th percentile would receive this acclaim, but not someone who is in the 89th.

The Greek symbol α is used for designated levels of these zones. For the preceding percentile example, Pα was set at .95. In many statistical decisions, α ≤ .05 is commonly set as the boundary level for P values that will be called “significant.” In statistical decisions, the Z value that corresponds to α is marked Zα . For a two-tailed α = .05, Z.05 = 1.96.

6.6.4.1Inner Percentile Zones and Symbols — For inner percentile ranges, the correspond-

ing zones are based on two-tailed internal probabilities. If α represents the two-tailed external probability, the inner zone is determined as 1 − α .

To find an ipr95, the corresponding zone is 1 − α = .95, and the two-tailed α

is .05. If you want an

ipr90, α = .1 and the Gaussian Z.1 = 1.645. For an ipr50, or interquartile range, α =

.5 and Z.5 = .674.

© 2002 by Chapman & Hall/CRC

6.6.4.2Observed (Calculated) Values — If we start with observed data and want to determine probabilities, no Greek symbols are used. The value of Z is calculated as (Xi – X) /s, and is simply marked Z or Zi. We then find the probability (or P value) associated with that value of Z. (Later in the text, to distinguish the observed value of Z from other possibilities, it will be cited as Z0 and the corresponding P value will be P0 .)

6.6.4.3Limited Tables of Z/P Values — With modern computer programs, a precise value of P, such as .086, is printed for each calculated Z value, such as 1.716. Before this computational convenience became available, however, Z values had to be interpreted with published tables showing

the P values that correspond to selected values of Zα . These tables, which resemble Table 4.3, are still needed and used if Z is determined with a hand calculator or other apparatus that does not immediately

produce the corresponding P value.

If the available tables do not show all possibilities, the observed Z value will often lie between the

tabular values for cited levels of Zα . Consequently, a Z value such as 1.716 is often reported with the interpolated statement .05 < P < .1, indicating that P lies between two pertinent Z α values, which in this instance are Z.05 = 1.96 and Z.1 = 1.645. If a particular boundary of Zα has been set for “statistical significance,” as discussed later, only the most pertinent level of α is usually indicated, with statements such as P < .1 or P > .05.

6.6.4.4Wrong Expression of Intermediate Values — When a P value lies between two values of α , such as .1 and .05, the > and < signs are sometimes placed incorrectly, so that the expression becomes an impossible .1 < P < .05. To avoid this problem, always put the lowest value on the left, as it would appear on a graph. The best expression for the intermediate P is then .05 < P < .1.

6.6.5Problems and Inconsistencies in Symbols

As long as α , Zα , and P always represent two-tailed values and selections, the symbols are clear and unambiguous. The α , Zα , and P symbols, however, traditionally refer to two-tailed probabilities. When the symbols (as well as the corresponding values) are modified for one-tailed applications, confusion can be produced by the inverse relationship that makes Z and P go in opposite directions. If 2P and P respectively represent two-tailed and one-tailed external probabilities determined from the data, the modified symbols are clear, except perhaps for readers who do not realize that P (alone) will then specifically represent one tail rather than the conventional two. To avoid this problem, the solo P can be modified with the parenthetical remark “(one-tailed).”

The big problems arise in choosing subscripts for Z, particularly when it demarcates boundaries against which the values of P or 2P will be compared for decisions about “statistical significance.” In most special tabulations of Z values, the associated P values are two-tailed. For a two-tailed α , the total external zone of α is split into two parts, each containing α /2, as shown in Figure 6.5. A 2P value is therefore compared against a boundary that has α /2 externally on each side.

In a one-tailed procedure, all of the external probability is unilateral, occupying the full value of α rather than α /2. To allow the external probability on one side of the curve to be α , the selected demarcation point for the ordinarily two-tailed Z would be Z2α . With this demarcation in Figure 6.6, an α /2 probability zone is transferred from the left to the right side of the curve. Having been placed under a higher point in the curve, the additional α /2 area is achieved with a smaller horizontal distance for Z than was previously needed at the other end of the curve. Thus, the values for Zα and Z2α do not correspond proportionately to the relative magnitudes of α and α /2.

The probability zones, demarcations, and symbols can be cited as follows for two-tailed and onetailed decisions:

 

Probability in

Demarcation

Symbol for

Decisions

Pertinent Zones

Point

P Value

 

 

 

 

Two-tailed

α /2

Zα

2P

One-tailed

α

Z2α

P (one-tailed)

© 2002 by Chapman & Hall/CRC

α

/2

α /2

α /2

 

 

α /2

 

Z

Zα

+ Zα

 

 

FIGURE 6.5

FIGURE 6.6

 

Demarcation of two α /2 zones of probability by ± Zα .

Demarcation of a one-tailed probability zone for α .

For two-tailed decisions with α = .05, Z.05 is set at 1.96; the inner 95% zone has two exterior zones that each have a probability of .025; and the corresponding internal probability is 1 − α = .95. For onetailed decisions with α = .05, the 2α demarcation point is set at Z.10 = 1.645; the one-tailed external probability is .05, and the internal probability is still l − α = .95.

A reasonably complete set of values in a Z/P tabulation is shown in Table 6.4, where values of 2P are shown for positive values of Z extending from 0 to about 3.9. Because almost all published tables of probability show two-tailed values, the Z boundaries are sometimes labelled as Zα /2, to clarify their connotation. Regardless of the label used for Z, however, the corresponding value of P in any two-tailed Z/P table is halved to get the one-tailed P value.

To try to avoid confusion, the symbols P, α , and Zα will be used here in an uncommitted general way to denote observed or specified levels of probability. Most discussions will refer to the conventional two-tailed approach, but one-tailed directions will often be considered. To allow this flexibility, the symbols will be “P” for the general probability, “2P” when it is specifically two-tailed, and “(one-tailed) P” when it is one-tailed. The α and Zα symbols will be used as such, without recourse to the confusing forms of α /2, 2α , Zα /2 or Z2α . The oneor two-tailed direction of α or Zα will always be cited in the associated text. Although perhaps undesirable for mathematical rigor, the apparent inconsistency has the pedagogic advantage of being reasonably clear and permitting an unconstrained discussion. If not otherwise specified, P and Zα can be considered as two-tailed.

6.6.6Determining Gaussian Zones

After Zα is chosen, Gaussian zones of data are easy to determine when Zα is multiplied by the observed standard deviation, s, to form Zα s. The latter value is then added and subtracted from the mean, X , to form X ± Zα s. In Chapter 4, the value of 2 was used instead of the more precise 1.96 to produce an inner 95% Gaussian zone as X ± 2s. The Zα multiplying factor can be changed, when desired, to form Gaussian zones of different sizes, such as ipr90 or ipr50. The corresponding values of α will be .10 and

.50; and the corresponding values of Zα (derived from Table 6.4) will be 1.645 and .674.

With smaller values of Zα , the inner zones will shrink as 1 − α gets smaller. Thus, in a set of data where X = 10 and s = 3, the corresponding Gaussian zones will be as follows:

ipr

α

Zα

Zα s

Extent of

 

± Z α s

X

95

.05

1.906

5.880

4.120 to 15.880

90

.10

1.645

4.935

5.065 to 14.935

50

.50

.674

2.022

7.978 to 12.022

© 2002 by Chapman & Hall/CRC

TABLE 6.4

Values of 2P for Positive Values of Z

Z

0.000

0.002

0.004

0.006

0.008

 

 

 

 

 

 

0.00

99840

99681

99521

99362

0.04

98609

96650

96490

96331

96172

0.08

93624

93465

93306

93147

92988

0.12

90448

90290

90132

89974

89815

0.16

87288

87131

86973

86816

86658

0.20

84148

83992

83835

83679

83523

0.24

81033

80878

80723

80568

80413

0.28

77948

77794

77641

77488

77335

0.32

74897

74745

74594

74442

74291

0.36

71885

71735

71586

71437

71287

0.38

70395

70246

70098

69950

69802

0.40

68916

68768

68621

68474

68327

0.42

67449

67303

67157

67011

66865

0.44

65994

65849

65704

65560

65415

0.46

64552

64408

64265

64122

63978

0.48

63123

62981

62839

62697

62555

0.50

61708

61567

61426

61286

61145

0.52

60306

60167

60028

59889

59750

0.54

58920

58782

58644

58507

58369

0.56

57548

57412

57275

57139

57003

0.58

56191

56057

55922

55788

55653

0.60

54851

54717

54584

54451

54319

0.62

53526

53394

53263

53131

53000

0.64

52217

52087

51958

51828

51698

0.66

50925

50797

50669

50541

50413

0.68

49650

49524

49398

49271

49145

0.7

48393

47152

45930

44725

43539

0.8

42371

41222

40091

38979

37886

0.9

36812

35757

34722

33706

32709

1.0

31731

30773

29834

28914

28014

1.1

27133

26271

25429

24605

23800

1.2

23014

22246

21498

20767

20055

1.3

19360

18684

18025

17383

16759

1.4

16151

15561

14987

14429

13887

1.5

13361

12851

12356

11876

11411

1.6

10960

10523

10101

09691

09296

1.7

08913

08543

08186

07841

07508

1.8

07186

06876

06577

06289

06011

1.9

05743

05486

05238

05000

04770

2.0

04550

04338

04135

03940

03753

2.1

03573

03401

03235

03077

02926

2.2

02781

02642

02509

02382

02261

2.3

02145

02034

01928

01827

01731

2.4

01640

01552

01469

01389

01314

2.5

01242

01174

01109

01047

00988

2.6

00932

00879

00829

00781

00736

2.7

00693

00653

00614

00578

00544

2.8

00511

00480

00451

00424

00398

2.9

00373

00350

00328

00308

00288

3.0

00270

00253

00237

00221

00207

3.2

00137

00128

00120

00111

00104

 

 

 

 

 

 

3.4: 00067

3.5: 00047

3.6: 00032

3.7: 00022

3.8: 00014

3.891: 00010

Note: The three decimal places in the top headings become reduced to two when Z reaches 0.7. Thus, for Z = .82, the 2P value is found as .41222 in the column marked 0.002. For Z = 1.96, the value of 2P in the 0.006 column is 0.5000. [Table derived from page 28 of Geigy Scientific Tables, Vol. 2, ed. by C. Lentner, 1982. Ciba-Geigy Limited. Basle, Switzerland.]

© 2002 by Chapman & Hall/CRC

6.7Disparities Between Theory and Reality

In a Gaussian distribution, the two sides of the curve are symmetrical around the center. No matter what boundary is chosen, the external two-tailed probabilities will be identical and equally divided. Thus, if a boundary of Zα is set at 1.96 for 2P = .05, the internal probability contains .475 of the data in the inner zone from Z = 0 to Z = 1.96, and .475 of the data in the zone from Z = 0 to Z = −1.96. The total inner probability is .475 + .475 = .95. The external probability is .05, representing the sum of the upper zone of .025 beyond Z = 1.96, and the lower zone of .025 below Z = − 1.96.

If only a one-tailed probability is desired, however, the internal probability contains the entire 50% of items in the zone below the mean, plus the demarcated group in the corresponding zone above the mean. Thus, for a one-tailed probability with Zα set at 1.96, the internal probability is 0.975, calculated as 0.5 (for the lower half of the data) + 0.475 for the zone from Z = 0 to Z = 1.96. The external probability is 0.025 for the zone beyond Z = 1.96.

This perfect symmetry is seldom present, alas, in the empirical data of the real world. For example, the values in Table 4.1 show ascending cumulative frequencies of 0.0255 when the chloride level is 89 and .9765 when the level is 110. These two chloride boundaries would offer a reasonably symmetrical inner zone (the ipr95) for the inner 95% of the data, but the chloride levels of 89 and 110 are not symmetrically located around either the median of 102 or the mean of 101.6.

The chloride values of Table 4.1 appear in this text by courtesy of the laboratories of Yale-New Haven Hospital. In response to a request seeking a Gaussian set of data, the laboratory produced a print-out of distributions for about 30 of the many chemical substances it measures. To ensure that the “sample size” would be large enough to show the true shape of the distribution, at least 2000 measurements were included in the set of data for each variable. The main point of this story is that none of the distributions was Gaussian. The chloride values were chosen for display here because they had the most “Gaussianoid” appearance of all the examined results.

The moral of the story is a warning to beware of using Gaussian statistics to describe medical data. The mean may be dislocated from the center of the distribution; the standard deviations may be affected by outliers or asymmetrical distributions; and the Gaussian Z-scores may yield misleading values for percentiles. The absence of a truly Gaussian pattern in most biologic data has been entertainingly described by Micceri9 as “the unicorn, the normal curve, and other improbable creatures.”

Gaussian statistics are most effectively used, as discussed later, for roles in statistical inference, not description. Gaussian confidence intervals can be employed for inferences about possible variation in a central index, but not for describing the data set itself. For the variations in a central index, as we shall see in the next two chapters, a reasonable mathematical justification can be offered for using the Gaussian strategy.

6.8Verbal Estimates of Magnitudes

The last aspect of probability to be considered in this chapter is the quantitative inconsistency with which different people attach numbers to verbal terms describing magnitudes such as amounts, frequen - cies, and probabilities. In 1976, after consulting with various authorities, U.S. President Gerald R. Ford said that a swine flu epidemic in the next winter was “a very real possibility.” When asked by a science writer,11 four experts estimated this probability as 2%, 10%, 35%, and “less than even.” Ever since this revelation, disagreement has often been studied for the quantitites implied when people use such verbal terms as a little, a lot, often, unlikely, and usually.

6.8.1Poetic License

One of the earliest recorded complaints about verbal imprecision was made by Charles Babbage, renowned as the inventor of a 19th century “calculating engine” that is often regarded as the earliest ancestor of today’s digital computers.

© 2002 by Chapman & Hall/CRC