Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002
.pdf
become widespread, however, and most elementary statistical instruction, as well as a large amount of discourse in this text, still employs the traditional “parametric” statistical strategies.
Nevertheless, the revolution is happening, and you should be prepared for its eventual results. Respectable statisticians today are even proposing such heresies as, “All techniques designed before the advent of automatic computing must be abandoned.”8 Until the revolution becomes totally established, however, the traditional statistical strategies will continue to be used and published. Consequently, the traditional strategies will be discussed here to prepare you for today’s realities. The new strategies will regularly be mentioned, however, to indicate things that can not only be done today, but routinely expected within the foreseeable future.
6.5Demarcating Zones of Probability
Regardless of whether a sample is constructed empirically or theoretically, zones of probability in the distribution have a particularly important role in both statistical description and statistical inference. Descriptively, the zones are used to demarcate the “bulk” of a set of data or to form a “range of normal.” Inferentially, analogous zones lead to decisions that a particular observation is too “uncommon” or “far out” to be accepted as a regular part of the distribution. For these purposes, boundary markers must be set for zones of probability.
6.5.1Choosing Zones
In gambling, we may want to know the probabilities for a single event, such as drawing a spade in cards or tossing a 7 in dice. In statistical activities, however, we often want to know the probabilities for a zone of events or values. We would seldom ask about the probability that a single chloride value will be exactly 101, but we might regularly want to know the chances that the value will lie in a zone below 93 or above 107.
A zone almost always refers to a series of directly contiguous dimensional (or ordinal) values in a distribution. In the chloride example just cited, the selected zone would extend from 93 to 107. Sometimes, however, a zone can be a cluster of selected noncontiguous values. For example, when you make a “field” bet in tosses of two dice, the successful zone consists of any member of the items 2, 3, 4, 9, 10, 11, or 12.
6.5.2Setting Markers
One way to set markers for a zone is from the cumulative relative frequencies associated with an actual or resampled distribution. The array of frequencies could be examined to note the values that demarcate zones of either chloride measurements in Table 4.1 or resampled means in Table 6.1.
A direct check of individual frequencies can be avoided, however, if the distribution has certain mathematical characteristics, such as the distributions eponymically cited with such names as Gauss, “Student,” and Bernouilli. If we know (or can assume) that a collection of data has one of these model distributions, we can easily find markers for the desired probabilities, without resorting to specific tabulations of frequencies.
This mathematical attribute has given the eponymic model distributions their statistical popularity. To determine probabilites, we first calculate a special index, called a “test statistic,” from the observed data. We then assume, based on mathematical justifications discussed later, that the test statistic is associated with a particular mathematical model such as the standard Gaussian distribution. The value of the test statistic then becomes the marker for a zone of probability. For example, the Z-score is often used as a test statistic that has specific probability values, shown in Section 4.8.3, for each of its locations in a standard Gaussian distribution. [Other well-known test statistics are referred to a t (or “Student”) distribution or to a chi-square distribution.]
© 2002 by Chapman & Hall/CRC
The percentiles and standard deviations discussed in Chapters 4 and 5 can thus be used in two ways: descriptively, to indicate specific zones for an empirical set of data, and inferentially, to demarcate zones of probability in a mathematical model.
6.5.3Internal and External Zones
The total cumulative probability in any distribution always has a sum of 1. This total can be divided into different sets of zones. An internal zone of data contains values that are within or smaller than the demarcated boundary. An external zone contains values that are outside or beyond the demarcated boundary. For example, the ipr95 has an internal cumulative probability of .95 and an external cumulative probability of .05. At the 97.5 percentile point, the internal cumulative probability is .975 (for values of data smaller than the associated percentile value), and the external cumulative probability is .025.
6.5.4Two-Zone and Three-Zone Boundaries
A single percentile point, such as P.95, demarcates two zones — one internal and the other external. Boundaries established by two percentiles, such as the 2.5 and 97.5 percentile points that form the ipr 95, will demarcate three zones of cumulative probability: a central internal zone, surrounded by two external zones. The distinctions are shown in Figure 6.1.
FIGURE 6.1
Cumulative probability zones for ipr95. Upper drawing shows the 95 percentile point as one boundary with two zones. Lower drawing shows the 2.5 and 97.5 percentile points as two boundaries with three zones.
|
Internal Zone |
= .95 |
External |
|
|
Zone |
|||
|
|
|
= .05 |
|
|
|
|
P.95 |
|
External |
|
|
External |
|
Zone |
Internal Zone |
= .95 |
Zone |
|
= .025 |
= .025 |
|||
|
|
|||
P.025 |
|
|
P.975 |
6.5.5P-Values and Tails of Probability
In many appraisals of probability, the focus is on the external zone, i.e., the chance of observing something more extreme than what is contained within the selected boundary. This external zone of cumulative probability is called a P-value. The external zone is often demarcated as the area located in the “tail” of the curve of relative frequencies for a mathematical model.
The cumulative aspect of the frequencies is omitted when the P-values or other cumulative results are cited as probabilities, but the cumulative feature is implicit in the word tail, which describes the location of the zone. Thus, if only two zones have been demarcated with a single boundary, the single external zone is called a one-tailed probability. A key distinction in one-tailed probabilities is stipulating a direction to indicate whether the focus of the external zone is on values that are above or below the selected boundary. If three zones have been formed from two boundaries, the two external zones are called a two-tailed probability. Two-tailed probabilities are always bi-directional, demarcating external zones that occupy both extremes of the distribution.
6.5.6Confidence Intervals and Internal Zones
For the entity called the confidence interval for a central index, the focus is on an internal zone of probability. It usually has two boundaries, set around the proposed location of the central index, for a
© 2002 by Chapman & Hall/CRC
zone within which the location might vary under circumstances discussed earlier and also later in Chapter 7.
6.5.7To-and-from Conversions for Observations and Probabilities
Observations in a set of data can readily be converted to probabilities, and vice versa. Beginning with observed (or cited) values of data, we can find probabilities; or beginning with probabilities, we can find observed values.
6.5.7.1 From Observations to Probabilities — For the data of Table 4.1, suppose we want to know the probability that a chloride value is below 93. The single cited boundary of 93 establishes a directional two-zone arrangement. For the extreme values below 93, the cumulative relative frequency is external to the boundary. For values of 93 or higher, the cumulative relative frequency is internal. In Table 4.1, the external probability for values below 93 is .0598 and the internal probability is .9402 (= l − .0598).
For chloride levels bounded by values of 93 and 107, the three zones have an internal probability for the zone of 93–107, an external probability for the zone below 93, and another external probability for the zone above 107. The respective values for these three probabilities in Table 4.l are .0598, .8598, and .0804.
Note that the internal and external probabilities in a frequency distribution always have a sum of 1. This important distinction is often disregarded, however, when people make subjective estimates about probabilities for a particular choice (such as a diagnosis or murder suspect) among a series of candidates.10 The total of the probability values estimated for individual candidates may exceed 1. Other human inconsistencies in using probabilities (and magnitudes) are discussed in Section 6.8.
6.5.7.2 From Probabilities to Observed Values — To determine cumulative relative frequencies or probabilities, we read horizontally rightward in Table 4.l beginning with the observed chloride values. We could also, however, go leftward, from cumulative frequencies toward chloride values. For example, from the definition of percentiles, we know that the 25th percentile occurs where the ascending cumulative frequency (or lower external probability) passes 0.25. Table 4.1 shows that the 25th percentile occurs at the chloride level of 99. The table also shows that the 50th percentile is at the chloride level of 102, and the 75th percentile at 105. An ipr95, extending from the 2.5 to the 97.5 percentile in Table 4.l, could be marked by the chloride levels of 89 to 110.
6.6Gaussian Z-Score Demarcations
Despite all the difficulties in non-Gaussian distributions, the standard deviation is a splendid way of demarcating inner zones when a distribution is truly Gaussian. In particular, the Z-scores determined from the standard deviation of Gaussian data will correspond directly to probability values discerned from the mathematical formula for the Gaussian curve.
One of these probability values, often called a probability density, is the height of the curve at a particular value, i.e., the relative frequency of items having the individually cited Z-score. These were the probability values shown with the Gaussian formula and small table in Section 4.8.2. (In a tabulation of either dimensional or categorical data, the probability density represents the proportion or relative frequency of the distribution occurring at a particular individual value.)
A different set of probability values refers to cumulative relative frequencies rather than the focal density at a cited value of Z. These cumulative proportions were shown for one side of the curve in Table 4.3 and for both sides in the in-text table in Section 4.8.3. Because of the symmetry and versatility of the Gaussian curve, a single value of Z can be associated with probabilities that represent several different zones of cumulative relative frequencies. These cumulative relative frequencies are the areas formed under the Gaussian curve.
© 2002 by Chapman & Hall/CRC
6.6.1One-Tailed Probabilities
A percentile value demarcates a one-tailed probability for the zone of data spanned as the distribution moves upward from its lowest value.
The top part of Figure 6.2 shows the demarcation for a positive value of Z. The zone of internal probability is the percentile value associated with Z, counting from the lower end of the distribution. The rest of the distribution contains the zone of external probability. As shown in the middle part of Figure 6.2, if Z has the same but negative value, the zone of internal probability again starts from the lower end of the distribution, but is smaller than in the first situation. The zones of internal and external probability in each situation are opposite counterparts.
In many appraisals, however, the data analyst is interested in external probabilities for values of Z that are more extreme, in the same direction, than the observed value. Thus, if Z = 1.3, we want external probabilities for values of Z that are Š 1.3. If Z is −1.3, we want external probabilities for values of Z that are ≤ − 1.3. In the latter situation, therefore, the external probability would be redesignated on the lower side of the Gaussian curve, as shown in the bottom part of Figure 6.2. The upper and lower figures become mirror images around the central value where Z = 0.
If the external one-tailed probability value is called P, it corresponds to a percentile value of 1 − P. For example, a Z-score of 1.0 in a Gaussian curve represents a one-tailed exterior probability (or P value) of .16. The corresponding interior probability or percentile point is 1 − .16 = .84. This same result can be obtained by noting in Table 4.3 that a Z-score of 1.0 has a cumulative relative probability of .34 on the right side of the curve. Because the left side of the curve contains a cumulative probability of .50, the total will produce an interior probability of .84 for the percentile point.
|
Z |
Zone of Internal |
Zone of External |
Probability |
Probability |
-Z |
|
Zone of Internal |
Zone of External |
Probability |
Probability |
-Z |
|
Zone of External |
Corresponding Zone |
Probability for |
of Internal |
More Extreme |
Probability |
Values of Z |
|
FIGURE 6.2
Zones of internal and external probability for a single value of Z in a Gaussian curve.
6.6.2Two-Tailed Probabilities
The proportion of data contained in the symmetrical inner zone formed between the positive and negative values of a Gaussian Z has an inner probability. The remaining data, on the two outside portions beyond the inner zone, have a total external probability, which is called a two-tailed
Pvalue.
The distinction is shown in Figure 6.3. In this instance,
if the Z-score is ±1, the external probability is .16 on one side of the curve and .16 on the other, so that the two-tailed P value is .32. The inner zone of probability is 1 − .32 =
.68. (The latter value should look familiar. It was mentioned in Section 5.3.2.2 with the statement that a zone of X ± s, i.e., one standard deviation around the mean, occupies 68% of a set of Gaussian data.)
Zone of
Internal
Probability
-Z |
+Z |
Zone of External |
Zone of External |
Probability |
Probability |
FIGURE 6.3
Zones of internal and external probability for the customary value of± Z in a Gaussian curve.
© 2002 by Chapman & Hall/CRC
6.6.3Distinctions in Cumulative Probabilities for Z-Scores
In special tables or in computer print-outs for a Gaussian distribution, the Z-scores and the corresponding cumulative probabilities are almost always shown as two-tailed (external) P values. These can easily be halved to denote the one-tailed P values, but the distinctions must be carefully considered if you want to determine percentiles, to find internal rather than external probabilities, or to go back and forth from Z values to P values. Table 6.3 shows the corresponding one-tailed and two-tailed cumulative probability values of Z-scores in a standard Gaussian distribution.
To find internal probabilities or percentiles, the cited external probabilities must be suitably subtracted. With a two-tailed P value, symbolized as 2P, the internal probability (for the zone of inner values) is 1 − 2P. For a one-tailed inner zone of probability, however, the two-tailed P value must be halved before it is subtracted from 1. Because percentile points represent one-tailed probability values, the Z-score corresponding to the 95th percentile would be 1.645, because it has an external one-tailed probability of .05.
TABLE 6.3
Cumulative Relative Frequencies and P Values Associated with Z Values in a Standard Gaussian Distribution
|
One-Tailed |
|
Two-Tailed |
|
Z |
External Probability* |
|
External Probability |
Internal Probability |
|
|
|
|
|
0 |
.5000 |
1 |
0 |
|
±0.500 |
.3090 |
.6170 |
.3830 |
|
±0.674 |
.2500 |
.5000 |
.5000 |
|
±1.000 |
.1585 |
.3170 |
.6830 |
|
±1.282 |
.1000 |
.2000 |
.8000 |
|
±1.500 |
.0670 |
.1340 |
.8660 |
|
±1.645 |
.0500 |
.1000 |
.9000 |
|
±1.960 |
.0250 |
.0500 |
.9500 |
|
±2.000 |
.0230 |
.0460 |
.9540 |
|
±2.240 |
.0125 |
.0250 |
.9750 |
|
±2.500 |
.0060 |
.0120 |
.9880 |
|
±2.576 |
.0050 |
.0100 |
.9900 |
|
±3.000 |
.0073 |
.0027 |
.9973 |
|
±3.290 |
.0005 |
.0010 |
.9990 |
|
*For positive Z values only. For negative Z values, the external probability (for values higher than −Z) is 1 minus the cited result. For example, at Z = − 1.282, the one-tailed external probability is 1 − .1 = .9.
6.6.3.1“Decreasing Exponential” Z/P Relationship — As Z gets larger, the external twotailed P values get smaller. Since P is an area under the curve, the Z and P values have a “decreasing exponential” rather than linear relationship in Gaussian curves. The distinctions are shown in Figure 6.4. A 19% increase in Z, from 1.645 to 1.96, will halve the two-tailed P value, from .1 to .05. A 40% decrease in Z, from 3.29 to 1.96, will produce a 50-fold increase in the two-tailed P, from .001 to .05. Consequently, whenever Z exceeds 1, small changes in Z can have a major proportionate impact on P in the zones used for critical decisions about “significance.”
If interested in one-tailed P values, we would find the Z-score associated with twice the one-tailedP. This one-tailed value for Z will be smaller, but not half as small, as the two-tailed value. Thus, for a two-tailed P of .025, Z is 2.24; but for a one-tailed P of .025, Z is 1.96, which corresponds to a twotailed P of .05. A Z-score of 1.645 has a P of .1 when two-tailed and .05 when one-tailed.
6.6.3.2Ambiguous Interpretations — These Z/P relationships are particularly noteworthy when external probabilites are cited for P values that represent statistical inferences rather than direct descriptions. In tests of statistical inference, as noted later, many investigators prefer to use one-tailed
©2002 by Chapman & Hall/CRC
Two-tailed
P Values
.65
.60
.55
.50
.45
.40
.35
.30
.25
.20
.15
.10
.05
0 |
.5 |
1.0 |
1.5 |
2.0 |
2.5 |
3.0 |
3.5 |
Z-scores
FIGURE 6.4
Relationship of 2- tailed P values and Z-Scores in a Gaussian curve.
P values because, for the same value of Z, the one-tailed P will be lower (and hence more “significant”) than the two-tailed value. A currently unresolved dispute, discussed later in Chapter 11, has arisen between advocates of the two opposing policies: to always demand 2-tailed or sometimes accept 1-tailed P values for expressing “significance.” Regardless of which policy is adopted, however, a reader will need to know how the decisions were made by the investigator (or editor).
If the inferential result is merely cited as “P < .05,” however, the reader will not know whether it is one-tailed or two-tailed. To avoid ambiguity, P values can be reported as “2P < .05” when two-tailed and “P < .05” when one-tailed. This convention has begun to appear in published medical literature, but is not yet common. Until it becomes universal, writers should specify (in the associated text) whether P values are one-tailed or two-tailed.
6.6.4Designated and Observed Zones of Probability
Special symbols are used to distinguish a previously designated zone of probability from zones that are observed in the data. For example, we might decide to give the accolade of high honors to anyone who ranks in or above the 95th percentile in a certification test. Someone who scores in the 97th percentile would receive this acclaim, but not someone who is in the 89th.
The Greek symbol α is used for designated levels of these zones. For the preceding percentile example, Pα was set at ≥ .95. In many statistical decisions, α ≤ .05 is commonly set as the boundary level for P values that will be called “significant.” In statistical decisions, the Z value that corresponds to α is marked Zα . For a two-tailed α = .05, Z.05 = 1.96.
6.6.4.1Inner Percentile Zones and Symbols — For inner percentile ranges, the correspond-
ing zones are based on two-tailed internal probabilities. If α represents the two-tailed external probability, the inner zone is determined as 1 − α .
To find an ipr95, the corresponding zone is 1 − α = .95, and the two-tailed α |
is .05. If you want an |
ipr90, α = .1 and the Gaussian Z.1 = 1.645. For an ipr50, or interquartile range, α = |
.5 and Z.5 = .674. |
© 2002 by Chapman & Hall/CRC
6.6.4.2Observed (Calculated) Values — If we start with observed data and want to determine probabilities, no Greek symbols are used. The value of Z is calculated as (Xi – X) /s, and is simply marked Z or Zi. We then find the probability (or P value) associated with that value of Z. (Later in the text, to distinguish the observed value of Z from other possibilities, it will be cited as Z0 and the corresponding P value will be P0 .)
6.6.4.3Limited Tables of Z/P Values — With modern computer programs, a precise value of P, such as .086, is printed for each calculated Z value, such as 1.716. Before this computational convenience became available, however, Z values had to be interpreted with published tables showing
the P values that correspond to selected values of Zα . These tables, which resemble Table 4.3, are still needed and used if Z is determined with a hand calculator or other apparatus that does not immediately
produce the corresponding P value.
If the available tables do not show all possibilities, the observed Z value will often lie between the
tabular values for cited levels of Zα . Consequently, a Z value such as 1.716 is often reported with the interpolated statement .05 < P < .1, indicating that P lies between two pertinent Z α values, which in this instance are Z.05 = 1.96 and Z.1 = 1.645. If a particular boundary of Zα has been set for “statistical significance,” as discussed later, only the most pertinent level of α is usually indicated, with statements such as P < .1 or P > .05.
6.6.4.4Wrong Expression of Intermediate Values — When a P value lies between two values of α , such as .1 and .05, the > and < signs are sometimes placed incorrectly, so that the expression becomes an impossible .1 < P < .05. To avoid this problem, always put the lowest value on the left, as it would appear on a graph. The best expression for the intermediate P is then .05 < P < .1.
6.6.5Problems and Inconsistencies in Symbols
As long as α , Zα , and P always represent two-tailed values and selections, the symbols are clear and unambiguous. The α , Zα , and P symbols, however, traditionally refer to two-tailed probabilities. When the symbols (as well as the corresponding values) are modified for one-tailed applications, confusion can be produced by the inverse relationship that makes Z and P go in opposite directions. If 2P and P respectively represent two-tailed and one-tailed external probabilities determined from the data, the modified symbols are clear, except perhaps for readers who do not realize that P (alone) will then specifically represent one tail rather than the conventional two. To avoid this problem, the solo P can be modified with the parenthetical remark “(one-tailed).”
The big problems arise in choosing subscripts for Z, particularly when it demarcates boundaries against which the values of P or 2P will be compared for decisions about “statistical significance.” In most special tabulations of Z values, the associated P values are two-tailed. For a two-tailed α , the total external zone of α is split into two parts, each containing α /2, as shown in Figure 6.5. A 2P value is therefore compared against a boundary that has α /2 externally on each side.
In a one-tailed procedure, all of the external probability is unilateral, occupying the full value of α rather than α /2. To allow the external probability on one side of the curve to be α , the selected demarcation point for the ordinarily two-tailed Z would be Z2α . With this demarcation in Figure 6.6, an α /2 probability zone is transferred from the left to the right side of the curve. Having been placed under a higher point in the curve, the additional α /2 area is achieved with a smaller horizontal distance for Z than was previously needed at the other end of the curve. Thus, the values for Zα and Z2α do not correspond proportionately to the relative magnitudes of α and α /2.
The probability zones, demarcations, and symbols can be cited as follows for two-tailed and onetailed decisions:
|
Probability in |
Demarcation |
Symbol for |
Decisions |
Pertinent Zones |
Point |
P Value |
|
|
|
|
Two-tailed |
α /2 |
Zα |
2P |
One-tailed |
α |
Z2α |
P (one-tailed) |
© 2002 by Chapman & Hall/CRC
α |
/2 |
α /2 |
α /2 |
|
|
α /2
|
Z2α |
Zα |
+ Zα |
|
|
FIGURE 6.5 |
FIGURE 6.6 |
|
Demarcation of two α /2 zones of probability by ± Zα . |
Demarcation of a one-tailed probability zone for α . |
|
For two-tailed decisions with α = .05, Z.05 is set at 1.96; the inner 95% zone has two exterior zones that each have a probability of .025; and the corresponding internal probability is 1 − α = .95. For onetailed decisions with α = .05, the 2α demarcation point is set at Z.10 = 1.645; the one-tailed external probability is .05, and the internal probability is still l − α = .95.
A reasonably complete set of values in a Z/P tabulation is shown in Table 6.4, where values of 2P are shown for positive values of Z extending from 0 to about 3.9. Because almost all published tables of probability show two-tailed values, the Z boundaries are sometimes labelled as Zα /2, to clarify their connotation. Regardless of the label used for Z, however, the corresponding value of P in any two-tailed Z/P table is halved to get the one-tailed P value.
To try to avoid confusion, the symbols P, α , and Zα will be used here in an uncommitted general way to denote observed or specified levels of probability. Most discussions will refer to the conventional two-tailed approach, but one-tailed directions will often be considered. To allow this flexibility, the symbols will be “P” for the general probability, “2P” when it is specifically two-tailed, and “(one-tailed) P” when it is one-tailed. The α and Zα symbols will be used as such, without recourse to the confusing forms of α /2, 2α , Zα /2 or Z2α . The oneor two-tailed direction of α or Zα will always be cited in the associated text. Although perhaps undesirable for mathematical rigor, the apparent inconsistency has the pedagogic advantage of being reasonably clear and permitting an unconstrained discussion. If not otherwise specified, P and Zα can be considered as two-tailed.
6.6.6Determining Gaussian Zones
After Zα is chosen, Gaussian zones of data are easy to determine when Zα is multiplied by the observed standard deviation, s, to form Zα s. The latter value is then added and subtracted from the mean, X , to form X ± Zα s. In Chapter 4, the value of 2 was used instead of the more precise 1.96 to produce an inner 95% Gaussian zone as X ± 2s. The Zα multiplying factor can be changed, when desired, to form Gaussian zones of different sizes, such as ipr90 or ipr50. The corresponding values of α will be .10 and
.50; and the corresponding values of Zα (derived from Table 6.4) will be 1.645 and .674.
With smaller values of Zα , the inner zones will shrink as 1 − α gets smaller. Thus, in a set of data where X = 10 and s = 3, the corresponding Gaussian zones will be as follows:
ipr |
α |
Zα |
Zα s |
Extent of |
|
± Z α s |
X |
||||||
95 |
.05 |
1.906 |
5.880 |
4.120 to 15.880 |
||
90 |
.10 |
1.645 |
4.935 |
5.065 to 14.935 |
||
50 |
.50 |
.674 |
2.022 |
7.978 to 12.022 |
||
© 2002 by Chapman & Hall/CRC
TABLE 6.4
Values of 2P for Positive Values of Z
Z |
0.000 |
0.002 |
0.004 |
0.006 |
0.008 |
|
|
|
|
|
|
0.00 |
• |
99840 |
99681 |
99521 |
99362 |
0.04 |
98609 |
96650 |
96490 |
96331 |
96172 |
0.08 |
93624 |
93465 |
93306 |
93147 |
92988 |
0.12 |
90448 |
90290 |
90132 |
89974 |
89815 |
0.16 |
87288 |
87131 |
86973 |
86816 |
86658 |
0.20 |
84148 |
83992 |
83835 |
83679 |
83523 |
0.24 |
81033 |
80878 |
80723 |
80568 |
80413 |
0.28 |
77948 |
77794 |
77641 |
77488 |
77335 |
0.32 |
74897 |
74745 |
74594 |
74442 |
74291 |
0.36 |
71885 |
71735 |
71586 |
71437 |
71287 |
0.38 |
70395 |
70246 |
70098 |
69950 |
69802 |
0.40 |
68916 |
68768 |
68621 |
68474 |
68327 |
0.42 |
67449 |
67303 |
67157 |
67011 |
66865 |
0.44 |
65994 |
65849 |
65704 |
65560 |
65415 |
0.46 |
64552 |
64408 |
64265 |
64122 |
63978 |
0.48 |
63123 |
62981 |
62839 |
62697 |
62555 |
0.50 |
61708 |
61567 |
61426 |
61286 |
61145 |
0.52 |
60306 |
60167 |
60028 |
59889 |
59750 |
0.54 |
58920 |
58782 |
58644 |
58507 |
58369 |
0.56 |
57548 |
57412 |
57275 |
57139 |
57003 |
0.58 |
56191 |
56057 |
55922 |
55788 |
55653 |
0.60 |
54851 |
54717 |
54584 |
54451 |
54319 |
0.62 |
53526 |
53394 |
53263 |
53131 |
53000 |
0.64 |
52217 |
52087 |
51958 |
51828 |
51698 |
0.66 |
50925 |
50797 |
50669 |
50541 |
50413 |
0.68 |
49650 |
49524 |
49398 |
49271 |
49145 |
0.7 |
48393 |
47152 |
45930 |
44725 |
43539 |
0.8 |
42371 |
41222 |
40091 |
38979 |
37886 |
0.9 |
36812 |
35757 |
34722 |
33706 |
32709 |
1.0 |
31731 |
30773 |
29834 |
28914 |
28014 |
1.1 |
27133 |
26271 |
25429 |
24605 |
23800 |
1.2 |
23014 |
22246 |
21498 |
20767 |
20055 |
1.3 |
19360 |
18684 |
18025 |
17383 |
16759 |
1.4 |
16151 |
15561 |
14987 |
14429 |
13887 |
1.5 |
13361 |
12851 |
12356 |
11876 |
11411 |
1.6 |
10960 |
10523 |
10101 |
09691 |
09296 |
1.7 |
08913 |
08543 |
08186 |
07841 |
07508 |
1.8 |
07186 |
06876 |
06577 |
06289 |
06011 |
1.9 |
05743 |
05486 |
05238 |
05000 |
04770 |
2.0 |
04550 |
04338 |
04135 |
03940 |
03753 |
2.1 |
03573 |
03401 |
03235 |
03077 |
02926 |
2.2 |
02781 |
02642 |
02509 |
02382 |
02261 |
2.3 |
02145 |
02034 |
01928 |
01827 |
01731 |
2.4 |
01640 |
01552 |
01469 |
01389 |
01314 |
2.5 |
01242 |
01174 |
01109 |
01047 |
00988 |
2.6 |
00932 |
00879 |
00829 |
00781 |
00736 |
2.7 |
00693 |
00653 |
00614 |
00578 |
00544 |
2.8 |
00511 |
00480 |
00451 |
00424 |
00398 |
2.9 |
00373 |
00350 |
00328 |
00308 |
00288 |
3.0 |
00270 |
00253 |
00237 |
00221 |
00207 |
3.2 |
00137 |
00128 |
00120 |
00111 |
00104 |
|
|
|
|
|
|
3.4: 00067 |
3.5: 00047 |
3.6: 00032 |
3.7: 00022 |
3.8: 00014 |
3.891: 00010 |
Note: The three decimal places in the top headings become reduced to two when Z reaches 0.7. Thus, for Z = .82, the 2P value is found as .41222 in the column marked 0.002. For Z = 1.96, the value of 2P in the 0.006 column is 0.5000. [Table derived from page 28 of Geigy Scientific Tables, Vol. 2, ed. by C. Lentner, 1982. Ciba-Geigy Limited. Basle, Switzerland.]
© 2002 by Chapman & Hall/CRC
6.7Disparities Between Theory and Reality
In a Gaussian distribution, the two sides of the curve are symmetrical around the center. No matter what boundary is chosen, the external two-tailed probabilities will be identical and equally divided. Thus, if a boundary of Zα is set at 1.96 for 2P = .05, the internal probability contains .475 of the data in the inner zone from Z = 0 to Z = 1.96, and .475 of the data in the zone from Z = 0 to Z = −1.96. The total inner probability is .475 + .475 = .95. The external probability is .05, representing the sum of the upper zone of .025 beyond Z = 1.96, and the lower zone of .025 below Z = − 1.96.
If only a one-tailed probability is desired, however, the internal probability contains the entire 50% of items in the zone below the mean, plus the demarcated group in the corresponding zone above the mean. Thus, for a one-tailed probability with Zα set at 1.96, the internal probability is 0.975, calculated as 0.5 (for the lower half of the data) + 0.475 for the zone from Z = 0 to Z = 1.96. The external probability is 0.025 for the zone beyond Z = 1.96.
This perfect symmetry is seldom present, alas, in the empirical data of the real world. For example, the values in Table 4.1 show ascending cumulative frequencies of 0.0255 when the chloride level is 89 and .9765 when the level is 110. These two chloride boundaries would offer a reasonably symmetrical inner zone (the ipr95) for the inner 95% of the data, but the chloride levels of 89 and 110 are not symmetrically located around either the median of 102 or the mean of 101.6.
The chloride values of Table 4.1 appear in this text by courtesy of the laboratories of Yale-New Haven Hospital. In response to a request seeking a Gaussian set of data, the laboratory produced a print-out of distributions for about 30 of the many chemical substances it measures. To ensure that the “sample size” would be large enough to show the true shape of the distribution, at least 2000 measurements were included in the set of data for each variable. The main point of this story is that none of the distributions was Gaussian. The chloride values were chosen for display here because they had the most “Gaussianoid” appearance of all the examined results.
The moral of the story is a warning to beware of using Gaussian statistics to describe medical data. The mean may be dislocated from the center of the distribution; the standard deviations may be affected by outliers or asymmetrical distributions; and the Gaussian Z-scores may yield misleading values for percentiles. The absence of a truly Gaussian pattern in most biologic data has been entertainingly described by Micceri9 as “the unicorn, the normal curve, and other improbable creatures.”
Gaussian statistics are most effectively used, as discussed later, for roles in statistical inference, not description. Gaussian confidence intervals can be employed for inferences about possible variation in a central index, but not for describing the data set itself. For the variations in a central index, as we shall see in the next two chapters, a reasonable mathematical justification can be offered for using the Gaussian strategy.
6.8Verbal Estimates of Magnitudes
The last aspect of probability to be considered in this chapter is the quantitative inconsistency with which different people attach numbers to verbal terms describing magnitudes such as amounts, frequen - cies, and probabilities. In 1976, after consulting with various authorities, U.S. President Gerald R. Ford said that a swine flu epidemic in the next winter was “a very real possibility.” When asked by a science writer,11 four experts estimated this probability as 2%, 10%, 35%, and “less than even.” Ever since this revelation, disagreement has often been studied for the quantitites implied when people use such verbal terms as a little, a lot, often, unlikely, and usually.
6.8.1Poetic License
One of the earliest recorded complaints about verbal imprecision was made by Charles Babbage, renowned as the inventor of a 19th century “calculating engine” that is often regarded as the earliest ancestor of today’s digital computers.
© 2002 by Chapman & Hall/CRC
