Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Ординатура / Офтальмология / Английские материалы / Principles Of Medical Statistics_Feinstein_2002

.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
25.93 Mб
Скачать

rate. The time duration becomes more evident when a streptococcal infection incidence rate of .60 over 3 years becomes reported as an average annual incidence of .20. [This type of simple average can be erroneous for mortality rates. If .60 of a cohort dies over 3 years, and if the annual mortality rate is said to be .20, the survival rate each year will be .80. At the end of 3 years, the proportion of people still alive will be .80 × .80 × .80 = .51. The 3-year mortality rate will be .49, not .60.]

Time durations also become an evident and integral part of the computation process when actuarial or other methods, discussed in Chapter 22, are used to adjust incidence rates for numerator losses.

17.3.3Repetitions

In many incidence rates, the numerator condition is a “failure event” — such as death, occurrence of myocardial infarction, or unwanted pregnancy — that ends each person’s period of observation. Sometimes, however, the numerator is a streptococcal or urinary tract infection that can occur repeatedly.

The repeated-occurrence events produce two types of problems in deciding how to express the attack rates. The first problem is whether to count the number of events or the number of persons in whom events occurred. For example, suppose 45 urinary tract infections are noted when 300 women are followed for an interval of time. Does the “45” represent 45 infections in different women, or perhaps 10 infections in one woman, 5 infections in each of 3 women, 2 infections in each of 8 women, and 4 women who had 1 infection each. Either approach can be used, but if the numerator refers to infected women rather than number of infections in this group, the attack rate is 16/300 rather than 45/300.

A second problem involves a decision about how to express unequal durations for people who have been followed for different lengths of time. If the numerator is a single “failure” event, unequal durations of observation can be managed with the actuarial methods of “survival” adjustment described in Chapter 22. Because these adjustments cannot easily be used for repeated events, the customary proce - dure is to convert the denominator from a count of persons to a sum of person-durations. Thus, if 20 women have been followed for 0.5 years, 30 for 1 year, 50 for 1.5 years, 60 for 2 years, 90 for 2.5 years, and 50 for 3 years, the denominator would contain 610 person-years of follow-up, calculated as (20 × .5) + (30 × 1) + (50 × 1.5) + (60 × 2) + (90 × 2.5) + (50 × 3). If 45 urinary tract infections occurred in that group during that duration, the attack rate would be 45/610 = 7.4% per patient-year.

17.4 Issues in Eligibility

When the problem of quasi-risks was discussed in Section 17.2.4, persons were regarded as ineligible for the denominator if they could not possibly develop the numerator event, or if it had already occurred.

17.4.1Exclusions for New Development

The previous existence of the event seems an appropriate reason for exclusion if the study is concerned with conditions such as cancer, coronary disease, cerebrovascular disease, or pregnancy, whose new development is the main focus of the research. If no attempt is made to remove the ineligible persons, the denominator will be wrong, and the subsequent incidence rates will be inaccurate.

For example, suppose we institute a program to prevent a relatively common ailment, such as coronary disease, in a community of 100,000 people. If 2000 people have coronary disease that is already present (but perhaps unrecognized), the eligible denominator should be 98,000. At the end of the program, suppose coronary disease has become identified in 1500 of the 2000 people who already had it, and in 1000 new cases. The correct incidence rate would be 1000/98,000 = .0102. The incidence rate may be incorrectly cited, however, as 2500/100,000 = .025.

17.4.2Inclusions for Repetitions

A different type of problem arises if the numerator event can occur repeatedly, because persons who have had it once may get it again. Clinical examples of such events are recurrent episodes (often called

© 2002 by Chapman & Hall/CRC

“attacks”) of streptococcal infection, rheumatic fever, asthma, acute pulmonary edema, epileptic seizure, migraine headache, urinary tract infection, transient ischemic attack, syncope, severe depression, pregnancy, or dysmenorrhea. In these circumstances, persons who have already had the event once are not merely eligible for inclusion in the denominator; they may sometimes be the main focus of the research.

For example, continuous prophylactic medication would seldom be given to prevent rheumatic fever, epilepsy, asthma, migraine, or urinary tract infection in someone who has never had an episode, but might be used for persons whose susceptibility to recurrence has been demonstrated by an initial attack. On the other hand, continuous prophylactic medication to lower blood sugar, blood pressure, or blood lipids might be used in persons who may or may not have already manifested a “vascular complication.”

In the absence of a standard set of guidelines, the decisions about eligibility for repetition are arbitrary. Regardless of what is chosen, however, the criteria should always be clearly stated in the published report, and the results should always be analyzed separately for persons with and without previous episodes of the numerator event.

17.5 The Odds Ratio

The odds ratio is a unique statistical index of risk. Calculated as (pA/pB) (qB/qA), it is applicable only to a contrast of two proportions (pA vs. pB) and, in the world of medicine, is used almost exclusively for epidemiologic statistics.

As discussed earlier throughout Section 10.7, the odds ratio in a fourfold table is constructed as ad/bc. It is a contrast not of the actual proportions, but of the numbers in the four cells that are components of the proportions. The two denominators for the proportions themselves would be constructed either with (a + b) and (c + d) or with (a + c) and (b + d). Thus, a simple ratio that contrasts the two proportions might be [a/(a + b)]/[c/(c + d)]. Because the denominators are ignored or “ablated,” the odds ratio has an appeal that makes it sometimes regarded as the best single index for summarizing results of a 2 × 2 table.

In the world of epidemiology, however, the main attraction of the odds ratio is that it makes etiologic research easy to do. In suitable circumstances, the odds ratio obtained from a quick, inexpensive casecontrol study can approximate the risk ratio obtained from a protracted, expensive cohort study. A specific numerical example of this accomplishment was not cited earlier and is offered now.

17.5.1Illustration of Numbers in a Cohort Study

Suppose omphalosis usually occurs in about one person per thousand, and suppose we suspect that tea drinking helps cause omphalosis, raising its rate of occurrence to about five per thousand. To get supportive evidence for this etiologic suspicion, we might assemble a cohort of 10,000 people, of whom 2000 are tea drinkers and 8000 are non-tea drinkers. After following these people for the next 20 years, we might find that omphalosis had developed as shown in Table 17.1.

TABLE 17.1

Occurrence of Omphalosis in a Hypothetical Cohort Study

 

Omphalosis

No Omphalosis

Total

 

 

 

 

Tea Drinkers

10

1990

2000

Non-Tea Drinkers

8

7992

8000

TOTAL

18

9982

10,000

 

 

 

 

For the people in Table 17.1, the risk (i.e., the rate of occurrence) of omphalosis is 10/2000 = .005 in tea drinkers and 8/8000 = .001 in non-tea drinkers. These two risks can be contrasted in a single index, the direct risk ratio, which is .005/.001 = 5.

© 2002 by Chapman & Hall/CRC

Alternatively, we can calculate an odds ratio from these data by determing the odds for tea drinking in omphalotic patients (= 10/8 = 1.25) and in non-omphalotic patients (= 1990/7992 = .249). The odds ratio will be 1.25/.249 = 5.02 and (in this instance) will be quite similar to the risk ratio.

To do the cohort study, however, would require assembling 10,000 people — tea drinkers and non-tea drinkers — and following them for the next 20 years to determine which persons develop omphalosis. This type of huge, long-term project does not attract many potential investigators, even if ample funds were available to collect the 10,000 people and maintain them under observation for the next 20 years. As a substitute research structure, the “retrospective” case-control study offers an alternative simpler strategy for exploring the etiologic question.

17.5.2Illustration of Numbers in a Case-Control Study

Instead of starting at the beginning of the causal pathway, with “exposed” tea drinkers and “non-exposed” non-tea drinkers, the investigator starts at the end, assembling a group of people who have already developed omphalosis. A comparative group, called controls, is chosen from persons without omphalosis.

After being selected, members of the case and control groups are asked about their antecedent intake of tea. From this interview, each person might be classified as a tea-drinker or non-tea drinker, and a fourfold table is created. If no distortions (or biases) have occurred in the itinerary between tea-drinking (or non-tea-drinking) and the subsequent occurrence (or nonoccurrence) of omphalosis, the proportion of tea drinkers noted in the cases and controls will be the same as what would have been found in the larger, longer, forward-directed cohort study of 10,000 people.

For example, suppose the investigator assembles 90 cases of patients with omphalosis, and 90 controls who lack omphalosis. In the omphalosis group, the odds for tea drinking should be 10/8, and so 10 of every 18 cases should be tea drinkers. Of 90 cases, 50 would be tea drinkers. In the non-omphalosis group, the odds for tea drinking should be 1990/7992. For only 90 controls, the appropriate number would be about 18 tea drinkers. The investigator’s fourfold table would produce the results shown in Table 17.2.

TABLE 17.2

Case-Control Study of the Same Topic Presented in Table 17.1

 

Cases

Controls

Total

 

 

 

 

Tea Drinkers

50

18

68

Non-Tea Drinkers

40

72

112

TOTAL

90

90

180

 

 

 

 

The results in Table 17.2 do not allow a risk or incidence rate to be determined because the patients were not assembled as tea drinkers or non-tea drinkers. If the “risk” of omphalosis were calculated in the tea drinkers of this table, the result would yield the unbelievably high 50/68 = .74, instead of the correct rate of .005 found in Table 17.1. Similarly, the “risk” of omphalosis in non-tea drinkers would be 40/112 = .36, which is about 360 times higher than the true risk of .001. Furthermore, if a “risk ratio” were calculated from these two “risks,” the result would be .74/.36 = 2.06, which is substantially lower than the true risk ratio of 5.

On the other hand, if the odds ratio for this table is calculated as (50/40)/(18/72) or (50 × 72)/(40 × 18), the result is 5 and is identical to the true risk ratio. [The algebra that accounts for this similarity was shown earlier in Section 10.7.4.1 and in Appendix A.10.1 of Chapter 10.]

17.5.3Ambiguities in Construction

Just as odds can always be stated for or against a particular occurrence, the odds ratio is always ambiguous

because it can always be constructed in two ways. In a table that has

a b

 

as its four cells, the columns

 

c d

 

 

b a . The rows might

could equally well have been reversed, so that the same four numbers would be

 

 

 

 

 

 

 

 

 

d c

© 2002 by Chapman & Hall/CRC

also have been reversed to make the table either

 

d c

or

 

c d

. The first and fourth of these four

 

a b

 

b a

 

 

 

 

 

 

 

arrangements will produce ad/bc as the odds ratio. The second and third will produce the reciprocal

value, bc/ad.

The choice of arrangements and expressions depends on what point is being communicated. If we think that exposure helps promote a disease, the odds ratio is usually constructed with the exposed-and- diseased cell in the upper left corner. The ad/bc odds ratio will then exceed 1. If exposure protects against disease (as in tests of vaccines), this arrangement will produce an odds ratio below 1.

On the other hand, if exposure is believed to protect against disease, we may want to cite the risk ratio for the unexposed group. In the latter instance, the cell for the unexposed-and-diseased group is placed in the upper left corner, and the ad/bc odds ratio will exceed 1. Thus, an odds ratio that indicates a reduced or “protective” risk of 0.23 in the exposed group also indicates an increased or “promoted” risk of 1/.23 = 4.35 in the unexposed group.

17.5.4Incalculable Results

The word incalculable is often used to denote something too large to be precisely quantified. For odds ratios, however, this word literally means “cannot be calculated” — an event that happens whenever any one of the four cells has a value of 0. Because we must be able to determine both ad/bc and bc/ad, a zero cell would produce an unacceptable division by 0.

Reluctant to give up the cherished odds ratio, biostatisticians have proposed that 0.5 be added to each cell when this problem arises. Consider the controversial case-control study9 that associated clear-cell adenocarcinoma of the vagina or cervix (CCVC) with intrauterine gestational exposure to diethylstilbestrol (DES). The results were as follows:

Antecedent Exposure to DES

Cases of CCVC

Control Group

 

 

 

Exposed

7

0

Non-Exposed

1

32

 

 

 

With the “add 0.5 to each cell” tactic, the odds ratio in these data would be calculated as (7 + .5) (32 + .5)/[(0 + 0.5)(1 + .5)] = 325.

17.5.5Stochastic Appraisals

Like any other index of contrast, the odds ratio can be appraised stochastically with a P value or confidence interval.

17.5.5.1P Values — As in any 2 × 2 table, a P value for the results of an odds ratio can be obtained with a Fisher exact test or a chi-square test. In many epidemiologic studies, the numbers are small enough to warrant the Fisher test.

17.5.5.2Confidence Intervals — Determining a confidence interval for the odds ratio is tricky because it does not have a symmetrical span of values, and because it is constructed from two quotients, not one.

If the odds ratio exceeds 1, its value can range in the interval from 1 to almost infinity. If the ratio is <1, however, the values can range only in the relatively small interval between 1 and 0. To avoid this

inappropriate asymmetry, confidence intervals are usually calculated for the natural logarithm (“ln”) of the odds ratio. Because ln 1 = 0, these intervals will be spread symmetrically around zero. Thus, if the odds ratio is 4, ln 4 = 1.39; if the ratio is 0.25 (= 1/4), ln .25 = −1.39. When the symmetric logarithmic

calculation is converted to ordinary numbers, however, the interval will then be asymmetrical. Because the confidence interval involves two quotients that can often have small numerical constitu-

ents, the calculation has evoked several mathematical strategies. They are mentioned here mainly so you

©2002 by Chapman & Hall/CRC

7 01 32
17.5.5.2.1 Woolf Simple Method.

will have heard of them and be able to recognize that the different results do not always agree. You need not try to learn (or even understand) the formulas. The three most prominent methods are eponymically named for Woolf,10 Cornfield,11 and Miettinen.12 Generically, they are sometimes called respectively the simple, exact, and test-based methods. In several mathematical comparisons,13,14 Cornfield’s exact method was the preferred approach, particularly for small sample sizes.

Using o as the symbol for odds ratio, a standard error can be calculated for ln o in a relatively simple method proposed by Woolf. 10 The method relies on calculating standard error of ln o with the formula

SE = (1 a) + (1 b ) + (1 c ) + (1 d )

[17.1]

The a, b, c, d symbols are the frequency counts in each cell after 0.5 has been added to the cell. After

the confidence interval is prepared in the usual manner as ln o ± Zα (SE), the limits of the interval are converted to customary units by taking the antilogarithm, i.e., eln(o) , for the lower and upper values.

For example, in Table 17.2, SE = (1 50.5 ) + (1 18.5) + (1 40.5 ) + (1 72.5 ) = .112 = .335. The logarithmic 95% confidence interval will be ln 5 ± (1.96) (.335) = 1.609 ± .657. This confidence interval will extend from .953 to 2.266. Because e .953 = 2.59 and e2.266 = 9.64, the boundaries for the 95% confidence

interval around the odds ratio of 5 will

be

2.59 and 9.64. For the table in Section 17.5.4, SE =

(1 7.5 ) + (1 0.5 ) + (1 1.5) + (1 32.5 )

=

2.83 = 1.68; and ln 325 ± (1.96) (1.68) = 5.784 ± 3.298.

Suitable calculations and conversions will then show that the 95% confidence interval for this odds ratio extends from 12 to 8794.

Breslow and Day15 are willing to construct confidence intervals this way, but Gart and Thomas16 say the limits will generally be too narrow, especially for small-sized groups. Fleiss17 recommends using Formula [17.1] to gauge “precision” for the odds ratio but not to calculate the actual confidence interval.

17.5.5.2.2Cornfield Exact Method. In a tactic originally suggested by Fisher18 and reminiscent of the array of tables used for the Fisher exact test, Cornfield11 developed an exact method for getting an odds ratio’s confidence interval. The calculations are complex and computer-intensive because iterative numerical solutions are required for quartic equations. Because of the calculational complexity, Cornfield’s method is seldom used. If you do a lot of this work, however, you might want to get and use the appropriate computer program. [The Cornfield method is not available in standard SAS, BMDP, or other packages, but can be found in the EPIINFOVersion 5 available from the Centers for Disease Control. 19]

17.5.5.2.3Miettinen Test-Based Method. Probably the most common approach today is Miettinen’s “test-based” method,12 which uses the X2 test statistic calculated for the 2 × 2 table, preferably15

without Yates correction. The confidence interval will be

ln o ± o(Zα /X)

[17.2]

where X is the square root of the X2 statistic, and where Zα would be 1.96 for a two-tailed α = .05. Because the formula can be rewritten as ln o [1 ± (Zα /X)], the confidence limits could be expressed in a power of o as

 

 

 

o[1 ± (Zα /X)]

[17.3]

For example, for the values of

 

50 18

in Table 17.2, o = 5.0; X2

= 24.2; and X = 4.92, so that 1.96/4.92 =

 

 

 

 

40 72

 

 

.398 and 1 ± .398 yields .602 and 1.398. The value of 5.0 .602 is 2.63 and 5.01.398 is 9.49. (With Woolf’s

method, the corresponding values were 2.59 and 9.64.) For the table in Section 17.5.4, the uncorrected X2 is 33.94 and X = 5.83, so that 1.96/5.83 = .336. With 1 – .336 becoming .664, the lower confidence limit is 325.664 = 46.33; and the upper limit is 3251.336 = 2269.22. (With Woolf’s method, the corresponding limits were 12 and 8794.) Thus, in both of the two cited examples, the Miettinen method produced narrower intervals than the Woolf method.

17.5.5.3 Tests of Fragility — A rapid crude (or “commonsense”) method to appraise stability for an odds ratio is to do a unit fragility test. For this test, relocate one unit of the data while keeping

© 2002 by Chapman & Hall/CRC

6 12 31

the marginal values constant, and then recalculate the odds ratio. For example, the table in Section 17.5.4 is already at a maximum value in one direction. The unit relocation can move things only in the other direction toward . The odds ratio for the latter table is (6 × 31)/(2) = 93, which is a drop of 71% [= (325 93)/325] from the original value. This demonstration of substantial descriptive “instability” can be compared against the stochastic result of the Fisher test, which would produce P = 4.0 × 10 7 for the extreme data in the observed original table.

 

 

 

 

50

18

in Table 17.2, however, the odds ratio of 5 has much less fragility. A shift of one unit produces

 

For

40

72

 

49

19

 

 

 

 

51

17

 

41

 

 

in one direction and

39

in the other. The corresponding odds ratio would become 4.47 or 5.62.

 

71

 

 

 

 

73

The proportionate fragility would be a drop of .53/5 = 11% or a rise of .62/5 = 12%. The result is still not

impressively stable, but with X2 = 4.92, the 2P value is <.05.

17.5.6Quantitative Appraisal of Odds Ratio

When boundaries of “quantitative significance” were discussed in Chapter 10, the odds ratio received relatively little discussion among the other indexes of contrast. The quantitative importance of an odds ratio is particularly difficult to evaluate because the appraisal strongly depends on the context, rather than the magnitude itself. Furthermore, the magnitude may be directly distorted by the context.

17.5.6.1“Inflation” of “High” Rates — Consider the “high” rates (usually .1) that are

found in many clinical studies, such as a randomized trial in which the compared success rates are

pA = .5 and pB = .3. The simple ratio of pA/pB = .5/.3 = 1.67 would indicate the higher “risk” of success with treatment A. The simple ratio of qB/qA = .7/.5 = 1.4 would indicate the higher risk of failure with treatment B. The odds ratio, however, multiplies these two “risks” to produce (pA/pB) (qB/qA) = (.5/.3) (.7/.5) = 2.33, which substantially inflates the value for either simple ratio.

The inflation is the mathematical reason why the odds ratio should not be used if the smaller of the two rates is not small enough. If not close to 1, the value of q B = 1pB will be divided by a smaller qA, and the product will inflate the value of pA/pB. Scientifically, of course, there is no reason to use the odds ratio when the actual values of pA and pB have been determined in a clinical trial or cohort study. In the latter two situations, the desired “risk” ratio can be expressed directly either as pA/pB or as qB/qA, but not as their product.

17.5.6.2No Effect on “Low” Rates — In most public health studies of etiologic “risk fac-

tors,” the odds ratio will not produce substantial inflation because the compared rates of disease

occurrence are relatively small, i.e., < .01. For example, in Table 17.1, pB = .001 and pA = .005, and qB /qA = –999/.995 = 1.004. The odds ratio of 5.02 would be hardly changed from the simple risk ratio of 5. If the risk ratio were still 5, but obtained from pA = .05 and pB = .01, the value of qB/qA would be

.99/.95 = 1.04 and the odds ratio would become 5.2, which is moderately but not substantially inflated above 5. If the risk ratio of 5 came from pA = .5 and pB = .1, however, qB/qA would be .9/.5 = 1.8, and the odds ratio would increase to 9.

17.5.6.3Criteria for Quantitative Significance — For the reasons just cited, odds ratios are particularly difficult to interpret out of context. In clinical studies of therapy, the odds ratio usually is mathematically undesirable because of potential inflation and is also unnecessary because the compared results can be effectively (and better) evaluated with more conventional indexes of contrast, such as the direct increment or number needed to treat. When etiology is investigated with an epidemiologic casecontrol study, however, the “sampling method” usually precludes the calculation of direct increments or ratios. The odds ratio then becomes the only index of contrast available for evaluation.

Although the evaluation is highly judgmental, various authorities20,21 have stated that odds ratios < 3 represent “weak associations” that often may not warrant serious attention because the small elevations above 1 hardly exceed the “noise” that can be expected from ordinary sources of inaccuracy and bias in the raw data. In a study of quantitative conclusions for published odds ratios, Burnand et al.22 found that 2.2 seemed to be an average minimum level for quantitative significance.

©2002 by Chapman & Hall/CRC

Because odds ratios are so difficult to interpret, they are not well understood by workers in the healthcare field, and they can be seriously misleading or deceptive if presented to laymen when investigators seek either to obtain “informed consent” or to advocate a particular belief. A much more effective and “honest” approach is to determine or estimate an incidence rate for the control-group “risk,” use the odds ratio to calculate the corresponding rate for the “exposed” or “treated” group, and convert the increment in incidence rates to an NNE for the number needed for one event. An opportunity to do this conversion appears in Exercise 17.6 at the end of the chapter.

17.5.6.4 Lod Scores — Odds have received an entirely new use in molecular genetics, to quantify the strength of linkage for an observed association between two traits in families. The odds are determined as a ratio of posterior and prior probabilities. The numerator is the posterior probability of observing the distributional pattern of the two traits’ frequency of re-combination under the hypothesis of linkage. The denominator is the analogous prior probability, assumed to be 0.5, under the null hypothesis of no linkage. The decimal logarithm of this ratio is called a lod score; and a lod value of 3, i.e., odds of 1000 to 1, is required to assert that linkage exists.23,24

17.5.7Combination of Stratified Odds Ratios

A simple 2 × 2 table is sometimes stratified, according to an additional variable, into a series of 2 × 2 tables. For example, Table 17.1 might be divided into separate 2 × 2 tables for men and women, or for different durations of tea drinking. The table in Section 17.5.4 might be partitioned according to whether the women did or did not have pregnancy problems such as cramping, bleeding, or spotting. The separate odds ratios calculated for each stratum can then be appropriately combined to form an odds ratio that has been “adjusted” for imbalances in the strata. The adjustment method, called the Mantel–Haenszel procedure, is discussed in Chapter 26.

17.5.8Scientific Hazards of Odds Ratios

Despite all the mathematical cavils, the main hazard of the odds ratio is scientific, not statistical. It is regularly used for a cause–effect inference about an etiologic agent or “risk factor,” but the research structure itself may contain few or sometimes no scientific precautions against biased or inaccurate results.

Randomized trials were developed to provide these precautions in cause–effect reasoning about the action of therapeutic agents; and the precautions are equally pertinent in cause–effect reasoning for etiologic agents. Although the precautions could be applied in case-control studies, suitable efforts are seldom made. Consequently, case-control studies become an easy prey for the susceptibility bias, performance bias, detection bias, and transfer bias — discussed elsewhere6 — that can occur innately in any form of cause–effect relationship. Beyond the innate biases, the investigator has the opportunity to introduce two additional biases — exclusion bias and ascertainment bias — when the groups of cases and controls are chosen and interviewed.

Thus, in exchange for the ease of doing etiologic research with the mathematical splendor of the odds ratio, case-control studies create formidable scientific problems in deciding whether the result is valid or credible.

17.6 Specialized Terms for Contrasting Risks

As noted in Chapter 10, two central indexes are most commonly contrasted as an increment or ratio. Exactly those same ideas occur in a contrast of two epidemiologic rates. Because the rates are regarded as risks, however, an elaborate specialized jargon has been developed for the citations. Emanating from the world of public health, which emphasizes preventing disease by reducing risk factors, the jargon uses terms such as attributable risk and etiologic fraction, which indicate “editorial” conclusions about the data rather than scientific descriptions of “news.”

© 2002 by Chapman & Hall/CRC

An additional complication in the nomenclature is that the risks — which usually refer to the rate of death or the incidence rate for a particular disease — can be considered and compared for three sets of people, rather than two. The three sets are: a regional community population, a group of persons exposed to a particular etiologic agent or risk factor, and a group of persons who are unexposed. The risks are then contrasted in ways that imply the quantitative consequences of exposure or non-exposure. To avoid the threat of clarity amid all the other complexity, the same ideas have received many different names, and occasionally the same name is given to different ideas.

Card-carrying epidemiologists themselves often have difficulty remembering all the terms and distinctions, which are listed here mainly so you will have met them and know roughly what they mean when they appear in published literature. The “glossary” that follows is intended merely to cite the nomenclature and quantitative constituents, not to justify, praise, question, or condemn the associated assumptions and reasoning about cause–effect relationships. Whenever the literary exposure becomes too risky to your intellectual tranquillity, advance promptly to Section 17.7.

17.6.1Glossary of Symbols and Terms

The three rates under discussion as risks can be represented with the following symbols:

pE = rate of event (e.g., occurrence of disease or death) in an exposed group of people pU = rate in an unexposed group of people

pC = rate in a community (regional) population

The proportion of the community exposed to a risk factor is e, and the proportion of unexposed people will be 1 e. The rate of the event in the community will be

pC = epE + (1 – e )pU

17.6.2Increments

The increment in risk represents the alleged excess of the disease that is caused by exposure. The increment can be expressed in two ways. Attributable risk is used for the increment pE pU. It is also called the rate difference or risk difference. Attributable community risk is used for the increment pE – pC. It is also called the attributable population risk. Substituting pC = epE + (1 e)pU and working out the algebra produces (1 e) (pE pU). The attributable community risk is thus the attributable risk multiplied by the proportion of unexposed people.

17.6.3Ratios

A single simple ratio expresses the contrast of rates for exposed/un-exposed risk as r = pE /pU. To compensate for the simplicity, this ratio has an exuberance of names. It has been called risk ratio, relative risk, rate ratio, incidence rate ratio, and relative rate. All of these statistical names refer to essentially the same entity. The term that will be used here is risk ratio.

17.6.3.1 Nonstatistical Sources of Confusion in Nomenclature — The reason that odds ratios are so attractive in epidemiology is that under the suitable conditions discussed earlier for a retrospective case-control study, the odds ratio, o, can be used to approximate the risk ratio, r, that might be obtained in a cohort study. (In fact, many case-control investigators improperly use the term risk ratio or relative risk when reporting results of an odds ratio.)

In the customary use of pertinent words, however, the term cases refers to persons with the focal disease, and controls refers to persons who do not have it. The case-control structure is thus used for diagnostic marker studies (see Chapter 21) and for other cross-sectional clinical investigations6 in which a disease and a focal entity are associated in a concurrent, forward, or uncertain direction of

© 2002 by Chapman & Hall/CRC

timing, rather than in the retrospective direction of etiologic research. To separate the backward from the other directions of inquiry, the term trohoc (which is cohort spelled backward) was proposed25 for retrospective case-control studies. The term has been “deprecated”26 or actively resented, however, by many public-health epidemiologists, who insist that case-control is a satisfactory name, despite the directional ambiguity.

At least six alternative names have been proposed for retrospective case-control studies, not to clarify the problem of direction, but to remove the confusion produced when “control” is used for an outcome event rather than for the customary scientific designation of a comparative agent.6 Four additional terms are case-comparison study, case-compeer study, case-history study, and case-referent study. Another alternative, case-base study, can indicate the choice of controls from the regional “base” population rather than from the available other clinical conditions that are often used as “controls.” A sixth term, case-exposure study, has been proposed27 when “the controls who are obtained concurrently with cases are representative of the exposure experience of the population from which the cases are drawn.”

The last term can add to the confusion produced when case-control is also used (erroneously) for a cohort study in which the exposed persons are called “cases,” and the non-exposed are called “controls.” Despite all the peer-reviewing and editing, this flagrant error occasionally appears in prestigious medical journals.28,29

17.6.3.2 Confidence Interval for Risk Ratio — When a cohort study has actually been done, with the outcome events found in a/nE exposed people and in c/nU unexposed people, the risk ratio is r = (a/nE)/(c/nU). A confidence interval can be obtained30 for the logarithmic transformation of r by calculating

SE( ln r) = (1 ⁄a) – (1 ⁄nE) + (1 ⁄c ) – (1 ⁄nU )

This result is then managed like the standard error of the logarithm of the odds ratio, and the final confidence limits are the exponentiated values of ln r ± Zα [SE (ln r)].

17.6.4Cumulative Incidence and Incidence Density

Another source of confusing jargon about rates is that the term incidence rate can be applied to two different mathematical phenomena. One of them is the conventional, standard scientific idea: the proportion of people in a specified population who have developed the cited event during the course of specified time period. Mathematicians call this the cumulative incidence rate. The second usage refers to the proportionate number of new cases during a very small or “instantaneous” time period. This idea has different names: incidence density, instantaneous incidence rate, and hazard rate. The two approaches are used later in Chapter 22 when interval mortality rates indicate the hazards from which the cumulative incidence, or survival rate, is then calculated.

Most epidemiologic case-control studies claim to use “incidence-density” arrangements, in which exposure histories are compared for “ incident cases” and for control “noncases who are still at risk of becoming cases.” In a “ cumulative-density” arrangement, however, the control group is “no longer at risk of becoming cases.” The latter situation is pertinent for investigating reproductive outcomes such as congenital deformities or vaccine efficacy during an epidemic. Hogue et al.31 have pointed out that the case-control odds ratio can suitably approximate the risk ratio in incidence-density studies, but must receive special checks and modification in cumulative-incidence studies.

17.6.5Proportionate Increments

Proportionate increments are particularly popular because, as discussed in Chapter 10, they make the results seem more impressive than ratios alone, and also because they allow additional editorial claims

© 2002 by Chapman & Hall/CRC

about etiology and prevention. The calculations and titles will depend on what is chosen for comparison in the reference group or denominator.

When the attributable risk is divided by the risk in the non-exposed group, the proportionate increment is formed as (pE – pU)/pU = r – 1. This excess relative risk is also called the relative effect.

Prevented fraction reports the same comparison as the excess relative risk, but the exposure is expected to prevent rather than promote the disease. Since pU should exceed pE, the proportionate increment is expressed as (pU – pE)/pU = 1 – r. With exposure having a protective effect, r will be < 1, and the value of 1 r will range from 0 to 1. This term, often used for evaluating vaccines, is also called protective efficacy.

For etiologic fraction (for exposed group), the attributable risk, pE pU, is divided by the risk in the exposed group to form (pE – pU)/pE = 1 – (1/r) = (r – 1)/r. It is also called the attributable proportion or attributable risk percent. It indicates the proportion of disease that would presumably be removed from the exposed group if everyone became unexposed.

Etiologic fraction (for community) uses a different increment, formed by the excess of the community rate beyond the unexposed rate. This increment is then divided by the community rate to produce (pC – pU)/pC. Substituting pC = epE + (1 e)pU, and carrying out the rest of the algebra, this expression reduces to [e(r – 1)]/e (r – 1) + 1. The result, which is also called the population attributable risk percent, is regarded as the proportion of community cases attributable to the exposure. The implication is that these cases would be prevented if the exposure were eliminated.

The calculations can be illustrated with data presented by Cole and McMahon32 for a case-control study of smoking and bladder cancer in men. The data were as follows:

 

Cases

Controls

 

 

 

Smokers

284

261

Nonsmokers

59

114

TOTAL

354

375

 

 

 

Because bladder cancer is a relatively rare disease, the odds ratio, calculated as (284 × 114)/(261 × 59) = 2.10, can be used as a surrogate for the risk ratio. The proportion of exposure in the general population can be estimated from the control group as e = 261/375 = .696. Inserting this information into the foregoing formula, the community etiologic fraction is found to be (.696) (1.10)/[(.696) (1.10) + 1] =

.43. Thus, if the underlying reasoning and data are correct, the men of a smoke-free society should achieve an apparently impressive 43% reduction in the former incidence rate of bladder cancer. Because the former rate is not stated, however, the actual magnitude of the reduction is unknown.

17.7 Advantages of Risk Ratios and Odds Ratios

If you have survived the tour through all the foregoing formulas, you may have noted a fascinating thing about r, the risk ratio: it avoids the need for knowing any of the basic values for pE, pU, or pC. Except for the direct increments cited in Section 17.6.2, all the other formulas do not require getting any scientific evidence or doing any specific research to find the actual occurrence rates for the disease (or death) in the exposed, unexposed, or community populations. Everything is taken care of by the mathematical arrangements. Such impressive information as “excess relative risk,” “etiologic fractions,” and “prevented fractions” can all be determined from a single item of information, r, which is the risk ratio for pE/pU. (In one instance, we also need an estimate for e.)

Consequently, an easy, inexpensive, retrospective case-control study can avoid the laborious cohort research necessary to find pE , pU , or pC. With the risk ratio approximated by the ad/bc odds ratio of the 2 × 2 table and with the value of e estimated from results in the “control” group [e.g., as b/(b + d)], we can produce a gigantic mathematical buffet that extends from risk ratios to etiologic fractions. As noted earlier in the text, Francis Galton once wrote33 that the ancient Greeks, had they but known it, would

© 2002 by Chapman & Hall/CRC