have worshipped the Gaussian curve as a god. The retrospective case-control study and odds ratio, if available, might have received the same deification from ancient Greek epidemiologists.
17.8 Adjustments for Numerators, Denominators, and Transfers
A quite different statistical strategy for epidemiologic rates is to “adjust” them mathematically for additional major problems, not hitherto cited, in either the numerator or denominator constituents.
The incidence data in a longitudinal study must often be adjusted for “numerator losses.” The “lost” people were counted when they entered the denominator, and their status was known at some subsequent time of observation; but they then became “lost to follow-up” and their ultimate status is unknown for counting the outcome event in the numerator as dead/alive, success/failure, etc. To adjust the ongoing depletion of the cohort, the denominators are successively decremented over time, using either conventional actuarial or the currently popular Kaplan-Meier “life-table” methods. The methods of adjustment, often called survival analysis, are discussed in Chapter 22.
A different type of adjustment, used in both longitudinal and cross-sectional studies, is done for “denominator imbalances.” This problem arises when we want to compare incidence rates or prevalence in two groups, but we know that the composition of the groups is biased by a “confounding” factor that affects the rates. For example, the comparison of birth rates in two groups of people would be unfair if one group contained mainly men and postmenopausal women, whereas the other group contained mainly women aged 18 to 45. An adjustment called “standardization” is often done to help “equalize” the disproportionate denominator components that would distort the comparison of the unadjusted “crude” rates. This process, discussed in Chapter 26, leads to the “age–sex adjustments,” “stratum-specific rates,” and other adaptations for which epidemiologic data are famous (or infamous).
A third type of problem, which is difficult to adjust, occurs when epidemiologic cohorts are assembled from volunteers, or from the people who are still available many years after the initial serial time at which the cohort observations began. The problem, a scientific issue in assembly of groups, is beyond the scope of a mainly statistical discussion.
17.9 Interdisciplinary Problems in Rates
As might be expected when the same ideas and names are used for rates in substantially different groups, major interdisciplinary problems have occurred about the propriety of the activities. Publichealth researchers may contend that clinicians should not apply the name cohort or talk about rates if the denominator refers to clinical groups, rather than regional populations. Clinicians may respond that the public-health cohorts are studied mathematically, rather than with direct examinations, and that the population-based rates of “disease” are too inaccurate to warrant scientific credibility, particularly when results in different regions or eras are used for major decisions in public policy.
Because the problems are currently unresolved (with few attempts having been made to achieve resolution), both the public-health and the clinical investigators continue their unchanged use (or abuse) of the process. Although public-health mortality rates are regularly adjusted or “standardized” for unbalanced demographic composition of denominators, clinical mortality rates are seldom adjusted for corresponding imbalances caused by differences in severity of the denominator conditions. Although extensive variations in accuracy and consistency of diagnostic citation5,6 make death-certificate data untrustworthy, a single selected “cause of death” continues to be used for public-health tabulations of the incidence of different diseases.
A striking example of scientific defects in both the clinical and public-health approaches is the use of infant mortality rates as indicators of national or regional quality of health care. The denominator of these rates consists of infants who were born alive. The numerator consists of live-births who died in
© 2002 by Chapman & Hall/CRC
the next year. What is ignored in both denominators and numerators are the fates of the products of conception. No adjustments are made for spontaneous or induced abortions; and in particular, no adjustments are made for stillbirths or for infants born in a precarious state of life.
If left alone, the precarious infants may promptly die, be recorded as stillbirths, and appear in neither numerators nor denominators of the infant mortality rates. If given vigorous resuscitation and excellent care, however, many of the precarious births will survive, but those who do not will augment the numerator of deaths. In this way, excellent care in the delivery room and in special neonatal nurseries can help salvage life for many babies who formerly would have been “stillbirths.” The result, however, can also lead to a paradoxical increase in the infant mortality rates.
Despite these problems, few attempts have been made to adjust the infant mortality rates for “precar - ious” states (which include very low birth weights) or for the local clinical customs used to identify a precarious baby who is resuscitated but who dies soon afterward.34 The rates of infant mortality can rise or fall according to whether the clinician decides to list such babies either as stillbirths (thereby filling out one official certificate) or as live births followed by deaths (thereby having to fill out two official certificates).
References
1. Elandt-Johnson, 1975; 2. Freedman, 1991; 3. Ericksen, 1985; 4. Hamilton, 1990; 5. Gittlesohn, 1982; 6. Feinstein, 1985; 7. McFarlane, 1987; 8. Burnand, 1992; 9. Herbst, 1972; 10. Woolf, 1955; 11. Cornfield, 1956; 12. Miettinen, 1976; 13. Gart, 1982; 14. Brown, 1981; 15. Breslow, 1980; 16. Gart, 1972; 17. Fleiss, 1981, pg. 64; 18. Fisher, 1934; 19. Dean, 1990; 20. Wynder, 1987; 21. Cornfield, 1987; 22. Burnand, 1990; 23. Bale, 1989; 24. Risch, 1992; 25. Feinstein, 1973; 26. Last, 1988; 27. Hogue, 1981; 28. Koivisto, 1984; 29. Eagles, 1990; 30. Katz, 1978; 31. Hogue, 1983; 32. Cole, 1971; 33. Galton, 1889; 34. Howell, 1994; 35. Greenwald, 1971; 36. Labarthe, 1978; 37. McFarlane, 1986.
Exercises
17.1.A major controversy has occurred about apparent contradictions in biostatistical data as researchers try to convince Congress to allocate more funds for intramural and extramural investigations supported by the NIH. Citing improved survival rates for conditions such as cervical cancer, breast cancer, and leukemia, clinicians claim we are “winning the war” against cancer. Citing increased incidence rates for these (and other cancers), with minimal change in mortality rates, public-health experts claim that the “war” has made little progress, and we should focus on prevention rather than cure.
17.1.1.What explanation would you offer to suggest that the rising incidence of cancer is a statistical consequence of “winning” rather than “losing” the battle?
17.1.2.What explanation would you offer to reconcile the contradictory trends for survival and mortality rates, and to suggest that both sets of results are correct?
17.1.3.What focus of intervention would you choose for efforts to prevent breast cancer, cervical cancer, or leukemia?
17.2.Practicing pediatricians constantly make use of “growth charts” to show the range of normal
growth for children. The charts are constructed in the general format shown for weight in Figure E.17.2. The data for these charts are obtained as follows. A collection of “normal” children is assembled, measured, and divided into groups according to age. For each age group, the distribution of weight is noted and converted into percentiles. The percentile points are then entered on the graph for each age
group and the points are joined to form the lines.
Pediatricians use these graphs to follow their cohorts of well children and to determine whether the children are growing in a normal manner.
17.2.1.Were the data on the graph obtained from cohort research? If not, what designation
©2002 by Chapman & Hall/CRC
would you give to the data structures?
17.2.2.Do you perceive any clinical biostatistical problems arising from any disparity you have noted?
17.3.The drawing shown in Figure E.17.3 gives
|
a diagrammatic representation of the occurrence |
|
|
|
|
|
|
|
|
|
|
|
95th Percentile |
|
and course of instances of a particular disease in |
|
|
|
|
|
|
|
|
|
|
|
|
six members of a group of 300 persons. The other |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50th Percentile |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
294 persons remained free of the disease. For this |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
group of 300 persons, calculate |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5th Percentile |
|
17.3.1. Point prevalence on July 1, 1993. |
WEIGHT |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17.3.2. Incidence rate, July 1, 1993 to |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
June 30, 1994. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17.3.3. Period prevalence, July 1, 1993 to |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
June 30, 1994. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[Epidemiologists use the term period |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
prevalence for the sum (i.e., point prevalence plus |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
incidence) of all encountered instances of disease.] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
2 |
3 |
4 |
5 |
6 |
etc. |
|
17.4. In patients receiving radiotherapy for lung |
|
|
|
|
AGE IN YEARS |
|
|
|
|
|
|
|
cancer, the 6-month survival rate is found to be |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32% at the West Haven VA Hospital, and 54% at |
FIGURE E.17.2 |
|
|
|
|
|
|
|
|
|
|
|
Yale-New Haven Hospital. A newspaper reporter |
|
|
|
|
|
|
|
|
|
|
|
Format of pediatric growth curves. |
|
|
discovers this difference and has prepared an article
about the incompetent physicians working at the VA. The managing editor, after a few inquiries, discovers that the same physicians design and supervise radiotherapy at both institutions. The newspaper reporter then plans a major “exposure” of the unsatisfactory radiotherapy equipment purchased by the Veterans Administration, but the managing editor discourages the work after learning that the VA patients are transported by shuttle bus and usually receive their radiotherapy at Yale-New Haven Hospital. The reporter, who has been told to subside by the irritated managing editor, now alerts and urges the district congressman to conduct public hearings about the fatal fumes or other lethal ambiance of the shuttle bus. A day before the public hearings are due to begin, the congressman’s assistant, at a cocktail party, meets a quantitative clinical epidemiologist. The QCE person, when told about the impending excitement, makes a few statistically oriented remarks that persuade the congressman to call off the investigation immediately and to abandon further pursuit of the subject. What do you think might have been said?
17.5.After the original report9 of the case-control study mentioned in Section 17.5.4, a second casecontrol study35 was published with the following results:
17.5.1.What is the odds ratio for this table?
17.5.2.Although DES is now almost universally regarded as a transplacental carcinogen, and although millions of women who were exposed to DES in utero now live in cancerophobic terror waiting for the clear cells to strike, the two cited case-control studies are the only “controlled” epidemiologic evidence that supports the carcinogenic belief. In a subsequent large cohort study (the “DESAD” project36), no clear-cell cancers were found after about 25 years since birth in any of about 2000 adult women exposed in utero to DES or in a corresponding number of women in the non-exposed matched group, born at the same time as the exposed cohort.
Several cantankerous clinical epidemiologists37 (from New Haven) have disputed the “established wisdom,” contending that the case-control studies were poorly conducted and that a cause–effect relationship has not been proved. What do you think were some of the main arguments offered by the “heretics”?
17.5.3.Assuming that the DES/CCVC relationship is real, causal, and has odds ratios as large as those you have noted and/or calculated, why do you think no cases of clear-cell cancer were found in the exposed DESAD cohort?
17.6.In a case-control study, the odds ratio for development of endometrial cancer was found to be 4.3
for postmenopausal women taking replacement estrogen therapy. (The results have been disputed because of a problem in “detection bias,” but can be accepted as correct for this exercise.) To inform a patient
© 2002 by Chapman & Hall/CRC
Case
Number 
1
|
2 |
|
= Date of onset of disease |
3 |
|
|
R |
= Date of death or termination |
4 |
|
of disease |
|
|
R = Date of recurrence of disease |
5 |
|
|
|
|
6 |
R |
July 1, 1993 |
June 30, 1994 |
FIGURE E.17.3
Occurrence patterns of disease in 6 cases.
Antecedent In-Utero |
Cases of Clear-Cell |
|
Exposure to DES |
Vaginal Cancer |
Control Group |
|
|
|
Exposed |
5 |
0 |
Non-Exposed |
0 |
8 |
|
|
|
of the risk of this cancer if she decides to take replacement therapy, you want to convert the odds ratio to a value of NNE. You can assume that the customary incidence of endometrial cancer is .001. What calculations would you do, and what result do you get for NNE?
17.7. In an immaculately conducted case-control study, the investigators found an odds ratio of 6 for the development of an adverse event after exposure to Agent X. You have been invited, as an expert consultant, to comment about establishing public policy for management of the problem revealed by this research. Assuming that the study was indeed “immaculate” (so that research architecture need not be further evaluated), you have been allowed to ask no more than four individual questions before you reach your judgment. What sequence of four questions would you choose, and why?
© 2002 by Chapman & Hall/CRC
Part III
Evaluating Associations
All of the statistical strategies discussed so far were intended to evaluate one group or to compare two groups of data. When we now begin “comparing” two variables rather than two groups, the bad news is that the different types of variables will require diverse new arrangements and many new descriptive indexes for the results. The good news, however, is that only a few indexes are needed for most of the common descriptive activities; the additional arrangements involve no new forms of statistical inference, which is used with the same basic principles as before.
The arrangements of two variables are generally called associations, but the associations can have many different goals and structures. The aim might be to discern trend in the relationship of two different variables or to note concordance for two variables that describe the same entity. The trends
can be an interdependent correlation or a dependent regression; |
the concordances can refer to |
conformity or to agreement. |
|
The diverse associations that can be formed between two variables are discussed in the next few chapters. In more advanced activities, when the associations become multivariate, the statistical strategies are more complicated, but they should not be too hard to grasp if you clearly understand what happens for only two variables.
© 2002 by Chapman & Hall/CRC
18
Principles of Associations
CONTENTS
18.1Two-Group Contrasts
18.2Distinguishing Features of Associations
18.2.1Goals of Evaluation
18.2.2Orientation of Relationship
18.2.3Patterns Formed by Constituent Variables
18.3Basic Mathematical Strategies for Associations
18.3.1Basic Mathematical Principles
18.3.2Choice of Principles
18.4Concept and Strategy of Regression
18.4.1Historical Background
18.4.2Regression to the Mean
18.4.3Straight-Line Models and Slopes
18.4.4Reasons for Straight-Line Models
18.4.5Disadvantages of Straight-Line Models
18.5Alternative Categorical Strategies
18.5.1Double Dichotomous Partitions
18.5.2Ordinal Strategies
18.5.3Choosing Summaries for Y
18.5.4Comparison of Linear-Model and Categorical Strategies
18.5.5Role of Categorical Examination
References
Exercises
Suppose we have measured weight and serum cholesterol in each member of Group A and Group B. The results could be summarized with the univariate and the two-group contrast indexes discussed in Parts I and II of the text. Indexes of location and dispersion could express the univariate values of weight and cholesterol separately in each group. Indexes of contrast could compare the weights and the choles - terol values in Group A vs. Group B. If we wanted to know, however, whether cholesterol tends to rise or fall with increasing levels of weight, we currently do not have a suitable method of expression. To summarize the trend in two variables, we need a new approach, using indexes of association.
Association is a heavy-duty word in statistics. Except for univariate results in a single group of data, all statistical arrangements can be regarded as associations of either two variables or more than two.
18.1 Two-Group Contrasts
Although not so designated, the two-group contrasts in Chapters 10 through 17 were really associations of two variables. One of them had a binary scale, identifying the two contrasted groups as A or B, exposed or nonexposed, treated or untreated. The second variable, which was the analytic focus for the two-group comparison, was a dimension (such as age), a binary attribute (such as success/failure or alive/dead), an ordinal grade, or a nominal category. When the results of the second variable were
© 2002 by Chapman & Hall/CRC
summarized with means, medians, standard deviations, or proportions, the group identity, cited in the first variable, became a subscript (such as A and B, or 1 and 2) in the symbolic expressions XA and
XB , or p1 and p2. The use of two variables is readily apparent if we consider the way the original data would have been coded. An X variable would have identified group membership as A or B (or in some other binary code), and a Y variable would have identified the “result,” which would then be summarized
for two means as YA and YB .
The arrangement of data is illustrated in the following layout for four variables that can each receive a two-group contrast. The first column identifies each person; the second column is the X variable, identifying group membership as A or B; and the remaining four columns show Y variables that can be binary, dimensional, ordinal, or nominal.
|
|
|
|
Y Variable |
|
Person |
X Variable |
Success |
Age |
Urine Sugar |
Color of Eyes |
|
|
|
|
|
|
1 |
B |
0 |
28 |
Trace |
Brown |
2 |
A |
1 |
34 |
1+ |
Blue |
3 |
A |
0 |
19 |
3+ |
Hazel |
4 |
B |
1 |
42 |
None |
Brown |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
The two-group contrasts could compare Group A vs. Group B for proportions of success, mean (or median) age, or distributions of urine sugar or color of eyes.
In most instances, however, the idea of association is used for situations in which the X variable is dimensional, ordinal, or nominal rather than binary. According to the scale of the corresponding Y variable, the bivariate arrangement will then have a bi-dimensional, bi-ordinal, multi-group, or other format that cannot be summarized with a relatively simple index of contrast, and will require new statistical indexes.
18.2 Distinguishing Features of Associations
The indexes that describe associations are prepared for different goals, orientations, and patterns. The goals can be to evaluate trends or concordances; the orientation of variables can be dependent or nondependent; and the patterns can have the diverse formats produced when each variable is expressed in four possible types of scale. These basic distinctions are discussed in the next few sections before we turn to the corresponding statistical principles.
18.2.1Goals of Evaluation
All of the associations to be discussed here and in the next three chapters refer to a relationship between two variables, but the bivariate relationships can be examined for diverse goals and cited in diverse indexes that reflect the types of variables under examination.
The goals can be aimed at describing concordances for the agreement between two similar variables or trends for gradients of change, contrast, or correlation between two different variables.
18.2.1.1 Concordances — The distinguishing feature of a concordance is that the two associated variables describe exactly the same substantive entity and are cited in exactly the same (i.e., “commen - surate”) scales. Despite the similarity of scales and measurements, however, the two variables come from different sources, such as different observers, different systems of observation, or two sets of measurements by the same observer. The goal of the analyses is to summarize the agreement (or disagreement) between the two sets of measurements.
For example, we might check for inter-observer variability when two pathologists each give diagnostic “readings” to the same set of slides, for intra-observer variability when the same pathologist reads the
© 2002 by Chapman & Hall/CRC
slides on a second occasion, for sensitivity and specificity when diagnostic marker tests are checked against the “gold standard” diagnoses, or for quality control when the same set of chemical specimens is measured at two different laboratories.
18.2.1.2 Trends — In an evaluation of trend, the two associated variables describe different entities, such as body weight and serum cholesterol. If each variable is ranked in a dimensional or ordinal scale, the goal is to determine the corresponding “movement” for the two sets of rankings. As one variable changes, the trend is shown by the gradient that occurs as the second variable rises, falls, or stays the same. Does serum cholesterol get lower as people get thinner? Does income go up as educational level increases? Is severity of pain related to the amount of fever? All of these questions are answered by examining trends in ranked variables that have dimensional or ordinal scales.
Unless each variable can be ranked, however, the trend cannot always be assessed as a pattern of movement. Because binary and nominal scales do not have successive ups or downs, the idea of rising or falling cannot be used to express trend in bivariate relationships between religion and occupation, height and choice of hospital, or level of pain and a nominal set of therapeutic agents A, B, C, and D. Nevertheless, even for non-ranked variables, gradients can often be identified in a second variable when changes occur in the first. These gradients will be further discussed in Section 18.2.2.3.
18.2.2Orientation of Relationship
The two associated variables can be oriented in a dependent or nondependent direction. In a dependent relationship, one variable is believed to affect or influence the other. Thus, we may think that body weight affects serum cholesterol or that treatment produces a successful outcome. These orientations go in one direction because we are not likely to believe that a successful outcome influences the preceding treatment or (in most instances) that serum cholesterol influences weight.
A somewhat confusing jargon has been developed for describing the directional distinction. The variable that does the influencing can be called independent, predictive, or explanatory. The affected variable can be called the dependent, target, or outcome variable. In the usual graphic arrangement, shown in Figure 18.1, the independent variable is marked X, and labeled as the abscissa in a horizontal direction. The dependent variable is called Y, labeled as the ordinate, and placed in a vertical direction. (If you have trouble remembering which is which, a good mnemonic is that alphabetically X precedes Y; and abscissa precedes ordinate.)
Figure 18.2 shows the collection of data points for a dependent relationship of the dimensional variables, serum cholesterol and body weight.
Figure 18.3 shows an analogous set of data points for the dependent relationship of two binary variables, success and treatment. In this instance, success is coded as 0 if absent, and 1 if present; and Treatment is coded 0 for A and 1 for B. The clusters of points at the graph locations of (0, 0), (0, 1), (1, 0) and (l, l) correspond to the frequency counts that would appear in a 2 × 2 contingency table.
Y
Ordinate;
Dependent
Variable
X
Abscissa;
Independent Variable
FIGURE 18.1
Graphic outline for data of an independent and dependent variable.
Serum
Cholesterol
Body Weight
FIGURE 18.2
Relationship of serum cholesterol vs. body weight as dimensional variables.
© 2002 by Chapman & Hall/CRC
FIGURE 18.3
Relationship of outcome (success/failure) vs. treatment (A/B) as binary variables.
Treatment
A
Treatment
B
0
1
1
0
Success
As discussed earlier (Section 9.3.1), the axis of contingency tables is usually shifted and rotated, so that the independent variable appears in the rows and the dependent (or outcome) variable is in the columns. Table 18.1 shows the customary tabular arrangement that would express the data inFigure 18.3.
TABLE 18.1
Tabular Arrangement of Data in Figure 18.3
|
Outcome |
|
|
Treatment |
Success |
Failure |
TOTAL |
|
|
|
|
A |
9 |
11 |
20 |
B |
20 |
6 |
26 |
TOTAL |
29 |
17 |
46 |
|
|
|
|
In a nondependent relationship, we simply see how the two variables go together, without necessarily implying that one of them is independent and the other dependent. A nondependent association might be examined for the relationship of hematocrit and hemoglobin, or for serum cholesterol and white blood count. Nondependent relationships are sometimes called interdependent if the two variables seem distinctly associated — such as hemoglobin and hematocrit — without having a specific dependent orientation.
In evaluations of concordance, the relationship is dependent if one of the variables is regarded as the “gold standard”; and the analysis is concerned with accuracy or conformity rather
than mere agreement. Thus, indexes of sensitivity, specificity, and predictive accuracy all have a directional orientation, whereas other indexes of concordance, such as proportional
agreement, do not.
18.2.2.1 Trends in Ranked Variables — I f b o t h variables can be ranked, trends can easily be examined either nondependently or in a specific directional orientation.
18.2.2.2 Trends for Binary Variables —
Although binary variables do not seem to have ranks, their 0/1 characteristics can readily be used for expressing trends. The two categories of an independent binary variable delineate two groups, such as A and B for treatment or men and women for sex. The binary, dimensional, or ordinal results of the dependent variable will then indicate the “trend” as the
independent variable “moves” from one binary category to the other. These “trends” are the contrasts that were discussed throughout Chapters 10 to 17 when the independent variable identified Groups A and B, and the results were compared for such outcomes as the success rates, pA vs. pB, for a binary dependent variable, and as the means, XA vs. XB , for a dimensional variable, such as blood sugar.
For an independent ordinal variable, the results of a binary or dimensional dependent variable can be summarized and compared according to changes in rank of the independent variable. For example, suppose the dependent variable is the outcome state of being alive or dead at a particular point in time. Figure 18.4 shows this state for 5 people at four time intervals after zero time. Figure 18.5 is a “survival curve” that summarizes the results for these five people, showing binary proportions of survival at different ranked points in time.
© 2002 by Chapman & Hall/CRC
If the ranks refer to severity of illness rather than time, the binary proportions of survival can be shown in a table called a prognostic stratification for the clinical staging system. Table 18.2 displays results for this type of arrangement.
PROPORTION
ALIVE
1
ALIVE
DEAD |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
|
1 |
2 |
3 |
4 |
0 |
1 |
2 |
3 |
4 |
|
|
|
|
|
TIME INTERVALS |
|
|
|
|
TIME INTERVALS |
|
|
FIGURE 18.4 |
|
|
|
|
|
|
|
|
FIGURE 18.5 |
|
|
|
|
|
|
Data for survival in a group of 5 persons. |
|
|
“Survival Curve” (survival proportions over time) for |
|
|
|
|
|
|
|
|
|
|
|
|
data in Figure 18.4 |
|
|
|
|
TABLE 18.2
Prognostic Stratification Showing Relationship of 5-Year Survival to
Clinical Stage of Disease
Variable X |
Variable Y |
(Clinical Stage of Disease) |
(5-Year Survival Proportions) |
|
|
I |
16/20 (80%) |
II |
23/50 (46%) |
III |
21/70 (30%) |
IV |
6/60 (10%) |
TOTAL |
66/200(33%) |
|
|
18.2.2.3 Special Arrangements for Nominal Variables — Because ordinal and dimensional variables can be ranked and binary variables can acquire magnitudes when summarized as proportions, the main problems in describing trend occur for nominal variables. A set of unranked categories, A, B, C, D,…, cannot be put into an ordered arrangement for discerning a specific trend.
If used as the dependent outcome event, a nominal variable is sometimes compressed into a binary variable, and the trend is shown with binary proportions. For example, consider a set of diseases — in heart, lungs, brain, liver, etc. — that can be the outcome associated with such independent variables as a binary sex, an ordinal social class, a dimensional age, or a nominal ethnic group. If the nominal dependent variable is dichotomized as cardiovascular disease vs. other, the binary proportions of cardiovascular disease could promptly be examined in relation to each of the independent variables.
If the independent variable is nominal, a similar type of compressed dichotomization would allow simple comparisons. For example, suppose the independent variable, religion, contains the four categories Christian, Hindu, Jewish, and Moslem. The dependent variables might be the dimensional weight, the ordinal stage of clinical severity, or the binary college graduate. The results of the dependent variables could be contrasted in simple arrangements such as Christian vs. All Others or in pairs of categories such as Jewish vs. Moslem.
Sometimes, however, the independent nominal variable is kept intact, and the results of the dependent variable are examined simultaneously in all three (or more) of the nominal categories. The indexes for this type of multicategorical arrangement are discussed in Chapter 27.
© 2002 by Chapman & Hall/CRC