FIGURE 18.14
Best-fitting straight lines for the patterns shown in Figure 18.13.
failing to capture the large gradient, at the beginning of the X variable, that later reversed into another large gradient going in the opposite direction.
18.4.4Reasons for Straight-Line Models
Looking at the obvious curves in the middle and right side of Figures 18.13 and 18.14, you may wonder why the data were fitted with a straight line. With alternative algebraic formats, a curve could have been created to fit each set of data almost perfectly. For example, the middle set of data could be fit exactly with a curve such as ln[Y/(1 − Y)] = a + bX. The right-hand set of data could be fit exactly with a curve such as Y = a − (X – b)2. Why not use curves rather than straight lines?
The five main reasons for routine use of straight-line models are: (1) the curvilinear models are usually difficult to choose; (2) the straight-line model is standard and easy to interpret; (3) it produces indexes for both estimation and co-relation; (4) in bivariate data, the straight-line model can be used for both the dependent relationship of regression and the interdependent relationship of correlation; and (5) in bivariate data, the correlation coefficient is the standardized regression coefficient. These distinctions are discussed in the next five subsections.
18.4.4.1Problems in Choosing Curves — The curvilinear patterns in the data of Figures 18.13 and 18.14 were reasonably obvious. If you are familiar with the shape of different curves, you would have promptly recognized the S-shaped “logistic” pattern of the middle set of data and the inverted- U parabola of the right-hand set.
In many instances, however, the corresponding correct shape is not immediately apparent. For example,
consider the pattern of points in Figure 18.15. This set of points could be fit by many different curves. The pattern could be part of a descending exponential curve having the form Y = ea−bX, but could also be part of a polynomial expression such as Y = a + bX + cX2 + dX3 + fX4, or a segment of a giant
oscillating curve, such as a sine or cosine wave. From the limitless number of curves that might be fitted to these data, an appropriately programmed computer could readily find one that fits excellently, but we would not know whether the selected curve is correct for the true relationship between X and Y. The excellent fit might be an artifact in which the “best” fit may not be the right fit.
A more substantial problem arises when the pattern of points does not have an obvious shape. Consider the set of data shown in Figure 18.16. This pattern looks more like the customary results of medical research than any of the illustrations in Figures 18.14 or 18.15. The data set of Figure 18.16 does not have an obvious rectilinear or curvilinear pattern. Because these data could be assigned to almost any selected pattern, the straight-line format has the advantage of being “standard.” It avoids the arbitrary choice of a candidate from one of the many possible curves.
18.4.4.2Problems in Interpreting Curvilinear Results — If the goal is to evaluate trend, the result of a straight line is easy to interpret: the trend is the slope. With the line expressed as Y = a + bX,
its slope, b, is called the regression coefficient. It indicates that Y changes by the amount of b units for every unitary change in X. With curvilinear models, however, trends are much more difficult and
©2002 by Chapman & Hall/CRC
Y
X
FIGURE 18.15 |
.FIGURE 18.16 |
Pattern of points for which the appropri- |
Patterns of points for which no linear pattern seems immediately |
ate curve has many possibilities |
apparent. |
sometimes impossible to determine. Suppose the curve is expressed as Y = a + bX − cX 2 − dX 3 + eX4. The trend, i.e., the change in Y as X changes, could eventually be determined from this curve, but the process would not be easy. The trends would differ in different sectors of the X values, and a great deal of computation would be needed to determine the locations of the sectors and the magnitude of the trends.
Consequently, a straight-line model has a second major advantage: The single regression coefficient for slope is easy to interpet as an index of trend for the co-relationship.
18.4.4.3The “Two-Fer” Bonus — All regression models, whether straight lines or curves, are mathematically constructed (as noted in Chapter 19) to produce a “best fit” for the estimated values of Y. While producing this estimate, however, the straight-line model has the “two-fer” advantage of offering a bonus: the regression coefficient, b, is an easy-to-interpret index of the co-relationship between Y and X.
18.4.4.4Indexes for Both Correlation and Regression — For any two variables, X and Y, we can assume either that Y depends on X or that X depends on Y. For the first assumption, the regression line takes the form
Y = a + bX
For the second assumption, the regression line has different coefficients expressed as
X = a′ + b′ Y
As noted later in Chapter 19, these two lines are seldom identical. They will usually have different values for a and a′ and for b and b′; and the trend that is manifested by the regression coefficient will differ if we regress Y on X or X on Y. As both regression lines are legitimate, what do we do if interested not in regression, but in correlation? Suppose we want to know the interdependent trend between the two variables, rather than the slope of Y on X or the slope of X on Y?
The answer to this question involves another act of mathematical elegance that will be demonstrated in Chapter 19. The correlation coefficient, r, which indicates interdependent trend, is obtained like a geometric mean, as the square root of the product of the two slopes, b and b′. In other words,
r =
(b)(b′)
Thus, the fourth advantage of the straight-line model is that the bivariate correlation coefficient is the geometric mean of the two possible regression coefficients.
18.4.4.5 Correlation and Standardized Regression — The advantage just mentioned may not immediately seem impressive. To get the correlation coefficient symmetrically, as the square root of bb′, is a neat mathematical feat, but why should anyone be excited about it? The answer to this question, as demonstrated later in Chapter 19, is that the ordinary regression coefficient is affected by the units of measurement. In quantifying the dependent trend between Y and X, the regression coefficient will be larger or smaller according to whether X is measured in days, weeks, or years and Y in pounds or kilograms. In bivariate data, however, the correlation coefficient, as discussed throughout Section
© 2002 by Chapman & Hall/CRC
18.3.1.4, is analogous to a standardized Z-score that is unaffected by units of measurement. The correlation coefficient, r, is a standardized regression coefficient, having exactly the same value whether Y is regressed on X or X is regressed on Y.
The fifth advantage of the bivariate straight-line model, therefore, is that the correlation coefficient can be used in a standardized way to compare the strengths of different co-relationships, regardless of the units in which the variables were measured.
18.4.5Disadvantages of Straight-Line Models
In exchange for these many advantages, straight-line models have the flaws that have already been cited. The actual pattern of the data may be distorted by its transmogrification into a linear pattern, and the single, constant index of trend, although correct on average, may be wrong in many zones of the data.
The problem is not altered by using the correlation coefficient, r, rather than the regression coefficient, b, as an index of trend. Because the correlation coefficient seems to have been constructed as a product of standardized deviates for X and Y, without invoking a linear model such as Y = a + bX, we might expect the correlation coefficient to be unaffected by linearity in the data. That expectation is wrong, because the “linear” deviates of Xi – X and Yi − Y were used to form the products calculated as covariance. These products can give a distorted account of the co-relationship if X and Y do not have a rectilinear pattern. Besides, as noted later in Chapter 19, r will have the same flaws as b because it can be calculated simply as a “standardized” transformation of b.
The distortion of non-rectilinear dimensional patterns was shown earlier in Figures 18.13 and 18.14. The distortion of zones can be seen in the previous results of Table 18.2, where the survival proportions were 80% in Stage I, 46% in Stage II, 30% in Stage III, and 10% in Stage IV. Without any special mathematical concepts, the trend of this relationship is easily shown by the incremental gradient in survival rates from one stage (or zone) to the next. The overall survival gradient is 70% (= 80 − 10) between Stages I and IV, but the intermediate zonal gradients are distinctly unequal. They have a drop of 34% between Stages I and II, 16% between Stages II and III, and 20% from III to IV. These inequalities show that the trend is not the same in different zones of data, although the average gradient for the three increments would be about 70%/3 23%. With certain techniques discussed later, a straight line could be fitted to these data. The slope of that line would show the average decline in survival proportions between Stages I and IV. The constant value of the slope, however, would not indicate the distinct differences in component gradients for the zones. The question about overall trend in the data would get a reasonable, respectable answer, but the answer would not reveal the striking differences in the inter-zonal trends.
A crucial basic issue for the investigator, therefore, is: What kind of trend do we want to know about? Is it the average overall trend or the distinctive differences that can occur in the constituent zones of the data? The answer to this question is pertinent not only when bivariate data are fitted with straight-line models but particularly in multivariable analysis, where everything may depend on the results achieved with “linear models.” According to what we really want to know, these models can be entirely correct statistically, while producing gross distortions scientifically.
18.5 Alternative Categorical Strategies
Despite the possible problems and distortions, the straight-line mathematical models have generally been quite successful. In fact, this approach was used (without being deliberately identified) when two central indexes were contrasted in Chapter 10. If we code X = 0 for all members of Group A and X = 1 for all members of Group B, the pattern of points for dimensional data is as shown in Figure 18.17. The means for these two groups could be expressed as YA for Group A and YB for Group B. The increment of
YB – YA is really the slope of the line joining the mean values of Y as X moves from 0 (for Group A) to 1 (for Group B).
This process is easy to conceive and illustrate because we could arbitrarily set the values of X at 0 and 1 for the two groups. If X has its own dimensional values, however, the data might have the bi-
© 2002 by Chapman & Hall/CRC
dimensional pattern shown in Figure 18.16. Despite the many cited virtues of the straight-line model, the pattern of data in Figure 18.16 does not intuitively suggest that a straight line (or any other linear shape) should be fitted. What can be done instead?
The main alternative strategy is to abandon the idea of fitting the data with straight lines or any other arbitrary mathematical shape. Instead, the data are divided into categories (or zones), and the results are examined directly within those zones. Diverse mathematical tactics can be used during the examinations — but no attempt is made to fit the data into the overall “shape” of a mathematical model.
18.5.1Double Dichotomous Partitions
In one approach, the bi-dimensional data of Figure 18.16 are converted into two groups. A dichotomous partition at the median value of X will produce two groups of equal size, as shown in Figure 18.18. The mean values of Y can then be determined in each group and compared as an increment.
In another possible approach, the data of Figure 18.16 become a “double dichotomy” when another partition is added at the median of the total set of Y values. This partition would divide Figure 18.18 into the four categories shown in Figure 18.19. The values above and below the median could then be called high and low, and binary proportions could be compared for the occurrence of high values in the two groups. With the median arbitrarily assigned here to the higher group, the data in Figure 18.19 would form the following 2 × 2 table.
|
|
Value of Y |
|
Proportion of |
Value of X |
< Median |
≥ Median |
Total |
“High” Values of Y |
|
|
|
|
|
<Median |
6 |
3 |
9 |
.33 |
≥ Median |
3 |
6 |
9 |
.67 |
|
|
|
|
|
These results immediately show that Group B (with X values ≥ median) has a larger proportion of high values of Y than Group A.
Y
_
YB _ 
YA
X
0 1
FIGURE 18.17
Pattern of points and means for two-group contrast where X = 0 for Group A and X = 1 for Group B.
Y
X
FIGURE 18.18
Median drawn through X variable to divide data of Figure 18.16 into two groups.
18.5.2Ordinal Strategies
The main objection to a dichotomous split for the X variable is that the result shows a contrast rather than a distinctive trend. Whenever only two groups are compared, one set of values will almost always (on average) be higher or lower than the other, but the contrast does not convey a rising or falling sense of movement.
© 2002 by Chapman & Hall/CRC
the data of Figure 18.16.
Tertile partition for X and median partition for Y in
FIGURE 18.20
X
18.5.4 Comparison of Linear-Model and Categorical Strategies
For another set of bi-dimensional data, shown in Figure 18.21, a straight line could be fitted (by methods
ˆ = +
to be discussed in Chapter 19) as Yi 6.39 0.118Xi. An alternative categorical approach, with quintile partitions for X and a binary split for Y, is shown in Figure 18.22. Using I, II, …, V for the five zones of X, the binary proportions for high values of and Y in each zone are I: .20(1/5); II: .33(2/6); III:
.50(3/6); IV: .60(3/5); and V: .80(4/5). The numbers are small, and all of the proportions are unstable. Nevertheless, the upward trend in the data seems quite clear.
The overall gradient across the five quintiles is .80 − .20 = .60, which produces an average gradient of .60/4 = .15 for each of the four changes in the 5 categories of X. Because X ranges from 0 to 50, the
In choosing a summary value to show the trend of Y in each category, we could use a mean or median, but neither of these values would immediately indicate the stability of the data. With dimensional summaries, a second index, such as a standard deviation or ipr 95,
would be needed to help denote stability. Consequently, although many data analysts will check the trend in Y by examining medians or means, many other analysts prefer to inspect the binary proportions in each zone, expressed as p = t/n, where t is the count in the “high” or other category demarcated for the binary split of n members. The single value of p immediately shows the magnitude of the binary proportion, and the constituent elements, t/n, will immediately indicate the stability. With the latter approach, the Y variable would be split at its median for the total data (as shown in Figure 18.20), and each of the X groups would be summarized, as in Section 18.5.2, for the proportion of values that lie either above or below the median.
The trend in movement of the data can be explored, however, if the X variable is partitioned into more than two categories. For dimensional values of X, this partition can produce a set of three, four, five, or more ordinal categories. The trend can then be examined for summaries of the Y variable as a mean, median, or proportion in each of those categories.
To produce a definite central category, the X variable is often split into an odd number of zones. To avoid invidious choices of boundaries and to distribute the data into relatively equal-sized groups, the split is usually done with quantiles of X. The simplest partition divides X at its tertiles. As shown in Figure 18.20, a tertile split for the X variable of Figure 18.16 produces two boundaries and divides the abscissa into three categorical zones. The proportions of “high” Y values in those three categories of X are .33 (2/6), .50 (3/6), and .67 (4/6). Although the group sizes are small, the equal gradients are compatible with a “linear” relationship.
If relatively abundant, the data are often partitioned Y
at the quintiles, which will divide X into five equalsized ordinal groups. The trend can readily be discerned from the summary value of Y in each group
as X “moves” from the lowest to highest category.
18.5.3 Choosing Summaries for Y
X
divide data of Figure 18.16 into four groups.
FIGURE 18.19
Medians drawn through X and Y variables to
Y
© 2002 by Chapman & Hall/CRC
average span in each zone is 10 units. The average gradient in each zone is thus .15/10 = .015. When Y changes from low to high values, the mean of the 14 low (i.e., ð10) values of Y is 6.07, and the mean
0 |
|
|
|
|
|
|
|
|
|
|
|
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
0 |
10 |
20 |
30 |
40 |
50 |
0 |
10 |
20 |
30 |
40 |
50 |
FIGURE 18.21 |
|
|
|
|
|
|
|
|
FIGURE 18.22 |
|
|
|
|
|
|
|
|
Set of bi-dimensional data. |
|
|
|
|
|
|
Quintile partition for X and median partition for Y in data |
|
|
|
|
|
|
|
|
|
|
|
|
of Figure 18.21. |
|
|
|
|
|
|
|
|
of the 13 high values is 12.92. Thus, the average change from a low to high value is 6.85 units (= 12.92 – 6.07) in Y. Accordingly, the average change in Y is (6.85)(.015) = .103 per unit change in X. This average value, obtained with only a crude binary-quintile split, is reasonably close to the coefficient of 0.118 obtained when the average slope was calculated with the regression-model equation.
18.5.5Role of Categorical Examination
You can decide for yourself whether you prefer to use a line or a series of central indexes to discern the trend of data in such Figures as 18.16 or 18.21. In this instance, both approaches lead to essentially the same conclusion: a generally rising trend. (The denominators are too small in each of the five zones to draw any firm conclusions about major changes in trend between adjacent zones.)
The alternative categorical strategies often play a role analogous to that of a history and physical examination in clinical practice, but they can also be the “gold standard” for showing exactly what is happening. They allow you to inspect the data directly, without being constrained by whatever emerges from application of a straight-line or other arbitrary algebraic model. The alternative categorical strategy may sometimes be just as arbitrary as a straight line, but the arbitrariness is chosen by you rather than by mathematical doctrines. One of the magnificent intellectual contributions of modern electronic computation is not the capacity for complex calculation, but the ability to let you easily explore your own data. With these explorations, you can do commonsense “clinical” evaluations that might otherwise be lost amid the mathematical elegance and computational splendor.
This type of exploration can promptly reveal the distortions that may occur in relatively simple bivariate analyses. For example, if you did not plot a graph of points for the collections of data in the middle and right sides of Figure 18.13, you would not know about the curving patterns. If you then examined only the value of b for the slope of the regression lines, you would totally miss the changing trends in different zones. A categorical examination of trends in tertile or quintile splits of the X variables, however, would promptly indicate the distinctions that were missed by the straight-line model.
A further advantage of categorical evaluations occurs, as discussed elsewhere,4 in multivariable analyses. The algebraic models rely on a series of mathematical assumptions about straight-line or other shapes for the data, but the basic assumptions are often difficult to check or confirm. If you really want to know about trends in different zones of the data, the exploration of categorical groups becomes the “gold standard” for determining whether a mathematical model has distorted the results.
© 2002 by Chapman & Hall/CRC
References
1. Galton, 1885; 2. Davis, 1976; 3. Healy, 1978; 4. Feinstein, 1996.
Exercises
Because this chapter is intended mainly to describe background strategy and ideas, the exercises do not involve many statistical operations or calculations.
18.1.Here are two opportunities to improve your skill in the “physical examination” of graphs.
18.1.1.Figure E.18.1.1 shows a straight line drawn for the points of two variables. Without knowing how the line was determined and without trying to verify its coefficients, what is one simple thing you can do mentally, without any formal calculations or specific manipulations, to determine that the line is drawn correctly on the graph?
18.1.2.The intercept seems to be wrong for the straight line in Fig. E.18.1.2; because the line does not meet the Y-axis at a value of −0.91. What feature of the graph suggests that the cited value is correct? How could you check it? Does the slope seem correct? Does the line seem to fit properly?
18.2.In any published literature at your disposal, find a paper where the authors used a straight-line pattern
to summarize the trend in a set of bivariate dimensional data. The points of data should have been shown in a graph, together with an equation for the corresponding line. Although you do not yet know how that line was formulated, submit a replicate copy of the graph (indicating its publication source) and describe your comfort or discomfort in accepting the line as a “summary” of the bivariate relationship.
After describing your response, make “guesstimates” or other arrangements of the data points to prepare an alternative analysis that might help confirm or refute your reaction. You get extra credit if you find a graph that left you uncomfortable, give good reasons for the discomfort, and prove your point with the alternative analysis.
18.3. The most common associations that appear in medical literature involve either bi-dimensional data or the doubly dichotomous “contrast” shown in a 2 × 2 table. Examining the tables or graphs of any published papers that are conveniently available, find two examples of associations that are not either bi-dimensional or “double dichotomies.” Show and label the skeleton structure of categories and/or dimensions (not the actual data) in the table or graph that connects the two variables.
100
Variable Y
FIGURE E.18.1.1
Relationship between Variable Y and Variable X in 11 persons.. A significant correlation was found (y = 1.07x + 0.02, r = 0.79, 2P < 0.01).
80
60
40
20 |
|
|
y=1.07x+1.6 |
|
|
|
|
r=0.79, p<0.01 |
|
0 |
|
|
|
|
0 |
2 0 |
4 0 |
6 0 |
8 0 |
Variable X
© 2002 by Chapman & Hall/CRC
In (Variable W)
FIGURE E.18.1.2
y = 0.04 x - 0.91
2
r = 0.55
p < 0.001
1
0
-1
Variable V
Data and regression analysis for relationship ofVariable W vs.VariableV.Variable W was appraised as a logarithm because of an “uneven distribution.”
© 2002 by Chapman & Hall/CRC
19
Evaluating Trends
CONTENTS
19.1Linear Model of Regression
19.1.1Regression Coefficient
19.1.2The “Other” Regression Line
19.1.3Correlation and Standardized Regression
19.1.4Alternative Approaches
19.2Accomplishments in Estimation
19.2.1Components of Estimations
19.2.2Sum of Partitioned Variance
19.2.3r2 and Proportionate Reduction of Variance
19.2.4Interpretation of r2
19.2.5F Ratio of Mean Variances
19.3Indications of Trend
19.3.1Magnitude of b
19.3.2Standardized Regression Coefficient
19.3.3“Quantitative Significance” for r
19.4Stochastic Procedures
19.4.1P Values
19.4.2Confidence Intervals
19.5Pragmatic Applications and Problems
19.5.1“Screening” Tests for Trend
19.5.2Predictive Equations
19.5.3Other Applications
19.6Common Abuses
19.6.1Stochastic Distortions
19.6.2Checks for Linear Nonconformity of Quantified Trends
19.6.3Potential Influence of Outliers
19.6.4Causal Implications
19.6.5Additional “Sins”
19.7Other Trends
References
Appendix: Additional Formulas and Proofs for Assertions
A.19.1 |
|
|
|
|
ˆ |
2 |
is a minimum when b = Sxy |
/Sxx |
Sr = Σ (Yi – Yi ) |
|
A.19.2 |
a = |
|
|
|
|
|
|
|
|
|
|
|
Y |
– bX |
|
|
|
|
|
|
|
|
A.19.3 |
Standardized Regression Equation is (Yi – |
|
)/sy = r(Xi – |
|
)/sx |
Y |
X |
A.19.4 |
|
|
|
|
|
|
ˆ |
|
|
|
Degrees of Freedom for Yi |
|
|
|
A.19.5 |
Parametric Variances of Regression Line |
|
|
|
|
|
|
|
|
|
|
ˆ |
|
|
|
|
A.19.5.1 Variance of σ y·x for Yi – Yi |
|
|
|
|
A.19.5.2 Variance of the Slope β |
|
|
|
A.19.6 |
Critical Ratio for t or Z Test on b or r |
|
|
|
A.19.7 |
Confidence Intervals for r and b |
|
|
|
A.19.8 |
Confidence Interval for Individual Points |
|
|
|
A.19.9 |
Confidence Interval for Intercept |
|
|
|
© 2002 by Chapman & Hall/CRC
A.19.10 Comparison of Two Regression Lines
A.19.11 Transformations of r
Exercises
Like any set of strategic principles, the estimations and covariations discussed in Chapter 18 need operational methods to obtain specific results and to evaluate the accomplishments. This chapter is devoted to the most commonly used “classical” methods that produce regression and correlation analyses for bi-dimensional data.
The first four main sections of the chapter contain many statistical concepts and indexes that “set the stage” before the pragmatic “show” of applications and abuses arrives in Sections 5 and 6. The lengthy discourse is justified because everything previously discussed in the text (for one and two groups) and almost everything that follows (for more complex statistical activities later and even for multivariable analyses) can be regarded as extensions of the basic principles used for correlation and regression.
19.1 Linear Model of Regression
Anyone who has seen straight-line graphs for expressing the data of physics or chemistry will be immediately familiar with using the slope of the line as an index of a targeted relationship. If Y depends on X, and if the straight line is expressed as Y = a + bX, the slope of the line is b. If Y tends to rise as X rises, the line will slope upward and b will be positive; if Y tends to fall as X rises, the line will slope downward and b will be negative. The steeper the slope, the greater will be the absolute magnitude of b, and the stronger the relationship. If the line is essentially horizontal, showing no substantially upward or downward slope, b will be close to zero, suggesting that Y has little or no relationship to X.
When this same principle is used in bivariate analyses, we first make the assumption that X and Y have a rectilinear relationship, and we then find the slope of the straight line that best fits the data. There will actually be two possible lines: one in which Y depends on X, and the other in which X depends on Y.
19.1.1Regression Coefficient
If Y depends on X, the line expressed as
ˆ
Yi = a + bXi
ˆ
represents the regression of Y on X. The Yi estimates the value of Y that lies on the line for each observed value of Xi . The Yi symbol, without the hat, represents the corresponding observed value in the data.
As shown later (in Appendix A.19.2) the value of the intercept, a, is calculated as Y – b X . Accordingly, the line is regularly written in the form of deviations as
ˆ |
[19.1] |
Yi – Y = b(Xi – X ) |
ˆ
When Xiˆ= X, Yi = Y. Therefore, the line always passes through the mean of the two variables. When Xi = 0, Yi = Y – bX = a, and so the line intersects the y-axis at the intercept value of a. The slope of the line, b, is called the regression coefficient.
19.1.1.1 Method of Calculation — Of the various approaches that might be used for determining the appropriate values of a and b for the data, the traditional and still most popular strategy is the “principle of least squares.” According to this principle, b is chosen to give the smallest possible
ˆ |
2 |
. Each |
ˆ |
|
|
(i.e., the “least”) value to the quantity Σ (Yi – Yi ) |
|
Yi – Yi value represents the residual deviation |
ˆ |
|
|
|
|
|
between the observed Yi and the estimated Yi; and the goal is to minimize the sum of squares for the |
|
|
|
ˆ |
2 |
. |
residual deviations. With the symbol r (for regression or residual), this sum is Sr = Σ (Yi – Yi ) |
|
© 2002 by Chapman & Hall/CRC