Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Encyclopedia of SociologyVol._3

.pdf
Скачиваний:
16
Добавлен:
23.03.2015
Размер:
6.4 Mб
Скачать

MULTIPLE INDICATOR MODELS

satisfaction produces an r2 of .04—i.e., income explains 4 percent of the variance in life satisfaction.) In sum, given a correlation of .2, we can predict life satisfaction better by knowing someone’s income than if we did not have this information (we reduce our errors by 4 percent) but we will still make a lot of errors in our prediction (96 percent of the variance in life satisfaction remains unexplained).

These errors in prediction stem in part from less-than-perfect measures of income and life satisfaction analyzed (a topic covered in the next section). However, they also occur because there are many other causes of life satisfaction (e.g., physical health) in addition to income. The more of these additional causes there are, and the stronger their effects (i.e., the stronger their correlation with life satisfaction), the weaker the ability of a single construct such as income to predict life satisfaction. The same principles apply, of course, to any construct used to predict other constructs (e.g., using people’s level of stress to predict the amount of aggression they will display).

Correlation coefficients and ‘‘path coefficients’’ are part of the ‘‘language’’ of causal modeling, including multiple indicator models. Like a correlation coefficient, a path coefficient describes the strength of the relationship between two variables. One can interpret a (standardized) path coefficient in a manner roughly similar to a correlation coefficient. Readers will increase their understanding of the material to follow if they familiarize themselves with these measures of strength of association. (For more information on interpreting correlation and path coefficients, see Blalock [1979]; Kline [1998].)

RELIABILITY AND VALIDITY OF

MEASURES

As noted earlier, measurement errors can bias estimates of the true causal associations between constructs of interest to sociologists and other researchers. Accordingly, it is important to use measures of high quality. Specialists in the field of measurement (often labeled ‘‘psychometricans’’) describe high-quality measures as having strong reliability and validity (Nunnally and Bernstein 1994). Reliability concerns the consistency with

which an indicator measures a given construct; validity assesses whether one is measuring what one intends to measure or something else.

Reliability A common method of assessing reliability is to determine the strength of the correlation (consistency) between alternative (multiple) indicators of the same construct—for example, the correlation between two different measures of life satisfaction. A correlation of 1.0 would suggest that both indicators are perfectly reliable measures of life satisfaction. Conversely, a correlation of, say, .3 would suggest that the two indicators are not very reliable measures of life satisfaction.

Given the subjective nature of indicators of life satisfaction (and many other constructs found in the social and behavioral sciences), we should not be surprised to find fairly low correlations (consistency) among their many possible multiple indicators. The ambiguity inherent in agreeing or disagreeing with statements like ‘‘I am satisfied with my life,’’ ‘‘The conditions of my life are excellent,’’ and ‘‘In most ways my life is close to my ideal’’ should introduce considerable measurement error. Furthermore, we might anticipate that much of this measurement error would be random. That is, relative to respondents’ actual (‘‘true’’) scores for life satisfaction, they would provide answers (observed scores) to the subjective questions regarding life satisfaction that are likely to be nonsystematic. For example, depending on the degree of ambiguity in the multiple indicators for subjective construct like life satisfaction, a person is likely to display a random pattern of giving too high and too low scores relative to the person’s true score across the set of measures.

This ‘‘noise’’—unreliability due to random measurement error—will reduce the correlation of a given indicator with another indicator of the same construct. Indeed, ‘‘pure’’ noise (e.g., completely random responses to questions concerning respondents’ life satisfaction) should not correlate with anything (i.e., r = 0). To the extent that researchers can reduce random noise in the indicators (e.g., by attempting to create as clearly worded self-report measures of a construct as possible) the reliability and corresponding correlations among multiple indicators should increase. Even where researchers are careful however, to select the best available indicators of constructs

1908

MULTIPLE INDICATOR MODELS

that represent subjective states (like life satisfaction), correlations between indicators frequently do not exceed r’s of .3 to .5.

Not only does less-than-perfect reliability reduce correlations among multiple indicators of a given construct, but, more importantly, this random measurement error also reduces the degree to which indicators for one latent construct correlate with the indicators for another latent construct. That is, unreliable measures (such as each of the multiple indicators of life satisfaction) will underestimate the true causal linkages between constructs of interest (e.g., the effect of income on life satisfaction). These biased estimates can, of course, have adverse consequences for advancing our scientific knowledge (e.g., perceiving income as a less important source of life satisfaction than it might actually be). (Although unreliable measures will always underestimate the true relationship between two constructs, the bias of unreliable measures is more complex in ‘‘multivariate’’ situations where important control variables may exhibit as much unreliability as [or more unreliability than] do the predictor and outcome variables of interest.)

Psychometricans have long been aware of this problem of ‘‘attenuated correlations’’ from unreliable measures. In response, traditional practice is to combine each of the multiple indicators for a given construct into a single composite scale (e.g., sum a person’s score across each life satisfaction indicator). The random errors contained in each individual indicator tend to ‘‘cancel each other out’’ in the composite scale (cf. Nunnally and Bernstein 1994), and the overall reliability (typically measured with Cronbach’s alpha) on scale ranging from 0 to 1.0 can improve substantially relative to the reliability of individual items within the scale. Although composite scales are a definite step in the right direction, they are still less than perfectly reliable, and often much less. Consequently, researchers are still faced with the problem of biased estimates of the causal linkages among constructs of interest.

Validity. Unreliable indicators are not the only source of measurement error that can bias estimates of causal linkages among constructs. Invalid indicators can also create biased estimates. As we shall see in subsequent sections, bias from invalid measures stems from different sources and

is more complex and difficult to detect and control than bias from unreliable measures.

Valid measures require at least modest reliability (i.e., correlations among indicators of a given construct cannot be r = 0); but reliable measures are not necessarily valid measures. One can have multiple indicators that are moderately to highly reliable (e.g., r’s = .5 to .8), but they may not measure the construct they are supposed to measure (i.e., may not be valid). For example, life satisfaction indicators may display at least moderate reliability, but no one would claim that they are valid measures of, say, a person’s physical health.

This example helps clarify some differences between reliability and validity, but at the risk of obscuring the difficulty that researchers typically encounter in establishing valid measures of many latent constructs. Continuing with our example, researchers may select multiple indicators of life satisfaction that critics could never plausibly argue actually measure physical health: Critics might make a very plausible argument, however, that some or all the indicators of life satisfaction also measure a more closely related concept—such as ‘‘optimism.’’

Note, too, that if the life satisfaction indicators do, in fact, also measure optimism, then the correlation that income has with the life satisfaction indicators could stem entirely from income’s causal links with an optimistic personality, rather than from income’s effect on life satisfaction itself. In other words, in this hypothetical situation, invalid (‘‘contaminated’’) measures of life satisfaction could lead to overestimating income’s effect on life satisfaction (though, as we will see in later sections, one can construct examples where invalid measures of life satisfaction could also lead to underestimating income’s effect).

Given the many subjective, loosely defined constructs that form the core concepts of sociology and other social and behavioral sciences, the issue of what the indicators are actually measuring (i.e., their validity) is a common and often serious problem. Clearly, our scientific knowledge is not advanced where researchers claim a relationship between constructs using invalid measures of one or more of the constructs.

The sections, below, on single-indicator and multiple-indicator models will elaborate on the

1909

MULTIPLE INDICATOR MODELS

bias introduced by measurement error stemming from unreliability and invalidity, and how to use the multiple indicators and ‘‘path analysis’’ of structural equation modeling (SEM) to test and correct for the bias. This discussion continues to use the example of estimating the effect of income on life satisfaction in the face of measurement error.

SINGLE-INDICATOR MODELS

Figure 1 depicts the hypothesized causal link (solid arrow labeled ‘‘x’’) between the latent (unobservable) constructs of income and life satisfaction (represented with circles), and the hypothesized causal link (solid arrows labeled ‘‘a’’ and ‘‘b’’) between each latent construct (circle) and its respective empirical (observed) indicator (box). The D (disturbances) in Figure 1 represents all potential causes of life satisfaction that the researcher has not included in the causal model (stressful life events, personality characteristics, family relationships, etc.). The E’s in Figure 1 represent random measurement error (and any unspecified latent constructs that have a ‘‘unique’’ effect on a given indicator).

Because each latent construct (circled ‘‘I’’ and ‘‘LS’’) in Figure 1 has only one indicator (boxed ‘‘I1‘‘ or ‘‘LS1’’) to measure the respective construct, researchers describe the diagram as a causal model with single (as opposed to multiple) indicators. Additionally, Figure 1 displays a dashed, doubleheaded arrow (labeled ‘‘r1’’) between the box for income and the box for life satisfaction. This dashed, double-headed arrow represents the empirical or observed correlation between the empirical indicators of income and life satisfaction. Following the logic diagramed in Figure 1, this observed correlation is the product of actual income affecting both measured income through path a, and actual life satisfaction through path x. In turn, actual life satisfaction affects measured life satisfaction via path b. Stated in more formal ‘‘path equations,’’ a*x*b = r1.

Note, that researchers can never directly observe the true causal effect (i.e., path x) of actual income (I) on actual life satisfaction (LS). Researchers can only infer such a relationship based on the observed correlation (r1) between the empirical indicators for income and life satisfaction— I1 and LS1. In other words, social scientists use

an observed correlation—r1—to estimate an unobservable true causal effect—path x.

Notice also that in the presence of random measurement error, the observed correlation r1 will always be an underestimate of the unobservable path x representing the hypothesized true effect of income on life satisfaction. Of course, researchers hope that r1 will equal path x. But r1 will only equal x if our empirical measures of income and life satisfaction (I1 and LS1) are perfectly reliable—that is, have no random measurement error.

The phrase ‘‘completely reliable measures’’ implies that each person’s observed score for income and life satisfaction indicators reflect exactly that person’s actual or ‘‘true score’’ for income and life satisfaction. If the indicators for income and life satisfaction are indeed perfect measures, then researchers can attach a (standardized) path coefficient of 1.0 to each path (i.e, a and b) between the latent constructs and their indicator. Likewise, researchers can attach a path coefficient of 0 to each path (i.e., d and e) representing the effects of random measurement errors (E1 and E2) on the respective indicators for income and life satisfaction.

The path coefficient of 1.0 for a and b signifies a perfect relationship between the latent construct (i.e., true score) and the measure or indicator for the latent construct (i.e., recorded score). In other words, there is no ‘‘slippage’’ between the actual amount of income or life satisfaction people have and the amount of income or life satisfaction that a researcher records for each person (i.e., there is no random measurement error). Therefore, people who truly have the highest income will report the most income, those who truly have the lowest income will report the lowest income, and so on. Likewise, people who, in fact, have the most life satisfaction will always record a life satisfaction score (e.g., ‘‘5’’) higher than those a little less satisfied (e.g., ‘‘4’’), individuals a little less satisfied will always record a life satisfaction score higher than those a little bit less satisfied yet (e.g., ‘‘3’’); and so on.

Under the assumption that the measures of income and life satisfaction are perfectly reliable, social scientists can use the observed correlation (r1) between the indicators I1 and LS1 to estimate the true causal effect (path x) of actual income (I) on actual life satisfaction (LS). Specifically, r1 = a*x*b;

1910

MULTIPLE INDICATOR MODELS

 

 

Unmeasured

 

 

causal variables

 

 

D

I

X

LS

Actual

Actual

 

Income

True

Life Satisfaction

 

Effect

 

a

 

b

I1

 

LS1

Measured

 

Measured

Income

 

Life Satisfaction

Response to “How much

 

Response to “I am

money do you earn yearly?”

 

satisfied with my life.”

d

 

e

E1

r1

E2

 

Random

Observed

Random

measurement error

correlation

measurement error

Figure 1. Single-Indicator Model for Estimating the Effect of Income on Life Satisfaction.

hence, r1 = 1.0*x*1.0, or r1 = x. Thus, if the observed correlation between income and life satisfaction (i.e., r1) is, say, .2, then the true (unobservable) causal effect of income on life satisfaction (i.e., path x) would also be .2. (For more detailed explanations of how to interpret and calculate path coefficients, see Sullivan and Feldman [1979]; Loehlin [1998].)

Of course, even if researchers were to measure income and life satisfaction with perfect reliability (i.e., paths a and b each equal 1.0), there are other possible errors (‘‘misspecifications’’) in the model shown in Figure 1 that could bias researchers’ estimates of how strong an effect income truly has on life satisfaction. That is, Figure 1 does not

depict other possible ‘‘misspecifications’’ in the model such as ‘‘reverse causal order’’ (e.g., amount of life satisfaction determines a person’s income) or ‘‘spuriousness’’ (e.g., education determines both income and life satisfaction and hence only makes it appear that income causes life satisfaction). (For more details on these additional sources of potential misspecification, see also Blalock [1979].)

How realistic is the assumption of perfect measurement in single indicator causal models? The answer is, ‘‘It depends.’’ For example, to assume a path coefficient of 1.0 for path a in Figure 1 would not be too unrealistic (the actual coefficient is likely a little less than 1.0—say, .90 or

.95). That is, we would not expect many errors in

1911

MULTIPLE INDICATOR MODELS

measuring a person’s true income. Likewise, we would expect few measurement errors in recording, say, a person’s age, sex, or race. As noted earlier, the measurement of subjective states (including satisfaction with life) is likely to occur with considerable error. Therefore, researchers would likely underestimate the true causal link between income and life satisfaction, if they assumed no random measurement error for the life satisfaction indicator—that is, if they assumed a coefficient of 1.0 for path b in Figure 1.

How badly researchers underestimate the true causal effect (i.e., path x) would depend, of course, on how much less than 1.0 was the value for path b. For the sake of illustration, assume that path b equals .5. Assume also that income is perfectly measured (i.e., path a = 1.0) and the observed correlation (r1) between the indicators for income and life satisfaction is .2. Under these conditions, 1.0*x*.5 = .2, and x = .4. In other words, researchers who report the observed correlation between income and life satisfaction (r1 = .2) would substantially underestimate the strength of the true effect (x = .4) that income has on life satisfaction. Recall that an r of .2 represents an r2 of .04, or 4 percent of the variance in life satisfaction explained by income, whereas an r of .4 represents an r2 of .16, or 16 percent of the variance explained by income. Accounting for 16 percent of the variance in life satisfaction represents a much stronger effect for income than does 4 percent explained variance. Based on this hypothetical example, income goes from a weak to a relatively strong predictor of life satisfaction, if we have the appropriate information to allow us to correct for the bias of the unreliable measure of life satisfaction.

But how do researchers know what values to assign to the unobservable paths (such as a and b in Figure 1) linking a given latent construct to its single indicator? Unless logic, theory, or prior empirical evidence suggests that the constructs in a given model are measured with little error (i.e., the path between the circle and the box = 1.0) or indicate what might be an appropriate value less than 1.0 to assign for the path between the circle and box, researchers must turn from single indicator models to multiple indicator models. As noted earlier, multiple indicator models allow one to make corrections for measurement errors that would otherwise bias estimates of the causal relationships between constructs.

MULTIPLE INDICATOR MODELS

For researchers to claim that they are using multiple indicator models, at least one of the concepts in a causal model must have more than one indicator. ‘‘Multiple indicators’’ means simply that a causal model contains alternative measures of the same thing (same latent construct). Figure 2 depicts a multiple indicator model in which income still has a single indicator but life satisfaction now has three indicators (i.e., three alternative measures of the same life satisfaction latent construct). In addition to the original indicator for life satis- faction—‘‘I am satisfied with my life’’—there are two new indicators; namely, ‘‘In most ways my life is close to my ideal’’ and ‘‘The conditions of my life are excellent.’’ (Recall that the possible response categories range from ‘‘strongly agree’’ to ‘‘strongly disagree.’’ See also Diener et. al. [1985] for a fuller description of this measurement scale.)

As in Figure 1, the dashed, double-headed arrows represent the observed correlations between each pair of indicators. Recall that these observed correlations stem from the assumed operation of the latent constructs inferred in the causal model. Specifically, an increase in actual income (I) should produce an increase in measured income (I1) through (causal) path a. Moreover, an increase in actual income should also produce an increase in actual life satisfaction (LS) through (causal) path x, which in turn should produce an increase in each of the measures of life satisfaction (LS1, LS2, and LS3) through (causal) paths b, c, and d. In other words, these hypothesized causal pathways should produce observed correlations between all possible pairs of the measured variables (I1, LS1, LS2, and LS3). (We are assuming here that the underlying measurement model is one in which the unobserved constructs have a causal effect on their respective indicators. There are some types of multiple indicators, however, where a more plausible measurement model would suggest the reverse causal order—i.e., that the multiple indicators each drive the latent construct common to the set of indicators. See, Kline [1998] for a discussion of these ‘‘cause’’ indicator measurement models, as opposed to the more traditional ‘‘effect’’ indicator measurement model described here.)

The use of a single indicator for income means that researchers must use logic, theory, or prior

1912

MULTIPLE INDICATOR MODELS

I

 

X = .4

LS

 

 

Actual

 

Actual

 

 

 

 

 

 

 

Income

 

True

Life Satisfaction

 

 

 

Effect

 

 

 

a = 1 . 0

 

 

b = .5

c = .7

d=.7

 

 

 

 

 

 

 

I1

 

 

LS1

LS2

 

LS3

Measured

 

 

Measured

Measured

 

Measured

Income

 

Life Satisfaction

Life Satisfaction

Life Satisfaction

Response to

 

 

Response to

Response to

 

Response to

“How much money

 

“I am satisfied

“In most ways

 

“The conditions

 

 

with my life.”

 

do you earn yearly?”

 

 

my life is close

 

of my life

 

 

 

 

 

 

 

 

to my ideal.”

 

are excellent.”

 

r 1 = 2 . 0

 

r 4 = . 3 5

r 6 = . 4 9

 

E1

r 2

= . 2 8

E2

E3

 

E4

 

 

r 5 = . 3 5

 

 

 

 

 

 

 

 

 

 

 

r 3 = . 2 8

 

 

Figure 2. Multiple-Indicator Model for Estimating the Effect of Income on Life Satisfaction.

empirical evidence to assign a path coefficient to represent ‘‘slippage’’ between the true score (I) and the measured score (I1). For income, path a = 1.0 (i.e., no random measurement error) seems like a reasonable estimate, and makes it easier to illustrate the path equations. (Although a coefficient of say, .95, might be more realistic, whether we use 1.0 or .95 will make little difference in the calculations that follow.) As noted earlier, however, indicators of life satisfaction are not as easily assigned path coefficients of 1.0. That is, there is likely to be considerable random error (unreliability) in measuring a subjective state such as life satisfaction. Fortunately, however, the use of multiple indicators for the life satisfaction construct permits researchers to provide reasonable estimates of random measurement error based on the empirical data in the current study—namely, the observed correlations (i.e., consistency) among the multiple indicators. Because measurement error can vary so much from one research setting to

another, it is always preferable to provide estimates of reliability based on the current rather than previous empirical studies. Likewise, reliability estimates based on the current study are much better than those estimates obtained from logic or theory, unless the latter sources can provide a compelling case for a highly reliable single indicator (such as the measure of income used in the present example).

If the multiple indicator model in Figure 2 is correctly specified, then the observed correlations among the several pairs of indicators should provide researchers with information to calculate estimates for the hypothesized (unobservable) causal paths b, c, and d (i.e., estimates of how much ‘‘slippage’’ there is between actual life satisfaction and each measure of life satisfaction). Researchers can use hand calculations involving simple algebra to estimate the causal paths for such simple multiple indicator models as depicted in Figure 2 (for

1913

MULTIPLE INDICATOR MODELS

examples, see Sullivan and Feldman 1979 and Loehlin 1998). But more complicated models are best left to ‘‘structural equation modeling’’ (SEM) computer software programs such as LISREL (Linear Structural Relationships; Joreskog and Sorbom 1993), EQS (Equations; Bentler 1995), or AMOS (Analysis of Moment Structures; Arbuckle 1997). Kline (1998) provides a particularly excellent and comprehensive introduction to the topic. There are two annotated bibliographies that represent almost all work related to SEM up to about 1996 (Austin and Wolfe 1991; Austin and Calderon 1996). Marcoulides and Schumacker (1996) and Schumacker and Marcoulides (1998) cover even more recent advances. Smallwaters software company has a Web site that gives a wealth of information, including other relevant Web sites: http:// www.smallwaters.com/weblinks.

In essence, these SEM computer programs go through a series of trial-and-error ‘‘iterations’’ in which different values are substituted for the hypothesized causal paths—in Figure 2, paths b, c, d, and x. (Recall that we assigned a value of 1.0 for path a, so the SEM program does not have to estimate a value for this hypothesized path.) Ultimately, the program reaches (‘‘converges on’’) a ‘‘final solution.’’ This solution will reproduce as closely as possible the observed correlations—in Figure 2, r1, r2, r3, r4, r5, and r6—among each of the indicators in the proposed causal model. In the final solution depicted in Figure 2, the path estimates for b, c, d, and x (when combined with the ‘‘assigned’’ or ‘‘fixed’’ value for path a) exactly reproduce the observed correlations. (Note that the final solution will reproduce the observed correlations better than will the initial solutions, unless, of course, the SEM program finds the best solution on its first attempt-which is not likely in most ‘‘real-world’’ data analyses.)

More technically, the SEM program builds a series of (simultaneous) equations that represent the various hypothesized causal paths that determine each observed correlation. In Figure 2, the correlation (r1 = .20) for I1 and LS1 involves the ‘‘path’’ equation: a*x*b = .20; for the correlation (r2 = .28) of I1 and LS2: a*x*c = .28; for the correlation (r3 = .28) of I1 and LS3: a*x*d = .28; for the correlation (r4 = .35) of LS1 and LS2: b*c = .35; for the correlation (r5 = .35) of LS1 and LS3: b*d = .35; and for the correlation (r6 = .49) of LS2 and LS3: c*d =

.49. The SEM program then uses the known val- ues—observed correlations and, in the causal model for Figure 2, the fixed (predetermined) value of 1.0 for path a—to simultaneously solve the set of equations to obtain a value for each of the causal paths that initially have unknown values.

Except in artificial examples (like Figure 2), however, the SEM program is unlikely to obtain final values for the causal paths such that the path equations exactly reproduce the observed correlations. More specifically, the program will attempt through its iterative trial-and-error procedures to find a final value for each of the causal paths b, c, d, and x that will minimize the ‘‘average discrepancy’’ across each of the six model-implied correlations (i.e., predicted by the path equations) versus the six empirically observed correlations.

For example, to reproduce the observed correlation (r4 = .35) between LS1 and LS2 (recall that empirical correlations among indicators of subjective states like life satisfaction often range between

.3 and .5), the SEM program would have to start with values for causal paths b and c (i.e., estimates of ‘‘slippage’’ between actual life satisfaction and measured life satisfaction) considerably lower than 1.0. In other words, the software program would need to allow for some random measurement error. If, instead, the initial solution of the SEM program assumed perfect reliability, there would be a substantial discrepancy between at least some of the implied versus observed correlations among the indicators. That is, multiplying the path b = 1.0 by the path c = 1.0 (i.e., assuming perfect reliability of each indicator) would imply an observed correlation of 1.0 (i.e., perfect consistency) between LS1 and LS2—an implied (i.e., predicted) correlation that far exceeds the r4 = .35 correlation we actually observe. (Keep in mind that, as depicted in Figure 2, the best values for b and c are .5 and .7, respectively, which the SEM program will eventually converge on as it ‘‘iterates’’ to a final solution).

If, at the next iteration, the SEM program were to substitute equal values for b and c of about

.59 each (to allow for less than perfect reliability), this solution would exactly reproduce the observed correlation of .35 (i.e., .59*.59 = .35) between LS1 and LS2. But using a value of .59 for both b and c would not allow the program to reproduce the observed correlations for LS1 and LS2 with I1 (the indicator for income). To obtain the observed

1914

MULTIPLE INDICATOR MODELS

correlation of .20 between I1 and LS1, the program needs to multiply the paths a*x*b. Accordingly, r1 (.20) should equal a*x*b—that is, 1.0*x*.59 = .20. Solving for x, the program would obtain a path value of about .35. Likewise, to obtain the observed correlation of .28 between I1 and LS2, the program needs to multiply the paths a*x*b. Accordingly, r2 (.28) must equal a*x*c—that is, 1.0*x*.59 = .28. Solving for x, the program would obtain a path value of about .47. In other words, the program cannot find a solution for the preceding two equations that uses the same value for x. That is, for the first equation x . .35. But for the second equation x . .47.

Given the SEM program’s need to come up with a unique (i.e., single) value for x (and for all the other causal paths the program must estimate), a possible compromise might be to use a value of

.41. Substituting this value into the preceding two equations—a*x*b and a*x*c—would provide an implied correlations of about .24—1.0*.41*.59 .

.24—for both equations. Comparing this implied correlation with the observed correlations of .20 and .28 for I1/LS1 and I1/LS2, respectively, the discrepancy in these two situations is +/- .04.

Although this is not a large discrepancy between the implied and the observed correlations, the SEM program can do better (at least in this hypothetical example). If the SEM program subsequently estimates values of .50 and .70 for causal paths b and c, respectively, then it is possible to use the same value of x (specifically, .40) for each path equation involving I1/LS1 and I1/LS2, and reproduce exactly the observed correlations for r1 (i.e., 1.0*.40*.50 = .20) and r2 (i.e., 1.0*.40*.70 = .28). By using these estimated values of .5 and .7 for paths b and c in place of the .6 and .6 values initially estimated, the program can also reproduce exactly the observed correlation (r4 = .35) between LS1 and LS2—that is, .5*.7 = .35. Furthermore, by using an estimated value of .7 for causal path d, the program can exactly reproduce all the remaining observed correlations in Figure 2—r3 = .28, r5 = .35, and r6 = .49—that involve path d, that is, a*x*d, b*d, and c*d, respectively. (We leave to the reader the task of solving the equations.)

In sum, by using the fixed (a priori) value of a = 1.0 and the estimated values of x = .4, b = .5, c = .7, and d = .7, the six implied correlations exactly

match the six observed correlations depicted in Figure 2. In other words, the hypothesized causal paths in our model provide a ‘‘perfect fit’’ to the ‘‘data’’ (empirical correlations).

Reproducing the observed correlations among indicators does not, however, establish that the proposed model is correct. In the logic of hypothesis testing, one can only disconfirm models, not prove them. Indeed, there is generally a large number of alternative models, often with entirely different causal structures, that would reproduce the observable correlations just as well as the original model specified (see Kim and Mueller 1978 for examples). It should be apparent, therefore, that social scientists must provide rigorous logic and theory in building multiple indicator models, that is, in providing support for one model among a wide variety of possible models. In other words, multiple indicator procedures require that researchers think very carefully about how measures are linked to latent constructs, and how latent constructs are linked to other latent constructs.

Additionally, it is highly desirable that a model contain more observable correlations among indicators than unobservable causal paths to be esti- mated—that is, the model should be ‘‘overidentified.’’ For example, Figure 2 has six observed correla-

tions (r1, r2, r3, r4, r5, and r6) but only four hypothesized causal paths (x, b, c, and d) to estimate. Thus,

Figure 2 is overidentified—with two ‘‘degrees of freedom’’ (df) By having an excess of observed correlations versus hypothesized causal paths (i.e., by having at least one and preferably many degrees of freedom), a researcher can provide tests of ‘‘model fit’’ to assess the probability that there exist alternative causal paths not specified in the original model. (Where the fit between the implied vs. observed correlations is poor, researchers typically seek to revise their causal model to better fit the data.)

Conversely, ‘‘just-identified’’ models will contain exactly as many observable correlations as hypothesized causal paths to be estimated (i.e., will have 0 degrees of freedom). Such models will always produce estimates (solutions) for the causal paths that exactly reproduce the observable corre- lations—no matter how badly misspecified the proposed causal pathways may be. In other words,

1915

MULTIPLE INDICATOR MODELS

‘‘perfect’’ fit is inevitable and provides no useful information regarding whether the model is correctly specified or not. This result occurs because, unlike an overidentified model, a just-identified model does not have any degrees of freedom with which to detect alternative causal pathways to those specified in the original model. Accordingly, just-identified models are not very interesting to SEM practitioners.

Finally, the worst possible model is one that is ‘‘underidentified,’’ that is, has fewer observable correlations than unobservable causal paths to be estimated. Such models can provide no single (unique) solutions for the unobservable paths. In other words, an infinite variety of alternative estimates for the causal paths is possible. For example, if we restricted Figure 2 to include only LS1 and LS2 (i.e., dropping LS3 and I1), the resulting two-indica- tor model of life satisfaction would be underidentified. That is, we would have two causal paths to esti- mate—from the latent construct to each of the two indicators (i.e., paths b and c)—but only one observed correlation (r4 = .35). Under this situation, there is no unique solution. We can literally substitute an infinite set of values for paths b and c to exactly reproduce the observed correlation (e.g., given b*c = .35, we can use .7*.5 or .5*.7, or two values slightly less than .6 each, and so on).

The number of indicators per latent construct helps determine whether a model will be overidentified or not. In general, one should have at least three and preferably four or more indicators per latent construct—unless one can assume a single indicator, such as income in Figure 2, has little measurement error. Adding more indicators for a latent construct rapidly increases the ‘‘overidentifying’’ pieces of empirical information (i.e., degrees of freedom). That is to say, observable correlations (between indicators) grow faster than the unobservable causal paths (between a given latent construct and indicator) to be estimated.

For example, adding a fourth indicator for life satisfaction in Figure 2 would require estimating one additional causal path (linking the life satisfaction latent construct to the fourth indicator), but would also produce four more observed correlations (LS4 with LS3, LS2, LS1, and I1). The modified model would thus have three more degrees of freedom, and correspondingly greater power to

determine how well the model fits the empirical data (observed correlations). Including a fifth indicator for life satisfaction would produce four more degrees of freedom, and even more power to detect a misspecified model. (The issue of model identification is more complicated than outlined here. The requirement that an identified model have at least as many observed correlations as causal paths to be estimated is a necessary but not sufficient condition; cf. Kline [1998].)

Some additional points regarding multiple indicator models require clarification. For instance, in ‘‘real life’’ a researcher would never encounter such a perfect reproduction of the (noncausal) observable correlations from the unobservable (causal) paths as Figure 2 depicts. (We are assuming here that the causal model tested in ‘‘real life,’’ like that model tested in Figure 2, is ‘‘overidentified.’’ Recall that a ‘‘just-identified’’ model always exactly reproduces the observed correlations.) Indeed, even if the researcher’s model is correctly specified, the researcher should expect at least some minor discrepancies in comparing the observed correlations among indicators with the correlations among indicators predicted (implied) by the hypothesized causal paths.

Researchers can dismiss as ‘‘sampling error’’ (i.e., ‘‘chance’’) any discrepancies that are not too large (given a specific sample size). At some point, however, the discrepancies do become too large to dismiss as ‘‘chance.’’ At that point, researchers may determine that they have not specified a proper model. Poor model fit is strong grounds for reevaluating and respecifying the original causal model—typically by adding and (less often) subtracting causal paths to obtain a better fitting model. Just because an overidentified model can detect that a model is misspecified does not mean, however, that it is easy to then tell where the misspecification is occurring. Finding and correcting misspecification is a complex ‘‘art form’’ that we cannot describe here (but see Kline 1998 for an overview).

The next section will describe how nonrandom measurement error can create a misfitting multiple indicator model and corresponding bias in estimates of causal pathways. Additionally, we will demonstrate how a just-identified model, in contrast to an overidentified model, will fail to detect

1916

MULTIPLE INDICATOR MODELS

and thus correct for this misspecification, resulting in considerable bias in estimating the true effect of income on life satisfaction.

MULTIPLE-INDICATOR MODELS WITH NONRANDOM MEASUREMENT ERROR

We have discussed how poor quality measures— low reliability and low validity—can bias the estimates of the true effects of one latent construct on another. Our specific modeling examples (Figures 1 and 2), however, have focused on the bias introduced by unreliable measures only. That is, our causal diagrams have assumed that all measurement error is random. For example, Figure 2 depicts the error terms (E’s) for each of three multiple indicators of life satisfaction to be unconnected. Such random measurement error can occur for any number of reasons: ambiguous questions, coding errors, respondent fatigue, and so forth. But none of these sources of measurement error should increase correlations among the multiple indicators. Indeed, as noted in previous sections, random measurement error should reduce the correlations (consistency) among multiple indicators of a given latent construct—less-than-perfect correlations that researchers can then use to estimate reliability and thereby correct for the bias that would otherwise occur in underestimating the true effect of one latent construct on another (e.g., income on life satisfaction).

Conversely, where measurement error increases correlations among indicators, social scientists describe it as systematic or nonrandom. Under these conditions, the measurement errors of two or more indicators have a common source (a latent construct) other than or in addition to the concept that the indicators were suppose to measure. The focus now becomes the validity of measures. Are you measuring what you claim to measure or something else? (See the section above entitled ‘‘Reliability and Validity of Measures’’ for a more general discussion.)

Failure to include nonrandom—linked or ‘‘cor- related’’—errors in a multiple-indicator model will bias the estimates of other causal paths in the model. Figure 3 depicts such a linkage of error terms through the personality variable ‘‘optimism’’

(O). Based on the hypothetical model in Figure 3,

the observed correlation (r6) between the indicators LS2 and LS3 would not be entirely the consequence of the effects of life satisfaction (LS) operating through the causal paths c and d. In fact, part of this observed correlation would occur as a consequence of the causal paths e and f (which in this hypothetical example, we have constrained to be equal). In other words, the indicators LS2 and LS3 measure some of life satisfaction but also optimism. That is, they measure something in addition to what they were intended to measure. Stated in still other words, the two indicators are not ‘‘pure’’ (completely valid) measures of life satisfaction because they are ‘‘contaminated’’ by also tapping optimism.

Note that r6 is the only observed correlation that differs in strength in comparing Figures 2 and 3. For Figure 2, this correlation equals .49; for Figure 3, this correlation equals .85. The higher observed correlation for r6 in Figure 3 stems from the ‘‘inflated’’ correlation produced by the effects of optimism through paths e (.6) and f (.6). Note, also, that .6*.6 equals .36. If we add .36 to the original observed correlation for r6 (i.e., .49) in Figure 2, we obtain the observed correlation of .85 in Figure 3. All other paths estimates and observed correlation remain the same across the two figures. Furthermore, like Figure 2, Figure 3 depicts path estimates for a final solution that exactly reproduce all observed correlations.

Note, too, if we had not added the causal paths (e and f) to represent the hypothesized effect of optimism on two measures of life satisfaction, we could not obtain a ‘‘good fit’’ for the observed correlations in Figure 3. Indeed, without these additional paths, the SEM computer program would have to increase the path coefficients for c and d— say, to about .92 each—in order to reproduce the observed correlation of .85 for r6. But then the program would fail to reproduce the observed correlations involving LS2 and LS3 with the other indicators in the model (LS1 and I1). For example, the observed correlation between I1 and LS2 (r2 =

.28) would now be overestimated, based on the product of the causal paths—a*x*c—that the model in Figure 3 suggests determines r2. That is, 1.0*.4*.92 results in an implied (predicted) correlation of about .37, which leaves a discrepancy of

.09 relative to the correlation (.28) actually observed.

1917

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]