
Словари и журналы / Психологические журналы / p125British Journal of Mathematical and Statistical Psycholo
.pdf
125
British Journal of Mathematical and Statistical Psychology (2002), 55, 125–143
© 2002 The British Psychological Society
www.bps.org.uk
Cross-validation by downweighting in• uential cases in structural equation modelling
Ke-Hai Yuan1 *, Linda L. Marshall2 and Rebecca Weston2
1University of Notre Dame, USA
2University of North Texas, USA
In the social and behavioural sciences, structural equation modelling has been widely used to test a substantivetheory or causal relationship among latent constructs. Crossvalidation (CV) is a valuable tool for selecting the best model among competing structural models. In• uential cases or outliers are often present in practical data. Therefore, even the correct model for the majority of the data may not cross-validate well. This paper discusses various drawbacks of CV based on sample covariance matrices, and develops a procedure for using robust covariance matrices in the model calibration and validation stages. Examples illustrate that the CV index based on sample covariance matrices is very sensitive to in•uential cases, and even a single outlier can cause the CV index to support a wrong model. The CV index based on robust covariance matrices is much less sensitive to in•uential cases and thus leads to a more valid conclusion about the practical value of a model structure.
1. Introduction
Structural equation modelling (SEM) is one of the most popular methods in multivariate analysis, with extensive applications in the social and behavioural sciences (Bentler & Dudgeon, 1996). The advantage of SEM is that manifest variables, latent variables and measurement errors can be modelled and tested simultaneously. However, due to the complexity of the structural relationship and measurement errors, an SEM model is, at best, only an approximation of the real world. When the sample size is large, most models will be rejected because of the high power of the commonlyused goodness-of-Žt chi-square test statistics. On the other hand, when the same data are used for both model estimation and testing, a non-signiŽcant chi-square statistic does not necessarily imply
* Requests for reprints should be addressed to Ke-Hai Yuan, Dept of Psychology, University of Notre Dame, Notre Dame, IN 46556, USA (e-mail: kyuan@nd.edu).
126 Ke-Hai Yuan et al.
that the model will Žt a new sample equally well. Rather than using the current sample for both model estimation and testing, prudent researchers should use cross-validation (CV) for model assessment. In addition to determining how the model would Žt a different sample, CVprevents the misuse of empirical model modiŽcation in SEM, which is a universal practice (Kaplan, 1991; MacCallum, Roznowski, & Necowitz, 1992) facilitated by the modiŽcation index in LISREL (Jo¨ reskog & So¨ rbom, 1993) and the Lagrange multiplier test in EQS (Bentler, 1995).
CV was Žrst studied as a valuable tool for evaluating prediction errors by Stone (1974). Its relationship with other model selection procedures was studied in Stone (1977) and Li (1987). Bentler (1980) discussed various CV strategies for causal modelling. Cudeck and Browne (1983) proposed the use of CV for selecting among competing models. Let sample c be the calibration sample with sample covariance matrix Sc and sample u be the validation sample with sample covariance matrix Su. For a structural model S(u), let uˆ c be the parameter value of u that minimizes FML(Sc , S(u)), where
F |
(S |
, S(u)) = |
tr[S |
S± 1(u)] ± |
log | S |
S± 1(u)| ± |
p |
(1) |
ML c |
|
c |
|
c |
|
|
|
|
is the Wishart likelihood function. The CV index |
|
|
|
|
||||
|
|
|
|
|
ˆ |
|
|
(2) |
|
|
|
t = FML(Su, S(uc )) |
|
|
corresponds to the tight replication strategy in Bentler (1980). This is the discrepancy
between Su and |
ˆ |
ˆ |
measured by FML. Among competing models (e.g., the |
Sc = |
S(uc ) |
independence model, the saturated model and models supported by substantive theory), the best model should lead to the smallest CV index (Cudeck & Browne, 1983). AjustiŽcation for a model selection procedure based on CVis that a useful model should be applicable to different samples within the same population. In practice, often onlyone sample is available. One suggestion in such a situation is to split the sample into two, with one half (the calibration sample) used for model estimation and the other half (the validation sample) used for validation (Bentler, 1980). Browne and Cudeck (1989) also proposed a CV index, based on a single sample, which is closely related to the AIC criterion (Akaike, 1973; 1987; Bozdogan, 1987). Using extensive simulation with factor models, Bandalos (1993) studied factors that in•uence various CV indices.
In the social and behavioural sciences, surveys are the most common method of data collection. This methodology results in outliers and non-normal data. Classical methods for SEMare developed under the assumption of multivariate normal data (Bollen, 1989; Jo¨ reskog, 1969). However, practical data sets rarely meet this assumption. Among 440 large-sample studies taken from journal articles, research projects, and tests using achievement and psychometric measures, Micceri (1989) reported that all data were signiŽcantly nonnormally distributed. Yet, methods based on normal theory such as maximum likelihood (ML) and generalized least squares are frequently used to analyse these data sets (Breckler, 1990). The sample covariance matrix is very sensitive to in•uential cases or outliers (Hampel, Ronchetti, Rousseeuw, & Stahel, 1986; Huber, 1981; Wilcox, 1997). Therefore, when samples contain outliers or come from a distribution with heavy tails, two sample covariance matrices based on samples from the same population may be quite different. Consequently, the CV index in (2) can still be substantial even when S(u) is approximately correct. This phenomenon is clearly illustrated in MacCallum et al. (1992). Actually, model selection based on (2) is more vulnerable to bad data than the commonly used Žt indices or chi-square test statistics. This is because in•uential cases in either the calibration sample or the validation sample
Cross-validation in structural equation modelling 127
will render model selection based on (2) meaningless. Although this seems a simple fact, social scientists and even psychometricians have not fully realized the danger of trusting the result of using (2) for model selections. This is evidenced by numerous articles emphasizing the importance of CV (see Anderson & Gerbing, 1988; Bentler, 1980; Breckler, 1990; Browne &Cudeck, 1989; Cudeck &Browne, 1983; MacCallum &Austin, 2000; MacCallum et al., 1992) and the single sample based CV index implemented in LISREL (Jo¨ reskog & So¨ rbom, 1993), one of the most widely used programs. However, there is not a single paper mentioning the risk of using the CV index when outliers or in•uential cases exist in the samples involved. Our purpose here is to demonstrate the drawbacks of using CV based on sample covariance matrices and propose a more appropriate procedure.
For a data set from a distribution with heavytails, robust estimation of the population covariance (or dispersion) matrix has been studied by a variety of authors in the statistical literature (Huber, 1977; Maronna, 1976; Tyler, 1983) and also widely applied to data analysis (Ammann, 1989; Birch & Myers, 1982; Campbell, 1980, 1982; Devlin, Gnanadesikan, & Kettenring, 1981; Gabriel & Odoroff, 1984; Heiser, 1987; Kharin, 1996; Lange, Little, & Taylor, 1989; Verboon & Heiser, 1994; Yuan & Bentler, 1998a, 1998b). In contrast to the sample covariance matrix where every case gets equal weight, in a robust covariance matrix estimator each gets its weight according to its position in the data space. The further awaya case is from the majority of the data, the less weight it has. As the most in•uential cases get the smallest weights, their effect on the covariance matrix estimator is controlled. Therefore, a robust covariance matrix estimator is generally more efŽcient if the data set is from a distribution with heavy tails. If a data set contains outlying cases, a robust covariance matrix estimator is much less sensitive to these outliers. When replacing the sample covariance matrix by a robust counterpart in SEM these desirable properties will be inherited by parameter estimates (Yuan, Bentler, & Chan, 2002). Because the purpose of CV is to identify a valuable model as determined by the majority of the data, using Sc and Su, which are less sensitive to outliers, is especially important for CV.
SpeciŽcally, we demonstrate the danger of using CV based on sample covariance matrices and discuss its non-robust nature in Section 2. In Section 3, we propose to replace the sample covariance matrices by robust estimates of the corresponding covariance matrices in the CV procedure for SEM. Some necessary conditions for this new procedure are also discussed. In Section 4, this procedure is evaluated using several examples. The procedure is then applied to a data set on family relationships of lowincome women in Section 5. Finally, limitations and topics for future research are discussed.
2. Drawbacks of cross-validation index based on sample covariance matrices
When a practical data set contains in•uential cases, these can be classiŽed into two categories. In the Žrst category, the data cloud spreads out continuously in various directions. In such a situation the in•uential cases are due to the heavy tails of the sampling distribution. The sample covariance matrix is not the most efŽcient estimator of the population covariance matrix if the sampling distribution has heavy tails (Tyler, 1983). For example, when data follow a multivariate t distribution with two degrees of freedom, the sample covariance matrix does not converge. In the second category, the in•uential cases are just a few isolated points, which are commonly called outliers

128 Ke-Hai Yuan et al.
Figure 1. Regression (a) for normal data with r = 0; (b) for normal data with r = 0 and one outlier.
(Barnett & Lewis, 1994). In such a situation, because the in•uence function is quadratic (Hampel, 1974), the sample covariance matrix can assume any position. In either of these situations, conclusions based on the CV index in (2) can be misleading, as discussed below.
Cross-validation in structural equation modelling 129
When the calibration sample, the validation sample, or both possess heavy tails, the two sample covariance matrices may not necessarilybe near each other, even when the two corresponding population covariance matrices are identical. This phenomenon may occur when sample sizes are not large enough, if the tails of the samples are moderately heavy. However, when the tails are heavy enough, regardless of how large the sample sizes are, the two sample covariance matrices Sc and Su can still be far apart. It is obvious that the CVindex in (2) will inherit the properties of the sample covariance matrices. In such a case, t will not predict model validity of S(u). Asmaller t indicates either of the following two situations: the model structure is approximately correct because of favourable •uctuations; or the model structure is not of much practical value but the two sample covariance matrices happen to be near each other. A parallel conclusion can be reached for a larger t.
The worst situation arises when heavy tails of a data set are created by outliers. As demonstrated in Devlin et al. (1981) and Yuan and Bentler (1998b), when a sample contains a single extreme outlier, regardless of the underlying structure of the majority of the data, classicalprocedures for principalcomponents orfactoranalysis mayonlyidentify one component or factor which is decided by the outlying case. With CV, if either the calibration sample or the validation sample contains an outlier, the t in (2) can become useless. This can be illustrated through the Pearson correlation coefŽcient in Fig. 1, where Fig. 1(a) is based on a sample from a normal population with r = 0 andFig. 1(b) is based on an independent sample from the same population but with one outlier. The sample correlation coefŽcients are respectively ra = 0.092 and rb = 0.438, and we would not expect a correlation with a magnitude of about 0.438 to be obtained in other samples. When both the calibration and validation samples contain an extreme outlier in the same direction, regardless of the structure of the majority of the data in the two samples, the factor model decided by the outliers will be the best model when judged by (2). The problem is more complicated and can be more serious with multiple outliers, as illustrated in Section 4. Outliers can also lead to paths that are not supported by substantive theory with the commonly used model modiŽcation procedures in popular software.
We demonstrate in later sections that, regardless of what types of in•uential cases exist, once they are downweighted the robust covariance matrix will be decided by the majorityof the data cloud in a sample. Thus, two robust covariance matrix estimates will be more comparable if they represent the same population covariance matrix. Consequently, the resulting CVindex re•ects the model structure decided by the majority of a data set rather than that decided by a few in•uential cases. This is the desired property of t in (2) when using it to rank competing models.
3. Cross-validation based on robust covariance matrices
Robust estimation originates from modelling data by ML by Žnding a proper density function (Huber, 1964). Since data sets, in practice, may not follow multivariate normal distributions, various attempts to generate multivariate non-normal distributions have been made (Fang, Kotz, & Ng, 1990; Olkin, 1994; Yuan & Bentler, 1999). Among generalizations of multivariate non-normal distributions, the class of elliptical distributions has been well studied and found applicable in many different disciplines (Kano, Berkane, & Bentler, 1993; Lange et al., 1989; Little, 1988). The density of a p-variate elliptical distribution is given by
f (x ) = | S| ± 1/2hf (x ± m)¢S± 1(x ± m)g , |
(3) |

130 Ke-Hai Yuan et al.
where h(·) is a scalar function that does not depend on m and S. The multivariate normal distribution corresponds to h(r) = (2p)± p/2 exp(± r/2). By choosing different h(·), a variety of distributions with heavier or lighter tails than those of a normal distribution can be obtained (Fang et al., 1990). An important point is that the sample covariance matrix is not the most efŽcient estimate of S unless the distribution is normal. Several methods have been proposed to estimate m and S within the class of elliptical distributions. For a sample x 1, . . . , x n , let
d(x i , m, S) = [(x i ± m)¢S± 1(x i ± m)]1/2
be the Mahalanobis distance and u1(t) and u2(t) be non-negative scalar functions. Maronna (1976) deŽned robust M-estimates of m and S by solving the equations
|
|
|
m = |
Sin= 1u1f d(x i , m, S)g x i |
|
|
|
|
(4a) |
|
|
|
Sin= 1u1f d(x i , m, S)g |
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
and |
|
|
|
|
|
|
|
|
|
|
|
1 |
n |
|
|
|
|
|
|
|
S = |
n Xi= 1 u2f d2(x i, m, S)g (x i ± m)(x i ± m)¢. |
|
|
|
(4b) |
|||
If the sample is from |
distribution (3) and we choose |
Ç2 |
)/h(t |
2 |
) |
and |
|||
u1(t) = ± 2h(t |
|
||||||||
|
Ç |
|
Ç |
|
|
|
|
|
|
u2(t) = ± 2h(t)/h(t), where |
h(·) is the derivative of h(·), we obtain the ML estimates mˆ |
||||||||
and |
ˆ |
|
|
|
|
|
|
|
|
S. Unfortunately, a data set may not exactly follow an elliptical distribution, and we |
may not know the form of h(·) even if it does. With such a limitation, a varietyof weight functions (Campbell, 1982; Hampel, 1974; Huber, 1977; Maronna, 1976) have been proposed especiallyfor modelling data sets from distributions with heavytails. Different weight functions lead to different estimates. Prudent choice of a weight function will result in a more efŽcient estimate of the covariance matrix, and this efŽciency will also be re•ected in the covariance parameter estimates in a structural model (Tyler, 1983; Yuan et al., 2002). Among the many weight functions, that of Huber type is commonly used to model practical data sets with heavy tails (Campbell, 1980, 1982; Devlin et al., 1981; Yuan & Bentler, 1998a, 1998b). The Huber-type weight function is given by
u1(d) = |
» r/d, |
if d > r |
(5) |
|
1, |
if d #r, |
|
and u2(d2) = f u1(d)g 2/b (Tyler, 1983), where r2 is given by P(xp2 > r2) = |
a, a is the |
proportion of outliers one wants to control assuming the massive data cloud follows a p- variate normal distribution, and b is a constant such that Ef x2pu2(x2p)g = p which makes
the corresponding estimate ˆ unbiased if x ,N . Notice that only the tuning
S i (m, S)
parameter a needs to be decided in applying the Huber-type weight function; r and b are just functions of a.
In addition to the Huber-type weight function, there are a variety of other types of weight functions (see Table 11-1 of Hoaglin, Mosteller, & Tukey, 1983). Examples in Yuan and Bentler (1998a, 1998b) indicate that the difference among the various weight functions is minimal if proper tuning parameters are chosen. With the tuning parameter a, the Huber-type weight function is very •exible in controlling the percentage of cases to be downweighted. Arecent study by Yuan et al. (2002) found that the most efŽcient parameter estimates in SEM models are often associated with the Huber-type weight function for several practical data sets. Based on these considerations, we will onlyapply the Huber-type weight function in our examples. Applications with other types of

Cross-validation in structural equation modelling |
131 |
weights can be performed in essentially the same way. The solution to (4) can be obtained through the iterative process
|
|
m(t+ 1) = |
Sin= 1u1f d(x i , m(t), S(t))g x i |
, |
(6a) |
|
|
Sin= 1u1f d(x i , m(t), S(t))g |
|||
|
|
|
|
|
|
and |
|
|
|
|
|
|
1 |
n |
|
|
|
S(t+ 1) = |
n Xi= 1 u2f d2(x i , m(t), S(t))g (x i ± m(t))(x i ± m(t))¢. |
(6b) |
until convergence. The estimation procedure in (6) is called the iteratively reweighted least squares algorithm; convergence properties of this algorithm can be found in Holland and Welsch (1977) and Green (1984). The sample mean x¯ and sample covariance matrix S can be used as starting values for this process, which works satisfactorily in our empirical experience with a variety of data sets.
Let x 1, x 2, . . . , x n follow an elliptical distribution as in (3) and Sn = ˆ be a robust
S
covariance matrix estimator. Sn generally does not converge to the population covariance matrix. Instead, it converges to a constant times the population covariance matrix, kS. Because of this, terms such as ‘dispersion matrix’ or ‘scatter matrix’ are sometimes used. The positive scalar k depends on the weight functions u1(t) and u2(t) used in the estimation process as well as on the unknown underlying distribution of the data. As discussed below, we do not need to estimate k in order to apply a robust procedure.
A robust covariance matrix Sn is generally different from the sample covariance matrix S. The estimator uˆ that minimizes FML(Sn , S(u)) is also different from that which minimizes FML(S, S(u)). We need a condition introduced by Browne (1984) to clarifythe difference. Acovariance structure S(u) is invariant under a constant scaling factor (ICSF) if, for any parameter vector u and positive constant k, there exists a parameter vector u such that S(u
) = kS(u). Therefore, if a structural model S(u) is ICSF, the model that holds for a covariance matrix S also holds for a rescaled version of the covariance matrix, kS. As noted by Browne (1984), almost all models for covariance matrices in current use are ICSF; all the models used in the next section are ICSF. With an ICSF model, even though uˆ based on S is generally different from that based on Sn , the test statistic for model inference will be exactly the same, i.e., FML(S, S(uˆ )) = FML(Sn , S(uˆ )) when Sn and S are proportional. Because of sampling errors, a robust estimator Sn is generally not a rescaling of the sample covariance matrix S. However, the population counterparts of Sn and S will be proportional if the sample is from (3). This implies that statistical conclusions based on S and Sn will be the same within the class of elliptical distributions. Similarly, the constant k will be the same in the population quantities of Su and Sc if they are estimated by the same weighting scheme (e.g., the same a in the Huber-type weight function). Consequently, the effect of k will be cancelled because the
CV index involves both S and ˆ . Thus, whatever is used in the Huber-type weight
v Sc a
function, the t will estimate the same population quantity. The difference among ts by different downweighting schemes only re•ects sampling errors and the effect of in•uential cases or outliers.
The class of elliptical distributions has been used in virtuallyall robust procedures as a working assumption. But this assumption is seldom checked in practice (Birch & Myers, 1982; Campbell, 1980, 1982; Devlin et al., 1981; Lange et al., 1989; Yuan & Bentler, 1998a, 1998b). In order to check this assumption properly, we use the multivariate skewness and kurtosis to evaluate the elliptical symmetry of a data set.

132 Ke-Hai Yuan et al.
Let x 1, . . . , x n be a p-variate sample. Mardia (1970, 1974) developed two statistics for measuring the skewness and kurtosis of the sample, which are respectively
|
|
n |
1(x j ± x¯ )g 3 |
b1, p = n2 iX, j= 1f (x i ± x¯ )¢S± |
|||
|
1 |
|
|
and |
|
|
|
|
1 |
n |
|
b2, p = |
n Xi= 1 f (x i ± x¯ )¢S± 1(x i ± x¯ )g 2, |
where x¯ |
is the sample mean vector and S is the sample covariance matrix. Notice that |
||
b2, p |
is just the average of the fourth power of the Mahalanobis distances d (x i , x¯ , S), and |
||
b1, p |
is |
the |
average of the third power of the cross-products zi¢zj , where |
zi = |
S± 1/2(x i ± |
x¯ ) is the vector of standardized scores. When the sample is from a |
multivariate normal distribution, the asymptotic distributions of the normalized versions of these two statistics were given by Mardia (1970) as
|
nb1, p |
2 |
|
p( p + 1)( p + 2) |
|
|
||
M1 = |
|
|
,xk, |
k = |
|
|
, |
(7a) |
6 |
|
|
6 |
|||||
and |
|
|
|
|
|
|
|
|
M2 = |
b2, p ± p( p + 2) |
,N(0, 1). |
|
(7b) |
||||
f 8p( p + 2)/Ng 1/2 |
|
A signiŽcant M1 may indicate a departure from symmetry, while a signiŽcant M2 may indicate heavier tails of the sampling distribution than those of a normal distribution. It is necessary to assume an elliptical distribution because the matrix being estimated by a robust covariance matrix is proportional to the population covariance matrix. However, even though the population is elliptically symmetric, a few outliers can cause M1 to be highly signiŽcant. In such an instance, the sample covariance matrix will be a biased estimate of the population covariance matrix. We have to remove these outliers or downweight their effect in order to achieve an accurate analysis. In this paper, we apply the downweighting method for handling outliers. Compared with the removal approach for outliers, the merit of downweighting was discussed in detail by Rousseeuw and van Zomeren (1990).
4. Examples
Four examples are presented in this section.1 The Žrst two are based on real data sets, the third is based on a simulated data set, and the Žnal one is based on a real data set with some artiŽcial outliers. Since only one sample is available for each of these data sets, we have to split the samples for CV. So it is reasonable to assume that Sc and Su represent the same population covariance matrix. Our purpose is to contrast the difference between the CV index based on sample covariance matrices and that based on robust covariance matrices. There are various ways to divide the cases. Random splitting is commonly recommended. To allow others to verify and compare these results, we divided the cases according to their order in the original data sets. One subsample contained
1SAS IML programs for these examples can be obtained from the authors or at http://www.nd.edu/ ,kyuan/cross-validation/
Cross-validation in structural equation modelling |
133 |
even-numbered cases. The other subsample contained odd-numbered cases. We applied the Huber-type weight function to each data set. Notice that the covariance matrix estimator corresponding to a = 0 is the sample covariance matrix. In addition to a = 0, we choose a = 0.05, 0.10, . . . , 0.50 for each application. Based on the assumption that the two subsamples are drawn from the same population, the a is kept the same in applying the downweighting procedure to each of the subsamples. There are 11 CV indices to compare for each model. We use tOE for the CV index in (2) when the odd-numbered cases are used as the calibration sample, tEO for the CV index when the even-numbered cases are used as the calibration sample, and t = (tOE + tEO)/2.
Example 1
Our Žrst data set is from Holzinger and Swineford (1939). This classic data set consists of 24 cognitive variables from 145 subjects. Jo¨ reskog (1969) used 9 of the 24 variables in studying the correlation structure with the normal theory MLmethod. We used the same nine variables: (1) visual perception, (2) cubes, (3) lozenges, (4) paragraph comprehension, (5) sentence completion, (6) word meaning, (7) addition, (8) counting dots, (9) straight-curved capitals. In Holzinger and Swineford’s (1939) study, variables 1, 2 and 3 were designed to measure spatial ability, variables 4, 5 and 6 were designed to measure verbal ability, and variables 7, 8 and 9 were speed tests. Let x represent the nine variables; thus the conŽrmatory factor model
|
|
|
|
|
x = |
m + Lf + e , |
|
Cov(x ) = |
|
LFL¢ + W, |
|
|
|
(8a) |
||||
with m = E(x ), |
0 |
1.0 l52 |
l62 |
0 |
0 |
0 |
|
1 , F = |
0 f11 |
f12 |
f13 1, |
|||||||
L = |
0 0 |
0 |
|
|||||||||||||||
|
B |
1.0 |
l21 |
l31 |
0 |
0 |
0 |
0 |
0 |
0 |
|
¢ |
|
f |
f |
f |
C |
|
|
0 |
0 |
0 |
0 |
0 |
0 |
1.0 l |
l |
|
|
C |
B f21 |
f22 |
f23 |
||||
|
@ |
|
|
|
|
|
|
|
|
|
|
|
A |
@ |
|
|
|
A |
|
B |
|
|
|
|
|
|
|
83 |
|
93 |
C |
B |
31 |
32 |
33 |
C |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(8b) |
represents the hypothesis of the original design. We assume the measurement errors are uncorrelated with W = Cov(e ) being a diagonal matrix. In addition to model (8b), Jo¨ reskog (1969) proposed a variety of alternative models. One of these models is
L = |
0 0 |
0 |
0 |
1.0 l52 |
l62 |
0 |
0 |
0 |
1 , F = |
0 f21 |
f22 |
f23 1, |
||||||
|
|
1.0 |
l21 |
l31 |
0 |
0 |
0 |
0 |
l81 |
l91 |
¢ |
|
f11 |
f12 |
0 |
|
||
|
B |
0 |
0 |
0 |
0 |
0 |
0 |
1.0 |
l |
83 |
l |
93 |
C |
B |
0 |
f |
f |
C |
|
B |
|
|
|
|
|
|
|
|
|
C |
B |
|
32 |
33 |
C |
||
|
@ |
|
|
|
|
|
|
|
|
|
|
|
A |
@ |
|
|
|
A |
(8c)
with the same error covariance matrix W.
Mardia’s statistics for these nine variables are M1 = 0.77 and M2 = 3.04, indicating that the data may come from a distribution with slightly heavier tails than those of a normal distribution when these statistics refer to x2165 and N(0, 1), respectively. Because of heavier tails, a downweighting procedure may lead to a CV index that is less in•uenced by sampling errors. Of substantive interest is whether the downweighting procedure indicates that one model is preferable to the other. The expectation is veriŽed by the CV indices in Table 1. For both models, the smallest tOE is obtained at

134 Ke-Hai Yuan et al.
a = 0.25 while the smallest tEO is obtained at a = 0.15. This suggests that the two subsamples may contain a different number of in•uential cases. Using the average, the smallest CVindex is obtained at a = 0.15 for both the models, with t = 1.062 for model (8b) and t = 0.940 for model (8c). Even though model (8b) is supported by the design theory, all CV indices (t) indicate that model (8c) is preferable. Examining the substantial content of the variables, model (8c) also makes sense because items in tests 8 and 9 may also need spatial ability to increase the speed. This example illustrates that when the distribution of the sample has heavy tails, the downweighting method leads to more comparable S(uˆ c ) and Su, and consequently to a CVindex that gives better support for models based on substantive theory.
Table 1. CV indices for nine-variable psychological data
|
|
|
Model (8b) |
|
|
|
Model (8c) |
|
|
|
|
|
|
|
|
|
|
a |
|
tOE |
tEO |
t |
|
tOE |
tEO |
t |
0.00 |
1.427 |
1.116 |
1.271 |
1.189 |
1.045 |
1.117 |
||
0.05 |
1.285 |
1.029 |
1.157 |
1.061 |
0.991 |
1.026 |
||
0.10 |
1.182 |
0.995 |
1.089 |
0.964 |
0.963 |
0.963 |
||
0.15 |
1.136 |
0.988 |
1.062 |
0.919 |
0.960 |
0.940 |
||
0.20 |
1.128 |
0.998 |
1.063 |
0.914 |
0.978 |
0.946 |
||
0.25 |
1.127 |
1.011 |
1.069 |
0.912 |
1.000 |
0.956 |
||
0.30 |
1.142 |
1.035 |
1.089 |
0.927 |
1.031 |
0.979 |
||
0.35 |
1.162 |
1.075 |
1.118 |
0.946 |
1.076 |
1.011 |
||
0.40 |
1.191 |
1.127 |
1.159 |
0.974 |
1.129 |
1.051 |
||
0.45 |
1.227 |
1.190 |
1.208 |
1.009 |
1.191 |
1.100 |
||
0.50 |
1.259 |
1.241 |
1.250 |
1.043 |
1.240 |
1.141 |
||
|
|
|
|
|
|
|
|
|
Example 2
Our second example was based on the industrialization and political democracy panel data introduced by Bollen (1989). This data set consists of eight political democracy variables y = ( y1, . . . , y8)¢ and three industrialization variables x = (x1, x2, x3)¢ in 75 developing countries. The variables y1 to y4 are indicators of political democracy in 1960, and y5 to y8 are the same indicators measured in 1965. Assuming that political democracy in 1965 is predicted by political democracy in 1960, and both are further predicted by 1960 industrialization, Bollen (1989) proposed the model
x = mx + Lx y + d, |
|
y = my + Ly h + e, |
|
|
(9a) |
||||||
and |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
h = Bh + Gy + z, |
|
|
|
|
(9b) |
|||
where mx = E(x ), my = E(y ), |
Ly = Á0 |
03 |
04 |
05 |
1 l3 |
l4 |
l5 |
! , (9c) |
|||
Lx = ( 1 l1 |
l2 )¢, |
||||||||||
|
|
|
|
1 |
l |
l |
l |
0 0 |
0 |
0 |
¢ |
B = |
Áb |
|
0 !, G = |
Ág11 !, |
|
|
|
|
|
(9d) |
|
|
0 |
0 |
|
g |
|
|
|
|
|
|
|
|
|
21 |
|
|
21 |
|
|
|
|
|
|