Словари и журналы / Психологические журналы / p41British Journal of Mathematical and Statistical Psycholog
.pdf
41
British Journal of Mathematical and Statistical Psychology (2002), 55, 41–62
© 2002 The British Psychological Society
www.bps.org.uk
Fitting structural equation models using estimating equations: A model segregation approach
Ke-Hai Yuan1 * and Wai Chan2
1University of Notre Dame, USA
2The Chinese University of Hong Kong, Hong Kong
Problems such as improper solution, non-convergence, subsets of variables having different distribution, and latent variables with single indicators are common in the practice of structural equation modelling. In such cases, it may be feasible to Ž x some model parameters at prespeciŽ ed values while concentrating on estimating some other parameters. This paper formulates such a model Ž tting process through a model segregation approach. The statistical properties of this procedure are studied using the theory of estimating equations and optimal estimating functions. The dependency of the new parameter estimates on those of the prespeciŽ ed parameter estimates is characterized for several commonly used estimating equations. A rescaled model Ž t statistic is proposed. Examples illustrate various applications of this procedure.
1. Introduction
Structural equation modelling (SEM), one of the most popular methods in multivariate analysis, has been extensively applied in the social and behavioural sciences, education, econometrics, medicine and the biological sciences. In an SEM model, the relationship among observed variables is formulated through unobserved latent constructs. Because measurement errors are explicitly accounted for, results based on SEM are generally more accurate than those based on regression or analysis of variance. In particular, a substantive theory or causal relationship among the latent constructs formulated through path diagrams can be tested by SEM. Classical approaches to SEM are based on the multivariate normal distribution assumption ( Jo¨ reskog, 1969; Browne, 1974;
*Requests for reprints should be addressed to Ke-Hai Yuan, Dept of Psychology, University of Notre Dame, Notre Dame, IN 46556, USA (e-mail: kyuan@nd.edu).
42 Ke-Hai Yuan and Wai Chan
Bollen, 1989). Because of the complexity of real-world data, various methods for SEM have been developed that do not require the normality assumption (e.g., Browne, 1984; Bentler & Dijkstra, 1985; Satorra, 1992; Satorra & Bentler, 1988; Kano, Berkane, & Bentler, 1993; Yuan & Bentler, 2000). These methods lead to better parameter estimation and model evaluation when a data set does not follow the normal distribution assumption. For example, one may use the asymptotically distribution-free (ADF) method (Browne, 1984; Yuan & Bentler, 1997a) for model estimation and evaluation when a data set does not follow a normal distribution. However, there still exist various issues that are difŽcult to deal with in the SEM literature.
One practical issue concerns improper solutions or Heywood cases (negative error variance estimates). In the classical example summarized in Jo¨ reskog (1967), factor models for 9 out of 11 classical data sets possess improper solutions. There are various causes of improper solutions. Van Driel (1978) identiŽed three major ones: sampling •uctuations combined with small true values of error variances; model misspeciŽcation; and indeŽniteness of the model. A Monte Carlo study conducted by Anderson and Gerbing (1984) with correct models found that 24.9% of replications had improper solutions. One approach to improper solutions is to remove the variables corresponding to negative estimates of error variances (Lawley&Maxwell, 1971). This may not be ideal because of the loss of essential information provided by these marginals. In the context of unrestricted factor analysis, Martin and McDonald (1975) proposed a Bayesian approach to overcome Heywood cases. However, it is unclear how to obtain standard errors of parameter estimates and how to test the overall model adequacy with this approach.
Another common issue arises when a multivariate data set consists of marginals which are jointly normal and marginals which are jointly far away from normal. In such a case one may use the ADF method to Žt the model in order to obtain more efŽcient parameter estimators. It is known that parameter estimators based on the ADF method are at least as efŽcient as those based on the normal theory maximum likelihood (ML) when sample sizes are large enough. For small to medium-sized samples, however, empirical studies indicate that the maximum likelihood estimators (MLEs) are more efŽcient, especially when data are approximately normal (Yuan & Bentler, 1997b). In order to obtain more efŽcient parameter estimates one would like to estimate parameters in the submodel corresponding to the nearly normal variables by ML and those corresponding to marginals that are far away from normal by the ADF method. However, it is not clear how to combine the separate estimation procedures in evaluating the overall model structure and in estimating the parameters that describe the relations among the different sets of variables.
The third problem occurs when only a single indicator is available for a latent construct. If the reliability coefŽcient of this indicator is known, then the measurement error variance may be replaced by its estimate and treated as a constant in subsequent analysis. However, as the estimated variance is still random, it is not clear how treating a random number as a constant affects the other parameter estimates and the evaluation of the overall model in the subsequent analysis.
The last issue arises when convergence cannot be reached, and this is probably the most challenging issue facing many SEM users. The main causes of non-convergence are small sample sizes associated with large models and when the model does not Žt data. Even for correct models in the Monte Carlo study a certain proportion of non-convergences still exists (Anderson & Gerbing, 1984; Yuan & Bentler, 1997a). The problem is that, without convergence, one has no way of judging the causes
|
Fitting structural equation models |
43 |
that lead to the non-convergence. In such a |
situation it is important to manage |
|
a solution in order to obtain any information |
on whether the model Žts the |
data |
or not. |
|
|
As we shall see, all the above issues can be tackled by means of the approach of estimating equations with nuisance parameters. Suppose we have a covariance structure S = S(b). Somehow, b contains a subvector which either has already been estimated or needs to be estimated separately. Let this vector be g, and b = (u¢, g¢)’. Now our problem becomes that of how to estimate u in S = S(u, g). Because an estimate gˆ is already available, one faces the problem of minimizing a discrepancy function
F |
[S, S(u, gˆ )] = tr[S± 1 |
(u, gˆ )S] ± log | S± 1 |
(u, gˆ )S| ± |
p |
(1) |
ML |
|
|
|
|
|
in the case of maximum likelihood. Of course, the ADF discrepancy function (Browne, 1984) can also be chosen here. The vector gˆ in (1) may be a consistent estimate of a set of error variances that are improper if solving u and g simultaneously in the conventional approach. It may be a set of interesting model parameter estimates in a separate model estimation due to marginal variables that are far away from normal. With a single indicator, gˆ will be the error variance estimate when a reliability coefŽcient of the indicator is available. In our experience, non-convergence generally occurs with a large model of many free parameters or when the sample size is not large enough relative to the model size. When convergence cannot be reached, one may be able to achieve it by Žtting the segregated smaller models based on different marginals. For example, with a nine-variable, three-factor model, one may Žt three three-variable models for the nine factor loadings and nine error variances. In this case, the gˆ will be these parameter estimates. The parameter vector u will contain only the three factor correlations. Because four smaller models are Žtted it is very likely that one will achieve convergence even when the simultaneous estimation for u and g does not converge.
The above procedure of Žtting S(u, g) with g estimated separately from u is also related to the idea of Žtting measurement and structural models separately as recommended by Anderson and Gerbing (1988). In the same paper, the authors also recommended unidimensionality of measurement. In such a case, g may be the vector of parameters in the measurement model (factor loadings and error variances) obtained in the Žrst stage, and u will be the vector of parameters in the structural model. In the LISREL notation for SEM (Jo¨ reskog & So¨ rbom, 1993), g will be the vector containing the free parameters in Lx, Ly, Qd and Q«; u will contain the free parameters in B, G, F and W. There are many ways to choose g depending on model Žtting convenience. As will be illustrated in the examples, it is up to the researcher in the context of Žtting a speciŽc model to a given data set.
Our aim is to study how the statistical properties of uˆ are related to those of gˆ in this model segregation approach, though we are also interested in how to evaluate the overall model structure. Actually, uˆ = uˆ (gˆ ) is a function of gˆ , and this is characterized through estimating equations with nuisance parameters in Section 2. How the efŽciency of gˆ in•uences that of uˆ in three commonly used discrepancy functions will be studied in Section 3. Section 4 will study the effect on the commonly used test statistics of treating gˆ as known. Several applications will be presented in Section 5. A simulation example in Section 6 contrasts different procedures when the overall model structure is correct. Some concluding remarks will be given in Section 7.
44 Ke-Hai Yuan and Wai Chan
2. Estimating equation with nuisance parameters
Let F[S, S(u, gˆ )] be the discrepancy function which is minimized to give an estimate uˆ . Taking the derivative of F with respect to u, one obtains
ˆ |
0, |
(2) |
Gn (S, u, gˆ ) = |
where the subscript n is used to indicate that the function Gn depends on the sample size n. The function Gn (S, u, g) is called an estimating function by Godambe (1960). Equation (2) is generallyreferred to as an estimating equation with nuisance parameters (Liang & Zeger, 1995). As mentioned in Section 1, obtaining gˆ by a method other than solving (2) is preferable to solving (u, g) simultaneously. Yuan and Jennrich (2000) studied the distribution of uˆ and its relationship to that of gˆ under quite general conditions. Based on the Žndings of Yuan and Jennrich, this section gives a brief introduction to a result that is relevant to the study of the distribution of uˆ in the next section. We also present some new results regarding the asymptotic covariance matrix of uˆ (gˆ ) in the SEMcontext. We use a dot on top of a function to imply a derivative, and when the function contains more than one argument we use a subscript to denote the corresponding partial derivative; the argument of a function is omitted if evaluated at the population value b0 = (u¢0, g¢0)¢. Let
A = lim Gnu(S, u0, g0), |
B = lim Gng(S, u0, g0). |
Ç |
Ç |
n !¥ |
n !¥ |
As we shall see, we need to use part of the data to estimate g0. Actually, gˆ is an implicit
function of S. When both Gn |
and |
this |
function are continuously |
differentiable, |
|
Gn (S, u0, g0) and gˆ will be jointly asymptotically normal. This leads to |
|
||||
n³ |
Gn(S, u0, g0) |
´!L N(0, V), |
(3) |
||
gˆ ± g0 |
|
||||
p • |
|
|
|
|
|
where |
|
|
|
|
|
|
|
V11 |
|
V12 |
|
|
V = |
³V21 |
|
V22 ´. |
|
Assuming (3), then Theorem 1 of Yuan and Jennrich (2000), together with their equations (2.3) and (2.4), implies that
where Q = A± 1PA¢± 1 and |
pn•(uˆ ± u0) !L N(0, Q), |
(4a) |
|
P = V11 + BV21 + V12B¢ + BV22B¢. |
(4b) |
It follows immediately from (4) that if B = 0, then the asymptotic distribution of gˆ does p •
not in•uence the asymptotic distribution of uˆ as long as gˆ is n-consistent for g0, so knowing gˆ is equivalent to knowing g0. Yuan and Jennrich (2000) discussed various contexts for B = 0. For the issues highlighted in this paper, the corresponding B will not be zero in general. When B Þ0, the asymptotic distribution of gˆ does in•uence the asymptotic distribution of uˆ . The effect can be seen from the asymptotic covariance
matrix of uˆ . When Gn(S, u, gˆ ) = FÇMLu[S, S(u, gˆ )] and S is the sample covariance matrix based on a normal sample x 1, . . . , x n , then uˆ corresponds to the pseudo-MLE (Gong &
Samaniego, 1981; Parke, 1986; Kano et al., 1993). Parke (1986) observed that for the pseudo-MLE, V12 = 0 generally. Another case where V12 = 0 is when gˆ is asymptotically efŽcient (Pierce, 1982). In the pseudo-MLEcontext, A = ± V11, and the Q matrix can be
|
Fitting structural equation models |
45 |
simpliŽed to |
|
|
Q = V± 1 |
+ A± 1BV B¢A± 1. |
(5) |
11 |
22 |
|
Because data in the social and behavioural sciences are typically not normal, other discrepancy functions than (1) are also commonly used in the SEM literature. Even for the ML procedure V12 may not be zero because of the non-normality of the data. It is also difŽcult for one to verifywhether the gˆ s that solve the issues highlighted in Section 1 are efŽcient or not. In the following we will study cases of V12 = 0 under less stringent conditions. We will need to borrow some concepts from the literature on estimating functions (see Godambe, 1960; Godambe & Kale, 1991). Let Gn (S, b) be an estimating function associated with estimating b. For example, Gn (S, b) = FÇMLb(S, b) in the context of using the maximum likelihood discrepancy function. Denote CG = E[GÇnb(S, b)]. A function Gn is called optimal if
C± 1Var(Gn )C¢± 1 ± C± 1Var(Gn )C¢± 1 $0 (6)
G G G G
for all the forms of Gn that satisfy E[Gn(S, b)] = 0. Kale (1985) and Small and McLeish (1988) established a one-to-one relationship between the optimal estimating function Gn and the optimal estimate b˜ by solving
Gn (S, b) = 0, |
(7) |
˜ |
satisŽes (6). Estimating |
in the sense that b has the smallest asymptotic variance if Gn |
functions in the SEM context may not enjoy Žnite-sample optimal properties, so we will have to redeŽne CG by CG = limn !¥ E[GÇnb(S, b)], Var(Gn ) by the asymptotic covariance matrix of Gn, and E[Gn(S, b)] = 0 by limn !¥ E[Gn (S, b)] = 0 for an asymptotically optimal estimating function.
Let h (b) be a function of b and h¯ (S) be its consistent estimate. A key result here is that if Gn (S, b) is an asymptotically optimal estimating function, then it is asymptotically uncorrelated with h¯ (S). This can be easily proved since, otherwise, we can construct another estimating function
Gn (S, b) = Gn(S, b) + aK[h¯ (S) ± h (b)],
which will violate (6) if a is chosen properly, where K is any given non-singular matrix of proper dimension. A similar technique is used in the proof of Theorem 7.3.3 of Casella and Berger (1990, p. 317).
With b = (u¢, g¢)’ in the SEM context, the estimating function in (2) is generally a subvector of Gn (S, b). When gˆ = h¯ (S) is asymptoticallyuncorrelated with Gn (S, b), gˆ is also asymptotically uncorrelated with Gn (S, u0, g0), so V12 = 0 in such a case. When V12 = 0 in the context of estimating equations,
Q = A± 1V11A¢± 1 + A± 1BV22B¢A¢± 1. |
(8) |
The second term on the right-hand side of either (5) or (8) re•ects the cost of estimating the extra parameter g0 in (2). The matrix A may not equal the matrix ± V11 in general. For an optimal estimating function Gn there exists A = ± V11, as will be the case for both the pseudo-MLEand the estimating equation derived from the ADFdiscrepancy function considered in the next section.
Notice that the function h¯ (S) of S being asympototically uncorrelated with Gn (S, b) does not exclude some other kind of nonlinear function of the raw sample being correlated with Gn (S, b). For example, a robust estimate gˆ = h¯ (x 1, . . . , x n ) of g may be correlated with Gn (S, u0, g0) when the sample has heavier tails than those of the multivariate normal distribution.
46 Ke-Hai Yuan and Wai Chan
3. Distribution of uˆ for three estimating functions
The theory of estimating equations can be applied to various contexts (Liang & Zeger, 1995; Yuan &Jennrich, 2000). In this section we will use it to solve various issues in SEM as highlighted in Section 1. The estimating equations to be studied include that derived from the normal theory ML discrepancy function (1) when data are normal and nonnormal; that derived from the generalized least squares (GLS) discrepancy function based on normal theory (Browne, 1974), and that from the ADF discrepancy function. For a p ´ p sample covariance matrix S = (sij), let s = vech(S) be the vector formed by stacking the columns of S, omitting the elements above the diagonal. Let vec(S) be the vector of stacking the columns of S, then there exists a unique p2 ´ p
matrix Dp such that vec(S) = Dp vech(S) (Magnus & Neudecker, 1999), where p
= p( p + 1)/2. We will use s(b) = vech[S(b)]. The estimating functions corresponding to the normal theory MLand GLS discrepancy functions are respectively given by
GML(S, u, gˆ ) = |
21 sÇu¢ (u, gˆ )Dp¢ |
[S± 1(u, gˆ ) Ä S± 1(u, gˆ )]Dp [s ± s(u, gˆ )], |
(9a) |
GGLS(S, u, gˆ ) = |
21 sÇu¢ (u, gˆ )Dp¢ |
(S± 1 Ä S± 1)Dp [s ± s(u, gˆ )]. |
(9b) |
Let y i = vech[(x i ± x¯ )(x i ± x¯ )¢] and Sy be the sample covariance matrix of y i. The estimating function corresponding to the ADF discrepancy function is
GADF(S, u, gˆ ) = sÇ¢ (u, gˆ )S± 1[s ± s(u, gˆ )]. (9c)
u y
It has been proved by Godambe (1960) that score functions are optimal. Consequently GML is asymptotically optimal in estimating u when data are normal, and so is GGLS
because it is asymptotically equivalent to GML. Similarly, because the ADF method leads |
||
˜ |
|
|
to the estimate b with the smallest asymptotic covariance matrix, GADF is also asymp- |
||
totically optimal among estimating functions of the form (9c) when Sy± 1 is replaced by |
||
any positive deŽnite matrices. |
G± 1, with G = Covf vech[(x i ± m)(x i ± |
m)¢]g |
Let W1 = 2± 1Dp¢ (S± 1 Ä S± 1)Dp and W2 = |
||
being the asymptotic covariance matrix of |
ns. Then the matrices A and B for both |
|
|
the normal theory ML and the normal |
|
the estimating functions corresponding top • |
|
|
theory GLS discrepancy functions are given by |
|
|
A = ± sÇu¢ W1sÇu, |
B = ± sÇu¢ W1sÇg; |
(10a) |
and the ones for the estimating equation corresponding to the ADF discrepancy function are given by
A = ± sÇu¢ W2sÇu, B = ± sÇu¢ W2sÇg. |
(10b) |
Notice that matrices A and B only involve the covariance structure of S(b) and do not depend on the distribution of the sample or on that of gˆ once a discrepancy function is chosen. On the other hand, the V matrix in (3) depends on the distribution of the data and on that of gˆ . When the data are normal,
V11 = sÇu¢ W1sÇu |
(11a) |
for all the three estimating equations in (9). When the data are non-normal, |
|
V11 = sÇu¢ W1GW1sÇu |
(11b) |
for (9a) and (9b); and |
|
V11 = sÇu¢ W2sÇu |
(11c) |
for (9c).
Fitting structural equation models |
47 |
For anyof the issues described in Section 1 one has to use part of the data to estimate g. There are various ways of estimating g, depending on the speciŽc context of applications. When a Heywood case occurs in Žtting the overall model simultaneously for uˆ and gˆ , one may Žrst Žt a submodel that contains the parameter with the improper solution. Usually, there is more than one possible choice for this submodel. In this case, the one-factor measurement model with its corresponding indicators, including the problematic item, may serve the purpose. Let x = (x ¢1, x ¢2)¢ represent the observed variables with dimensions p1 and p2 for x 1 and x 2, respectively. When gˆ was obtained based on x 1, we denote the corresponding covariance matrix of x and its corresponding sample counterpart respectively as
S = |
³ S21 |
S22 |
´ |
and S = |
³ S21 |
S22 |
´. |
|
S11 |
S12 |
|
|
S11 |
S12 |
|
Suppose the submodel structure is S11(a) and a contains all the elements of g but mayonly contain a subset of u. Then one can Žt s1(a) = vech[S11(a)] to s1 = vech(S11)
to obtain gˆ . Of course, S11(a) must be identiŽable in order for a unique solution to exist. |
|||||||||
Let W = |
2± 1Dp¢ |
1 |
(S± 1 |
Ä S± 1)Dp |
1 |
when gˆ was obtained by the normal theory MLor the |
|||
1 |
|
11 |
11 |
|
|
ns1) when gˆ was obtained by the ADF |
|||
normal theory GLS approach, and W2 = Acov( |
|||||||||
method, where Acov stands for asymptotic |
covariance matrix. Then it follows from |
||||||||
|
p • |
|
|||||||
equation (7) of Yuan and Bentler (1998a) that |
|
|
|||||||
where |
|
|
|
pn•(gˆ ± g0) = Ppn•(s ± s0) + op (1), |
(12a) |
||||
|
|
|
|
|
|
|
|
|
|
P = H(sÇ¢1W1 sÇ1)± 1sÇ¢1W1L
in the normal theory MLor GLS approach, and
P = H(sÇ¢1W2sÇ1)± 1sÇ¢1W2L
in the ADF approach, with H and L being selection matrices such that g = Ha and
s1 = Ls.
When the data contain marginals that are approximately normal and marginals that are quite non-normal, or when Žtting the whole model simultaneously cannot lead to
convergence, one may Žt the model in parts. Write g = |
(g1¢ , g2¢ , . . . , gJ¢)¢; each gˆ j can be |
||||||||||||
obtained in a similar way to (12 |
|
) with |
(gˆ |
± |
g |
|
) = |
P (s ± |
s |
) + |
|
(1), and we |
|
have |
|
a |
|
pn• j |
|
|
j0 |
|
j pn• |
0 |
|
op |
|
where P = |
(P1¢ , . . . , PJ¢)¢. pn•(gˆ ± |
g0) = |
Ppn•(s ± |
s0) + op (1), |
|
|
|
(12b) |
|||||
With a single indicator x = t + e for the latent construct t, suppose also that the
measurement x |
has reliability r. Then the estimate of error variance is given by |
gˆ = jˆ ee = sxx(1 ± |
r). If the reliability is based on previous research and no information |
about its standard error is available, then one may treat r as a constant and we have
pn•(jˆ ee ± jee) = Ppn•(s ± s0) + op (1), |
(12c) |
where P = (1 ± r)L and L is the selection matrix such that Ls = |
sxx. If the standard error |
of r is available and r ± r0 ,N(0, d2), we can also obtain the distribution of gˆ and the corresponding V matrix in (3). For this we need to assume that d2 is comparable with
48 |
Ke-Hai Yuan and Wai Chan |
|
|
|
|
|
1/n. Denoting the population reliabilitycoefŽcient as r0, then g0 = jxx(1 ± |
r0). We have |
|||||
|
n(gˆ ± g0) = P |
n³sr ± |
r00 ´ |
+ op (1), |
(12d ) |
|
|
p • |
p • |
± |
s |
|
|
|
|
|
|
|
||
where P = ((1 ± r0)L, ± jxx).
With normal data, V12 = 0 for any of the estimating functions in (9). Since the estimating function deŽned in (9c) is asymptotically optimal, in general V12 = 0 for the estimating equation based on the ADF discrepancy function for both normal and non-normal data. When using (12a) to (12c) for non-normal data, it follows from (9) that
V12 = sÇu¢ W1GP¢ |
(13a) |
for the estimating equations derived from the normal theory based discrepancy functions.
The covariance matrix V22 depends only on the distribution of gˆ for (12a) to (12c),
which is |
|
V22 = P GP¢. |
(14a) |
Because the estimate r in (12d ) depends on previous data, it needs special attention. Notice that, when the reliability estimate r is based on previous research, s and r are independent. When using the normal theory based estimating equations for uˆ , V12 = 0 for normal data; and it follows from (9) and (12d ) that
V12 = (1 ± r0)sÇu¢ W1GL¢ |
(13b) |
when the data are not normal. Because the estimating function in (9c) is asymptotically optimal, V12 = 0 for the estimating equation based on the ADF discrepancy function. The variance v22 is given by
v22 = (1 ± r0)2LGL¢ + njxx2 d2. |
(14b) |
||||
Notice that LGL¢ equals the asymptotic variance of |
nsxx. |
|
|||
Now the asymptotic distribution of |
ˆ |
can be |
obtained easily by using (10), (11), (13) |
||
u |
|||||
|
|
p • |
ˆ |
||
and (14) in (4). In particular, the asymptotic covariance matrix of u(gˆ ) can be simpliŽed to (5) or (8) in various cases. In the following, we wish to compare the efŽciency of uˆ in the model segregation approach with that of u˜ in the simultaneous estimation approach. Although, as discussed in Section 1, the simultaneous estimation approach may not be feasible or may lead to an inappropriate solution to g, the comparison will give us more insight to the model estimation processes. This will be studied for the following two cases.
In the Žrst case the data are normal and the normal theory based MLis used in the simultaneous estimation approach. Denote the information matrix in the simultaneous estimation by
|
|
|
I = |
³ Igu |
Igg ´. |
|
|
|
|
|
|
|
|
Iuu |
Iug |
|
|
|
|
|
|
|
|
|
˜ |
|
|
|
|
Then the asymptotic covariance matrix of the MLE u in |
|
|
|
||||||
is given by |
|
|
pn•(u˜ |
± u0) !L N(0, Qu) |
|
|
|
|
|
Q = (I ± I |
ug |
I± 1I )± 1 = I± 1 + I± 1I (I |
± I I± 1I |
ug |
)± 1I I± 1. |
(15) |
|||
u |
uu |
gg gu |
uu |
uu ug gg |
gu uu |
gu uu |
|
||
Fitting structural equation models |
49 |
With any of the estimating functions in (9), A = ± Iuu = V11, B = ± Iug. Comparing (15) with (5), the difference in efŽciency between the two approaches is due to the efŽciency of gˆ . The comparison of the efŽciency of uˆ and u˜ for the normal theory based GLS approach is the same as that for the MLapproach.
In the second case the data are non-normal and the ADF method is used in both the simultaneous estimation approach and in deriving the estimating function (9c) in the model segregation approach. Let
|
M = sÇ¢WsÇ= ³ Mgu |
Mgg ´ |
|
||
|
|
|
Muu |
Mug |
|
when using ADF to simultaneously estimate u and g; then |
|
||||
where |
pn•(u˜ |
± u0) !L N(0, Qu), |
|
||
Q |
= (I ± M M± 1M )± 1 |
= M± 1 + M± 1M (M ± M M± 1 M )± 1 M M± 1 |
. (16) |
||
u |
uu ug gg gu |
uu |
uu ug |
gg gu uu ug gu uu |
|
Notice that A = ± Muu and B = ± Mug. Comparing (16) with (5), again the difference in efŽciency between the simultaneous estimation and the model segregation approach
is due to the efŽciency of |
gˆ . Actually, |
(M ± |
M M± 1M )± 1 is the asymptotic |
||
|
|
|
|
gg |
gu uu ug |
covariance matrix of |
n(g˜ ± |
g0); when gˆ |
is less efŽcient in the segregation approach |
||
|
ˆ |
|
|
|
˜ |
the corresponding |
u(gˆ ) is also less efŽcient than |
u. Before ending this discussion on |
|||
|
p • |
|
|
|
|
efŽciency we need to emphasize that the comparison is based on the asymptotic covariance matrix, and this efŽciency can only be realised when the sample size is large enough.
4. Test statistics
It is of interest to have a testing procedure for the overall model S = S(u, g) once uˆ is obtained. Let F[S, S(uˆ , gˆ )] be any of the three discrepancy functions evaluated at the parameter estimates. We would study the asymptotic distribution of T = nF[S, S(uˆ , gˆ )] before constructing a test statistic.
Fitting S to S(u, gˆ ) leads to
p •
n(uˆ ± u0) = [sÇ¢u(u0, gˆ )WsÇu(u0, gˆ )]±
p •
= (sÇ¢uWsÇu)± 1sÇ¢uW n[s
p •
1sÇ¢u(u0, gˆ )W n[s ± s(u0, gˆ )] + op (1)
(17)
± s(u0, gˆ )] + op (1),
where W = W1 for the normal theory based MLor GLS approach and W = W2 ADF based approach. It follows from (12) that
s ± s(u0, gˆ ) = (s ± s0) ± [s(u0, gˆ ) ± s0]
= (s ± s0) ± sÇg(gˆ ± g0)
p •
= D(s ± s0) + op (1/ n),
where
D = (I ± sÇgP).
for the
(18)
50 Ke-Hai Yuan and Wai Chan |
|
|
|
|
|
Equations (17) and (18) lead to |
|
|
|
|
|
and |
pn•(uˆ ± u0) = (sÇu¢ WsÇu)± 1sÇu¢ WDpn•(s ± s0) + op (1) |
|
|||
where |
pn•(bˆ ± |
b0) = Upn•(s ± s0) + op (1), |
(19) |
||
|
U = |
Á(sÇu¢ WsÇuP |
1 |
sÇu¢ WD !. |
|
|
|
)± |
|
|
|
Now, regarding F[S, S(bˆ )] as a function of sˆ = vech[S(bˆ )] and using a Taylor expansion of F[S, S(bˆ )] at s gives
|
|
|
|
ˆ |
Ç |
|
1 |
|
¨ ¯ |
|
|
|
|
|
F[S, S(b)] = F(S, S) + F¢(S, S)(sˆ ± s) + |
(sˆ ± |
s)¢F(S, S)(sˆ ± s), |
(20) |
|||||
|
|
|
2 |
||||||||
where |
¯ |
is a |
matrix |
lying between S and |
ˆ |
|
|
F(S, S) = 0, |
Ç |
0 and |
|
S |
S. Since |
F(S, S) = |
|||||||||
¨ |
¯ |
|
2W + |
op (1), it follows from (20) that |
|
|
|
|
|
|
|
F(S, S) = |
|
|
|
|
|
|
|||||
F[S, S(bˆ )] = (s ± sˆ )¢W(s ± sˆ ) + op (1/n).
Using (19), |
|
|
|
|
|
|
|
|
|
|
ˆ |
|
ˆ |
s0) |
|
|
|
|
s ± s(b) = (s ± s0) ± (s(b) ± |
|
|||
|
|
|
|
= (s ± s0) ± sÇ(bˆ ± b0) + op (1/pn•) |
|
||
|
|
|
|
)( |
|
), |
|
we obtain |
|
|
|
= (I ± sÇU |
s ± s0) + |
op (1/pn• |
|
|
|
|
|
ˆ |
en¢ Qen + |
op (1), |
|
|
|
|
nF[S, S(b)] = |
|
|||
where en = |
pn•(s ± s0) and |
Q = (I ± U¢sÇ¢)W(I ± sÇU). |
|
||||
± 1/2 |
en |
L |
|
|
|
|
|
Let zn = G |
|
; then zn!Np (0, I). Let t1 $. . . $tp ± q > 0 be the non-zero eigenva- |
|||||
lues of QG; then |
|
|
|
|
|
||
|
|
|
|
|
p ± q |
|
|
|
|
|
|
nF[S, S(bˆ )] !L Xj = 1 tjx12j, |
(21) |
||
where x21j are independent chi-square variates with one degree of freedom. Unless t1 = . . . = tp
± q, no commonly used distribution is available to describe the behaviour of the right-hand side of (21), which is a linear combination of chi-square distributions. Several approximations to a linear combination of chi-square distributions were developed by Box (1954) and Satterthwaite (1941). Another one was studied by Bentler (1994) for approximating (21) with Sn = S based on a sample from a non-normal distribution. When S is based on a sample from an elliptical distribution, Browne (1984) proposed a rescaling factor for nF(S, S(b˜ )) using Mardia’s (1970) coefŽcient of kurtosis. Satorra and Bentler (1988, 1994) proposed a more general rescaling factor for the likelihood ratio statistic. Tyler (1983) studied the likelihood ratio test for an explicit function of the covariance matrix with a rescaling factor. Replacing the sample
