Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Словари и журналы / Психологические журналы / p159British Journal of Mathematical and Statistical Psycholo

.pdf
Скачиваний:
47
Добавлен:
29.05.2015
Размер:
193.66 Кб
Скачать

159

British Journal of Mathematical and Statistical Psychology (2002), 55, 159–168

© 2002 The British Psychological Society

www.bps.org.uk

Estimation of the Wing–Kristofferson model for discrete motor responses

Jarl K. Kampen1 * and Tom A. B. Snijders2

1Katholieke Universiteit Leuven, Belgium

2Rijksuniversiteit Groningen, The Netherlands

A number of estimation methods for the variance components in the Wing– Kristofferson model for inter-response times are examined and compared by means of a simulation study. The estimation methods studied are the method of moments, maximum likelihood, and an alternative approach in which the Wing–Kristofferson model is recognized as a moving average model.

1. Introduction

The Wing–Kristofferson (WK) model (Wing & Kristofferson, 1973) was developed to account for two sources of variation in simple repeated motor responses, such as rhythmic tapping with a pencil on a table: a timing component and a component associated with the motor implementation of the response. Such data concerning repeated motor responses have been used to investigate whether respondents have problems in timing their movements rather than in implementing them. For example, Kooistra, Snijders, Schellekens, Kalverboer, and Geuze (1997) investigated whether motor problems in children with early-treated congenital hypothyroidism, in which the thyroid gland is missing or defective, could be explained in terms of a timing deŽcit. For this purpose, they analysed the results of the so-called tapping task, in which children are asked to perform rhythmic tapping; the times between taps are recorded and the WK model then Žtted. It was concluded that the variance of the motor implementation was signiŽcantly higher for children with congenital hypothyroidism than for controls. In this paper, the estimation methods for the WK model are re-examined and some new methods are proposed. The methods are compared by means of a simulation study. We also provide an applied example of the WK model.

Requests for reprints should be addressed to Jarl Kampen, Department Politieke Wetenschappen, Katholieke Universiteit Leuven, E. Van Evenstraat 2A, B-3000 Leuven, Belgium. (e-mail: io@soc.kuleuven.ac.be).

160 Jarl K. Kampen and Tom A. B. Snijders

2. The Wing–Kristofferson model for discrete motor responses

Let us call the time elapsing between two taps of a respondent the response interval. Wing and Kristofferson (1973) assume that the taps are paced internallyby the moment at which the implementation of each tap is started, and that a motor implementation delay intervenes between the implementation start and the observed tap. The time elapsing from the beginning of the implementation of the Žrst tap until the beginning of the implementation of the second tap is the timing interval of the respondent. Associated with each response interval are the motor delays for the Žrst and second tap. Hence, Wing and Kristofferson (1973) propose the model

rt = ct + dt ± dt± 1,

(1)

where rt is the observed tth response interval, ct is the timing interval associated with the tth response interval, and dt± 1 and dt are the motor delays associated with the preceding and present tap of the tth response interval, respectively. Let T represent the total number of response intervals rt, t [ (1, . . . , T ), to be analysed. It is assumed that the ct are independent and identically distributed (i.i.d.) with a normal distribution N(m, v). The dt are i.i.d. with a normal distribution N(h, f); the parameter h is unidentiŽable and for all purposes may be assumed to be 0 (or any other constant). It is further assumed that the timing intervals and the response delays are independent, which implies Var(rt) = v + 2f and Cov(rt, rt+ 1) = ± f.

3. Estimation of the Wing–Kristofferson model

3.1. Moment estimators

Wing and Kristofferson (1973) suggested estimating f by minus the observed lag-one autocovariance, and estimating v by the observed variance of the response intervals minus twice the estimate of f. This follows from equating sample moments to population moments. Kooistra et al. (1997, p. 65) propose a modiŽed estimator in which sample moments are equated to their expected values. The resulting estimators are then given by

ˆ

v =

ˆ

f =

where

rˆ =

and Žnally

1 XT

mˆ = T t = 1 rt,

 

(T ± 1)2(T2 + (2T + 1)rˆ )

,

 

 

 

 

T 3 ± 4T 2 + 4T

+ 2

 

 

 

 

 

 

 

 

 

 

 

± T ((T ± 2)jˆ 2 + (T ± 1)2rˆ )

 

 

 

 

 

 

 

 

 

,

 

 

 

 

T 3 ± 4T 2 + 4T

+ 2

 

 

 

 

 

 

 

 

 

 

2 = T ± 1 Xt = 1 (rt ± mˆ )2,

 

 

 

 

 

 

 

1

T

 

 

 

 

 

 

1

 

T± 1

 

 

 

 

 

 

 

 

Xt = 1 (rt ± mˆ 1)(rt+ 1 ± mˆ 2)

 

 

 

T ± 1

 

 

1

= T ± 1 Xt = 1 rt,

2

= T ± 1 Xt = 2 rt.

 

 

1

T± 1

 

 

1

T

(2)

(3)

(4)

Estimation for the Wing-Kristofferson model 161

For large T , the exact moment estimators converge to the WK estimates. For small T , both the procedures suggested by Wing and Kristofferson and the method of moments can lead to the estimation of negative variances. Evidence suggests that in practical cases, negative variances are computed in up to 30%of the number of respondents (Ivry & Keele, 1989; Lundy-Ekman, Ivry, Keele, & Woolacott, 1991). This may be regarded as an undesirable feature of these estimators.

3.2. Maximum likelihood estimators

An advantage of maximum likelihood estimators is that negative variances will not occur. Collect the response intervals in the T-vector r = (r1, . . . , rT )¢, the timing intervals in the T -vector c = (c1, . . . , cT )¢, and the motor implementation delays in the (T + 1)-

vector d = (d0, d1, . . . , dT )¢. DeŽne the elements of the T ´ (T + 1) matrix A by aij = ± 1 if i = j; aij = 1 if i = j ± 1; and aij = 0 for other i, j. Then we obtain the matrix representation of the WK model.

r = c + Ad.

The distributional assumptions for c and d imply that r has a T-variate normal distribution N(m1, S) where 1 is the unit T -vector and S = vIT + fAA¢, with IT denoting the identity matrix of order T. The log-likelihood function of the parameters conditional on the response intervals is given by

,(v, f; r) ~± 12 ln | vIT + fAA¢| ± 12 (r ± m)¢(vIT + fAA¢)± 1(r ± m).

Several algorithms can be used to maximize this log-likelihood as a function of v and f. One possibility is based on recognizing that S = vIT + fAA¢ is a tridiagonal matrix. Expressions for determinants and inverses of such matrices are given by Miller (1987, p. 65). Using these expressions, direct numerical maximization is possible using any standard numerical algorithm. Another possibility is to regard the vector d as missing data, and use the EMalgorithm (Dempster, Laird, & Rubin, 1977) to produce maximum likelihood estimators. Using straightforward but tedious algebra, the iterative estimators of the variance components by EMcan be shown (see the Appendix for a sketch of the proof ) to be deŽned by the iteration steps

ˆ(i+ 1)

 

1

 

(i)

AA¢z

(i)

± m)¢(r

 

(i)

AA¢z

(i)

± m) +

 

(i)

 

 

 

(i)

A¢(S

(i)

± 1

A)A¢)),

v

=

T

((r ±

f

 

 

 

± f

 

 

 

f

 

tr(A(IT+ 1 ± f

 

)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(5)

 

 

 

ˆ (i+ 1)

 

 

1

 

 

 

(i)

 

 

 

(i)

 

(i)

± 1

 

 

(i)

 

(i)

 

(i)

 

 

 

 

 

 

 

f

 

=

 

 

 

(f

 

(tr(IT+ 1 ± f

 

A¢(S )

 

A) + f

z

 

¢A¢Az

 

 

)),

 

 

(6)

 

 

 

 

T +

 

1

 

 

 

 

 

 

 

 

where z(i) = (S(i))± 1(r ±

 

m), S(i) =

v(i)IT

+ f(i)AA¢, and starting values v(0) and f(0) can be

chosen to be one of the moment estimates if they are positive, and arbitrary positive numbers otherwise, e.g. a proportion of the observed variance of r.

3.3. The moving average approach

Another possibility for estimating the variance components of the WK model is by recognizing it as a Žrst-order moving average or MA(1) process. Write the MA(1) process

yt = ut + but± 1,

162 Jarl K. Kampen and Tom A. B. Snijders

where yt = rt ± m and it is assumed that ut ,iid N(0, v) for all t. Well-known properties

of the MA(1) process are Var(y

) =

n(1 + b2) and Cov(y

, y

) = bn = ± f, whence,

t

 

t

t+ 1

 

from Var(yt) = v + 2f we have v = n(1 + b)2. Because of the

restrictions v $0 and

n $0, the model constrains b #0.

 

 

 

In the extensive literature on the MA(1) process, several procedures for producing maximum likelihood estimators of the parameters n and b are proposed; see, for instance, Box and Jenkins (1971, p. 187). However, maximum likelihood estimators of the variance components of Žrst-order moving average processes are known to be unreliable: the maximizing values do not always correspond to a plausible estimate due to a ‘pile-up effect’ (Anderson & Mentz, 1993). Also, maximum likelihood estimates are reported to be unstable (Godolphin & De Gooijer, 1982). These problems may also apply to the maximum likelihood estimators proposed in Section 3.2. In the literature, several alternative estimators have been proposed for the MA(1) process. For instance, Galbraith and Zinde-Walsh (1994) consider a non-iterative estimator by autoregressive approximation. That is, the MA(1) process is approximated by an autoregressive process AR( p), and the parameters of the model are estimated by ordinary least squares. Upon

deŽning the (T ± p) ´ p matrix Xp with elements given by xij =

ri+ j± 1, j = 1, . . . , T ± p,

and the (T ± p)-vector y = (rT± p, . . . , rT )¢, the coefŽcients of the autoregression are given

by the p-vector

 

 

 

 

Ã

(Xp¢ Xp)

± 1

Xp¢ yp,

(7)

bp =

 

of which the pth element is used as the estimate of b (Galbraith & Zinde-Walsh, 1994,

p. 145). Using jˆ

2

as deŽned in (4), n is estimated as nˆ = jˆ

2

/(1 +

ˆ

2

), and the estimators of

 

 

b

 

the variance components of the WK model are

 

 

 

 

 

 

 

 

ˆ

ˆ

2

 

 

 

 

(8)

 

 

v = nˆ

(1 + b)

 

 

 

 

 

and

 

 

 

 

 

 

 

 

 

 

 

ˆ

ˆ

 

 

 

 

 

(9)

 

 

f =

± bnˆ .

 

 

 

 

 

We refer to these estimators of the variance components of the WK model as the ARMA estimators. Regarding the choice of p in (7), Galbraith and Zinde-Walsh (1994, p. 149) suggest that for small T (say T #25) values of p ranging from 1 to 5 will provide good estimators.

4. A comparison of estimation methods of the Wing–Kristofferson model

4.1. A simulation study

In this section, we compare the efŽciency of the moment estimators as given by Wing and Kristofferson, the moment estimators given by Kooistra et al. ((2) and (3) above), the maximum likelihood estimators as deŽned by EM ((5) and (6)), and the ordinary least squares based estimators obtained in the ARMA approximation ((8) and (9)) by comparing their bias and mean squared error as obtained in a simulation study. The mean squared error of some estimator z of a parameter Z is deŽned by MSE(z) = E((z ± Z )2). The MSEs of the estimators are a function of T, v and f. We set m = 1000. If we take v + 2f = c for some positive constant c, the behaviour of the estimators can be illustrated adequately by manipulating only the number of taps T and the ratio of the variance components q = f/v (because up to a multiplicative term, the MSEs of the

Estimation for the Wing-Kristofferson model 163

estimated variance components are unaltered if the variance components are multiplied by a constant). We took c = 100, T [ (20, 200), and v [ (0, 10, 20, . . . , 100), corresponding to q [ (¥, 4.5, 2.0, 1.2, 0.75, 0.50, 0.33, 0.21, 0.13, 0.06, 0). In line with the suggestion of Galbraith and Zinde-Walsh (1994, p. 149), we set p = 3 if T = 20 and p = 24 if T = 200. (A pilot study where T = 20 suggested that in fact p = 3 produces more efŽcient estimates than p = 1, 2 or 4.)

The routines (available on request) for the simulations were written in GAUSS 3.2. Samples of c and d were drawn from normal distributions N(m, v) and N(0, f) respectively, and these vectors were linearly combined to form the vector r = c + Ad. Then the four different estimates of the variance components were computed and negative estimates were set to zero. This procedure was repeated for N = 10 000 runs per parameter setting when T = 20, and N = 100 runs for T = 200; the MSEs were estimated as the average over the N runs of the observed squared errors. It was found that the likelihood function can have a local maximum which is not the global maximum. This implies that the choice of starting values is important. We tried various reasonable starting values and present here only the results for the starting value yielding the uniformly smallest root mean squared errors (RMSEs). These starting values are deŽned by var(r)/2 for v and var(r)/4 for f.

The results of the simulations for T = 20 are shown in Tables 1–4, where the RMSE and bias of each of the four estimators are displayed as a function of v and f, respectively. The results ofthe simulations for T = 200 (not printed here) suggestthat all estimators are consistent. As for T = 20, and considering the timing variance v, Wing and Kristofferson’s moment estimator (abbreviated WKMom. in the tables) and the ARMAestimator perform better than the moment estimator of Kooistra et al. (K Mom. in the tables) in terms of RMSEvalues. Thus, the simulations do not support the suggestion of Kooistra et al. (1997) that their proposed estimator is better than the WK estimator. Comparatively speaking, ARMAtends to perform better with high v, and the WK moment estimators with low v. Nevertheless, the maximum likelihood estimator outperforms the three other approaches. All estimators except the moment estimator of Kooistra et al. have considerable bias outside the middle of the range of v-values, and for high v, the bias of the maximum likelihood estimator has an important contribution to its RMSE. For the motor variance f, on the other hand, the maximum likelihood estimator as compared to the two moment estimators has a larger RMSEfor small values of f and a smaller RMSEfor high values of f. The RMSE of the ARMA is uniformly lower than that of the other approaches. Accordingly, we propose a hybrid procedure in order to estimate the variance components of the WKmodel; that is, to estimate v by the maximum likelihood estimator and to estimate f by the ARMAestimator.

4.2. Example

For illustrative purposes, we include an applied example of the WK model, reanalysing the data gathered by Kooistra et al. (1997) on timing variability in children with earlytreated congenital hypothyroidism (see Introduction). The objective is to show whether or not the different estimation methods lead to similar conclusions. In the present study, a group of children suffering from thyroid agenesis (n = 21) was compared to a group of children suffering from thyroid dysgenesis (n = 25) and a group of controls (n = 34). The main conclusions that were drawn in the study were that the timing variance v does not differ across groups, and that the estimate of the motor delay variance f of children with early-treated congenital hypothyroidism is signiŽcantly

164 Jarl K. Kampen and Tom A. B. Snijders

 

 

 

Table 1.

ˆ

 

 

20, N = 10 000)

 

RMSE (v) as a function of the estimators (T =

 

 

 

 

 

 

 

 

v

WK Mom.

K Mom.

ML

ARMA

 

 

 

 

 

 

 

0

21.64

25.54

7.103

22.00

 

10

22.21

26.62

11.49

25.40

 

20

25.07

29.82

17.77

29.84

 

30

28.03

32.85

23.04

32.94

 

40

31.91

36.76

28.08

35.67

 

50

35.22

40.05

32.31

38.13

 

60

39.24

44.16

36.43

40.48

 

70

42.59

47.66

39.61

41.90

 

80

45.86

50.95

43.20

43.34

 

90

49.44

54.71

46.71

44.66

 

100

54.02

59.66

50.20

45.19

 

 

 

 

 

 

Table 2.

ˆ

 

 

20, N = 10 000)

 

Bias (v) as a function of the estimators (T =

 

 

 

 

 

 

 

 

v

WK Mom.

K Mom.

ML

ARMA

 

 

 

 

 

 

 

0

12.35

14.88

3.812

13.03

 

10

7.301

10.72

± 0.5486

10.51

 

20

3.443

7.902

± 3.947

8.833

 

30

± 0.0661

5.500

± 6.635

7.127

 

40

± 3.151

3.464

± 9.633

4.836

 

50

± 5.571

2.214

± 13.15

2.956

 

60

± 7.932

0.9958

± 16.32

0.1433

 

70

± 9.482

0.6916

± 19.65

± 2.603

 

80

± 11.72

± 0.3894

± 23.65

± 4.896

 

90

± 13.33

± 0.7963

± 28.17

± 7.640

 

100

± 14.86

± 1.141

± 32.90

± 9.722

 

 

 

 

 

 

Table 3.

ˆ

 

 

20, N = 10 000)

 

RMSE (f) as a function of the estimators (T =

 

 

 

 

 

 

 

 

f

WK Mom.

K Mom.

ML

ARMA

 

 

 

 

 

 

 

0

18.41

17.38

22.80

18.40

 

5

18.98

18.27

22.49

17.69

 

10

19.91

19.55

22.56

17.16

 

15

21.27

21.31

22.70

16.74

 

20

22.84

23.27

22.47

16.60

 

25

23.41

24.16

22.07

16.07

 

30

25.58

26.62

21.63

15.50

 

35

26.01

27.20

20.84

14.68

 

40

28.07

29.49

19.65

13.44

 

45

27.97

29.44

17.95

11.58

 

50

30.11

31.69

17.27

9.700

 

 

 

 

 

 

Estimation for the Wing-Kristofferson model 165

Table 4.

ˆ

 

 

 

 

Bias (f) as a function of the estimators (T = 20, N = 10 000)

 

 

 

 

 

 

 

 

f

WK Mom.

K Mom.

ML

ARMA

 

 

 

 

 

 

 

0

10.96

9.511

14.78

11.94

 

5

8.903

7.313

12.76

9.550

 

10

7.280

5.627

10.88

7.271

 

15

5.878

4.255

9.284

5.311

 

20

4.768

3.274

7.662

3.426

 

25

3.234

1.878

5.935

1.428

 

30

2.792

1.735

4.312

± 0.0086

 

35

1.653

0.8672

2.608

± 1.347

 

40

1.509

1.133

1.151

± 2.106

 

45

0.0690

0.0913

± 1.887

± 6.413

 

50

0.1030

0.5905

± 5.750

± 4.213

 

 

 

 

 

 

higher than that of controls; differences were tested by means of a t test comparing the means of the estimated variance components (as obtained by the exact method of moments) of the three groups in a pairwise fashion.

Means as well as standard errors of the means of the variance components as estimated by the four methods discussed in the previous sections are displayed in Table 5. If a negative variance was estimated (this happens in the moment approaches and the ARMA approach), the negative estimates were set to zero. Concerning the timing variance v, the results of Kooistra et al. (1997, p. 69) are corroborated: no signiŽcant differences between the congenital hypothyroidism groups combined (n = 46) and the controls were found in either estimation method. Concerning the motor delayvariance f, signiŽcant differences between groups were found for the exact moment estimates and those obtained from the ARMA approach; but no signiŽcant difference was detected between means of the WK moment estimates and the maximum likelihood estimates of f. However, if we take into consideration that in general, the motor variance f is estimated better by the moving average approach than by any of the other methods, the conclusion that children with congenital hypothyroidism have higher motor variability is justiŽed.

Table 5. Means (and estimated standard errors of the mean) of estimated variance components by group and by method

 

 

Estimate of v

 

 

 

Estimate of f

 

 

 

 

 

 

 

 

 

 

 

Group

WK Mom.

K Mom.

ML

ARMA

 

WK Mom.

K Mom.

ML

ARMA

 

 

 

 

 

 

 

 

 

1

817.1

971.0

765.2

839.0

499.5

485.6

422.1

495.3

 

(223.9)

(257.1)

(213.6)

(190.4)

(144.7)

(152.7)

(85.3)

(108.1)

2

1039.2

1212.8

936.4

1238.5

436.4

438.8

366.5

371.5

 

(265.7)

(309.0)

(205.8)

(280.3)

(151.7)

(153.1)

(126.7)

(125.4)

3

997.7

1188.0

769.5

1108.6

208.8

182.2

262.8

218.2

 

(212.7)

(250.3)

(164.6)

(265.8)

(29.8)

(28.0)

(38.6)

(37.4)

 

 

 

 

 

 

 

 

 

 

166 Jarl K. Kampen and Tom A. B. Snijders

5. Closing comments

In some applications of the WK model, researchers may be interested in the standard errors of the parameters at the individual rather than aggregate level (as in the example in Section 4.2.). The EMapproach can be modiŽed to yield standard errors (e.g., the SEM algorithm; see Meng &Rubin, 1991). For the other estimation methods, the derivation of analytical expressions of the standard errors is cumbersome, and for the moment estimators such standard errors would rely on fourth moments of the data and would therefore be very unstable. In these cases, we recommend that standard errors of the parameters involved are produced by means of the parametric bootstrap (Efron & Tibshirani, 1993), which is a simple and reliable procedure.

Finally, we have assumed that the response intervals are free from a so-called drift in the timing intervals. The WK model adjusting for a linear drift is given by

rt = ct + dt ± dt± 1 + qt

and adjusts the observed response intervals for a systematic acceleration or delay (conceptually, q corresponds to processes of the cerebrum rather than the cerebellum). Astudy of the estimation of the drift parameter is beyond the scope of this paper. Note, however, that moment estimators of the drift parameter q were given by Kooistra et al. (1997), and that they found no evidence of a drift in their data.

Acknowledgement

The authors are grateful to Libbe Kooistra for permission to use his data.

References

Anderson, T. W., & Mentz, R. P. (1993). A note on maximum likelihood estimation in the Žrstorder Gaussian moving average model. Statistics and Probability Letters, 16, 205–211.

Box, G. E. P., & Jenkins, G. M. (1971). Time series analysis: forecasting and control. London: Holden-Day.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EMalgorithm (with discussion). Journal of the Royal Statistical Society B, 39, 1– 38.

Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.

Galbraith, J. W., & Zinde-Walsh, V. (1994). A simple non-iterative estimator for moving average models. Biometrika, 81, 143–155.

Godolphin, E. J., & De Gooijer, J. G. (1982). On the maximum likelihood estimation of parameters of a Gaussian moving average process. Biometrika, 69, 443–451.

Ivry, R. B., & Keele, S. W. (1989). Time functions of the cerebellum. Journal of Cognitive Neurosciences, 6, 167–180.

Kooistra, L., Snijders, T. A. B., Schellekens, J. M. H., Kalverboer, A. F., &Geuze, R. H. (1997). Timing variabilityin children with early-treatedcongenital hypothyroidism. Acta Paediatrica, 96, 61– 73.

Lundy-Ekman, L., Ivry, R. B., Keele, S. W., & Woolacott, M. H. (1991). Timing and force control deŽcits in clumsy children. Journal of Cognitive Neurosciences, 3, 367–376.

Meng, X. L., &Rubin, D. B. (1991). Using EMto obtain asymptotic variance–covariance matrices: the SEM algorithm. Journal of the American Statistical Association, 86, 899–909.

Estimation for the Wing-Kristofferson model 167

Miller, K. S. (1987). Some eclectic matrix theory. Malabar, FL: Krieger.

Wing, M. W., & Kristofferson, A. B. (1973). Response delays and the timing of discrete motor responses. Perception & Psychophysics, 14, 5–12.

Received 30 April 1999; Ž nal version received 4 April 2001

Appendix

We give a brief derivation of the iterative maximum likelihood estimators as found by EM. First the expectation step (E-step) and then the maximization step (M-step) are presented. To apply the EM algorithm, we deŽne the complete data vector y = (r¢, d¢)¢, and the missing data by d. Note that y is normally distributed with mean vector mY = (m¢, 0¢)¢ and covariance matrix

SY =

³fA¢

fIT+ 1 ´.

 

S

fA

The vector y is a linear combination of the vectors c and d. This implies that, up to a multiplicative constant, the likelihood based on y is identical to the joint likelihood of the vectors c and d. The distributional assumptions for the latter two vectors generate the complete data log-likelihood

,(v, f, m; y) ~± 12 T ln v ± 12 v± 1(c ± m)¢(c ± m) ± 12 (T + 1) ln f ± 12 f± 1d¢d.

The E-step of the EM algorithm consists of Žnding an expression for

E(,(v, f; y)| r, v(i), f(i), m(i)),

i.e. the expected log-likelihood of the complete data given the observed data and given the current values v(i), f(i), m(i) of the parameters. The maximum likelihood estimate of m turns out to be very close to the sample mean of r, as Prt = Pct + dT+ 1 ± d0, and E(dT+ 1 ± d0) = 0. Therefore the estimation can be simpliŽed slightly by using

1 XT

mˆ = T t = 1 rt

right from the start. From multivariate normal distribution theory, the conditional expectation of d given r is E(d| r) = fA¢z, where z = S± 1(r ± m), and its conditional covariance matrix is Var(d| r) = f(IT+ 1 ± fA¢S± 1A). Finally,

E(d¢d; r, v(i), f(i)) = f(i)(tr(IT+ 1 ± f(i)A¢(S(i))± 1A) + f(i)z(i) ¢A¢Az(i)),

where S(i) = v(i)IT + f(i)AA¢ and z(i) = (S(i))± 1(r ± m). Upon substituting c = r ± Ad, we Žnd

E((c ± m)¢(c ± m); r, v(i), f(i)) = f(i)tr(A(IT+ 1 ± f(i)A¢(S(i))± 1A)A¢) + (r ± f(i)AA¢z(i) ± m)¢(r ± f(i)AA¢z(i) ± m).

168 Jarl K. Kampen and Tom A. B. Snijders

It follows that the expected value of the conditional log-likelihood is given by

E(,(v,f| y); r, v(i), f(i)) = ±

21 T ln v± 21 (T + 1) ln f ± 21 v± 1(f(i)tr(A(IT+ 1 ± f(i)A¢(S(i))± 1A)A¢)

+ (r ±

f(i)AA¢z(i) ± m)¢(r ± f(i)AA¢z(i) ± m))

±

21 f±

1(f(i)(tr(IT+ 1 ± f(i)A¢(S(i))± 1A) + f(i)z(i) ¢A¢Az(i))).

Because this log-likelihood is a sum of two mutually independent parts, one depending on v only and one depending on f only, the M-step can be divided in two separate parts: one part maximizing with respect to f and another maximizing with respect to v. This yields the two iteration steps reported in Section 3.2.