Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Московский государственный физико-технический университет (МФТИ)

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

EViews Guides BITCH / EV72.pdf

Скачиваний:

670

Добавлен:

03.06.2015

Размер:

8.25 Mб

Скачать

☆

<<< < Предыдущая 35 36 37 38 39 40 41 42 43 44 45 4647 / 11947 48 49 50 51 52 53 54 55 56 57 58 59 > Следующая >>>

Examples—307

ting or Hessian - expected, Hessian - observed, and OPG - BHHH. If you are computing Huber/White covariances, only the two Hessian based selections will be displayed.

By default, EViews will match the estimator to the one used in estimation as specified in the Estimation Options section. Thus, equations estimated by Quadratic Hill Climbing and Newton-Raphson will use the observed information, while those using IRLS or BHHH will use the expected information matrix or outer-product of the gradients, respectively.

The one exception to the default matching of estimation and covariance information matrices occurs when you estimate the equation using BHHH and request Huber/White covariances. For this combination, there is no obvious choice for estimating the outer matrix in the sandwich, so the observed information is arbitrarily used as the default.

Lastly you may use the d.f. Adjustment checkbox choose whether to apply a degree-of-free- dom correction to the coefficient covariance. By default, EViews will perform this adjustment.

Examples

In this section, we offer three examples illustrating GLM estimation in EViews.

Exponential Regression

Our first example uses the Kennen (1983) dataset (“Strike.WF1”) on number of strikes (NUMB), industrial production (IP), and dummy variable representing the month of February (FEB). To account for the non-negative response variable NUMB, we may estimate a nonlinear specification of the form:

NUMBi = exp(b1 + b2IPi + b3FEBi) + ei

(27.3)

where ei ~ N(0, j2). This model falls into the GLM framework with a log link and normal family. To estimate this specification, bring up the GLM dialog and fill out the equation specification page as follows:

numb c ip feb

then change the Link function to Log. For the moment, we leave the remaining settings and those on the Options page at their default values. Click on OK to accept the specification and estimate the model. EViews displays the following results:

308—Chapter 27. Generalized Linear Models

Dependent Variable: NUMB

Method: Generalized Linear Model (Quadratic Hill Climbing)

Date: 06/15/09 Time: 09:31

Sample: 1 103

Included observations: 103

Family: Normal

Link: Log

Dispersion computed using Pearson Chi-Square

Coefficient covariance computed using observed Hessian

Convergence achieved after 5 iterations

Variable	Coefficient	Std. Error	z-Statistic	Prob.


C	1.727368	0.066206	26.09097	0.0000
IP	2.664874	1.237904	2.152732	0.0313
FEB	-0.391015	0.313445	-1.247476	0.2122


Mean dependent var	5.495146	S.D. dependent var		3.653829
Sum squared resid	1273.783	Log likelihood		-275.6964
Akaike info criterion	5.411580	Schwarz criterion		5.488319
Hannan-Quinn criter.	5.442662	Deviance		1273.783
Deviance statistic	12.73783	Restr. deviance		1361.748
LR statistic	6.905754	Prob(LR statistic)		0.031654
Pearson SSR	1273.783	Pearson statistic		12.73783
Dispersion	12.73783

The top portion of the output displays the estimation settings and basic results, in particular the choice of algorithm (Quadratic Hill Climbing), distribution family (Normal), and link function (Log), as well as the dispersion estimator, coefficient covariance estimator, and estimation status. We see that the dispersion estimator is based on the Pearson x2 statistic and the coefficient covariance is computed using the inverse of the observed Hessian.

The coefficient estimates indicate that IP is positively related to the number of strikes, and that the relationship is statistically significant at conventional levels. The FEB dummy variable is negatively related to NUMB, but the relationship is not statistically significant.

The bottom portion of the output displays various descriptive statistics. Note that in place of some of the more familiar statistics, EViews reports the deviance, deviance statistic (deviance divided by the degrees-of-freedom) restricted deviance (deviance for the model with only a constant), and the corresponding LR test statistic and probability. The test indicates that the IP and FEB variables are jointly significant at roughly the 3% level. Also displayed are the sum-of-squared Pearson residuals and the estimate of the dispersion, which in this example is the Pearson statistic.

Examples—309

It may be instructive to examine the representations view of this equation. Simply go to the equation toolbar or the main menu and click on

View/Representations to display the view.

Notably, the representations view displays both the specification of the linear predictor (I_NUMB) as well as the mean specification (EXP(I_NUMB)) in terms of the EViews coefficient names, and in terms of the estimated values. These are the expressions used when fore-

casting the index or the dependent variable using the Forecast procedure (see “Forecasting” on page 316).

Binomial

We illustrate the estimation of GLM binomial logistic regression using a simple example from Agresti (2007, Table 3.1, p. 69) examining the relationship between snoring and heart disease. The data in the first page of the workfile “Snoring.WF1” consist of grouped binomial response data for 2,484 subjects divided into four risk factor groups for snoring level (SNORE), coded as 0, 2, 4, 5. Associated with each of the four groups is the number of individuals in the group exhibiting heart disease (DISEASE) as well as a total group size (TOTAL).

SNORE	DISEASE	TOTAL

0	24	1379

2	35	638

4	21	213

5	21	213

We may estimate a logistic regression model for these data in either raw frequency or proportions form.

To estimate the model in raw frequency form, bring up the GLM equation dialog, enter the linear predictor specification:

disease c snore

310—Chapter 27. Generalized Linear Models

select Binomial Count in the Family combo, and enter “TOTAL” in the Number of trials edit field. Next switch over to the Options page and turn off the d.f. Adjustment for the coefficient covariance. Click on OK to estimate the equation.

Dependent Variable: DISEASE

Method: Generalized Linear Model (Quadratic Hill Climbing) Date: 06/15/09 Time: 16:20

Sample: 1 4

Included observations: 4

Family: Binomial Count (n = TOTAL) Link: Logit

Dispersion fixed at 1

Coefficient covariance computed using observed Hessian Summary statistics are for the binomial proportions and implicit

variance weights used in estimation Convergence achieved after 4 iterations

No d.f. adjustment for standard errors & covariance

The output header shows relevant information for the estimation procedure. Note in particular the EViews message that summary statistics are computed for the binomial proportions data. This message is a hint at the fact that EViews estimates the binomial count model by scaling the dependent variable by the number of trials, and estimating the corresponding proportions specification.

Equivalently, you could have specified the model in proportions form. Simply enter the linear predictor specification:

disease/total c snore

with Binomial Proportions specified in the Family combo and “TOTAL” entered in the

Number of trials edit field.

Examples—311

Dependent Variable: DISEASE/TOTAL

Method: Generalized Linear Model (Quadratic Hill Climbing)

Date: 06/15/09 Time: 16:31

Sample: 1 4

Included observations: 4

Family: Binomial Proportion (trials = TOTAL)

Link: Logit

Dispersion fixed at 1

Coefficient covariance computed using observed Hessian

Convergence achieved after 4 iterations

No d.f. adjustment for standard errors & covariance

Variable	Coefficient	Std. Error	z-Statistic	Prob.


C	-3.866248	0.166214	-23.26061	0.0000
SNORING	0.397337	0.050011	7.945039	0.0000


Mean dependent var	0.023490	S.D. dependent var		0.001736
Sum squared resid	0.000357	Log likelihood		-11.53073
Akaike info criterion	6.765367	Schwarz criterion		6.458514
Hannan-Quinn criter.	6.092001	Deviance		2.808912
Deviance statistic	1.404456	Restr. deviance		65.90448
LR statistic	63.09557	Prob(LR statistic)		0.000000
Pearson SSR	2.874323	Pearson statistic		1.437162
Dispersion	1.000000

The top portion of the output changes to show the different settings, but the remaining output is identical. In particular, there is strong evidence that SNORING is related to heart disease in these data, with the estimated probability of heart disease increasing with the level of snoring.

It is worth mentioning that data of this form are sometimes represented in a frequency weighted form in which the data each group is divided into two records, one for the binomial successes, and one for the failures. Each each record contains the number of repeats in the group and a binary indicator for success (the total number of records is G , where G is the number of groups) The FREQ page of the “Snoring.WF1” workfile contains the data represented in this fashion:

SNORE	DISEASE	N

0	1	24

2	1	35

4	1	21

5	1	30

0	0	1379

2	0	638

312—Chapter 27. Generalized Linear Models

4	0	213

5	0	213

In this representation, DISEASE is an indicator for whether the record corresponds to individuals with heart disease or not, and N is the number of individuals in the category.

Estimation of the equivalent GLM model specified using the frequency weighted data is straightforward. Simply enter the linear predictor specification:

disease c snore

with either Binomial Proportions or Binomial Count specified in the Family combo. Since each observation corresponds to a binary indicator, you should enter “1” enter as the Number of trials edit field. The multiple individuals in the category are handled by entering “N” in the Frequency weights field in the Options page.

Dependent Variable: DISEASE

Method: Generalized Linear Model (Quadratic Hill Climbing)

Date: 06/16/09 Time: 14:45

Sample: 1 8

Included cases: 8

Total observations: 2484

Family: Binomial Count (n = 1)

Link: Logit

Frequency weight series: N

Dispersion fixed at 1

Coefficient covariance computed using observed Hessian

Convergence achieved after 6 iterations

No d.f. adjustment for standard errors & covariance

Variable	Coefficient	Std. Error	z-Statistic	Prob.

C	-3.866248	0.166214	-23.26061	0.0000
SNORING	0.397337	0.050011	7.945039	0.0000


Mean dependent var	0.044283	S.D. dependent var		0.205765
Sum squared resid	102.1917	Log likelihood		-418.8658
Akaike info criterion	0.338861	Schwarz criterion		0.343545
Hannan-Quinn criter.	0.340562	Deviance		837.7316
Deviance statistic	0.337523	Restr. deviance		900.8272
LR statistic	63.09557	Prob(LR statistic)		0.000000
Pearson SSR	2412.870	Pearson statistic		0.972147
Dispersion	1.000000

Note that while a number of the summary statistics differ due to the different representation of the data (notably the Deviance and Pearson SSRs), the coefficient estimates and LR test statistics in this case are identical to those outlined above. There will, however, be substantive differences between the two results in settings when the dispersion is estimated since the effective number of observations differs in the two settings.

Examples—313

Lastly the data may be represented in individual trial form, which expands observations for each trial in the group into a separate record. The total number of records in the data is Âni , where ni is the number of trials in the i-th (of G ) group. This representation is the traditional ungrouped binary response form for the data. Results for data in this representation should match those for the frequency weighted data.

Binomial Proportions

Papke and Wooldridge (1996) apply GLM techniques to the analysis of fractional response data for 401K tax advantaged savings plan participation rates (“401kjae.WF1”). Their analysis focuses on the relationship between plan participation rates (PRATE) and the employer matching contribution rates (MRATE), accounting for the log of total employment (LOG(TOTEMP), LOG(TOTEMP)^2), plan age (AGE, AGE^2), and a binary indicator for whether the plan is the only pension plan offered by the plan sponsor (SOLE).

We focus on two of the equations estimated in the paper. In both, the authors employ a GLM specification using a binomial proportion family and logit link. Information on the binomial group size ni is ignored, but variance misspecification is accounted for in two ways: first using a binomial QMLE with GLM standard errors, and second using the robust Huber-White covariance approach.

To estimate the GLM standard error specification, we first call up the GLM dialog and enter the linear predictor specification:

prate mprate log(totemp) log(totemp)^2 age age^2 sole

Next, select the Binomial Proportion family, and enter the sample description

@all if mrate<=1

Lastly, we leave the Number of trials edit field at the default value of 1, but correct for heterogeneity by going to the Options page and specifying Pearson Chi-Sq. dispersion estimates. Click on OK to continue.

The resulting estimates correspond the coefficient estimates and first set of standard errors in Papke and Wooldridge (Table II, column 2):

314—Chapter 27. Generalized Linear Models

Dependent Variable: PRATE

Method: Generalized Linear Model (Quadratic Hill Climbing)

Date: 08/12/09 Time: 11:28

Sample: 1 4735 IF MRATE <=1

Included observations: 3784

Family: Binomial Proportion (trials = 1) (quasi-likelihood)

Link: Logit

Dispersion computed using Pearson Chi-Square

Coefficient covariance computed using observed Hessian

Convergence achieved after 8 iterations

Variable	Coefficient	Std. Error	z-Statistic	Prob.


MRATE	1.390080	0.100368	13.84981	0.0000
LOG(TOTEMP)	-1.001875	0.111222	-9.007920	0.0000
LOG(TOTEMP)^2	0.052187	0.007105	7.345551	0.0000
AGE	0.050113	0.008710	5.753136	0.0000
AGE^2	-0.000515	0.000211	-2.444532	0.0145
SOLE	0.007947	0.046785	0.169859	0.8651
C	5.058001	0.426942	11.84704	0.0000


Mean dependent var	0.847769	S.D. dependent var		0.169961
Sum squared resid	92.69516	Quasi-log likelihood		-8075.396
Deviance	765.0353	Deviance statistic		0.202551
Restr. deviance	895.5505	Quasi-LR statistic		680.4838
Prob(Quasi-LR stat)	0.000000	Pearson SSR		724.4200
Pearson statistic	0.191798	Dispersion		0.191798

Papke and Wooldridge offer a detailed analysis of the results (p. 628-629), which we will not duplicate here. We will point out that the estimate of the dispersion (0.191798) taken from the Pearson statistic is far from the restricted value of 1.0.

The results using the QML with GLM standard errors rely on validity of the GLM assumption for the variance given in Equation (27.2), an assumption that may be too restrictive. We may instead estimate the equation without imposing a particular conditional variance specification by computing our estimates using a robust Huber-White sandwich method. Click on Estimate to bring up the equation dialog, select the Options tab, then change the Covariance method from Default to Huber/White. Click on OK to estimate the revised specification:

<<< < Предыдущая 35 36 37 38 39 40 41 42 43 44 45 4647 / 11947 48 49 50 51 52 53 54 55 56 57 58 59 > Следующая >>>

Соседние файлы в папке EViews Guides BITCH

#
03.06.20158.25 Mб670EV72.pdf
#
03.06.201513.69 Mб244EViews_Illustrated.pdf
#
03.06.20151 Mб196EViews_tutorial.pdf