Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Эконометрика Учебно-методическое пособие по дисциплине Эконометрика

..pdf
Скачиваний:
3
Добавлен:
15.11.2022
Размер:
1.08 Mб
Скачать

21

Practical use of econometric models has a great importance, especially their adequacy, i.e. according to the real process and the statistical data used in model construction. The quality analysis (verification of model) includes statistical and substantive part. Tests of statistical quality econometric models usually consists of the following steps:

Check the overall quality of the regression equation.

Check the statistical significance of the coefficients of the regression equation.

Checking the preciseness of the model.

Check the properties of the data, the execution of which was assumed in the evaluation of the equation (for example, the conditions of the Gauss - Markov).

Under the substantive component of the quality analysis refers to the examination of the economic meaning of the resulting model and its coefficients.

2.6 ASSESSMENT OF THE STATISTICAL SIGNIFICANCE OF THE MODEL PARAMETERS

The equation of the model defined by the selected source data has the form:

ˆ

a b X (simple linear regression). The model parameters

are random

Y

variables(a and b, calculated according to the sample). Their mathematical expectations is equal to α and β when performing assumptions about the variance i. The estimation of to α and β is the more, the less they spread around , i.e. less than their variance.

S

2

 

S 2 e

,

 

n

 

 

 

 

b

 

xi

 

 

2

 

 

 

X

 

i 1

When checking the quality of the model it is necessary to check existence of a linear relationship between X and Y, i.e. to check the statistical significance of the parameter . This analysis should be done according to the scheme of statistical testing of hypotheses, as already mentioned. Formulated two hypotheses:

H0: b=0 H1: b≠0

It is calculated t-statistics tb b .

Sb

You can prove that it is possible to calculat Sb2 by the formula:

 

 

 

 

 

S

2

 

S 2 e

, (2.12)

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

b

 

xi

 

 

2

 

 

 

 

 

 

 

 

X

 

 

 

 

 

 

 

 

 

i 1

 

 

 

 

 

 

 

n

 

 

 

 

 

 

 

 

 

 

2

 

ei2

 

 

 

 

 

 

 

S

i 1

 

is the estimated residual dispersion (estimation of the dispersion of

e

n

2

 

 

 

 

 

 

 

 

 

 

the errors),

Sb - the standard deviation of the random variable b.

The value of b is a measure of the slope of the regression line. Obviously, the

21

22

larger the variance of the Y values around the regression line (more Se2 ) lead larger (average) error in the determination of the tilt of the regression line. If there is no

variation at all (the εi=0 and hence Se

2 =0), then the linear function is unambiguous

and there is no error in the determination of the parameters.

The denominator value

Sb depends on the range of variation of the variable X.

 

n

 

 

 

The wider value leads bigger

X i

X

2 and smaller error in the estimation of the

 

i 1

 

 

 

tilt line. In addition, increasing the number of observations (ceteris paribus) enlarges

n X i X 2 and, consequently, reduces the magnitude of the error.

i 1

If tb ( the calculated value of t-statistics) will be more theoretical (t b>t), then the

null hypothesis is rejected and the coefficient b is recognized as statistically significant with the selected confidence level (α). In this case, it is possible to construct a confidence interval the β coefficient:

b t Sb b t Sb (2.13)

If the null hypothesis is accepted (b=0), it indicates the absence of a relationship between the explained and factor variables.

On a similar scheme tests the hypothesis of statistical significance of the coefficient a:

 

 

 

n

 

 

n

 

 

 

Se2

X i2

 

 

X i2

 

D(a) S2

 

 

i 1

= S2

 

i 1

(2.14)

 

 

a

 

n

 

 

 

b

 

n

 

 

2

 

 

 

 

n X i X

 

 

 

 

 

 

i 1

 

 

 

 

 

 

 

Sa and Sb is the standard dispersion of random variables a and b.

The dispersion of the free member of the equation is proportional S 2b , so for it

fair so as fair early made explanations about the influence of the scatter Yi around the regression line and the scatter Xi of the standard error.

The more changes the slope of a line drawn through a given point X ,Y , the greater the scatter free member, describing the point of intersection of this line with the y-axis.

In addition, dispersion and standard error of the free member is greater when

greater the average of value X 2 . A small change in the tilt of regression line can cause meaningful change in free member, when module of values X is large. In this case the distance become large from point observations to the y-axis.

For statistically significant parameters of the model it is possible to construct confidence intervals, the value of which forms an interval estimate of the corresponding true model parameters.

22

23

2.7 THE COEFFICIENT OF DETERMINATION. THE STATISTICAL SIGNIFICANCE OF THE COEFFICIENT OF DETERMINATION.

It is desirable to have the indicator measuring how the regression function is determined by the factor (explaining) variables X and a stochastic perturbation ε. The characteristic would evaluate the adequacy of the model or the degree of coherence between the estimated and actual values of Y.

At first glance it seems that the qualitative criteria of assessment could serve a sum of squares of deviations of the actual values from the dependent variable

ˆ

calculated by the estimated equation values Yi . However, this value depends on the

units of the dependent variable Y and the number of observations in the sample, so it isn’t good for evaluation.

The variance of the random variable Y in the sample can be measured using the variation (dispersion):

For analysis of the regression models will carry out the decomposition of this value into components (rule decomposition of the variance).

n

D(Y ) 1n (Yi Y )2

i 1

It is obvious that:

ˆ

ˆ

,

Yi Y (Y Yi ) (Yi Y )

ˆ

 

 

Yi Yi ei (graphical illustration is shown in figure 2.2).

Y

 

 

 

Yi

 

ei

 

ˆ

 

 

Yi

 

 

 

Y

 

 

X

Fig.2.1. Decomposition of deviations from the sample mean

ˆ

ei ,

 

ˆ

 

ˆ

ˆ

So as Yi Yi

D(Y ) D(Y

e) D(Y ) D(e) 2cov(Y ,e)

It is easy to check that

 

ˆ

. Then

we have the following equality,

cov(Y ,e) 0

called the decomposition rule variations:

ˆ

D(Y ) D(Y ) D(e) , (2.15) Here you can write a ratio:

n

 

 

 

 

n

ˆ

(Yi

Y )

2

 

 

(Yi

i 1

 

 

 

 

i 1

 

 

 

 

 

n

ˆ

 

 

Y )

2

 

2

. (2.16)

 

(Yi Yi )

 

 

 

 

 

i 1

 

 

 

The dispersion of actual values Yi around the average measured full sum of

23

24

n

squares TSS = (Yi Y )2 = nD(Y ) is the total deviation (variation).

i 1

 

 

 

 

 

 

n

 

 

 

2

 

ˆ

The amount of the ESS .=

ˆ

=

(Yi Y )

 

nD(Y ) defines the dispersion of

i 1

 

 

 

 

 

 

theoretical values around the middle and is called factorial deviation (variation). It is a measure of dispersion calculated (theoretical) values determined are included in the

equation independent variables Yˆi , so this deviation is called explained.

n

 

 

 

n

2

 

RSS =

(Y Y )2

= e

= nD(e) residual variance. This deviation cannot be

 

i

i

 

 

i 1

 

 

 

i 1

 

 

explained by the correlation dependence between Y and X, hence its name: "the unexplained", or residual variance. It measures the part of the scattering that occurs due to various random factors. Therefore, the closer RSS to zero, the less the actual Y

values deviate from calculated according to equation model values The relation (2.15) we write as

TSS=ESS+RSS.

Divide this value by TSS

TSSESS TSSRSS 1

ˆ

Y .

(2.17)

Value R 2 TSSESS is called the coefficient of determination (a measure of certainty). Otherwise you can also write: R 2 1 TSSRSS

R2 shows the proportion of common variation in the analyzed dependent variable caused by a change of the factor variables. For the case of simple linear regression R2 is equal to the square of the correlation coefficient of variables Y and X( RYX2 ).

Equation (2.17) shows that the lower RSS, the closer R2 to 1 and the better the model. In the General case, the numerical value of the determination coefficient is between zero and one: 0 R2 1.

If R2 = 1, the empirical values of Y are located on the regression line. If the coefficient of determination is equal to zero, then between X and Y there is no correlation and the regression line parallel to the X-axis. Thus, if there is a statistically significant linear relationship between the values X and Y, the coefficient of determination should be close to 1.

The aim of regression analysis is to explain the behavior of the dependent variable Y. In any given sample, Y is relatively low in some observations and relatively high in others. We want to know why. The variations in Y in any sample can be summarized by the sample variance. We should like to be able to account for the size of this variance.[1]

However, you should not place too much emphasis on high value R2, as the coefficient of determination can be close to 1 due to the fact that both values studed { and Y have a pronounced time trend, not associated with their causal dependence. In

24

25

the economy generally, such trend has aggregated indicators (GNP, GDP, income, etc). Therefore, the construction and evaluation of models for time series of volume indices value may be very close to 1, which does not necessarily indicate the presence of significant linear coddelation between the studied indicators.

If the regression equation is based on cross-data, the coefficient of determination can be low even when a quality of model is satisfactory due to the high variations between the individual elements, usually R2 does not exceed 0,7 in such situation. The same holds generally for regression time series, if they haven’t a significant trend. In macroeconomics examples of such dependencies are due to the relative, specific, growth indicators, for example: the dependence of the inflation rate from unemployment; accumulation rates on the value of the interest rate and other.

It is impossible to specify the exact boundary of acceptability for all case immediately. Researchers can be guided by the assessment of the link shown in the following table.

Table 2.1 The assessment of the link based on R2

Values

[0,1-0,3)

[0,3-0,5)

[0,5-0,7)

[0,7-0,9)

[0.9-0,99]

R2

 

 

 

 

 

Degree

weak

moderate

noticeable

high

very high

relationships

 

 

 

 

 

When researcher gets the value of R2 < 0,3, he must re-specification of the model.

In other cases, it is necessary to consider whether included in the model variables absolute or relative, do they have a time trend, sample size, etc.

Next step is assessment of compliance of the received data about the coefficient to determination coefficient in the General totality. It is advisable to check R2 for statistical significance. This process can be implemented by evaluating the statistical significance of the linear correlation coefficient between X and Y according to the scheme of t-testing of hypotheses especially for simple linear models. In this case, the standard error is calculated using the following formula:

SR

 

1 R2

 

n 2

 

 

 

 

2.8 EVALUATION OF THE ACCURACY OF THE MODEL

The actual values of the explained variable differ from the theoretical

(calculated according to model equation) in the value of ˆ . It is possible to

ei Yi Yi

calculate the value in each observation and name the approximation error. Deviations

ˆ

ei Yi Yi are absolute approximation error, but they are not comparable among

themselves (in different models because of mesurments). So, if one observation turned out to be error 5, and in another 10, this does not mean that in this case the first or second model gives the worst result. Therefore, in order scores were

25

26

comparable, considering the relationship of variance to actual values (in percent).

ˆ

 

can be both positive and negative value, when determining the

There ei Yi Yi

 

approximation errors for each observation variances are taken module.

 

 

 

ˆ

 

 

 

 

 

 

 

 

 

Value i =

 

 

Yi Yi

 

 

100 can be considered as a relative approximation error in

 

 

 

 

 

 

 

 

 

 

Yi

the i-th observation. It is useful to determine the average relative approximation error in order to have a General opinion about the accuracy of the model:

 

 

n

 

 

 

ˆ

 

 

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

Y Y

 

 

1

 

ei

 

 

 

 

 

 

 

i

 

i

 

100

 

 

 

 

 

100. (2.18)

 

 

 

 

 

 

n

 

 

 

Y

 

 

n

 

 

Y

 

 

 

 

i 1

 

 

i

 

 

 

 

i 1

 

i

 

 

 

It is possible to define approximation errors using the standard deviation:

 

n

 

1

ei2

 

i 1

 

Y

100 . (2.20)

n

The value of the approximation errors up to 15-20% allows to conclude about a sufficient the accuracy of the model.

Model improvements should not be measured in terms of accuracy gains. It may be going too far to say that accuracy is irrelevant, but caution is advised when using accuracy in the evaluation of predictive models.

2.9 POINT AND INTERVAL ESTIMATES OF THE DEPENDENT VARIABLE

One of the main tasks of econometric modeling is to predict values of the dependent variable for specific values of the explanatory variables.

Let the equation of the model defined by the sampling data has the form:

ˆ

a b X . . (2.21)

 

Y

 

Parameters a and b contain random errors. In the dependent variable

ˆ

Y ( X 0 )

(was found using equation model at some point) also contains random errors and, therefore, defines a conditional mean value of Y at the point X0 (the point estimate). Let denote it YX .

It can be shown that the variance of this quantity is calculated by the formula:

 

 

 

 

 

 

 

 

 

 

 

 

 

2

2

 

1

 

( X

0

 

X

)2

 

SYX

= DY ( X ) Se

 

 

 

 

 

 

 

 

 

(2.22)

 

n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

n

 

( Xi X )2

 

 

 

 

 

 

i 1

 

 

 

 

 

 

 

Provided

the

requirements of normal distribution of the residuals ε random

variable YX

has

a normal distribution, and statistics t

YX

M X

(Y )

has a t-

 

SYX

 

 

 

 

 

 

 

distribution with number degrees of freedom (n-2). Then for the conditional mathematical expectation M X (Y ) can be found confidence interval:

26

27

YX t SYX M X (Y ) YX t SYX , (2.23)

SYX SY2X is the standard error of the conditional average of the dependent

variable.

The formula (2.22) and (2.23) shows that the width of the confidence interval depends on the value X0: if X 0 X it is minimal. If the distance average and

analyzed X is growing, the average width of the confidence interval increases (figure 2.2).

Y

Confidence interval for M(YX(X0)

ˆ

a bX

Y

 

 

X0

x

X

Fig. 2.2 The confidence interval for explained variable

Figure 2.2 shows confidence region, which determines the location of the model the regression line, i.e. the conditional mathematical expectation, but doesn’t give separate values of the dependent variable, which may vary around the average.

Sometimes we can be more interested in finding confidence intervals for for some individual values of Y, that we associate with X0, than the average values of Y.

In a slightly different form, this problem can be formulated as: when receiving a new pair of observations (X0, Y0) to find out whether it is still based on equation

ˆ

formulated, i.e. equals Y0 to Y0 ( value obtained by substitution in the equation of the model X0).

ˆ

. We formulate two hypotheses:

Let's consider the value z Y0 Y0

H0 : z = 0; H1 : z ≠ 0

It is possible to calculate that the estimated variance of the magnitude of z is

calculated by the formula:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

1

 

( X

0

 

X

)2

 

 

 

 

 

 

S 2z = Se 1

 

 

 

 

 

 

 

 

 

 

 

(2.24)

 

 

n

n

 

 

 

 

 

 

 

 

 

 

 

 

( Xi X )

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i 1

 

 

 

 

 

 

 

 

 

 

 

 

Because the variable z Y0

 

 

 

ˆ

 

Y0 is a linear combination of normally distributed

variables, it

also

has

a

normal

 

distribution. Therefore, the value t

z

has a t-

 

 

Sz

distribution with (n-2) degrees of freedom.

27

28

If the calculated value of t-statistic is more than the tabblated, the null hypothesis is rejected, i.e. with the selected level of confidence we can assert that the

ˆ

value Y0 is statistically significantly (significantly) different from the values Y0 found

using equation model and the considered pair (X0, Y0) does not correspond to the relationship

For individual values of the variable Y can be constructed confidence interval:

ˆ

t

sz Y0

ˆ

t

sz . (2.25)

Y0

Y0

Obviously, this interval is wider than for conventional medium (X0) with the same level of trust and includes the confidence interval for the conditional mean.

2.10 THE USE OF ECONOMETRIC MODELS FOR FORECASTING The forecasting process, based on an econometric model, breaks down into the

following stages:

 

• selection and building of model;

 

• evaluation of the constructed model;

 

• forecast (point and interval).

 

 

 

 

 

To obtain point forecasts put the analyzed value X0

in the equation model and

find Y ( X 0 ) . This is the point forecast.

 

However, the probability of Y is the found point

ˆ

Y0 is almost zero, therefore

there is a need for projections of a "plug" through confidence intervals - interval forecast. Interval forecast can be constructed for average and individual values.

The analyzed value X 0 may lie both within sample and out of it. At the same time, if X 0 is outside the sample and very different from the average level, width of

the confidence interval increases significantly. This demonstrates the vagueness of the forecast.

Data (based on the forecast) should be critically conceptualized with a meaningful point of view.

2.11 THE EXAMPLE OF CONSTRUCTION AND QUALITY ASSURANCE SIMPLE REGRESSION MODEL

It is known a statistical data (table. 2.3) about the change of the output Y in leading to changes in costs of labor X.

Assignments:

1.To build a model of the relationship between these indicators (X and Y).

2.To check the quality of the constructed model

A.To check the overall quality built model

B.To assess the statistical significance of the model parameters. 3. To estimate the parameters of the true model.

4. To assess the accuracy of the model.

5. To construct point and interval forecast for the cost of labor in the amount of

28

29

39 thousand rubles

Table 2.3 Output and labor coast.

 

Output

20

22

25

28

30

34

35

44

50

56

 

Costs of

8

10

18

22

34

46

50

54

58

60

 

labor

 

 

 

 

 

 

 

 

 

 

1. Model building

 

 

 

 

 

 

 

 

 

 

Figure 2.3 Flowchart the change of the output Y in leading to changes in costs of labor X.

Based on the analysis of the the studied indicators there is a

flowchart (Fig. 2.3) we can suppose that between linear relationship: Y X . Estimate the

parameters of this model based on the method of least squares. The estimated equation models: Y X

Make the table to calculate the parameters and characteristics of the model. Table 2.4 Supporting calculations

 

 

 

2

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

N

X

Y

X

XY

Y

 

 

)2

(Y Y )2

e

( X

X

)

 

е

 

 

ei

 

 

(Y Y

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

у

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

i

 

1

8

20

64

160

19

219

256

 

1

762

 

1

 

0,1

 

2

10

22

100

220

20

164

219

 

4

655

 

2

 

0,1

 

3

18

25

324

450

25

96

 

100

 

0

310

 

0

 

0,0

 

4

22

28

484

616

27

46

 

58

 

1

185

 

1

 

0,0

 

5

34

30

1156

1020

34

23

 

0

 

 

19

3

 

 

 

-4

 

0,1

 

6

46

34

2116

1564

42

1

 

46

 

58

108

 

-8

 

0,2

 

7

50

35

2500

1750

44

0

 

85

 

81

207

 

-9

 

0,3

 

8

54

44

2916

2376

46

85

 

135

 

6

339

 

-2

 

0,1

 

9

58

50

3364

2900

49

231

196

 

1

502

 

1

 

0,0

 

 

 

 

 

 

 

29

 

 

 

 

 

 

 

 

 

 

 

 

 

30

10

56

60

3136

3360

48

635

164

154

416

12

0,2

 

356

348

16160

14416

354

TSS=

ESS=

RSS=

3486

 

1,1

 

 

 

 

 

 

1500

1258

325

 

 

 

 

 

 

 

 

 

 

ср.

36

35

1616

1442

35

 

 

 

 

 

 

We write the system of normal equations and find the solution.

 

 

 

 

 

 

 

a 36 b 35

 

 

 

 

 

 

a 1616 b 1442

 

 

 

 

 

36

 

 

 

 

 

 

 

 

 

 

 

 

 

 

b

 

XY

 

X

Y

 

 

1442 35* 36

0,6 ;

 

 

 

 

 

 

 

 

 

 

1616 362

X 2 (

 

 

 

)2

 

 

 

X

 

 

a 35 0,6 * 36 14

ˆ

14+0,6*X.

Received the following equation model: Y

2. Quality equation regression models

A. To evaluate the overall quality of the model (adequacy).

Calculate the coefficient of determination.

R

2

ESS

=

1258

0,8

 

indicates a high degree of correlation between Y and X.

 

 

 

 

 

 

 

 

TSS

1500

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Will

 

 

check

 

on

the

 

statistical

significance

 

of

the correlation

coefficient

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

R= R

 

R2 . .

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Find the theoretical value of t-test at 95% confidence level and the number of

degrees of freedom

n 2 10 2 8 from table of t - distribution (see Appendix) :

=2,306.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Find the estimated value of a statistical t-test:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 R 2

 

 

 

 

 

 

 

 

 

 

 

 

t

 

 

 

R

 

 

,

S

 

 

 

 

 

1 0,08

0,34 then t

 

 

 

0,8

2,63, more

than the

R

 

 

R

 

 

 

 

 

R

 

 

 

 

 

 

 

n 2

 

 

 

 

 

 

 

 

S R

 

 

 

 

 

 

8

 

 

0,34

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

tabulated value of t-statistic.

Therefore, the linear correlation coefficient and determination coefficient of the linear model are statistically significant. Since the coefficient of determination is characterized by high connectivity factor and dependent variables, the model can be considered as adequate.

B. Checking the statistical significance of the model parameters.

Since the model building is based on the sample data, it is necessary to test the statistical significance of the model parameters a and b.

For parameter b: S 2

 

Se2

 

40,6

0,01;

n

 

 

b

 

 

 

3486

 

 

(xi

x)2

 

i 1

30

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]