Assignment #5 (2 points)
Open the database SLEEP.RAW.
Estimate the following regression. Interpret R2 , the statistical inference of coefficients and the coefficients themselves.
R2=0.1228, which means that the duration of sleep on 12% is explained by the factors of duration of work, education (years of schooling), age, age^2, number of kids and male and on 78 % is explained by the other factors which we didn’t include in the model.
Coefficients near totwrk and male are statistically significant at 5% and 1% significance level; coefficient near education is significant only at 5 % level; coefficients near age, agesq and yngkid are insignificant.
Interpretation:
B1: each additional minute worked per week decreases the duration of sleep per week by 0,16 minutes.
B2: each additional year of schooling decreases the duration of sleep per week by 11,71 minutes.
B3: each additional year of age decreases the duration of sleep per week week by 8,7 minutes.
B4: agesq increases sleep until some critical value and after it decreases.
B5: if number of kids increases by 1 it will decrease minutes, which observed man sleep at night per week by 0,023 minutes.
B6: duration of man’s sleep per week is 87,75 minutes more than duration of woman’s sleep per week.
b0: in case of zero amount of work, education etc., the amount of minutes, which observed person sleep at night per week is equal to 3840 minutes.
Do a visual analysis of heteroscedasticity? What can you say just from the graph?
Just from the graph we can suspect for homoscedasticity, as for all fitted values corresponds approximately the same variance of residuals.
Apply full White test for heteroscedasticity (6) in handouts for seminar). What would you conclude?
predict e,residuals
gen totwrk2= totwrk^2
gen educ2= educ^2
gen agesq2= agesq^2
/we omit yngkid2 and male2 because of dummy, and age2 because of multicollinearity/
gen totwrk_educ = totwrk*educ
gen totwrk_age = totwrk*age
gen totwrk_agesq = totwrk*agesq
gen totwrk_yngkid= totwrk*yngkid
gen totwrk_male = totwrk* male
gen educ_age = educ*age
gen educ_agesq = educ*agesq
gen educ_yngkid = educ* yngkid
gen educ_male = educ*male
gen age_agesq = age*agesq
gen age_yngkid=age* yngkid
gen age_male = age*male
gen agesq_yngkid=agesq*yngkid
gen agesq_male=agesq*male
gen yngkid_male=yngkid*male
H0: we have homoscedasticity
To check this hypothesis we should compare nR2 and critical value.
nR2=
. display 706*0.0226
15.9556
Critical value we can calculate in Excel (Chi2).
36,415 > 22,592, so we accept the H0 about homoskedasticity.
Try the STATA version of White test using the commands.
p-value = 0,1477, we accept the H0: homoscedasticity, so there is no heteroskedasticity.
Apply Breusch-Pagan test for heteroscedasticity. How strong is the evidence of heteroscedasticity with this test?
p-value = 0,1515, so we accept H0, which means that there isn’t problem of heteroscedasticity.
What are the ways to remedy the problem of heteroscedasticity?
1. Redefing the variables
in order to reduce the variance of observations with extreme values ;
e.g. by taking logarithms or by scaling some variables.
2. Weighted Least Squares (WLS).
3. Heteroscedasticity-corrected robust standard errors.
Apply one of them. What would you conclude, did the model improve or not?
The easiest way would be to take weighted least squares regression, or regression with robust errors.
So, the model didn’t change (improve)? Because we didn’t have the problem of heteroscedasticity from the very beginning.
