Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 49 50 51 52 53 54 55 56 57 58 59 6061 / 17361 62 63 64 65 66 67 68 69 70 71 72 73 > Следующая >>>

					OLS regression					175
1	2	3	4	5	6	7	8	9	10	11
2.42	0.97	0.52	0.07	-0.38	-0.83	-1.28	-1.73	-1.18	-1.63	-1.08
12	13	14	15
-0.53	0.02	1.57	3.12

>plot(women$height,women$weight, xlab="Height (in inches)", ylab="Weight (in pounds)")

>abline(fit)

From the output, you see that the prediction equation is

Weight = − 87.52 + 3.45 × Height

Because a height of 0 is impossible, you wouldn’t try to give a physical interpretation to the intercept. It merely becomes an adjustment constant. From the Pr(>|t|) column, you see that the regression coefficient (3.45) is significantly different from zero (p < 0.001) and indicates that there’s an expected increase of 3.45 pounds of weight for every 1 inch increase in height. The multiple R-squared (0.991) indicates that the model accounts for 99.1% of the variance in weights. The multiple R-squared is also the squared correlation between the actual and predicted value (that is, R2 = ryy). The residual standard error (1.53 pounds) can be thought of as the average error in predicting weight from height using this model. The F statistic tests whether the predictor variables, taken together, predict the response variable above chance levels. Because there’s only one predictor variable in simple regression, in this example the F test is equivalent to the t-test for the regression coefficient for height.

For demonstration purposes, we’ve printed out the actual, predicted, and residual values. Evidently, the largest residuals occur for low and high heights, which can also be seen in the plot (figure 8.1).

The plot suggests that you might be able to improve on the prediction by using a line with one bend. For example, a model of the form Yi = β0 + β1X + β2X2 may provide a better fit to the data. Polynomial regression allows you to predict a response variable from an explanatory variable, where the form of the relationship is an nthdegree polynomial.

8.2.3Polynomial regression

The plot in figure 8.1 suggests that you might be able to improve your prediction using a regression with a quadratic term (that is, X 2). You can fit a quadratic equation using the statement

fit2 <- lm(weight ~ height + I(height^2), data=women)

The new term I(height^2) requires explanation. height^2 adds a height-squared term to the prediction equation. The I function treats the contents within the parentheses as an R regular expression. You need this because the ^ operator has a special meaning in formulas that you don’t want to invoke here (see table 8.2).

176	CHAPTER 8 Regression

The following listing shows the results of fitting the quadratic equation.

Listing 8.2 Polynomial regression

>fit2 <- lm(weight ~ height + I(height^2), data=women)

>summary(fit2)

Call:

lm(formula=weight ~ height + I(height^2), data=women)

Residuals:
Min	1Q	Median	3Q		Max
-0.5094 -0.2961 -0.0094			0.2862	0.5971
Coefficients:
	Estimate Std. Error			t	value Pr(>\|t\|)
(Intercept) 261.87818			25.19677		10.39	2.4e-07 ***
height	-7.34832		0.77769		-9.45	6.6e-07	***
I(height^2)	0.08306		0.00598		13.89	9.3e-09	***
---
Signif. codes:		0 '*' 0.001 ''			0.01	'*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.384 on 12 degrees of freedom Multiple R-squared: 0.999, Adjusted R-squared: 0.999 F-statistic: 1.14e+04 on 2 and 12 DF, p-value: <2e-16

>plot(women$height,women$weight, xlab="Height (in inches)", ylab="Weight (in lbs)")

>lines(women$height,fitted(fit2))

From this new analysis, the prediction equation is

^	− 7 . 3 5	× Height + 0.083 × Height2
Weight = 261.88

and both regression coefficients are significant at the p < 0.0001 level. The amount of variance accounted for has increased to 99.9%. The significance of the squared term (t = 13.89, p < .001) suggests that inclusion of the quadratic term improves the model fit. If you look at the plot of fit2 (figure 8.2) you can see that the curve does indeed provide a better fit.

Figure 8.2 Quadratic regression for weight predicted by height

	160
	150
Weight (in lbs)	140
	130
	120

Height (in inches)

OLS regression

177

Linear vs. nonlinear models

Note that this polynomial equation still fits under the rubric of linear regression. It’s linear because the equation involves a weighted sum of predictor variables (height and height-squared in this case). Even a model such as

Yi = β0 × logX1 + β2 × sinX2

would be considered a linear model (linear in terms of the parameters) and fit with the formula

Y ~ log(X1) + sin(X2)

In contrast, here’s an example of a truly nonlinear model:

Yi = β0 + β1 ex/β2

Nonlinear models of this form can be fit with the nls() function.

In general, an nth-degree polynomial produces a curve with n-1 bends. To fit a cubic polynomial, you’d use

fit3 <- lm(weight ~ height + I(height^2) +I(height^3), data=women)

Although higher polynomials are possible, I’ve rarely found that terms higher than cubic are necessary.

Before we move on, I should mention that the scatterplot() function in the car package provides a simple and convenient method of plotting a bivariate relationship. The code

library(car)
scatterplot(weight ~ height, data=women,
spread=FALSE, smoother.args=list(lty=2), pch=19,
main="Women Age 30-39",
xlab="Height (inches)",	Women Age 30−39

ylab="Weight (lbs.)")

produces the graph in figure 8.3.

Figure 8.3 Scatter plot of height by weight, with linear and smoothed fits, and marginal box plots

	160
	150
(lbs.)	140
Weight	130
	120
	58	60	62	64	66	68	70	72

Height (inches)

178	CHAPTER 8 Regression

This enhanced plot provides the scatter plot of weight with height, box plots for each variable in their respective margins, the linear line of best fit, and a smoothed (loess) fit line. The spread=FALSE options suppress spread and asymmetry information. The smoother.args=list(lty=2)option specifies that the loess fit be rendered as a dashed line. The pch=19 options display points as filled circles (the default is open circles). You can tell at a glance that the two variables are roughly symmetrical and that a curved line will fit the data points better than a straight line.

8.2.4Multiple linear regression

When there’s more than one predictor variable, simple linear regression becomes multiple linear regression, and the analysis grows more involved. Technically, polynomial regression is a special case of multiple regression. Quadratic regression has two

predictors (X and X 2), and cubic regression has three predictors (X, X 2, and X 3). Let’s look at a more general example.

We’ll use the state.x77 dataset in the base package for this example. Suppose you want to explore the relationship between a state’s murder rate and other characteristics of the state, including population, illiteracy rate, average income, and frost levels (mean number of days below freezing).

Because the lm() function requires a data frame (and the state.x77 dataset is contained in a matrix), you can simplify your life with the following code:

states <- as.data.frame(state.x77[,c("Murder", "Population", "Illiteracy", "Income", "Frost")])

This code creates a data frame called states, containing the variables you’re interested in. You’ll use this new data frame for the remainder of the chapter.

A good first step in multiple regression is to examine the relationships among the variables two at a time. The bivariate correlations are provided by the cor() function, and scatter plots are generated from the scatterplotMatrix() function in the car package (see the following listing and figure 8.4).

Listing 8.3 Examining bivariate relationships

> states <- as.data.frame(state.x77[,c("Murder", "Population", "Illiteracy", "Income", "Frost")])

> cor(states)
	Murder Population Illiteracy Income Frost
Murder	1.00	0.34	0.70	-0.23 -0.54
Population	0.34	1.00	0.11	0.21	-0.33
Illiteracy	0.70	0.11	1.00	-0.44	-0.67
Income	-0.23	0.21	-0.44	1.00	0.23
Frost	-0.54	-0.33	-0.67	0.23	1.00

>library(car)

>scatterplotMatrix(states, spread=FALSE, smoother.args=list(lty=2), main="Scatter Plot Matrix")

OLS regression

179

Murder

20000

10000

6000

4500

3000

Scatterplot matrix

0	10000	20000
Population

3000

4500

6000

Illiteracy

Income

0.5

1.5

2.5

1.5

0.5

Frost

0 50 100

Figure 8.4 Scatter plot matrix of dependent and independent variables for the states data, including linear and smoothed fits, and marginal distributions (kernel-density plots and rug plots)

By default, the scatterplotMatrix() function provides scatter plots of the variables with each other in the off-diagonals and superimposes smoothed (loess) and linear fit lines on these plots. The principal diagonal contains density and rug plots for each variable.

You can see that murder rate may be bimodal and that each of the predictor variables is skewed to some extent. Murder rates rise with population and illiteracy, and they fall with higher income levels and frost. At the same time, colder states have lower illiteracy rates and population and higher incomes.

Now let’s fit the multiple regression model with the lm() function (see the following listing).

Listing 8.4 Multiple linear regression

> states <- as.data.frame(state.x77[,c("Murder", "Population", "Illiteracy", "Income", "Frost")])

> fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data=states)

> summary(fit)

<<< < Предыдущая 49 50 51 52 53 54 55 56 57 58 59 6061 / 17361 62 63 64 65 66 67 68 69 70 71 72 73 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx