Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 47 48 49 50 51 52 53 54 55 56 57 5859 / 17359 60 61 62 63 64 65 66 67 68 69 70 71 > Следующая >>>

168	CHAPTER 8 Regression

From a theoretical point of view, the analysis will help answer such questions as these:

■What’s the relationship between exercise duration and calories burned? Is it linear or curvilinear? For example, does exercise have less impact on the number of calories burned after a certain point?

■How does effort (the percentage of time at the target heart rate, the average walking speed) factor in?

■Are these relationships the same for young and old, male and female, heavy and slim?

From a practical point of view, the analysis will help answer such questions as the following:

■How many calories can a 30-year-old man with a BMI of 28.7 expect to burn if he walks for 45 minutes at an average speed of 4 miles per hour and stays within his target heart rate 80% of the time?

■What’s the minimum number of variables you need to collect in order to accurately predict the number of calories a person will burn when walking?

■How accurate will your prediction tend to be?

Because regression analysis plays such a central role in modern statistics, we’ll cover it in some depth in this chapter. First, we’ll look at how to fit and interpret regression models. Next, we’ll review a set of techniques for identifying potential problems with these models and how to deal with them. Third, we’ll explore the issue of variable selection. Of all the potential predictor variables available, how do you decide which ones to include in your final model? Fourth, we’ll address the question of generalizability. How well will your model work when you apply it in the real world? Finally, we’ll consider relative importance. Of all the predictors in your model, which one is the most important, the second most important, and the least important?

As you can see, we’re covering a lot of ground. Effective regression analysis is an interactive, holistic process with many steps, and it involves more than a little skill. Rather than break it up into multiple chapters, I’ve opted to present this topic in a single chapter in order to capture this flavor. As a result, this will be the longest and most involved chapter in the book. Stick with it to the end, and you’ll have all the tools you need to tackle a wide variety of research questions. Promise!

8.1The many faces of regression

The term regression can be confusing because there are so many specialized varieties (see table 8.1). In addition, R has powerful and comprehensive features for fitting regression models, and the abundance of options can be confusing as well. For example, in 2005, Vito Ricci created a list of more than 205 functions in R that are used to generate regression analyses (http://mng.bz/NJhu).

	The many faces of regression	169
Table 8.1 Varieties of regression analysis

Type of regression	Typical use

Simple linear	Predicting a quantitative response variable from a quantitative explanatory
	variable.
Polynomial	Predicting a quantitative response variable from a quantitative explanatory
	variable, where the relationship is modeled as an nth order polynomial.
Multiple linear	Predicting a quantitative response variable from two or more explanatory
	variables.
Multilevel	Predicting a response variable from data that have a hierarchical structure
	(for example, students within classrooms within schools). Also called hier-
	archical, nested, or mixed models.
Multivariate	Predicting more than one response variable from one or more explanatory
	variables.
Logistic	Predicting a categorical response variable from one or more explanatory
	variables.
Poisson	Predicting a response variable representing counts from one or more
	explanatory variables.
Cox proportional hazards	Predicting time to an event (death, failure, relapse) from one or more
	explanatory variables.
Time-series	Modeling time-series data with correlated errors.
Nonlinear	Predicting a quantitative response variable from one or more explanatory
	variables, where the form of the model is nonlinear.
Nonparametric	Predicting a quantitative response variable from one or more explanatory
	variables, where the form of the model is derived from the data and not
	specified a priori.
Robust	Predicting a quantitative response variable from one or more explanatory
	variables using an approach that’s resistant to the effect of influential
	observations.

In this chapter, we’ll focus on regression methods that fall under the rubric of ordinary least squares (OLS) regression, including simple linear regression, polynomial regression, and multiple linear regression. OLS regression is the most common variety of statistical analysis today. Other types of regression models (including logistic regression and Poisson regression) will be covered in chapter 13.

8.1.1Scenarios for using OLS regression

In OLS regression, a quantitative dependent variable is predicted from a weighted sum of predictor variables, where the weights are parameters estimated from the data. Let’s take a look at a concrete example (no pun intended), loosely adapted from Fwa (2006).

170	CHAPTER 8 Regression

An engineer wants to identify the most important factors related to bridge deterioration (such as age, traffic volume, bridge design, construction materials and methods, construction quality, and weather conditions) and determine the mathematical form of these relationships. She collects data on each of these variables from a representative sample of bridges and models the data using OLS regression.

The approach is highly interactive. She fits a series of models, checks their compliance with underlying statistical assumptions, explores any unexpected or aberrant findings, and finally chooses the “best” model from among many possible models. If successful, the results will help her to

■Focus on important variables, by determining which of the many collected variables are useful in predicting bridge deterioration, along with their relative importance.

■Look for bridges that are likely to be in trouble, by providing an equation that can be used to predict bridge deterioration for new cases (where the values of the predictor variables are known, but the degree of bridge deterioration isn’t).

■Take advantage of serendipity, by identifying unusual bridges. If she finds that some bridges deteriorate much faster or slower than predicted by the model, a study of these outliers may yield important findings that could help her to understand the mechanisms involved in bridge deterioration.

Bridges may hold no interest for you. I’m a clinical psychologist and statistician, and I know next to nothing about civil engineering. But the general principles apply to an amazingly wide selection of problems in the physical, biological, and social sciences. Each of the following questions could also be addressed using an OLS approach:

■What’s the relationship between surface stream salinity and paved road surface area (Montgomery, 2007)?

■What aspects of a user’s experience contribute to the overuse of massively multiplayer online role playing games (MMORPGs) (Hsu, Wen, & Wu, 2009)?

■Which qualities of an educational environment are most strongly related to higher student achievement scores?

■What’s the form of the relationship between blood pressure, salt intake, and age? Is it the same for men and women?

■What’s the impact of stadiums and professional sports on metropolitan area development (Baade & Dye, 1990)?

■What factors account for interstate differences in the price of beer (Culbertson & Bradford, 1991)? (That one got your attention!)

Our primary limitation is our ability to formulate an interesting question, devise a useful response variable to measure, and gather appropriate data.

8.1.2What you need to know

For the remainder of this chapter, I’ll describe how to use R functions to fit OLS regression models, evaluate the fit, test assumptions, and select among competing

<<< < Предыдущая 47 48 49 50 51 52 53 54 55 56 57 5859 / 17359 60 61 62 63 64 65 66 67 68 69 70 71 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx