Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Kleiber - Applied econometrics in R

.pdf
Скачиваний:
46
Добавлен:
02.06.2015
Размер:
4.41 Mб
Скачать

Use R!

Advisors:

Robert Gentleman ·Kurt Hornik ·Giovanni Parmigiani

Use R!

Albert: Bayesian Computation with R Bivand/Pebesma/Gomez´-Rubio: Applied Spatial Data Analysis with R Claude: Morphometrics with R

Cook/Swayne: Interactive and Dynamic Graphics for Data Analysis: With R and GGobi

Hahne/Huber/Gentleman/Falcon: Bioconductor Case Studies

Kleiber/Zeileis, Applied Econometrics with R Nason: Wavelet Methods in Statistics with R

Paradis: Analysis of Phylogenetics and Evolution with R

Peng/Dominici: Statistical Methods for Environmental Epidemiology with R: A Case Study in Air Pollution and Health

Pfaff: Analysis of Integrated and Cointegrated Time Series with R, 2nd edition Sarkar: Lattice: Multivariate Data Visualization with R

Spector: Data Manipulation with R

Christian Kleiber · Achim Zeileis

Applied Econometrics with R

123

Christian Kleiber

Achim Zeileis

Universit¨at Basel

Wirtschaftsuniversit¨at Wien

WWZ, Department of Statistics and Econometrics

Department of Statistics and Mathematics

Petersgraben 51

Augasse 2–6

CH-4051 Basel

A-1090 Wien

Switzerland

Austria

Christian.Kleiber@unibas.ch

Achim.Zeileis@wu-wien.ac.at

Series Editors

 

Robert Gentleman

Kurt Hornik

Program in Computational Biology

Department of Statistics and Mathematics

Division of Public Health Sciences

Wirtschaftsuniversit¨at Wien

Fred Hutchinson Cancer Research Center

Augasse 2–6

1100 Fairview Avenue N., M2-B876

A-1090 Wien

PO Box 19024, Seattle, Washington 98102-1024

Austria

USA

 

Giovanni Parmigiani

 

The Sidney Kimmel Comprehensive Cancer Center

 

at Johns Hopkins University

 

550 North Broadway

 

Baltimore, MD 21205-2011

 

USA

 

ISBN: 978-0-387-77316-2

e-ISBN: 978-0-387-77318-6

DOI: 10.1007/978-0-387-77318-6

 

Library of Congress Control Number: 2008934356

c 2008 Springer Science+Business Media, LLC

All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed on acid-free paper

springer.com

Preface

R is a language and environment for data analysis and graphics. It may be considered an implementation of S, an award-winning language initially developed at Bell Laboratories since the late 1970s. The R project was initiated by Robert Gentleman and Ross Ihaka at the University of Auckland, New Zealand, in the early 1990s, and has been developed by an international team since mid-1997.

Historically, econometricians have favored other computing environments, some of which have fallen by the wayside, and also a variety of packages with canned routines. We believe that R has great potential in econometrics, both for research and for teaching. There are at least three reasons for this: (1) R is mostly platform independent and runs on Microsoft Windows, the Mac family of operating systems, and various flavors of Unix/Linux, and also on some more exotic platforms. (2) R is free software that can be downloaded and installed at no cost from a family of mirror sites around the globe, the Comprehensive R Archive Network (CRAN); hence students can easily install it on their own machines. (3) R is open-source software, so that the full source code is available and can be inspected to understand what it really does, learn from it, and modify and extend it. We also like to think that platform independence and the open-source philosophy make R an ideal environment for reproducible econometric research.

This book provides an introduction to econometric computing with R; it is not an econometrics textbook. Preferably readers have taken an introductory econometrics course before but not necessarily one that makes heavy use of matrices. However, we do assume that readers are somewhat familiar with matrix notation, specifically matrix representations of regression models. Thus, we hope the book might be suitable as a “second book” for a course with su cient emphasis on applications and practical issues at the intermediate or beginning graduate level. It is hoped that it will also be useful to professional economists and econometricians who wish to learn R. We cover linear regression models for cross-section and time series data as well as the common nonlinear models of microeconometrics, such as logit, probit, and tobit

vi Preface

models, as well as regression models for count data. In addition, we provide a chapter on programming, including simulations, optimization, and an introduction to Sweave()—an environment that allows integration of text and code in a single document, thereby greatly facilitating reproducible research. (In fact, the entire book was written using Sweave() technology.)

We feel that students should be introduced to challenging data sets as early as possible. We therefore use a number of data sets from the data archives of leading applied econometrics journals such as the Journal of Applied Econometrics and the Journal of Business & Economic Statistics. Some of these have been used in recent textbooks, among them Baltagi (2002), Davidson and MacKinnon (2004), Greene (2003), Stock and Watson (2007), and Verbeek (2004). In addition, we provide all further data sets from Baltagi (2002), Franses (1998), Greene (2003), and Stock and Watson (2007), as well as selected data sets from additional sources, in an R package called AER that accompanies this book. It is available from the CRAN servers at http://CRAN.R-project.org/ and also contains all the code used in the following chapters. These data sets are suitable for illustrating a wide variety of topics, among them wage equations, growth regressions, dynamic regressions and time series models, hedonic regressions, the demand for health care, or labor force participation, to mention a few.

In our view, applied econometrics su ers from an underuse of graphics— one of the strengths of the R system for statistical computing and graphics. Therefore, we decided to make liberal use of graphical displays throughout, some of which are perhaps not well known.

The publisher asked for a compact treatment; however, the fact that R has been mainly developed by statisticians forces us to briefly discuss a number of statistical concepts that are not widely used among econometricians, for historical reasons, including factors and generalized linear models, the latter in connection with microeconometrics. We also provide a chapter on R basics (notably data structures, graphics, and basic aspects of programming) to keep the book self-contained.

The production of the book

The entire book was typeset by the authors using LATEX and R’s Sweave() tools. Specifically, the final manuscript was compiled using R version 2.7.0, AER version 0.9-0, and the most current version (as of 2008-05-28) of all other CRAN packages that AER depends on (or suggests). The first author started under Microsoft Windows XP Pro, but thanks to a case of theft he switched to Mac OS X along the way. The second author used Debian GNU/Linux throughout. Thus, we can confidently assert that the book is fully reproducible, for the version given above, on the most important (single-user) platforms.

Preface vii

Settings and appearance

R is mainly run at its default settings; however, we found it convenient to employ a few minor modifications invoked by

R> options(prompt="R> ", digits=4, show.signif.stars=FALSE)

This replaces the standard R prompt > by the more evocative R>. For compactness, digits = 4 reduces the number of digits shown when printing numbers from the default of 7. Note that this does not reduce the precision with which these numbers are internally processed and stored. In addition, R by default displays one to three stars to indicate the significance of p values in model summaries at conventional levels. This is disabled by setting show.signif.stars

= FALSE.

Typographical conventions

We use a typewriter font for all code; additionally, function names are followed by parentheses, as in plot(), and class names (a concept that is explained in Chapters 1 and 2) are displayed as in “lm”. Furthermore, boldface is used for package names, as in AER.

Acknowledgments

This book would not exist without R itself, and thus we thank the R Development Core Team for their continuing e orts to provide an outstanding piece of open-source software, as well as all the R users and developers supporting these e orts. In particular, we are indebted to all those R package authors whose packages we employ in the course of this book.

Several anonymous reviewers provided valuable feedback on earlier drafts. In addition, we are grateful to Rob J. Hyndman, Roger Koenker, and Je rey S. Racine for particularly detailed comments and helpful discussions. On the technical side, we are indebted to Torsten Hothorn and Uwe Ligges for advice on and infrastructure for automated production of the book. Regarding the accompanying package AER, we are grateful to Badi H. Baltagi, Philip Hans Franses, William H. Greene, James H. Stock, and Mark W. Watson for permitting us to include all the data sets from their textbooks (namely Baltagi 2002; Franses 1998; Greene 2003; Stock and Watson 2007). We also thank Inga Diedenhofen and Markus Hertrich for preparing some of these data in R format. Finally, we thank John Kimmel, our editor at Springer, for his patience and encouragement in guiding the preparation and production of this book. Needless to say, we are responsible for the remaining shortcomings.

May, 2008

Christian Kleiber, Basel

 

Achim Zeileis, Wien

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 An Introductory R Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Working with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5 The Development Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 A Brief History of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 R as a Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 R as a Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5 Data Management in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.6 Object Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.7 R Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.8 Exploratory Data Analysis with R . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Linear Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3 Partially Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4 Factors, Interactions, and Weights . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.5 Linear Regression with Time Series Data . . . . . . . . . . . . . . . . . . . 79 3.6 Linear Regression with Panel Data . . . . . . . . . . . . . . . . . . . . . . . . 84 3.7 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

x Contents

4 Diagnostics and Alternative Methods of Regression . . . . . . . . 93

4.1 Regression Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.2 Diagnostic Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.3 Robust Standard Errors and Tests . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.4 Resistant Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.5 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5 Models of Microeconometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.1 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.2 Binary Dependent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.3 Regression Models for Count Data . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.4 Censored Dependent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6.1 Infrastructure and “Naive” Methods . . . . . . . . . . . . . . . . . . . . . . . . 151

6.2 Classical Model-Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

6.3 Stationarity, Unit Roots, and Cointegration . . . . . . . . . . . . . . . . . 164

6.4 Time Series Regression and Structural Change . . . . . . . . . . . . . . 169

6.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

7 Programming Your Own Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 183

7.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

7.2 Bootstrapping a Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . 189

7.3 Maximizing a Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

7.4 Reproducible Econometrics Using Sweave() . . . . . . . . . . . . . . . . 194

7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

1

Introduction

This brief chapter, apart from providing two introductory examples on fitting regression models, outlines some basic features of R, including its help facilities and the development model. For the interested reader, the final section briefly outlines the history of R.

1.1 An Introductory R Session

For a first impression of R’s “look and feel”, we provide an introductory R session in which we briefly analyze two data sets. This should serve as an illustration of how basic tasks can be performed and how the operations employed are generalized and modified for more advanced applications. We realize that not every detail will be fully transparent at this stage, but these examples should help to give a first impression of R’s functionality and syntax. Explanations regarding all technical details are deferred to subsequent chapters, where more complete analyses are provided.

Example 1: The demand for economics journals

We begin with a small data set taken from Stock and Watson (2007) that provides information on the number of library subscriptions to economic journals in the United States of America in the year 2000. The data set, originally collected by Bergstrom (2001), is available in package AER under the name Journals. It can be loaded via

R> data("Journals", package = "AER")

The commands

R> dim(Journals)

[1] 180 10

R> names(Journals)

C. Kleiber, A. Zeileis, Applied Econometrics with R,

DOI: 10.1007/978-0-387-77318-6 1, © Springer Science+Business Media, LLC 2008

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]