Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Kleiber - Applied econometrics in R

.pdf

Скачиваний:

Добавлен:

02.06.2015

Размер:

4.41 Mб

Скачать

☆

1 / 231 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 > Следующая >>>

Use R!

Advisors:

Robert Gentleman ·Kurt Hornik ·Giovanni Parmigiani

Use R!

Albert: Bayesian Computation with R Bivand/Pebesma/Gomez´-Rubio: Applied Spatial Data Analysis with R Claude: Morphometrics with R

Cook/Swayne: Interactive and Dynamic Graphics for Data Analysis: With R and GGobi

Hahne/Huber/Gentleman/Falcon: Bioconductor Case Studies

Kleiber/Zeileis, Applied Econometrics with R Nason: Wavelet Methods in Statistics with R

Paradis: Analysis of Phylogenetics and Evolution with R

Peng/Dominici: Statistical Methods for Environmental Epidemiology with R: A Case Study in Air Pollution and Health

Pfaff: Analysis of Integrated and Cointegrated Time Series with R, 2nd edition Sarkar: Lattice: Multivariate Data Visualization with R

Spector: Data Manipulation with R

Christian Kleiber · Achim Zeileis

Applied Econometrics with R

123

Christian Kleiber	Achim Zeileis
Universit¨at Basel	Wirtschaftsuniversit¨at Wien
WWZ, Department of Statistics and Econometrics	Department of Statistics and Mathematics
Petersgraben 51	Augasse 2–6
CH-4051 Basel	A-1090 Wien
Switzerland	Austria
Christian.Kleiber@unibas.ch	Achim.Zeileis@wu-wien.ac.at
Series Editors
Robert Gentleman	Kurt Hornik
Program in Computational Biology	Department of Statistics and Mathematics
Division of Public Health Sciences	Wirtschaftsuniversit¨at Wien
Fred Hutchinson Cancer Research Center	Augasse 2–6
1100 Fairview Avenue N., M2-B876	A-1090 Wien
PO Box 19024, Seattle, Washington 98102-1024	Austria
USA
Giovanni Parmigiani
The Sidney Kimmel Comprehensive Cancer Center
at Johns Hopkins University
550 North Broadway
Baltimore, MD 21205-2011
USA

ISBN: 978-0-387-77316-2	e-ISBN: 978-0-387-77318-6
DOI: 10.1007/978-0-387-77318-6

Library of Congress Control Number: 2008934356

c 2008 Springer Science+Business Media, LLC

All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.

The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identiﬁed as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.

Printed on acid-free paper

springer.com

Preface

R is a language and environment for data analysis and graphics. It may be considered an implementation of S, an award-winning language initially developed at Bell Laboratories since the late 1970s. The R project was initiated by Robert Gentleman and Ross Ihaka at the University of Auckland, New Zealand, in the early 1990s, and has been developed by an international team since mid-1997.

Historically, econometricians have favored other computing environments, some of which have fallen by the wayside, and also a variety of packages with canned routines. We believe that R has great potential in econometrics, both for research and for teaching. There are at least three reasons for this: (1) R is mostly platform independent and runs on Microsoft Windows, the Mac family of operating systems, and various ﬂavors of Unix/Linux, and also on some more exotic platforms. (2) R is free software that can be downloaded and installed at no cost from a family of mirror sites around the globe, the Comprehensive R Archive Network (CRAN); hence students can easily install it on their own machines. (3) R is open-source software, so that the full source code is available and can be inspected to understand what it really does, learn from it, and modify and extend it. We also like to think that platform independence and the open-source philosophy make R an ideal environment for reproducible econometric research.

This book provides an introduction to econometric computing with R; it is not an econometrics textbook. Preferably readers have taken an introductory econometrics course before but not necessarily one that makes heavy use of matrices. However, we do assume that readers are somewhat familiar with matrix notation, speciﬁcally matrix representations of regression models. Thus, we hope the book might be suitable as a “second book” for a course with su cient emphasis on applications and practical issues at the intermediate or beginning graduate level. It is hoped that it will also be useful to professional economists and econometricians who wish to learn R. We cover linear regression models for cross-section and time series data as well as the common nonlinear models of microeconometrics, such as logit, probit, and tobit

vi Preface

models, as well as regression models for count data. In addition, we provide a chapter on programming, including simulations, optimization, and an introduction to Sweave()—an environment that allows integration of text and code in a single document, thereby greatly facilitating reproducible research. (In fact, the entire book was written using Sweave() technology.)

We feel that students should be introduced to challenging data sets as early as possible. We therefore use a number of data sets from the data archives of leading applied econometrics journals such as the Journal of Applied Econometrics and the Journal of Business & Economic Statistics. Some of these have been used in recent textbooks, among them Baltagi (2002), Davidson and MacKinnon (2004), Greene (2003), Stock and Watson (2007), and Verbeek (2004). In addition, we provide all further data sets from Baltagi (2002), Franses (1998), Greene (2003), and Stock and Watson (2007), as well as selected data sets from additional sources, in an R package called AER that accompanies this book. It is available from the CRAN servers at http://CRAN.R-project.org/ and also contains all the code used in the following chapters. These data sets are suitable for illustrating a wide variety of topics, among them wage equations, growth regressions, dynamic regressions and time series models, hedonic regressions, the demand for health care, or labor force participation, to mention a few.

In our view, applied econometrics su ers from an underuse of graphics— one of the strengths of the R system for statistical computing and graphics. Therefore, we decided to make liberal use of graphical displays throughout, some of which are perhaps not well known.

The publisher asked for a compact treatment; however, the fact that R has been mainly developed by statisticians forces us to brieﬂy discuss a number of statistical concepts that are not widely used among econometricians, for historical reasons, including factors and generalized linear models, the latter in connection with microeconometrics. We also provide a chapter on R basics (notably data structures, graphics, and basic aspects of programming) to keep the book self-contained.

The production of the book

The entire book was typeset by the authors using LATEX and R’s Sweave() tools. Speciﬁcally, the ﬁnal manuscript was compiled using R version 2.7.0, AER version 0.9-0, and the most current version (as of 2008-05-28) of all other CRAN packages that AER depends on (or suggests). The ﬁrst author started under Microsoft Windows XP Pro, but thanks to a case of theft he switched to Mac OS X along the way. The second author used Debian GNU/Linux throughout. Thus, we can conﬁdently assert that the book is fully reproducible, for the version given above, on the most important (single-user) platforms.

Preface vii

Settings and appearance

R is mainly run at its default settings; however, we found it convenient to employ a few minor modiﬁcations invoked by

R> options(prompt="R> ", digits=4, show.signif.stars=FALSE)

This replaces the standard R prompt > by the more evocative R>. For compactness, digits = 4 reduces the number of digits shown when printing numbers from the default of 7. Note that this does not reduce the precision with which these numbers are internally processed and stored. In addition, R by default displays one to three stars to indicate the signiﬁcance of p values in model summaries at conventional levels. This is disabled by setting show.signif.stars

= FALSE.

Typographical conventions

We use a typewriter font for all code; additionally, function names are followed by parentheses, as in plot(), and class names (a concept that is explained in Chapters 1 and 2) are displayed as in “lm”. Furthermore, boldface is used for package names, as in AER.

Acknowledgments

This book would not exist without R itself, and thus we thank the R Development Core Team for their continuing e orts to provide an outstanding piece of open-source software, as well as all the R users and developers supporting these e orts. In particular, we are indebted to all those R package authors whose packages we employ in the course of this book.

Several anonymous reviewers provided valuable feedback on earlier drafts. In addition, we are grateful to Rob J. Hyndman, Roger Koenker, and Je rey S. Racine for particularly detailed comments and helpful discussions. On the technical side, we are indebted to Torsten Hothorn and Uwe Ligges for advice on and infrastructure for automated production of the book. Regarding the accompanying package AER, we are grateful to Badi H. Baltagi, Philip Hans Franses, William H. Greene, James H. Stock, and Mark W. Watson for permitting us to include all the data sets from their textbooks (namely Baltagi 2002; Franses 1998; Greene 2003; Stock and Watson 2007). We also thank Inga Diedenhofen and Markus Hertrich for preparing some of these data in R format. Finally, we thank John Kimmel, our editor at Springer, for his patience and encouragement in guiding the preparation and production of this book. Needless to say, we are responsible for the remaining shortcomings.

May, 2008	Christian Kleiber, Basel
	Achim Zeileis, Wien

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 An Introductory R Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Working with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4 Getting Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5 The Development Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 A Brief History of R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1 R as a Calculator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.3 R as a Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.5 Data Management in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.6 Object Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.7 R Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.8 Exploratory Data Analysis with R . . . . . . . . . . . . . . . . . . . . . . . . . 46 2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Linear Regression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 Multiple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.3 Partially Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.4 Factors, Interactions, and Weights . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.5 Linear Regression with Time Series Data . . . . . . . . . . . . . . . . . . . 79 3.6 Linear Regression with Panel Data . . . . . . . . . . . . . . . . . . . . . . . . 84 3.7 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

x Contents

4 Diagnostics and Alternative Methods of Regression . . . . . . . . 93

4.1 Regression Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.2 Diagnostic Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.3 Robust Standard Errors and Tests . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.4 Resistant Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.5 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

5 Models of Microeconometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.1 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.2 Binary Dependent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 5.3 Regression Models for Count Data . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.4 Censored Dependent Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 5.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

6 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

6.1 Infrastructure and “Naive” Methods . . . . . . . . . . . . . . . . . . . . . . . . 151

6.2 Classical Model-Based Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

6.3 Stationarity, Unit Roots, and Cointegration . . . . . . . . . . . . . . . . . 164

6.4 Time Series Regression and Structural Change . . . . . . . . . . . . . . 169

6.5 Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

7 Programming Your Own Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 183

7.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

7.2 Bootstrapping a Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . 189

7.3 Maximizing a Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

7.4 Reproducible Econometrics Using Sweave() . . . . . . . . . . . . . . . . 194

7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

Introduction

This brief chapter, apart from providing two introductory examples on ﬁtting regression models, outlines some basic features of R, including its help facilities and the development model. For the interested reader, the ﬁnal section brieﬂy outlines the history of R.

1.1 An Introductory R Session

For a ﬁrst impression of R’s “look and feel”, we provide an introductory R session in which we brieﬂy analyze two data sets. This should serve as an illustration of how basic tasks can be performed and how the operations employed are generalized and modiﬁed for more advanced applications. We realize that not every detail will be fully transparent at this stage, but these examples should help to give a ﬁrst impression of R’s functionality and syntax. Explanations regarding all technical details are deferred to subsequent chapters, where more complete analyses are provided.

Example 1: The demand for economics journals

We begin with a small data set taken from Stock and Watson (2007) that provides information on the number of library subscriptions to economic journals in the United States of America in the year 2000. The data set, originally collected by Bergstrom (2001), is available in package AER under the name Journals. It can be loaded via

R> data("Journals", package = "AER")

The commands

R> dim(Journals)

[1] 180 10

R> names(Journals)

C. Kleiber, A. Zeileis, Applied Econometrics with R,

1 / 231 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
02.06.20152.07 Mб5jpa.pdf
#
26.03.2016643 Кб12kak_provesti_issledovanie_kinsbursky.pdf
#
26.03.2016132.61 Кб32KEYS.doc
#
02.06.201527.65 Кб15Kharkhordin.docx
#
02.06.2015643.07 Кб67Khimia_I_Zhizn.doc
#
02.06.20154.41 Mб46Kleiber - Applied econometrics in R.pdf
#
02.06.2015117.73 Кб6Kniga_14_perevod.rtf
#
02.06.20151.81 Mб4KOAP_(01.09.2012).rtf
#
02.06.2015175.9 Кб30Kolok_po_diskre.pdf
#
25.09.20192.17 Mб1Kommerchesky_arbitrazh_posrednichestvo.doc
#
28.10.2018149.5 Кб1Kompilyatsia_izdanie_tretye.doc