Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Robert I. Kabacoff - R in action

.pdf
Скачиваний:
97
Добавлен:
02.06.2015
Размер:
12.13 Mб
Скачать

appendix B Customizing the startup environment

One of the first things that programmers like to do is customize their startup environment to conform to their preferred way of working. Customizing the startup environment allows you to set R options, specify a working directory, load commonly used packages, load user-written functions, set a default CRAN download site, and perform any number of housekeeping tasks.

You can customize the R environment through either a site initialization file (Rprofile.site) or a directory initialization file (.Rprofile). These are text files containing R code to be executed at startup.

At startup, R will source the file Rprofile.site from the R_HOME/etc directory, where R_HOME is an environment value. It will then look for an .Rprofile file to source in the current working directory. If R doesn’t find this file, it will look for it in the user’s home directory. You can use Sys.getenv("R_HOME"), Sys. getenv("HOME"), and getwd() to identify the location of the R_HOME, HOME, and current working directory, respectively.

You can place two special functions in these files. The .First() function is executed at the start of each R session, and the .Last() function is executed at the end of each session. An example of an Rprofile.site file is shown in listing B.1.

406

APPENDIX B Customizing the startup environment

407

Listing B.1 Sample Rprofile.site file

options(papersize="a4")

options(editor="notepad") Set common options options(pager="internal")

options(tab.width = 2) options(width = 130) options(graphics.record=TRUE) options(show.signif.stars=FALSE)

options(prompt="> ")

 

 

 

Set R interactive prompt

 

 

 

options(continue="+ ")

 

 

 

 

 

 

 

 

 

.libPaths("C:/my_R_library")

 

 

 

 

 

 

Set path for local library

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

local({r <- getOption("repos")

 

 

 

 

 

 

 

 

Set CRAN mirror

 

 

 

 

 

 

 

 

r["CRAN"] <- "http://cran.case.edu/"

 

 

 

 

 

 

 

 

 

 

default

options(repos=r)})

Startup function

.First <- function(){

 

 

 

library(lattice)

 

 

 

 

 

 

 

 

 

library(Hmisc)

 

 

 

 

 

 

 

 

 

source("C:/mydir/myfunctions.R")

 

 

 

 

 

 

 

 

 

cat("\nWelcome at", date(), "\n")

 

 

 

 

 

 

 

 

 

}

 

 

 

 

 

 

 

 

 

 

 

.Last <- function(){

 

 

 

 

Session end

 

 

 

 

cat("\nGoodbye at ", date(), "\n")

 

 

 

 

function

}

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

There are several things you should note about this file:

Setting a .libPaths value allows you to create a local library for packages outside of the R directory tree. This can be useful for retaining packages during an upgrade.

Setting a default CRAN mirror site frees you from having to choose one each time you issue an install.packages() command.

The .First() function is an excellent place to load libraries that you use often, as well as source text files containing user-written functions that you apply frequently.

The .Last() function is an excellent place for any cleanup activities, including archiving command histories, program output, and data files.

There are other ways to customize the startup environment, including the use of com- mand-line options and environment variables. See help(Startup) and appendix B in the Introduction to R manual (http://cran.r-project.org/doc/manuals/R-intro.pdf) for more details.

appendix C Exporting data from R

In chapter 2, we reviewed a wide range of methods for importing data into R. But there are times that you’ll want to go the other way—exporting data from R—so that data can be archived or imported into external applications. In this appendix, you’ll learn how to output an R object to a delimited text file, an Excel spreadsheet, or a statistical application (such as SPSS, SAS, or Stata).

C.1 Delimited text file

You can use the write.table() function to output an R object to a delimited text file. The format is

write.table(x, outfile, sep=delimiter, quote=TRUE, na="NA")

where x is the object and outfile is the target file. For example, the statement

write.table(mydata, "mydata.txt", sep=",")

would save the dataset mydata to a comma-delimited file named mydata.txt in the current working directory. Include a path (for example, “c:/myprojects/mydata. txt”) to save the output file elsewhere. Replacing sep="," with sep="\t" would save the data in a tab-delimited file. By default, strings are enclosed in quotes ("") and missing values are written as NA.

408

Statistical applications

409

C.2 Excel spreadsheet

The write.xlsx() function in the xlsx package can be used to save an R data frame to an Excel 2007 workbook. The format is

library(xlsx)

write.xlsx(x, outfile, col.Names=TRUE, row.names=TRUE, sheetName="Sheet 1", append=FALSE)

For example, the statements

library(xlsx)

write.xlsx(mydata, "mydata.xlsx")

export the data frame mydata to a worksheet (Sheet 1 by default) in an Excel workbook named mydata.xlsx in the current working directory. By default, the variable names in the dataset are used to create column headings in the spreadsheet and row names are placed in the first column of the spreadsheet. If mydata.xlsx already exists, it is overwritten.

The xlsx package is a powerful tool for manipulating Excel 2007 workbooks. See the package documentation for more details.

C.3 Statistical applications

The write.foreign() function in the foreign package can be used to export a data frame to an external statistical application. Two files are created—a freeformat text file containing the data, and a code file containing instructions for reading the data into the external statistical application. The format is

write.foreign(dataframe, datafile, codefile, package=package)

For example, the code

library(foreign)

write.foreign(mydata, "mydata.txt", "mycode.sps", package="SPSS")

would export the dataframe mydata into a free-format text file named mydata.txt in the current working directory and an SPSS program named mycode.sps that can be used to read the text file. Other values of package include "SAS" and "Stata".

To learn more about exporting data from R, see the R Data Import/Export documentation, available from http://cran.r-project.org/doc/manuals/R-data.pdf.

appendix D Creating publication-quality output

Research doesn’t end when the last statistical analysis or graph is completed. We need to include the results in a report that effectively communicates these findings to a teacher, supervisor, client, government agency, or journal editor. Although R creates state-of-the-art graphics, its text output is woefully retro—tables of monospaced text with columns lined up using spaces.

There are two common approaches to creating publication quality reports in R: Sweave and odfWeave. The Sweave package allows you to embed R code and output in LaTeX documents, in order to produce high-end typeset reports in PDF, PostScript, and DVI formats. Sweave is an elegant, precise, and highly flexible system, but it requires the author to be conversant with LaTeX coding.

In a similar fashion, the odfWeave package provides a mechanism for embedding R code and output in documents that follow the Open Documents Format (ODF). These reports can be further edited via an ODF word processor, such as OpenOffice Writer, and saved in either ODF or Microsoft Word format. The process is not as flexible as the Sweave approach, but it eliminates the need to learn LaTeX. We’ll look at each approach in turn.

D.1 High-quality typesetting with Sweave (R + LaTeX)

LaTeX is a document preparation system for high-quality typesetting (http://www. latex-project.org) that’s freely available for Windows, Mac, and Linux platforms. An author creates a text document that includes markup code for formatting the

410

High-quality typesetting with Sweave (R + LaTeX)

411

content. The document is then processed through a LaTeX compiler, producing a finished document in PDF, PostScript, or DVI format.

The Sweave package allows you to embed R code and output (including graphs) within the LaTeX document. This is a multistep process:

1A special document called a noweb file (typically with the extension .Rnw) is created using any text editor. The file contains the written content, LaTeX markup code, and R code chunks. Each R code chunk starts with the delimiter <<>>= and ends with the delimiter @.

2The Sweave() function processes the noweb file and generates a LaTeX file. During this step, the R code chunks are processed, and depending on options, replaced with LaTeX-formatted R code and output. This step can be accomplished from within R or from the command line.

Within R, the format is

Sweave("infile.Rnw")

By default, Sweave("example.Rnw") would input the file example.Rnw from the current working directory and output the file example.tex to the same directory. Alternatively, use can use

Sweave("infile.Rnw", syntax="SweaveSyntaxNoweb")

Specifying this syntax option can help avoid some common parsing errors, as well as conflicts with the R2HTML package.

Execution from the command line will depend on the operating system. For example, on a Linux system, this might look like $ R CMD Sweave infile.Rnw

3The LaTeX file is then run through a LaTeX compiler, creating a PDF, PostScript, or DVI file. Popular LaTeX compilers include TeX Live for Linux, MacTeX for Mac, and proTeXt for Windows.

The complete process is outlined in figure D.1.

example.rnw

Text file with LaTex Run through Sweave() function in R markup and Rcode

Chunks

example.pdf

PDF file

example.ps

Postscript file

example.dvi

DVI file

example.rnw

LaTex (TeX) file

Run through LaTeX compiler

Figure D.1 Process for generating a publication-quality report using Sweave

412

APPENDIX D Creating publication-quality output

As indicated earlier, each chunk of R code is surrounded by <<>>= and @. You can add options to each <<>>= delimiter in order to control the processing of the corresponding R code chunk. For example

<<echo=TRUE, results=HIDE>>= summary(lm(Y~X, data=mydata))

@

would output the code, but not the results, whereas

<<echo=FALSE, fig=TRUE>>= plot(A)

@

wouldn’t print the code but would include the graph in the output. Common delimiter options are described in table D.1.

Table D.1 Common options for R code chunks

Option

Description

 

 

echo

Include the code in the output (echo=TRUE) or not (echo=FALSE). The default is

 

TRUE.

eval

Use eval=FALSE to keep the code from being evaluated/executed. The default is

 

TRUE.

fig

Use fig=TRUE when the output is a graph. The default is FALSE.

results

Include R code output (results=verbatim), suppress the output (results=hide),

 

or include the output and assume that it contains LaTeX markup (results=tex).

 

The default is verbatim. Use results=tex when the output is generated by the

 

xtable() function in the xtable package or the latex() function in the Hmisc

 

package.

 

 

By default, Sweave will add LaTeX markup code to attractively format data frames, matrices, and vectors. Additionally, R objects can be embedded inline using a \Sexpr{} statement. Note that lattice graphs must be embedded in a print() statement to be processed properly.

The xtable() function in the xtable package can be used to format data frames and matrices more precisely. In addition, it can be used to format other R objects, including those produced by lm(), glm(), aov(), table(), ts(), and coxph(). Use method(xtable) to view a comprehensive list. When formatting R output using xtable(), be sure to include the results=tex option in the code chunk delimiter.

It’s easier to see how this all works with an example. Consider the noweb file in listing D.1. This is a reworking of the one-way ANOVA example in section 8.3. LaTeX markup code begins with a backslash (\). The exception is \Sexpr{}, which is a Sweave addition. R related code is presented in bold italics.

Listing D.1 A sample noweb file (example.nrw)

\documentclass[12pt]{article} \title{Sample Report}

High-quality typesetting with Sweave (R + LaTeX)

413

\author{Robert I. Kabacoff, Ph.D.} \date{}

\begin{document}

\maketitle

<<echo=false, results=hide>>= library(multcomp) library(xtable) attach(cholesterol)

@

\section{Results}

Cholesterol reduction was assessed in a study

that randomized \Sexpr{nrow(cholesterol)} patients to one of \Sexpr{length(unique(trt))} treatments. Summary statistics are provided in

Table \ref{table:descriptives}.

<<echo = false, results = tex>>=

descTable <- data.frame("Treatment" = sort(unique(trt)), "N" = as.vector(table(trt)),

"Mean" = tapply(response, list(trt), mean, na.rm=TRUE), "SD" = tapply(response, list(trt), sd, na.rm=TRUE)

)

print(xtable(descTable, caption = "Descriptive statistics for each treatment group", label = "table:descriptives"), caption.placement = "top", include.rownames = FALSE)

@

The analysis of variance is provided in Table \ref{table:anova}.

<<echo=false, results=tex>>= fit <- aov(response ~ trt)

print(xtable(fit, caption = "Analysis of variance", label = "table:anova"), caption.placement = "top")

@

\noindent and group differences are plotted in Figure \ref{figure:tukey}.

\begin{figure}\label{figure:tukey}

\begin{center}

<<fig=TRUE,echo=FALSE>>=

par(mar=c(5,4,6,2))

tuk <- glht(fit, linfct=mcp(trt="Tukey"))

plot(cld(tuk, level=.05),col="lightgrey",xlab="Treatment", ylab="Response") box("figure")

@

\caption{Distribution of response times and pairwise comparisons.} \end{center}

\end{figure}

\end{document}

414

APPENDIX D Creating publication-quality output

Sample Report

Robert I. Kabaco , Ph.D.

1 Results

Cholesterol reduction was assessed in a study that randomized 50 patients to one of 5 treatments. Summary statistics are provided in Table 1.

Table 1: Descriptive statistics for each treatment group

Treatment

N

Mean

SD

1time

10

5.78

2.88

2times

10

9.22

3.48

4times

10

12.37

2.92

drugD

10

15.36

3.45

drugE

10

20.95

3.35

The analysis of variance is provided in Table 2.

Table 2: Analysis of variance

Df Sum Sq Mean Sq F value Pr(>F)

trt

4

1351.37

337.84

32.43

0.0000

Residuals

45

468.75

10.42

 

 

and group di erences are plotted in Figure 1.

Figure D.2 Page 1 of the report created from the sample noweb file in listing D.1. The noweb file was processed through the Sweave() function in R and the resulting TeX file was processed through a LaTeX compiler to produce a PDF document.

After processing the noweb file through the Sweave() function in R and processing the resulting TeX file through a LaTeX compiler, the PDF document in figures D.2 and D.3 is generated.

 

a

a

b

 

 

 

 

b

c

 

 

 

 

c

d

 

 

 

 

 

 

25

 

 

 

 

 

20

 

 

 

 

Response

15

 

 

 

 

 

 

 

 

10

 

 

 

 

 

5

 

 

 

 

 

 

 

 

 

 

1time

2times

4times

drugD

drugE

 

 

 

Treatment

 

 

Figure 1: Distribution of response times and pairwise comparisons.

Figure D.3 Page 2 of the report created from the sample noweb file in listing D.1.

Joining forces with OpenOffice using odfWeave

415

To learn more about Sweave, visit the Sweave home page (www.stat.uni-muenchen. de/~leisch/Sweave/). An excellent presentation is also provided by Theresa Scott (http://biostat.mc.vanderbilt.edu/TheresaScott). To learn more about LaTeX, check out the article "The Not So Short Introduction to LaTeX 2e,” available on the LaTeX home page (www.latex-project.org).

D.2 Joining forces with OpenOffice using odfWeave

Sweave provides a means of embedding R code and output in a LaTeX document that’s compiled into a PDF, PostScript, or DVI file. Although beautiful, the final document isn’t editable. Additionally, many recipients require reports in a format such as Word. odfWeave provides a mechanism for embedding R code and output in OpenOffice documents. Instead of placing R code chunks in a LaTeX document, the user places R code chunks in an OpenOffice ODT file(see figure D.3.). An advantage is that the ODT file can be created with a WYSIWYG editor such as OpenOffice Writer (www.

OpenOffice.org); there’s no need to learn a markup language.

Once the noweb document is created as an ODT file, you process it through the odfWeave() function in the odfWeave package. Unlike Sweave, odfWeave has to be downloaded, installed before first use (install.packages("odfWeave")), and loaded in each session in which it will be used. For example,

library(odfWeave) infile <- "example.odt"

outfile <- "example-out.odt" odfWeave(infile, outfile)

will take the example.odt file displayed in figure D.4 and produce the example-out. odt file displayed in figure D.5. Adding options(SweaveSyntax="SweaveSyntaxNo web") before the odfWeave() statement may help reduce parsing errors on some platforms.

There are several differences between Sweave and odfWeave:

The xtable() function doesn’t work with odfWeave. By default, odfWeave will render data frames, matrices, and vectors in an attractive format. Optionally, the odfTable() function can be used to format these objects with a high degree of control.

ODF documents use XML markup rather than LaTeX. Therefore, the code chunk option result=tex should never be used. Use result=xml for code chunks that use odfTable().

The infile and outfile names should be different. Unlike Sweave, odfWeave("example.odt") would overwrite the noweb document with the final report.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]