One of the first things that programmers like to do is customize their startup environment to conform to their preferred way of working. Customizing the startup environment allows you to set R options, specify a working directory, load commonly used packages, load user-written functions, set a default CRAN download site, and perform any number of housekeeping tasks.
You can customize the R environment through either a site initialization file (Rprofile.site) or a directory initialization file (.Rprofile). These are text files containing R code to be executed at startup.
At startup, R will source the file Rprofile.site from the R_HOME/etc directory, where R_HOME is an environment value. It will then look for an .Rprofile file to source in the current working directory. If R doesn’t find this file, it will look for it in the user’s home directory. You can use Sys.getenv("R_HOME"), Sys. getenv("HOME"), and getwd() to identify the location of the R_HOME, HOME, and current working directory, respectively.
You can place two special functions in these files. The .First() function is executed at the start of each R session, and the .Last() function is executed at the end of each session. An example of an Rprofile.site file is shown in listing B.1.
406
APPENDIX B Customizing the startup environment
407
Listing B.1 Sample Rprofile.site file
options(papersize="a4")
options(editor="notepad") Set common options options(pager="internal")
There are several things you should note about this file:
■Setting a .libPaths value allows you to create a local library for packages outside of the R directory tree. This can be useful for retaining packages during an upgrade.
■Setting a default CRAN mirror site frees you from having to choose one each time you issue an install.packages() command.
■The .First() function is an excellent place to load libraries that you use often, as well as source text files containing user-written functions that you apply frequently.
■The .Last() function is an excellent place for any cleanup activities, including archiving command histories, program output, and data files.
There are other ways to customize the startup environment, including the use of com- mand-line options and environment variables. See help(Startup) and appendix B in the Introduction to R manual (http://cran.r-project.org/doc/manuals/R-intro.pdf) for more details.
appendix C Exporting data from R
In chapter 2, we reviewed a wide range of methods for importing data into R. But there are times that you’ll want to go the other way—exporting data from R—so that data can be archived or imported into external applications. In this appendix, you’ll learn how to output an R object to a delimited text file, an Excel spreadsheet, or a statistical application (such as SPSS, SAS, or Stata).
C.1 Delimited text file
You can use the write.table() function to output an R object to a delimited text file. The format is
where x is the object and outfile is the target file. For example, the statement
write.table(mydata, "mydata.txt", sep=",")
would save the dataset mydata to a comma-delimited file named mydata.txt in the current working directory. Include a path (for example, “c:/myprojects/mydata. txt”) to save the output file elsewhere. Replacing sep="," with sep="\t" would save the data in a tab-delimited file. By default, strings are enclosed in quotes ("") and missing values are written as NA.
408
Statistical applications
409
C.2 Excel spreadsheet
The write.xlsx() function in the xlsx package can be used to save an R data frame to an Excel 2007 workbook. The format is
export the data frame mydata to a worksheet (Sheet 1 by default) in an Excel workbook named mydata.xlsx in the current working directory. By default, the variable names in the dataset are used to create column headings in the spreadsheet and row names are placed in the first column of the spreadsheet. If mydata.xlsx already exists, it is overwritten.
The xlsx package is a powerful tool for manipulating Excel 2007 workbooks. See the package documentation for more details.
C.3 Statistical applications
The write.foreign() function in the foreign package can be used to export a data frame to an external statistical application. Two files are created—a freeformat text file containing the data, and a code file containing instructions for reading the data into the external statistical application. The format is
would export the dataframe mydata into a free-format text file named mydata.txt in the current working directory and an SPSS program named mycode.sps that can be used to read the text file. Other values of package include "SAS" and "Stata".
To learn more about exporting data from R, see the R Data Import/Export documentation, available from http://cran.r-project.org/doc/manuals/R-data.pdf.
appendix D Creating publication-quality output
Research doesn’t end when the last statistical analysis or graph is completed. We need to include the results in a report that effectively communicates these findings to a teacher, supervisor, client, government agency, or journal editor. Although R creates state-of-the-art graphics, its text output is woefully retro—tables of monospaced text with columns lined up using spaces.
There are two common approaches to creating publication quality reports in R: Sweave and odfWeave. The Sweave package allows you to embed R code and output in LaTeX documents, in order to produce high-end typeset reports in PDF, PostScript, and DVI formats. Sweave is an elegant, precise, and highly flexible system, but it requires the author to be conversant with LaTeX coding.
In a similar fashion, the odfWeave package provides a mechanism for embedding R code and output in documents that follow the Open Documents Format (ODF). These reports can be further edited via an ODF word processor, such as OpenOffice Writer, and saved in either ODF or Microsoft Word format. The process is not as flexible as the Sweave approach, but it eliminates the need to learn LaTeX. We’ll look at each approach in turn.
D.1 High-quality typesetting with Sweave (R + LaTeX)
LaTeX is a document preparation system for high-quality typesetting (http://www. latex-project.org) that’s freely available for Windows, Mac, and Linux platforms. An author creates a text document that includes markup code for formatting the
410
High-quality typesetting with Sweave (R + LaTeX)
411
content. The document is then processed through a LaTeX compiler, producing a finished document in PDF, PostScript, or DVI format.
The Sweave package allows you to embed R code and output (including graphs) within the LaTeX document. This is a multistep process:
1A special document called a noweb file (typically with the extension .Rnw) is created using any text editor. The file contains the written content, LaTeX markup code, and R code chunks. Each R code chunk starts with the delimiter <<>>= and ends with the delimiter @.
2The Sweave() function processes the noweb file and generates a LaTeX file. During this step, the R code chunks are processed, and depending on options, replaced with LaTeX-formatted R code and output. This step can be accomplished from within R or from the command line.
Within R, the format is
Sweave("infile.Rnw")
By default, Sweave("example.Rnw") would input the file example.Rnw from the current working directory and output the file example.tex to the same directory. Alternatively, use can use
Sweave("infile.Rnw", syntax="SweaveSyntaxNoweb")
Specifying this syntax option can help avoid some common parsing errors, as well as conflicts with the R2HTML package.
Execution from the command line will depend on the operating system. For example, on a Linux system, this might look like $ R CMD Sweave infile.Rnw
3The LaTeX file is then run through a LaTeX compiler, creating a PDF, PostScript, or DVI file. Popular LaTeX compilers include TeX Live for Linux, MacTeX for Mac, and proTeXt for Windows.
The complete process is outlined in figure D.1.
example.rnw
Text file with LaTex Run through Sweave() function in R markup and Rcode
Chunks
example.pdf
PDF file
example.ps
Postscript file
example.dvi
DVI file
example.rnw
LaTex (TeX) file
Run through LaTeX compiler
Figure D.1 Process for generating a publication-quality report using Sweave
412
APPENDIX D Creating publication-quality output
As indicated earlier, each chunk of R code is surrounded by <<>>= and @. You can add options to each <<>>= delimiter in order to control the processing of the corresponding R code chunk. For example
would output the code, but not the results, whereas
<<echo=FALSE, fig=TRUE>>= plot(A)
@
wouldn’t print the code but would include the graph in the output. Common delimiter options are described in table D.1.
Table D.1 Common options for R code chunks
Option
Description
echo
Include the code in the output (echo=TRUE) or not (echo=FALSE). The default is
TRUE.
eval
Use eval=FALSE to keep the code from being evaluated/executed. The default is
TRUE.
fig
Use fig=TRUE when the output is a graph. The default is FALSE.
results
Include R code output (results=verbatim), suppress the output (results=hide),
or include the output and assume that it contains LaTeX markup (results=tex).
The default is verbatim. Use results=tex when the output is generated by the
xtable() function in the xtable package or the latex() function in the Hmisc
package.
By default, Sweave will add LaTeX markup code to attractively format data frames, matrices, and vectors. Additionally, R objects can be embedded inline using a \Sexpr{} statement. Note that lattice graphs must be embedded in a print() statement to be processed properly.
The xtable() function in the xtable package can be used to format data frames and matrices more precisely. In addition, it can be used to format other R objects, including those produced by lm(), glm(), aov(), table(), ts(), and coxph(). Use method(xtable) to view a comprehensive list. When formatting R output using xtable(), be sure to include the results=tex option in the code chunk delimiter.
It’s easier to see how this all works with an example. Consider the noweb file in listing D.1. This is a reworking of the one-way ANOVA example in section 8.3. LaTeX markup code begins with a backslash (\). The exception is \Sexpr{}, which is a Sweave addition. R related code is presented in bold italics.
\caption{Distribution of response times and pairwise comparisons.} \end{center}
\end{figure}
\end{document}
414
APPENDIX D Creating publication-quality output
Sample Report
Robert I. Kabaco , Ph.D.
1 Results
Cholesterol reduction was assessed in a study that randomized 50 patients to one of 5 treatments. Summary statistics are provided in Table 1.
Table 1: Descriptive statistics for each treatment group
Treatment
N
Mean
SD
1time
10
5.78
2.88
2times
10
9.22
3.48
4times
10
12.37
2.92
drugD
10
15.36
3.45
drugE
10
20.95
3.35
The analysis of variance is provided in Table 2.
Table 2: Analysis of variance
Df Sum Sq Mean Sq F value Pr(>F)
trt
4
1351.37
337.84
32.43
0.0000
Residuals
45
468.75
10.42
and group di erences are plotted in Figure 1.
Figure D.2 Page 1 of the report created from the sample noweb file in listing D.1. The noweb file was processed through the Sweave() function in R and the resulting TeX file was processed through a LaTeX compiler to produce a PDF document.
After processing the noweb file through the Sweave() function in R and processing the resulting TeX file through a LaTeX compiler, the PDF document in figures D.2 and D.3 is generated.
a
a
b
b
c
c
d
25
20
Response
15
●
10
5
●
1time
2times
4times
drugD
drugE
Treatment
Figure 1: Distribution of response times and pairwise comparisons.
Figure D.3 Page 2 of the report created from the sample noweb file in listing D.1.
Joining forces with OpenOffice using odfWeave
415
To learn more about Sweave, visit the Sweave home page (www.stat.uni-muenchen. de/~leisch/Sweave/). An excellent presentation is also provided by Theresa Scott (http://biostat.mc.vanderbilt.edu/TheresaScott). To learn more about LaTeX, check out the article "The Not So Short Introduction to LaTeX 2e,” available on the LaTeX home page (www.latex-project.org).
D.2 Joining forces with OpenOffice using odfWeave
Sweave provides a means of embedding R code and output in a LaTeX document that’s compiled into a PDF, PostScript, or DVI file. Although beautiful, the final document isn’t editable. Additionally, many recipients require reports in a format such as Word. odfWeave provides a mechanism for embedding R code and output in OpenOffice documents. Instead of placing R code chunks in a LaTeX document, the user places R code chunks in an OpenOffice ODT file(see figure D.3.). An advantage is that the ODT file can be created with a WYSIWYG editor such as OpenOffice Writer (www.
OpenOffice.org); there’s no need to learn a markup language.
Once the noweb document is created as an ODT file, you process it through the odfWeave() function in the odfWeave package. Unlike Sweave, odfWeave has to be downloaded, installed before first use (install.packages("odfWeave")), and loaded in each session in which it will be used. For example,
will take the example.odt file displayed in figure D.4 and produce the example-out. odt file displayed in figure D.5. Adding options(SweaveSyntax="SweaveSyntaxNo web") before the odfWeave() statement may help reduce parsing errors on some platforms.
There are several differences between Sweave and odfWeave:
■The xtable() function doesn’t work with odfWeave. By default, odfWeave will render data frames, matrices, and vectors in an attractive format. Optionally, the odfTable() function can be used to format these objects with a high degree of control.
■ODF documents use XML markup rather than LaTeX. Therefore, the code chunk option result=tex should never be used. Use result=xml for code chunks that use odfTable().
■The infile and outfile names should be different. Unlike Sweave, odfWeave("example.odt") would overwrite the noweb document with the final report.