Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

Specifying the plot type with geoms

443

you’ll be able to create a wide variety of interesting and useful plots with just a few lines of code.

Let’s start with a description of geom functions and the type of graphs they can create. Then we’ll look at the aes() function in more detail and how you can use it to group data. Next, we’ll consider faceting and the creation of trellis graphs. Finally, we’ll look at ways to tweak the appearance of ggplot2 graphs, including modifying axes and legends, changing color schemes, and adding annotations. The chapter will end with pointers to resources that can help you master the ggplot2 approach more fully.

19.3 Specifying the plot type with geoms

Whereas the ggplot() function specifies the data source and variables to be plotted, the geom functions specify how these variables are to be visually represented (using points, bars, lines, and shaded regions). Currently, 37 geoms are available. Table 19.2 lists the more common ones, along with frequently used options. The options are described more fully in table 19.3.

Table 19.2 Geom functions

Function

Adds

Options

 

 

 

geom_bar()

Bar chart

color, fill, alpha

geom_boxplot()

Box plot

color, fill, alpha, notch, width

geom_density()

Density plot

color, fill, alpha, linetype

geom_histogram()

Histogram

color, fill, alpha, linetype, binwidth

geom_hline()

Horizontal lines

color, alpha, linetype, size

geom_jitter()

Jittered points

color, size, alpha, shape

geom_line()

Line graph

colorvalpha, linetype, size

geom_point()

Scatterplot

color, alpha, shape, size

geom_rug()

Rug plot

color, side

geom_smooth()

Fitted line

method, formula, color, fill, linetype, size

geom_text()

Text annotations

Many; see the help for this function

geom_violin()

Violin plot

color, fill, alpha, linetype

geom_vline()

Vertical lines

color, alpha, linetype, size

 

 

 

Most of the graphs described in this book can be created using the geoms in table 19.2. For example, the code

data(singer, package="lattice")

ggplot(singer, aes(x=height)) + geom_histogram()

444

CHAPTER 19 Advanced graphics with ggplot2

30

 

 

 

20

 

 

 

count

 

 

 

10

 

 

 

0

 

 

 

60

65

70

75

 

 

height

 

Figure 19.4 Histogram of singer heights

produces the histogram in figure 19.4, and

ggplot(singer, aes(x=voice.part, y=height)) + geom_boxplot()

produces the box plot in figure 19.5.

From figure 19.5, it appears that basses tend to be taller and sopranos tend to be shorter. Although gender wasn’t measured, it probably accounts for much of the variation you see.

75

70

height

65

60

Bass 2

Bass 1

Tenor 2

Tenor 1

Alto 2

Alto 1

Soprano 2 Soprano 1

voice.part

Figure 19.5 Box plot of singer heights by voice part

Specifying the plot type with geoms

445

Note that only the x variable was specified when creating a histogram, but both an x and a y variable were specified for the box plot. The geom_histogram() function defaults to counts on the y-axis when no y variable is specified. See the documentation for each function for details and additional examples.

Each geom function has a set of options that can be used to modify its representation. Common options are listed in table 19.3.

Table 19.3 Common options for geom functions

Option

Specifies

 

 

color

Color of points, lines, and borders around filled regions.

fill

Color of filled areas such as bars and density regions.

alpha

Transparency of colors, ranging from 0 (fully transparent) to 1 (opaque).

linetype

Pattern for lines (1 = solid, 2 = dashed, 3 = dotted, 4 = dotdash, 5 = longdash,

 

6 = twodash).

size

Point size and line width.

shape

Point shapes (same as pch, with 0 = open square, 1 = open circle, 2 = open triangle,

 

and so on). See figure 3.4 for examples.

position

Position of plotted objects such as bars and points. For bars, "dodge" places grouped

 

bar charts side by side, "stacked" vertically stacks grouped bar charts, and "fill"

 

vertically stacks grouped bar charts and standardizes their heights to be equal. For

 

points, "jitter" reduces point overlap.

binwidth

Bin width for histograms.

notch

Indicates whether box plots should be notched (TRUE/FALSE).

sides

Placement of rug plots on the graph ("b" = bottom, "l" = left, "t" = top, "r" = right,

 

"bl" = both bottom and left, and so on).

width

Width of box plots.

 

 

You can examine the use of many of these options using the Salaries dataset. The code

data(Salaries, package="car") library(ggplot2)

ggplot(Salaries, aes(x=rank, y=salary)) + geom_boxplot(fill="cornflowerblue", color="black", notch=TRUE)+

geom_point(position="jitter", color="blue", alpha=.5)+ geom_rug(side="l", color="black")

produces the plot in figure 19.6. The figure displays notched box plots of salary by academic rank. The actual observations (teachers) are overlaid and given some transparency so they don’t obscure the box plots. They’re also jittered to reduce their overlap. Finally, a rug plot is provided on the left to indicate the general spread of salaries.

446

CHAPTER 19 Advanced graphics with ggplot2

salary

200000

150000

100000

50000

AsstProf

AssocProf

Prof

rank

Figure 19.6 Notched box plots with superimposed points describing the salaries of college professors by rank. A rug plot is provided on the vertical axis.

From figure 19.6, you can see that the salaries of assistant, associate, and full professors differ significantly from each other (there is no overlap in the box plot notches). Additionally, the variance in salaries increases with greater rank, with a large range of salaries for full professors. In fact, at least one full professor earns less than assistant professors. There are also three full professors whose salaries are so large as to make them outliers (as indicated by the black dots in the Prof box plot). Having been a full professor earlier in my career, the data suggests to me that I was clearly underpaid.

The real power of the ggplot2 package is realized when geoms are combined to form new types of plots. Returning to the singer dataset, the code

library(ggplot2)

data(singer, package="lattice") ggplot(singer, aes(x=voice.part, y=height)) +

geom_violin(fill="lightblue") + geom_boxplot(fill="lightgreen", width=.2)

combines box plots with violin plots to create a new type of graph (displayed in figure 19.7). The box plots show the 25th, 50th, and 75th percentile scores for each voice part in the singer dataframe, along with any outliers. The violin plots provide more visual cues as to the distribution of scores over the range of heights for each voice part.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]