Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

546

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 124 125 126 127 128 129 130 131 132 133 134 135136 / 173136 137 138 139 140 141 142 143 144 145 146 147 148 > Следующая >>>

Grouping

height

Bass 2

Bass 1

Tenor 2

Tenor 1

Alto 2

Alto 1

Soprano 2 Soprano 1

voice.part

447

Figure 19.7 A combined violin and box plot graph of singer heights by voice part

In the remainder of this chapter, you’ll use geoms to create a wide range of graph types. Let’s start with grouping—the representation of more than one group of observations in a single graph.

19.4 Grouping

In order to understand data, it’s often helpful to plot two or more groups of observations on the same graph. In R, the groups are usually defined as the levels of a categorical variable (factor). Grouping is accomplished in ggplot2 graphs by associating one or more grouping variables with visual characteristics such as shape, color, fill, size, and line type. The aes() function in the ggplot() statement assigns variables to roles (visual characteristics of the plot), so this is a natural place to assign grouping variables.

Let’s use grouping to explore the Salaries dataset. The dataframe contains information on the salaries of university professors collected during the 2008–2009 academic year. Variables include rank (AsstProf, AssocProf, Prof), sex (Female, Male), yrs.since.phd (years since Ph.D.), yrs.service (years of service), and salary (nine-month salary in dollars).

First, you can ask how salaries vary by academic rank. The code

data(Salaries, package="car") library(ggplot2)

ggplot(data=Salaries, aes(x=salary, fill=rank)) + geom_density(alpha=.3)

448

density

CHAPTER 19 Advanced graphics with ggplot2

4e−05

3e−05

rank

AsstProf AssocProf

2e−05 Prof

1e−05

			Figure 19.8 Density plots
			Figure 19.8 Density plots
0e+00			of university salaries, grouped
			by academic rank
50000	100000	150000	200000

salary

plots three density curves in the same graph (one for each level of academic rank) and distinguishes them by fill color. The fills are set to be somewhat transparent (alpha) so that the overlapping curves don’t obscure each other. The colors also combine to improve visualization in join areas. The plot is given is figure 19.8. Note that a legend is produced automatically. In section 19.7.2, you’ll learn how to customize the legend generated for grouped data.

Salary increases by rank, but there is significant overlap, with some associate and full professors earning the same as assistant professors. As rank increases, so does the range of salaries. This is especially true for full professors, who have wide variation in their incomes. Placing all three distributions in the same graph facilitates comparisons among the groups.

Next, let’s plot the relationship between years since Ph.D. and salary, grouping by sex and rank:

ggplot(Salaries, aes(x=yrs.since.phd, y=salary, color=rank, shape=sex)) + geom_point()

In the resulting graph (figure 19.9), academic rank is represented by point color (assistant professors in red, associate professors in green, and full professors in blue). Additionally, sex is indicated by point shape (circles are females and triangles are men). If you’re looking at a greyscale image, the color differences can be difficult to see; try running the code yourself. Note that reasonable legends are again produced

salary

		Grouping	449
200000
		rank
		AsstProf
		AssocProf
150000		Prof
		Prof
		sex
		Female
		Male
100000
		Figure 19.9	Scatterplot of
		years since graduation and
		salary. Academic rank is
		represented by color, and sex
50000		is represented by shape.
50000
0	20	40
		yrs.since.phd

automatically. Here you can see that income increases with years since graduation, but the relationship is by no means linear.

Finally, you can visualize the number of professors by rank and sex using a grouped bar chart. The following code provides three bar-chart variations, displayed in figure 19.10:

ggplot(Salaries, aes(x=rank, fill=sex)) + geom_bar(position="stack") + labs(title='position="stack"')

ggplot(Salaries, aes(x=rank, fill=sex)) + geom_bar(position="dodge") + labs(title='position="dodge"')

ggplot(Salaries, aes(x=rank, fill=sex)) + geom_bar(position="fill") + labs(title='position="fill"')

Each of the plots in figure 19.10 emphasizes different aspects of the data. It’s clear from the first two graphs that there are many more full professors than members of other ranks. Additionally, there are more female full professors than female assistant or associate professors. The third graph indicates that the relative percentage of women to men in the full-professor group is less than in the other two groups, even though the total number of women is greater.

450	CHAPTER 19 Advanced graphics with ggplot2

200

count

100

position="stack"		position="dodge"
	250	1.00
	200
		0.75
count	150	count
count		count
		0.50
	100
		0.25
	50
	0	0.00

position="fill"

sex

Female

Male

AsstProf AssocProf	Prof	AsstProf AssocProf	Prof	AsstProf AssocProf	Prof
rank		rank		rank

Figure 19.10 Three versions of a grouped bar chart. Each displays the number of professors by academic rank and sex.

Note that the label on the y-axis for the third graph isn’t correct. It should say Proportion rather than count. You can correct this by adding y="Proportion" to the labs() function.

Options can be used in different ways, depending on whether they occur inside or outside the aes() function. Look at the following examples and try to guess what they do:

ggplot(Salaries, aes(x=rank, fill=sex))+ geom_bar() ggplot(Salaries, aes(x=rank)) + geom_bar(fill="red") ggplot(Salaries, aes(x=rank, fill="red")) + geom_bar()

In the first example, sex is a variable represented by fill color in the bar graph. In the second example, each bar is filled with the color red. In the third example, ggplot2 assumes that "red" is the name of a variable, and you get unexpected (and undesirable) results. In general, variables should go inside aes(), and assigned constants should go outside aes().

19.5 Faceting

Sometimes relationships are clearer if groups appear in side-by-side graphs rather than overlapping in a single graph. You can create trellis graphs (called faceted graphs in ggplot2) using the facet_wrap() and facet_grid() functions. The syntax is given in table 19.4, where var, rowvar, and colvar are factors.

Table 19.4 ggplot2 facet functions

Syntax	Results

facet_wrap(~var, ncol=n)	Separate plots for each level of var arranged into n columns
facet_wrap(~var, nrow=n)	Separate plots for each level of var arranged into n rows

	Faceting	451
Table 19.4 ggplot2 facet functions

Syntax	Results

facet_grid(rowvar~colvar)	Separate plots for each combination of rowvar and colvar,
	where rowvar represents rows and colvar represents columns
facet_grid(rowvar~.)	Separate plots for each level of rowvar, arranged as a single
	column
facet_grid(.~colvar)	Separate plots for each level of colvar, arranged as a single row

Going back to the choral example, you can a faceted graph using the following code:

data(singer, package="lattice") library(ggplot2)

ggplot(data=singer, aes(x=height)) + geom_histogram() + facet_wrap(~voice.part, nrow=4)

The resulting plot (figure 19.11) displays the distribution of singer heights by voice part. Separating the eight distributions into their own small, side-by-side plots makes them easier to compare.

As a second example, let’s create a graph that has faceting and grouping:

library(ggplot2)

ggplot(Salaries, aes(x=yrs.since.phd, y=salary, color=rank, shape=rank)) + geom_point() + facet_grid(.~sex)

count

Bass 2

Bass 1

Tenor 2

Tenor 1

Alto 2

Alto 1

		Soprano 2				Soprano 1
15
10
5
0
60	65	70	75	60	65	70	75
				height

Figure 19.11 Faceted graph showing the distribution (histogram) of singer heights by voice part

452

200000

salary	150000
	150000
	100000
	50000

CHAPTER 19 Advanced graphics with ggplot2

Female

Male

rank

AsstProf

AssocProf

Prof

Figure 19.12 Scatterplot of years since graduation and salary. Academic rank is represented by color and shape, and sex is faceted.

yrs.since.phd

The resulting graph is presented in 19.12. It contains the same information, but separating the plot into facets makes it somewhat easier to read.

Finally, try displaying the height distribution of choral members in the singer dataset separately for each voice part, using kernel-density plots arranged horizontally. Give each a different color. One solution is as follows:

data(singer, package="lattice") library(ggplot2)

ggplot(data=singer, aes(x=height, fill=voice.part)) + geom_density() +

facet_grid(voice.part~.)

The result is displayed in figure 19.13.

Note that the horizontal arrangement facilitates comparisons among the groups. The colors aren’t strictly necessary, but they can aid in distinguishing the plots. (If you’re viewing this in greyscale, be sure to try the example yourself.)

Figure 19.13 Faceted density plots for singer heights by voice part

density

0.2

0.1

0.0

0.2

0.1

0.0

0.2

0.1

0.0

0.2

0.1

0.0

0.2

0.1

0.0

0.2

0.1

0.0

0.2

0.1

0.0

0.2

0.1

0.0

			2 Bass
			1 Bass
			2 Tenor
			1 Tenor
			2 Alto
			1 Alto
			2 Soprano
			1 Soprano
60	65	70	75

voice.part

Bass 2

Bass 1

Tenor 2

Tenor 1

Alto 2

Alto 1

Soprano 2 Soprano 1

height

<<< < Предыдущая 124 125 126 127 128 129 130 131 132 133 134 135136 / 173136 137 138 139 140 141 142 143 144 145 146 147 148 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
02.06.2015162.69 Кб89Psyh_final_ver.docx
#
01.07.202542.31 Кб0Psy_stat_bach#1.docx
#
01.07.20251.74 Mб0Psy_stat_bach_1.docx
#
26.03.2016226.3 Кб24public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб546R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб20Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб7Raeff_Modernity.pdf
#
26.03.20162.12 Mб23raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
01.05.2025173.06 Кб0raschet_zdaniy.doc