Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Robert I. Kabacoff - R in action

.pdf

Скачиваний:

Добавлен:

02.06.2015

Размер:

12.13 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 8 910 / 4810 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

66	CHAPTER 3 Getting started with graphs

boxplot(wt, main="Boxplot of wt") par(opar)

detach(mtcars)

The results are presented in figure 3.14.

As a second example, let’s arrange 3 plots in 3 rows and 1 column. Here’s the code:

attach(mtcars)

opar <- par(no.readonly=TRUE) par(mfrow=c(3,1))

hist(wt)

hist(mpg)

hist(disp)

par(opar)

detach(mtcars)

The graph is displayed infigure 3.15. Note that the high-level function hist()includes a default title (use main="" to suppress it, or ann=FALSE to suppress all titles and labels).

	Scatterplot of wt vs. mpg					Scatterplot of wt vs disp
	30					400
mpg	20 25				disp	200 300
	15					100
	10					100
	2	3	4	5		2	3	4	5
			wt					wt
		Histogram of wt					Boxplot of wt
	8					5
Frequency	4 6					3 4
	2
						2
	0
	2	3	4	5
			wt

Figure 3.14 Graph combining four figures through par(mfrow=c(2,2))

Combining graphs

Histogram of wt

	8
Frequency	2 4 6
	0

Histogram of mpg

	12
Frequency	4 6 8
	2
	0

mpg

Histogram of disp

	6
Frequency	2 4
	0

100

200

300

400

500

disp

Figure 3.15 Graph combining with three figures through par(mfrow=c(3,1))

The layout() function has the form layout(mat) where mat is a matrix object specifying the location of the multiple plots to combine. In the following code, one figure is placed in row 1 and two figures are placed in row 2:

attach(mtcars)

layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE)) hist(wt)

hist(mpg)

hist(disp)

detach(mtcars)

The resulting graph is presented in figure 3.16.

Optionally, you can include widths= and heights= options in the layout() function to control the size of each figure more precisely. These options have the form

widths = a vector of values for the widths of columns heights = a vector of values for the heights of rows

Relative widths are specified with numeric values. Absolute widths (in centimeters) are specified with the lcm() function.

68					CHAPTER 3			Getting started with graphs
							Histogram of wt
	8
	8
Frequency	4 6



	2
	0



			2			3			4			5
								wt
		Histogram of mpg								Histogram of disp
	12								7
	10								6
									6
									5
Frequency	4 6 8							Frequency	5
									3 4



	2								2
									1
									1
	0								0
	0								0


	10	15	20	25	30	35			100	200	300	400	500
				mpg							disp

Figure 3.16 Graph combining three figures using the layout() function with default widths

In the following code, one figure is again placed in row 1 and two figures are placed in row 2. But the figure in row 1 is one-third the height of the figures in row 2. Additionally, the figure in the bottom-right cell is one-fourth the width of the figure in the bottom-left cell:

attach(mtcars)

layout(matrix(c(1, 1, 2, 3), 2, 2, byrow = TRUE), widths=c(3, 1), heights=c(1, 2))

hist(wt)

hist(mpg)

hist(disp)

detach(mtcars)

The graph is presented in figure 3.17.

As you can see, the layout() function gives you easy control over both the number and placement of graphs in a final image and the relative sizes of these graphs. See help(layout) for more details.

					Combining graphs					69
					Histogram of wt
Frequency	4 8
Frequency	4 8
	0
		2	3			4			5
						wt
			Histogram of mpg						Histogram of disp
	12								7
	10								6
									6
									5
Frequency	8							Frequency	5
	8								3 4
	6


	4
	4								2
	2								2
									1
									1
	0								0
	0								0


	10	15	20		25	30	35		100	400
				mpg						disp

Figure 3.17 Graph combining three figures using the layout() function with specified widths

3.5.1Creating a figure arrangement with fine control

There are times when you want to arrange or superimpose several figures to create a single meaningful plot. Doing so requires fine control over the placement of the figures. You can accomplish this with the fig= graphical parameter. In the following listing, two box plots are added to a scatter plot to create a single enhanced graph. The resulting graph is shown in figure 3.18.

Listing 3.4 Fine placement of figures in a graph

opar <- par(no.readonly=TRUE)
par(fig=c(0, 0.8, 0, 0.8))		Set up scatter plot
par(fig=c(0, 0.8, 0, 0.8))		Set up scatter plot
plot(mtcars$wt, mtcars$mpg,
xlab="Miles Per Gallon",
ylab="Car Weight")
par(fig=c(0, 0.8, 0.55, 1), new=TRUE)		Add box plot above
par(fig=c(0, 0.8, 0.55, 1), new=TRUE)		Add box plot above
boxplot(mtcars$wt, horizontal=TRUE, axes=FALSE)

70	CHAPTER 3 Getting started with graphs
	par(fig=c(0.65, 1, 0, 0.8), new=TRUE)	Add box plot to right
	par(fig=c(0.65, 1, 0, 0.8), new=TRUE)	Add box plot to right

boxplot(mtcars$mpg, axes=FALSE)

mtext("Enhanced Scatterplot", side=3, outer=TRUE, line=-3) par(opar)

To understand how this graph was created, think of the full graph area as going from (0,0) in the lower-left corner to (1,1) in the upper-right corner. Figure 3.19 will help you visualize this. The format of the fig= parameter is a numerical vector of the form c(x1, x2, y1, y2).

The first fig= sets up the scatter plot going from 0 to 0.8 on the x-axis and 0 to 0.8 on the y-axis. The top box plot goes from 0 to 0.8 on the x-axis and 0.55 to 1 on the y-axis. The right-hand box plot goes from 0.65 to 1 on the x-axis and 0 to 0.8 on the y-axis. fig= starts a new plot, so when adding a figure to an existing graph, include the new=TRUE option.

I chose 0.55 rather than 0.8 so that the top figure would be pulled closer to the scatter plot. Similarly, I chose 0.65 to pull the right-hand box plot closer to the scatter plot. You have to experiment to get the placement right.

Enhanced Scatterplot

	30
Car Weight	25
Car Weight	20
	15
	10

Miles Per Gallon

Figure 3.18 A scatter plot with two box plots added to the margins

Summary

(1,1)

(0,0)

Figure 3.19 Specifying locations using the fig= graphical parameter

NOTE The amount of space needed for individual subplots can be device dependent. If you get “Error in plot.new(): figure margins too large,” try varying the area given for each portion of the overall graph.

You can use fig= graphical parameter to combine several plots into any arrangement within a single graph. With a little practice, this approach gives you a great deal of flexibility when creating complex visual presentations.

3.6Summary

In this chapter, we reviewed methods for creating graphs and saving them in a variety of formats. The majority of the chapter was concerned with modifying the default graphs produced by R, in order to arrive at more useful or attractive plots. You learned how to modify a graph’s axes, fonts, symbols, lines, and colors, as well as how to add titles, subtitles, labels, plotted text, legends, and reference lines. You saw how to specify the size of the graph and margins, and how to combine multiple graphs into a single useful image.

Our focus in this chapter was on general techniques that you can apply to all graphs (with the exception of lattice graphs in chapter 16). Later chapters look at specific types of graphs. For example, chapter 7 covers methods for graphing a single variable. Graphing relationships between variables will be described in chapter 11. In chapter 16, we discuss advanced graphic methods, including lattice graphs (graphs that display the relationship between variables, for each level of other variables) and interactive graphs. Interactive graphs let you use the mouse to dynamically explore the plotted relationships.

In other chapters, we’ll discuss methods of visualizing data that are particularly useful for the statistical approaches under consideration. Graphs are a central part of

72	CHAPTER 3 Getting started with graphs

modern data analysis, and I’ll endeavor to incorporate them into each of the statistical approaches we discuss.

In the previous chapter we discussed a range of methods for inputting or importing data into R. Unfortunately, in the real world your data is rarely usable in the format in which you first get it. In the next chapter we look at ways to transform and massage our data into a state that’s more useful and conducive to analysis.

Basic data management4

This chapter covers

■Manipulating dates and missing values

■Understanding data type conversions

■Creating and recoding variables

■Sor ting, merging, and subsetting datasets

■Selecting and dropping variables

In chapter 2, we covered a variety of methods for importing data into R. Unfortunately, getting our data in the rectangular arrangement of a matrix or data frame is the first step in preparing it for analysis. To paraphrase Captain Kirk in the Star Trek episode “A Taste of Armageddon” (and proving my geekiness once and for all): “Data is a messy business—a very, very messy business.” In my own work, as much as 60 percent of the time I spend on data analysis is focused on preparing the data for analysis. I’ll go out a limb and say that the same is probably true in one form or another for most real-world data analysts. Let’s take a look at an example.

4.1A working example

One of the topics that I study in my current job is how men and women differ in the ways they lead their organizations. Typical questions might be

■Do men and women in management positions differ in the degree to which they defer to superiors?

74	CHAPTER 4 Basic data management
■	Does this vary from country to country, or are these gender differences universal?

One way to address these questions is to have bosses in multiple countries rate their managers on deferential behavior, using questions like the following:

This manager asks my opinion before making personnel decisions.
1	2	3	4	5
strongly	disagree	neither agree	agree	strongly
disagree		nor disagree		agree

The resulting data might resemble those in table 4.1. Each row represents the ratings given to a manager by his or her boss.

Table 4.1 Gender differences in leadership behavior

Manager	Date	Country	Gender	Age	q1	q2	q3	q4	q5

1	10/24/08	US	M	32	5	4	5	5	5
2	10/28/08	US	F	45	3	5	2	5	5
3	10/01/08	UK	F	25	3	5	5	5	2
4	10/12/08	UK	M	39	3	3	4
5	05/01/09	UK	F	99	2	2	1	2	1

Here, each manager is rated by their boss on five statements (q1 to q5) related to deference to authority. For example, manager 1 is a 32-year-old male working in the US and is rated deferential by his boss, while manager 5 is a female of unknown age (99 probably indicates missing) working in the UK and is rated low on deferential behavior. The date column captures when the ratings were made.

Although a dataset might have dozens of variables and thousands of observations, we’ve only included 10 columns and 5 rows to simplify the examples. Additionally, we’ve limited the number of items pertaining to the managers’ deferential behavior to 5. In a real-world study, you’d probably use 10–20 such items to improve the reliability and validity of the results. You can create a data frame containing the data in table 4.1 using the following code.

Listing 4.1 Creating the leadership data frame

manager <- c(1, 2, 3, 4, 5)

date <- c("10/24/08", "10/28/08", "10/1/08", "10/12/08", "5/1/09")

country <-		c("US", "US", "UK", "UK", "UK")
gender <- c("M",			"F", "F", "M", "F")
age <- c(32, 45,			25, 39, 99)
q1	<- c(5,	3, 3,	3, 2)
q2	<- c(4,	5, 5,	3, 2)
q3	<- c(5,	2, 5,	4, 1)
q4	<- c(5,	5, 5,	NA, 2)
q5	<- c(5,	5, 2,	NA, 1)
leadership		<- data.frame(manager, date, country, gender, age,

q1, q2, q3, q4, q5, stringsAsFactors=FALSE)

Creating new variables

In order to address the questions of interest, we must first address several data management issues. Here’s a partial list:

■The five ratings (q1 to q5) will need to be combined, yielding a single mean deferential score from each manager.

■In surveys, respondents often skip questions. For example, the boss rating manager 4 skipped questions 4 and 5. We’ll need a method of handling incomplete data. We’ll also need to recode values like 99 for age to missing.

■There may be hundreds of variables in a dataset, but we may only be interested in a few. To simplify matters, we’ll want to create a new dataset with only the variables of interest.

■Past research suggests that leadership behavior may change as a function of the manager’s age. To examine this, we may want to recode the current values of age into a new categorical age grouping (for example, young, middle-aged, elder).

■Leadership behavior may change over time. We might want to focus on deferential behavior during the recent global financial crisis. To do so, we may want to limit the study to data gathered during a specific period of time (say, January 1, 2009 to December 31, 2009).

We’ll work through each of these issues in the current chapter, as well as other basic data management tasks such as combining and sorting datasets. Then in chapter 5 we’ll look at some advanced topics.

4.2Creating new variables

In a typical research project, you’ll need to create new variables and transform existing ones. This is accomplished with statements of the form

variable <- expression

A wide array of operators and functions can be included in the expression portion of the statement. Table 4.2 lists R’s arithmetic operators. Arithmetic operators are used when developing formulas.

Table 4.2 Arithmetic operators

Operator	Description

+	Addition
-	Subtraction
*	Multiplication
/	Division
^ or **	Exponentiation
x%%y	Modulus (x mod y) 5%%2 is 1
x%/%y	Integer division 5%/%2 is 2

<<< < Предыдущая 1 2 3 4 5 6 7 8 910 / 4810 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
04.09.2019123.9 Кб2report_praktika.doc
#
02.06.201534.78 Кб27Research_Proposal_v_3_0.docx
#
01.05.202564.51 Кб1revision.doc
#
02.06.2015613.89 Кб24Rimskoe_pravo_bilety.doc
#
10.11.2019295.94 Кб12RI_lab.doc
#
02.06.201512.13 Mб97Robert I. Kabacoff - R in action.pdf
#
02.06.20152.89 Mб37Rossyskoe_zakonodatelstvo_X_XX_vekov_V_9-ti.doc
#
24.09.20195.23 Mб67RPZ.doc
#
01.05.2025136.7 Кб0RP_NIR_MEI_FM.doc
#
26.03.2016112.64 Кб4Rules.doc
#
26.03.2016233.33 Кб135RUR2012.docx