Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 24 25 26 27 28 29 30 31 32 33 34 3536 / 17336 37 38 39 40 41 42 43 44 45 46 47 48 > Следующая >>>

		Numerical and character functions			97
> mydata <- mvrnorm(500, mean, sigma)				d Generates data
> mydata <- mvrnorm(500, mean, sigma)
> mydata <- as.data.frame(mydata)
> mydata <- as.data.frame(mydata)
> names(mydata) <- c("y","x1","x2")
> dim(mydata)			e Views the results
> dim(mydata)
[1] 500 3
[1] 500 3
> head(mydata, n=10)
y	x1	x2

198.8 41.3 4.35

2244.5 205.2 3.57

3375.7 186.7 3.69

4-59.2 11.2 4.23

5313.0 111.0 2.91

6288.8 185.1 4.18

7134.8 165.0 3.68

8171.7 97.4 3.81

9167.3 101.0 4.01

10121.1 94.5 3.76

In listing 5.3, you set a random number seed so that you can reproduce the results at a later time b. You specify the desired mean vector and variance-covariance matrix c and generate 500 pseudo-random observations d. For convenience, the results are converted from a matrix to a data frame, and the variables are given names. Finally, you confirm that you have 500 observations and 3 variables, and you print out the first 10 observations e. Note that because a correlation matrix is also a covariance matrix, you could have specified the correlation structure directly.

The probability functions in R allow you to generate simulated data, sampled from distributions with known characteristics. Statistical methods that rely on simulated data have grown exponentially in recent years, and you’ll see several examples of these in later chapters.

5.2.4Character functions

Whereas mathematical and statistical functions operate on numerical data, character functions extract information from textual data or reformat textual data for printing and reporting. For example, you may want to concatenate a person’s first name and last name, ensuring that the first letter of each is capitalized. Or you may want to count the instances of obscenities in open-ended feedback. Some of the most useful character functions are listed in table 5.6.

Table 5.6 Character functions

Function	Description

nchar(x)	Counts the number of characters of x.
	x <- c("ab", "cde", "fghij")
	length(x) returns 3 (see table 5.7).
	nchar(x[3]) returns 5.
substr(x, start, stop)	Extracts or replaces substrings in a character vector.
	x <- "abcdef"
	substr(x, 2, 4) returns bcd.
	substr(x, 2, 4) <- "22222" (x is now "a222ef").

98	CHAPTER 5		Advanced data management
	Table 5.6 Character functions (continued)

	Function		Description

	grep(pattern, x,		Searches for pattern in x. If fixed=FALSE, then pattern is a
	ignore.case=FALSE,		regular expression. If fixed=TRUE, then pattern is a text string.
	fixed=FALSE)		Returns the matching indices.
			grep("A", c("b","A","c"), fixed=TRUE) returns 2.
	sub(pattern, replacement,	Finds pattern in x and substitutes the replacement text. If
	x, ignore.case=FALSE,		fixed=FALSE, then pattern is a regular expression. If
	fixed=FALSE)		fixed=TRUE, then pattern is a text string.
			sub("\\s",".","Hello There") returns Hello.There. Note
			that "\s" is a regular expression for finding whitespace; use
			"\\s" instead, because "\" is R’s escape character (see section
			1.3.3).
	strsplit(x, split,		Splits the elements of character vector x at split. If
	fixed=FALSE)		fixed=FALSE, then pattern is a regular expression. If
			fixed=TRUE, then pattern is a text string.
			y <- strsplit("abc", "") returns a one-component,
			three-element list containing
			"a" "b" "c"
			unlist(y)[2] and sapply(y, "[", 2) both return “b”.
	paste(..., sep="")		Concatenates strings after using the sep string to separate them.
			paste("x", 1:3, sep="") returns c("x1", "x2", "x3").
			paste("x",1:3,sep="M") returns c("xM1","xM2" "xM3").
			paste("Today is", date()) returns
			Today is Mon Dec 28 14:17:32 2015
			(I changed the date to appear more current.)
	toupper(x)		Uppercase.
			toupper("abc") returns “ABC”.
	tolower(x)		Lowercase.
			tolower("ABC") returns “abc”.

Note that the functions grep(), sub(), and strsplit() can search for a text string (fixed=TRUE) or a regular expression (fixed=FALSE); FALSE is the default. Regular expressions provide a clear and concise syntax for matching a pattern of text. For example, the regular expression

^[hc]?at

matches any string that starts with 0 or one occurrences of h or c, followed by at. The expression therefore matches hat, cat, and at, but not bat. To learn more, see the regular expression entry in Wikipedia.

5.2.5Other useful functions

The functions in table 5.7 are also quite useful for data-management and manipulation, but they don’t fit cleanly into the other categories.

Numerical and character functions		99
Table 5.7 Other useful functions

Function	Description

length(x)	Returns the length of object x.
	x <- c(2, 5, 6, 9)
	length(x) returns 4.
seq(from, to, by)	Generates a sequence.
	indices <- seq(1,10,2)
	indices is c(1, 3, 5, 7, 9).
rep(x, n)	Repeats x n times.
	y <- rep(1:3, 2)
	y is c(1, 2, 3, 1, 2, 3).
cut(x, n)	Divides the continuous variable x into a factor with n levels. To cre-
	ate an ordered factor, include the option ordered_result =
	TRUE.
pretty(x, n)	Creates pretty breakpoints. Divides a continuous variable x into n
	intervals by selecting n + 1 equally spaced rounded values. Often
	used in plotting.
cat(... , file = "myfile",	Concatenates the objects in … and outputs them to the screen or
append = FALSE)	to a file (if one is declared).
	name <- c("Jane")
	cat("Hello" , name, "\n")

The last example in the table demonstrates the use of escape characters in printing. Use \n for new lines, \t for tabs, \' for a single quote, \b for backspace, and so forth (type ?Quotes for more information). For example, the code

name	<- "Bob"
cat(	"Hello", name, "\b.\n", "Isn\'t R", "\t", "GREAT?\n")
produces
Hello Bob.
Isn't R		GREAT?

Note that the second line is indented one space. When cat concatenates objects for output, it separates each by a space. That’s why you include the backspace (\b) escape character before the period. Otherwise it would produce “Hello Bob .”

How you apply the functions covered so far to numbers, strings, and vectors is intuitive and straightforward, but how do you apply them to matrices and data frames? That’s the subject of the next section.

5.2.6Applying functions to matrices and data frames

One of the interesting features of R functions is that they can be applied to a variety of data objects (scalars, vectors, matrices, arrays, and data frames). The following listing provides an example.

100	CHAPTER 5 Advanced data management

Listing 5.4 Applying functions to data objects

>a <- 5

>sqrt(a) [1] 2.236068

>b <- c(1.243, 5.654, 2.99)

>round(b)

[1] 1 6 3

>c <- matrix(runif(12), nrow=3)

[,1]	[,2]	[,3]		[,4]
[1,] 0.4205	0.355	0.699	0.323
[2,] 0.0270	0.601	0.181	0.926
[3,] 0.6682	0.319	0.599	0.215
> log(c)
[,1]	[,2]	[,3]		[,4]
[1,] -0.866	-1.036	-0.358 -1.130
[2,] -3.614	-0.508	-1.711		-0.077
[3,] -0.403	-1.144	-0.513		-1.538
> mean(c)
[1] 0.444

Notice that the mean of matrix c in listing 5.4 results in a scalar (0.444). The mean() function takes the average of all 12 elements in the matrix. But what if you want the three row means or the four column means?

R provides a function, apply(), that allows you to apply an arbitrary function to any dimension of a matrix, array, or data frame. The format for the apply() function is

apply(x, MARGIN, FUN, ...)

where x is the data object, MARGIN is the dimension index, FUN is a function you specify, and ... are any parameters you want to pass to FUN. In a matrix or data frame, MARGIN=1 indicates rows and MARGIN=2 indicates columns. Look at the following examples.

Listing 5.5 Applying a function to the rows (columns) of a matrix

>mydata <- matrix(rnorm(30), nrow=6)

>mydata


	[,1]			[,2]		[,3]	[,4]	[,5]
	[1,] 0.71298			1.368	-0.8320 -1.234 -0.790
	[2,] -0.15096 -1.149 -1.0001 -0.725							0.506
	[3,] -1.77770			0.519	-0.6675		0.721 -1.350
	[4,] -0.00132 -0.308				0.9117		-1.391	1.558
	[5,] -0.00543 0.378 -0.0906 -1.485 -0.350
	[6,] -0.52178 -0.539 -1.7347						2.050	1.569
Calculates			> apply(mydata, 1, mean)
	e [1] -0.155 -0.504 -0.511					0.154 -0.310 0.165
the trimmed			> apply(mydata, 2, mean)
column		[1] -0.2907		0.0449 -0.5688 -0.3442				0.1906
means			> apply(mydata, 2, mean, trim=0.2)

	[1] -0.1699			0.0127 -0.6475 -0.6575				0.2312

b Generates data

c Calculates the row means

d Calculates the column means

You start by generating a 6 × 5 matrix containing random normal variates b. Then you calculate the six row means c and five column means d. Finally, you calculate

<<< < Предыдущая 24 25 26 27 28 29 30 31 32 33 34 3536 / 17336 37 38 39 40 41 42 43 44 45 46 47 48 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx