Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

470

CHAPTER 20 Advanced programming

 

 

library(ggplot2)

 

 

 

 

 

 

ggplot(data=dfm,

 

 

 

aes(x=Measurement, y=Centimeters, group=Cluster)) +

d Plots a

 

geom_point(size=3, aes(shape=Cluster,

color=Cluster)) +

line graph

 

geom_line(size=1, aes(color=Cluster))

+

 

 

ggtitle("Profiles for Iris Clusters")

 

 

 

 

 

 

First, the matrix of cluster centroids is extracted (rows are clusters, and columns are variable means) b. The matrix is then reshaped into long format using the reshape package (see section 5.6.2) c. Finally the data is plotted using the ggplot2 package (see section 18.3) d. The resulting graph is displayed in figure 20.1.

This type of graph is possible because all the variables plotted use the same units of measurement (centimeters). If the cluster analysis involved variables on different scales, you would need to standardize the data before plotting and label the y-axis something like Standardized Scores. See section 16.1 for details.

Now that you can represent data in structures and unpack the results, let’s look at flow control.

20.1.2Control structures

When the R interpreter processes code, it reads sequentially, line by line. If a line isn’t a complete statement, it reads additional lines until a fully formed statement can be constructed. For example, if you wanted to add 3 + 2 + 5,

> 3 + 2 + 5 [1] 10

Centimeters

 

Profiles for Iris Clusters

 

6

 

 

 

 

 

 

Cluster

4

 

 

1

 

 

 

 

 

 

2

 

 

 

3

2

 

 

 

0

 

 

 

Sepal.Length

Sepal.Width

Petal.Length

Petal.Width

 

Measurement

 

Figure 20.1 A plot of the centroids (means) for three clusters extracted from the Iris dataset using k-means clustering

A review of the language

471

will work. So will

> 3 + 2 + 5

[1] 10

The + sign at the end of the first line indicates that the statement isn’t complete. But

>3 + 2 [1] 5

>+ 5 [1] 5

obviously doesn’t work, because 3 + 2 is interpreted as a complete statement. Sometimes you need to process code nonsequentially. You may want to execute

code conditionally or repeat one or more statements multiple times. This section describes three control-flow functions that are particularly useful in writing functions: for(), if(), and ifelse().

FOR LOOPS

The for() function allows you to execute a statement repeatedly. The syntax is

for(var in seq){ statements

}

where var is a variable name and seq is an expression that evaluates to a vector. If there is only one statement, the curly braces are optional:

>for(i in 1:5) print(1:i) [1] 1 [1] 1 2

[1] 1 2 3 [1] 1 2 3 4

[1] 1 2 3 4 5

>for(i in 5:1)print(1:i) [1] 1 2 3 4 5 [1] 1 2 3 4 [1] 1 2 3 [1] 1 2 [1] 1

Note that var continues to exist after the function exits. Here, i equals 1.

IF() AND ELSE

The if() function allows you to execute statements conditionally. The syntax for the if() construct is

if(condition){ statements

} else { statements

}

472

CHAPTER 20 Advanced programming

The condition should be a one-element logical vector (TRUE or FALSE) and can’t be missing (NA). The else portion is optional. If there is only one statement, the curly braces are also optional.

As an example, consider the following code fragment:

if(interactive()){ plot(x, y)

} else { png("myplot.png") plot(x, y) dev.off()

}

If the code is being run interactively, the interactive() function returns TRUE and a plot is sent to the screen. Otherwise, the plot is saved to disk. You’ll use the if() function extensively in chapter 21.

IFELSE()

The ifelse() function is a vectorized version of if(). Vectorization allows a function to process objects without explicit looping. The format of ifelse() is

ifelse(test, yes, no)

where test is an object that has been coerced to logical mode, yes returns values for true elements of test, and no returns values for false elements of test.

Let’s say that you have a vector of p-values that you have extracted from a statistical analysis that involved six statistical tests, and you want to flag the tests that are significant at the p < .05 level. This can be accomplished with the following code:

>pvalues <- c(.0867, .0018, .0054, .1572, .0183, .5386)

>results <- ifelse(pvalues <.05, "Significant", "Not Significant")

>results

[1] "Not Significant" "Significant" "Significant"

[4] "Not Significant" "Significant" "Not Significant"

The ifelse() function loops through the vector pvalues and returns a character vector containing the value "Significant" or "Not Significant" depending on whether the corresponding element of pvalues is greater than .05.

The same result can be accomplished with explicit loops using

pvalues <- c(.0867, .0018, .0054, .1572, .0183, .5386) results <- vector(mode="character", length=length(pvalues)) for(i in 1:length(pvalues)){

if (pvalues[i] < .05) results[i] <- "Significant" else results[i] <- "Not Significant"

}

The vectorized version is faster and more efficient.

There are other control structures, including while(), repeat(), and switch(), but the ones presented here are the most commonly used. Now that you have data structures and control structures, we can talk about creating functions.

A review of the language

473

20.1.3Creating functions

Almost everything in R is a function. Even arithmetic operators like +, -, /, and * are actually functions. For example, 2 + 2 is equivalent to "+"(2, 2). This section describes function syntax. Scope is considered in section 20.2.

FUNCTION SYNTAX

The syntax of a function is

functionname <- function(parameters){

statements

return(value)

}

If there is more than one parameter, the parameters are separated by commas. Parameters can be passed by keyword, by position, or both. Additionally, parame-

ters can have default values. Consider the following function:

f <- function(x, y, z=1){ result <- x + (2*y) + (3*z) return(result)

}

>f(2,3,4) [1] 20

>f(2,3) [1] 11

>f(x=2, y=3) [1] 11

>f(z=4, y=2, 3) [1] 19

In the first case, the parameters are passed by position (x = 2, y = 3, z = 4). In the second case, the parameters are passed by position, and z defaults to 1. In the third case, the parameters are passed by keyword, and z again defaults to 1. In the final case, y and z are passed by keyword, and x is assumed to be the first parameter not explicitly specified (x = 3). This also demonstrates that parameters passed by keyword can appear in any order.

Parameters are optional, but you must include the parentheses even if no values are being passed. The return() function returns the object produced by the function. It’s also optional, and if it’s missing, the results of the last statement in the function are returned.

You can use the args() function to view the parameter names and default values:

>args(f)

function (x, y, z = 0) NULL

The args() function is designed for interactive viewing. If you need to obtain the parameter names and default values programmatically, use the formals() function. It returns a list with the necessary information.

474

CHAPTER 20 Advanced programming

Parameters are passed by value, not by reference. Consider this function statement:

result <- lm(height ~ weight, data=women)

The dataset women isn’t accessed directly. A copy is made and passed to the function. If the women dataset was very large, RAM could be used up quickly. This can become an issue when you’re dealing with big data problems, and you may need to use special techniques (see appendix G).

OBJECT SCOPE

The scope of the objects in R (how names are resolved to produce contents) is a complex topic. In the typical case,

Objects created outside of any function are global (can be resolved within any function). Objects created within a function are local (available only within the function).

Local objects are discarded at the end of function execution. Only objects passed back via the return() function (or assigned using an operator like <<-) are accessible after the function finishes executing.

Global objects can be accessed (read) from within a function but not altered (again, unless the <<- operator is used).

Objects passed to a function through parameters aren’t altered by the function. Copies of the objects are passed, not the objects themselves.

Here is a simple example:

>x <- 2

>y <- 3

>z <- 4

>f <- function(w){ z <- 2

x <- w*y*z

return(x)

}

>f(x) [1] 12

>x [1] 2

>y [1] 3

>z [1] 4

In this example, a copy of x is passed to the function f(), but the original isn’t altered. The value of y is obtained from the environment. Even though z exists in the environment, the value set in the function is used and doesn’t alter the value in the environment.

To understand scoping rules better, we need to discuss environments.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]