Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R_inferno.pdf
Скачиваний:
6
Добавлен:
21.05.2015
Размер:
947.8 Кб
Скачать

8.2. CHIMERAS

CIRCLE 8. BELIEVING IT DOES AS INTENDED

>llet <- levels(flet)

>names(llet) <- llet

>llet

a

b

c

d

e

"a"

"b"

"c"

"d"

"e"

>llet[c('a', 'b')] <- 'ab'

>llet

a

b

c

d

e

"ab"

"ab"

"c"

"d"

"e"

>fcom <- factor(llet[as.character(flet)])

>fcom

[1] ab ab c d e ab ab Levels: ab c d e

8.2.6do not subscript with factors

>x6 <- c(s=4, j=55, f=888)

>x6[c('s', 'f')] s f

4 888

>x6[factor(c('s', 'f'))] j s

55 4

8.2.7no go for factors in ifelse

>ifelse(c(TRUE, FALSE, TRUE), factor(letters), + factor(LETTERS))

[1] 1 2 3

>ifelse(c(TRUE, FALSE, TRUE), factor(letters), LETTERS) [1] "1" "B" "3"

(Recall that the length of the output of ifelse is always the length of therst argument. If you were expecting the rst argument to be replicated, you shouldn't have.)

8.2.8no c for factors

c(myfac1, myfac2)

just gives you the combined vector of integer codes. Certainly a method for c could be written for factors, but note it is going to be complicated|the levels of the factors need not match. It would be horribly messy for very little gain. This is a case in which R is not being overly helpful. Better is for you to do the combination that makes sense for the speci c case at hand.

Another reason why there is not a c function for factors is that c is used in other contexts to simplify objects:

84

8.2. CHIMERAS

CIRCLE 8. BELIEVING IT DOES AS INTENDED

> c(matrix(1:4, 2)) [1] 1 2 3 4

The operation that c does on factors is consistent with this.

A generally good solution is:

c(as.character(myfac1), as.character(myfac2))

Or maybe more likely factor of the above expression. Another possibility is:

unlist(list(myfac1, myfac2))

For example:

> unlist(list(factor(letters[1:3]), factor(LETTERS[7:8]))) [1] a b c G H

Levels: a b c G H

This last solution does not work for ordered factors.

8.2.9ordering in ordered

You need a bit of care when creating ordered factors:

>ordered(c(100, 90, 110, 90, 100, 110)) [1] 100 90 110 90 100 110

Levels: 90 < 100 < 110

>ordered(as.character(c(100, 90, 110, 90, 100, 110))) [1] 100 90 110 90 100 110

Levels: 100 < 110 < 90

The automatic ordering is done lexically for characters. This makes sense in general, but not in this case. (Note that the ordering may depend on your locale.) You can always specify levels to have direct control.

You can have essentially this same problem if you try to sort a factor.

8.2.10labels and excluded levels

The number of labels must equal the number of levels. Seems like a good rule. These can be the same going into the function, but need not be in the end. The issue is values that are excluded.

>factor(c(1:4,1:3), levels=c(1:4,NA), labels=1:5) Error in factor(c(1:4, 1:3), levels = c(1:4, NA), ... : invalid labels; length 5 should be 1 or 4

>factor(c(1:4,1:3), levels=c(1:4,NA), labels=1:4)

85

8.2. CHIMERAS

CIRCLE 8. BELIEVING IT DOES AS INTENDED

[1] 1 2 3 4 1 2 3 Levels: 1 2 3 4

> factor(c(1:4,1:3), levels=c(1:4,NA), labels=1:5, + exclude=NULL)

[1] 1 2 3 4 1 2 3 Levels: 1 2 3 4 5

And of course I lied to you. The number of labels can be 1 as well as the number of levels:

> factor(c(1:4,1:3), levels=c(1:4,NA), labels='Blah') [1] Blah1 Blah2 Blah3 Blah4 Blah1 Blah2 Blah3 Levels: Blah1 Blah2 Blah3 Blah4

8.2.11is missing missing or missing?

Missing values of course make sense in factors. It is entirely possible that we don't know the category into which a particular item falls.

>f1 <- factor(c('AA', 'BA', NA, 'NA'))

>f1

[1] AA BA <NA> NA Levels: AA BA NA > unclass(f1) [1] 1 2 NA 3 attr(,"levels")

[1] "AA" "BA" "NA"

As we saw in Circle 8.1.67, there is a di erence between a missing value and the string 'NA'. In f1 there is a category that corresponds to the string 'NA'. Values that are missing are indicated not by the usual NA, but by <NA> (to distinguish them from 'NA' the string when quotes are not used).

It is also possible to have a category that is missing values. This is achieved by changing the exclude argument from its default value:

>f2 <- factor(c('AA', 'BA', NA, 'NA'), exclude=NULL)

>f2

[1] AA BA <NA> NA Levels: AA BA NA NA > unclass(f2)

[1] 1 2 4 3 attr(,"levels")

[1] "AA" "BA" "NA" NA

Unlike in f1 the core data of f2 has no missing values.

Let's now really descend into the belly of the beast.

86

8.2. CHIMERAS

CIRCLE 8. BELIEVING IT DOES AS INTENDED

>f3 <- f2

>is.na(f3)[1] <- TRUE

>f3

[1] <NA> BA <NA> NA Levels: AA BA NA NA > unclass(f3)

[1] NA 2 4 3 attr(,"levels")

[1] "AA" "BA" "NA" NA

Here we have a level that is missing values, we also have a missing value in the core data.1

To summarize, there are two ways that missing values can enter a factor:

Missing means we don't know what category the item falls into.

Missing is the category of items that (originally) had missing values.

8.2.12data frame to character

>xdf3 <- data.frame(a=3:2, b=c('x', 'y'))

>as.character(xdf3[1,])

[1] "3" "1"

This is a hidden version of coercing a factor to character. One approach to get the correct behavior is to use as.matrix:

> as.character(as.matrix(xdf3[1,])) [1] "3" "x"

I'm not sure if it is less upsetting or more upsetting if you try coercing more than one row of a data frame to character:

> as.character(xdf3) [1] "c(3, 2)" "c(1, 2)"

If the columns of the data frame include factors or characters, then converting to a matrix will automatically get you a characters:

> as.matrix(xdf3) a b

[1,] "3" "x" [2,] "2" "y"

1The author would be intrigued to hear of an application where this makes sense|an item for which it is unknown if it is missing or not.

87

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]