Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

142

CHAPTER 7 Basic statistics

type in the describe() function and R searches for it, R comes to the psych package first and executes it. If you want the Hmisc version instead, you can type Hmisc::describe(mt). The function is still there. You have to give R more information to find it.

Now that you know how to generate descriptive statistics for the data as a whole, let’s review how to obtain statistics for subgroups of the data.

7.1.3Descriptive statistics by group

When comparing groups of individuals or observations, the focus is usually on the descriptive statistics of each group, rather than the total sample. Again, there are several ways to accomplish this in R. We’ll start by getting descriptive statistics for each level of transmission type. In chapter 5, we discussed methods of aggregating data. You can use the aggregate() function (section 5.6.2) to obtain descriptive statistics by group, as shown in the following listing.

Listing 7.6 Descriptive statistics by group using aggregate()

>myvars <- c("mpg", "hp", "wt")

>aggregate(mtcars[myvars], by=list(am=mtcars$am), mean)

 

am

mpg

hp

wt

1

0

17.1

160

3.77

2

1

24.4

127

2.41

> aggregate(mtcars[myvars], by=list(am=mtcars$am), sd)

 

am

mpg

hp

wt

1

0

3.83

53.9

0.777

2

1

6.17

84.1

0.617

Note the use of list(am=mtcars$am). If you used list(mtcars$am), the am column would be labeled Group.1 rather than am. You use the assignment to provide a more useful column label. If you have more than one grouping variable, you can use code like by=list(name1=groupvar1, name2=groupvar2, ... , nameN=groupvarN).

Unfortunately, aggregate() only allows you to use single-value functions such as mean, standard deviation, and the like in each call. It won’t return several statistics at once. For that task, you can use the by() function. The format is

by(data, INDICES, FUN)

where data is a data frame or matrix, INDICES is a factor or list of factors that defines the groups, and FUN is an arbitrary function that operates on all the columns of a data frame. The next listing provides an example.

Listing 7.7 Descriptive statistics by group using by()

>dstats <- function(x)sapply(x, mystats)

>myvars <- c("mpg", "hp", "wt")

>by(mtcars[myvars], mtcars$am, dstats)

 

 

Descriptive statistics

143

mtcars$am:

0

 

 

 

 

mpg

hp

wt

 

n

19.000

19.0000

19.000

 

mean

17.147

160.2632

3.769

 

stdev

3.834

53.9082

0.777

 

skew

0.014

-0.0142

0.976

 

kurtosis

-0.803

-1.2097

0.142

 

----------------------------------------

 

mtcars$am:

1

 

 

 

 

mpg

hp

wt

 

n

13.0000

13.000

13.000

 

mean

24.3923

126.846

2.411

 

stdev

6.1665

84.062

0.617

 

skew

0.0526

1.360

0.210

 

kurtosis

-1.4554

0.563

-1.174

 

In this case, dstats() applies the mystats() function from listing 7.2 to each column of the data frame. Placing it in the by() function gives you summary statistics for each level of am.

7.1.4Additional methods by group

The doBy package and the psych package also provide functions for descriptive statistics by group. Again, they aren’t distributed in the base installation and must be installed before first use. The summaryBy() function in the doBy package has the format

summaryBy(formula, data=dataframe, FUN=function)

where the formula takes the form

var1 + var2 + var3 + ... + varN ~ groupvar1 + groupvar2 + ... + groupvarN

Variables on the left of the ~ are the numeric variables to be analyzed, and variables on the right are categorical grouping variables. The function can be any built-in or user-created R function. An example using the mystats() function created in section 7.2.1 is shown in the following listing.

Listing 7.8 Summary statistics by group using summaryBy() in the doBy package

>library(doBy)

>summaryBy(mpg+hp+wt~am, data=mtcars, FUN=mystats)

 

am

mpg.n

mpg.mean

mpg.stdev

mpg.skew

mpg.kurtosis

hp.n

hp.mean hp.stdev

1

0

19

17.1

3.83

0.0140

-0.803

19

160

53.9

2

1

13

24.4

6.17

0.0526

-1.455

13

127

84.1

 

hp.skew

hp.kurtosis

wt.n

wt.mean

wt.stdev

wt.skew

wt.kurtosis

1

-0.0142

-1.210

19

3.77

0.777

0.976

0.142

2

1.3599

0.563

13

2.41

0.617

0.210

-1.174

The describeBy() function contained in the psych package provides the same descriptive statistics as describe(), stratified by one or more grouping variables, as you can see in the following listing.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]