Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

►Содержание►

<<< < Предыдущая 39 40 41 42 43 44 45 46 47 48 49 5051 / 17351 52 53 54 55 56 57 58 59 60 61 62 63 > Следующая >>>

142	CHAPTER 7 Basic statistics

type in the describe() function and R searches for it, R comes to the psych package first and executes it. If you want the Hmisc version instead, you can type Hmisc::describe(mt). The function is still there. You have to give R more information to find it.

Now that you know how to generate descriptive statistics for the data as a whole, let’s review how to obtain statistics for subgroups of the data.

7.1.3Descriptive statistics by group

When comparing groups of individuals or observations, the focus is usually on the descriptive statistics of each group, rather than the total sample. Again, there are several ways to accomplish this in R. We’ll start by getting descriptive statistics for each level of transmission type. In chapter 5, we discussed methods of aggregating data. You can use the aggregate() function (section 5.6.2) to obtain descriptive statistics by group, as shown in the following listing.

Listing 7.6 Descriptive statistics by group using aggregate()

>myvars <- c("mpg", "hp", "wt")

>aggregate(mtcars[myvars], by=list(am=mtcars$am), mean)

	am	mpg	hp	wt
1	0	17.1	160	3.77
2	1	24.4	127	2.41

> aggregate(mtcars[myvars], by=list(am=mtcars$am), sd)

	am	mpg	hp	wt
1	0	3.83	53.9	0.777
2	1	6.17	84.1	0.617

Note the use of list(am=mtcars$am). If you used list(mtcars$am), the am column would be labeled Group.1 rather than am. You use the assignment to provide a more useful column label. If you have more than one grouping variable, you can use code like by=list(name1=groupvar1, name2=groupvar2, ... , nameN=groupvarN).

Unfortunately, aggregate() only allows you to use single-value functions such as mean, standard deviation, and the like in each call. It won’t return several statistics at once. For that task, you can use the by() function. The format is

by(data, INDICES, FUN)

where data is a data frame or matrix, INDICES is a factor or list of factors that defines the groups, and FUN is an arbitrary function that operates on all the columns of a data frame. The next listing provides an example.

Listing 7.7 Descriptive statistics by group using by()

>dstats <- function(x)sapply(x, mystats)

>myvars <- c("mpg", "hp", "wt")

>by(mtcars[myvars], mtcars$am, dstats)

		Descriptive statistics		143
mtcars$am:	0
	mpg	hp	wt
n	19.000	19.0000	19.000
mean	17.147	160.2632	3.769
stdev	3.834	53.9082	0.777
skew	0.014	-0.0142	0.976
kurtosis	-0.803	-1.2097	0.142
----------------------------------------
mtcars$am:	1
	mpg	hp	wt
n	13.0000	13.000	13.000
mean	24.3923	126.846	2.411
stdev	6.1665	84.062	0.617
skew	0.0526	1.360	0.210
kurtosis	-1.4554	0.563	-1.174

In this case, dstats() applies the mystats() function from listing 7.2 to each column of the data frame. Placing it in the by() function gives you summary statistics for each level of am.

7.1.4Additional methods by group

The doBy package and the psych package also provide functions for descriptive statistics by group. Again, they aren’t distributed in the base installation and must be installed before first use. The summaryBy() function in the doBy package has the format

summaryBy(formula, data=dataframe, FUN=function)

where the formula takes the form

var1 + var2 + var3 + ... + varN ~ groupvar1 + groupvar2 + ... + groupvarN

Variables on the left of the ~ are the numeric variables to be analyzed, and variables on the right are categorical grouping variables. The function can be any built-in or user-created R function. An example using the mystats() function created in section 7.2.1 is shown in the following listing.

Listing 7.8 Summary statistics by group using summaryBy() in the doBy package

>library(doBy)

>summaryBy(mpg+hp+wt~am, data=mtcars, FUN=mystats)

	am	mpg.n	mpg.mean	mpg.stdev	mpg.skew	mpg.kurtosis	hp.n	hp.mean hp.stdev
1	0	19	17.1	3.83	0.0140	-0.803	19	160	53.9
2	1	13	24.4	6.17	0.0526	-1.455	13	127	84.1

	hp.skew	hp.kurtosis	wt.n	wt.mean	wt.stdev	wt.skew	wt.kurtosis
1	-0.0142	-1.210	19	3.77	0.777	0.976	0.142
2	1.3599	0.563	13	2.41	0.617	0.210	-1.174

The describeBy() function contained in the psych package provides the same descriptive statistics as describe(), stratified by one or more grouping variables, as you can see in the following listing.

<<< < Предыдущая 39 40 41 42 43 44 45 46 47 48 49 5051 / 17351 52 53 54 55 56 57 58 59 60 61 62 63 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx