Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 40 41 42 43 44 45 46 47 48 49 50 5152 / 17352 53 54 55 56 57 58 59 60 61 62 63 64 > Следующая >>>

144	CHAPTER 7 Basic statistics

Listing 7.9 Summary statistics by group using describe.by() in the psych package

>library(psych)

>myvars <- c("mpg", "hp", "wt")

>describeBy(mtcars[myvars], list(am=mtcars$am))

am: 0
	var	n	mean	sd	median	trimmed	mad	min	max
mpg	1	19	17.15	3.83	17.30	17.12	3.11	10.40	24.40
hp	2	19	160.26	53.91	175.00	161.06	77.10	62.00	245.00
wt	3	19	3.77	0.78	3.52	3.75	0.45	2.46	5.42
	range		skew	kurtosis	se
mpg	14.00		0.01	-0.80	0.88
hp	183.00		-0.01	-1.21	12.37
wt	2.96		0.98	0.14	0.18

----------------------------------------------------------------------------

am: 1
	var	n	mean		sd	median	trimmed	mad	min	max
mpg	1	13	24.39		6.17	22.80	24.38	6.67	15.00	33.90
hp	2	13	126.85		84.06	109.00	114.73	63.75	52.00	335.00
wt	3	13		2.41	0.62	2.32	2.39	0.68	1.51	3.57
	range		skew	kurtosis		se
mpg	18.90		0.05		-1.46	1.71
hp	283.00		1.36		0.56	23.31
wt	2.06		0.21		-1.17	0.17

Unlike the previous example, the describeBy() function doesn’t allow you to specify an arbitrary function, so it’s less generally applicable. If there’s more than one grouping variable, you can write them as list(name1=groupvar1, name2=groupvar2, ... , nameN=groupvarN). But this will work only if there are no empty cells when the grouping variables are crossed.

Data analysts have their own preferences for which descriptive statistics to display and how they like to see them formatted. This is probably why there are many variations available. Choose the one that works best for you, or create your own!

7.1.5Visualizing results

Numerical summaries of a distribution’s characteristics are important, but they’re no substitute for a visual representation. For quantitative variables, you have histograms (section 6.3), density plots (section 6.4), box plots (section 6.5), and dot plots (section 6.6). They can provide insights that are easily missed by reliance on a small set of descriptive statistics.

The functions considered so far provide summaries of quantitative variables. The functions in the next section allow you to examine the distributions of categorical variables.

7.2Frequency and contingency tables

In this section, we’ll look at frequency and contingency tables from categorical variables, along with tests of independence, measures of association, and methods for

Frequency and contingency tables

145

graphically displaying results. We’ll be using functions in the basic installation, along with functions from the vcd and gmodels packages. In the following examples, assume that A, B, and C represent categorical variables.

The data for this section come from the Arthritis dataset included with the vcd package. The data are from Kock & Edward (1988) and represent a double-blind clinical trial of new treatments for rheumatoid arthritis. Here are the first few observations:

>library(vcd)

>head(Arthritis)

	ID	Treatment	Sex	Age	Improved
1	57	Treated	Male	27	Some
2	46	Treated	Male	29	None
3	77	Treated	Male	30	None
4	17	Treated	Male	32	Marked
5	36	Treated	Male	46	Marked
6	23	Treated	Male	58	Marked

Treatment (Placebo, Treated), Sex (Male, Female), and Improved (None, Some, Marked) are all categorical factors. In the next section, you’ll create frequency and contingency tables (cross-classifications) from the data.

7.2.1Generating frequency tables

R provides several methods for creating frequency and contingency tables. The most important functions are listed in table 7.1.

Table 7.1 Functions for creating and manipulating contingency tables

Function	Description

table(var1, var2, ..., varN)	Creates an N-way contingency table from N categorical vari-
	ables (factors)
xtabs(formula, data)	Creates an N-way contingency table based on a formula and a
	matrix or data frame
prop.table(table, margins)	Expresses table entries as fractions of the marginal table
	defined by the margins
margin.table(table, margins)	Computes the sum of table entries for a marginal table
	defined by the margins
addmargins(table, margins)	Puts summary margins (sums by default) on a table
ftable(table)	Creates a compact, “flat” contingency table

In the following sections, we’ll use each of these functions to explore categorical variables. We’ll begin with simple frequencies, followed by two-way contingency tables, and end with multiway contingency tables. The first step is to create a table using either the table() or xtabs() function and then manipulate it using the other functions.

146	CHAPTER 7 Basic statistics

ONE-WAY TABLES

You can generate simple frequency counts using the table() function. Here’s an example:

>mytable <- with(Arthritis, table(Improved))

>mytable

Improved

None Some Marked

42 14 28

You can turn these frequencies into proportions with prop.table()

> prop.table(mytable) Improved

None Some Marked 0.500 0.167 0.333

or into percentages using prop.table()*100:

> prop.table(mytable)*100 Improved

None Some Marked

50.016.7 33.3

Here you can see that 50% of study participants had some or marked improvement (16.7 + 33.3).

TWO-WAY TABLES

For two-way tables, the format for the table() function is

mytable <- table(A, B)

where A is the row variable and B is the column variable. Alternatively, the xtabs() function allows you to create a contingency table using formula-style input. The format is

mytable <- xtabs(~ A + B, data=mydata)

where mydata is a matrix or data frame. In general, the variables to be cross-classified appear on the right of the formula (that is, to the right of the ~) separated by + signs. If a variable is included on the left side of the formula, it’s assumed to be a vector of frequencies (useful if the data have already been tabulated).

For the Arthritis data, you have

>mytable <- xtabs(~ Treatment+Improved, data=Arthritis)

>mytable

	Improved
Treatment	None	Some	Marked
Placebo	29	7	7
Treated	13	7	21

You can generate marginal frequencies and proportions using the margin.table() and prop.table() functions, respectively. For row sums and row proportions, you have

> margin.table(mytable, 1) Treatment

Placebo Treated 3 41

Frequency and contingency tables

147

> prop.table(mytable, 1)

Improved

Treatment None Some Marked

Placebo 0.674 0.163 0.163

Treated 0.317 0.171 0.512

The index (1) refers to the first variable in the table() statement. Looking at the table, you can see that 51% of treated individuals had marked improvement, compared to 16% of those receiving a placebo.

For column sums and column proportions, you have

> margin.table(mytable, 2)

Improved
None	Some	Marked
42	14	28
> prop.table(mytable, 2)
	Improved
Treatment	None	Some	Marked
Placebo	0.690	0.500	0.250
Treated	0.310	0.500	0.750

Here, the index (2) refers to the second variable in the table() statement. Cell proportions are obtained with this statement:

> prop.table(mytable)

Improved

Treatment None Some Marked

Placebo 0.3452 0.0833 0.0833

Treated 0.1548 0.0833 0.2500

You can use the addmargins() function to add marginal sums to these tables. For example, the following code adds a Sum row and column:

> addmargins(mytable)
	Improved
Treatment	None	Some	Marked	Sum
Placebo	29	7	7	43
Treated	13	7	21	41
Sum	42	14	28	84
> addmargins(prop.table(mytable))
	Improved
Treatment	None	Some	Marked	Sum
Placebo 0.3452 0.0833 0.0833 0.5119
Treated	0.1548	0.0833	0.2500	0.4881
Sum	0.5000	0.1667	0.3333	1.0000

When using addmargins(), the default is to create sum margins for all variables in a table. In contrast, the following code adds a Sum column alone:

> addmargins(prop.table(mytable, 1), 2)

Improved

Treatment None Some Marked Sum

Placebo 0.674 0.163 0.163 1.000

Treated 0.317 0.171 0.512 1.000

148	CHAPTER 7 Basic statistics

Similarly, this code adds a Sum row:

> addmargins(prop.table(mytable, 2), 1)

	Improved
Treatment	None	Some	Marked
Placebo	0.690	0.500	0.250
Treated	0.310	0.500	0.750
Sum	1.000	1.000	1.000

In the table, you see that 25% of those patients with marked improvement received a placebo.

NOTE The table() function ignores missing values (NAs) by default. To include NA as a valid category in the frequency counts, include the table option useNA="ifany".

A third method for creating two-way tables is the CrossTable() function in the gmodels package. The CrossTable() function produces two-way tables modeled after PROC FREQ in SAS or CROSSTABS in SPSS. The following listing shows an example.

Listing 7.10 Two-way table using CrossTable

>library(gmodels)

>CrossTable(Arthritis$Treatment, Arthritis$Improved)

	Cell Contents
\|	-------------------------			\|
\|			N	\|
\| Chi-square contribution				\|
\|	N / Row	Total		\|
\|	N / Col	Total		\|
\|	N / Table	Total		\|
\|-------------------------				\|
Total Observations in			Table: 84
		\|	Arthritis$Improved
Arthritis$Treatment		\|		None \|		Some \|		Marked \| Row Total \|
-------------------- -----------		\|			\|-----------		\|-----------		\|-----------	\|
	Placebo	\|		29	\|	7	\|	7	\|	43 \|
		\|		2.616	\|	0.004	\|	3.752	\|	\|
		\|		0.674	\|	0.163	\|	0.163	\|	0.512 \|
		\|		0.690	\|	0.500	\|	0.250	\|	\|
		\|		0.345	\|	0.083	\|	0.083	\|	\|
-------------------- -----------		\|			\|-----------		\|-----------		\|-----------	\|
	Treated	\|		13	\|	7	\|	21	\|	41 \|
		\|		2.744	\|	0.004	\|	3.935	\|	\|
		\|		0.317	\|	0.171	\|	0.512	\|	0.488 \|
		\|		0.310	\|	0.500	\|	0.750	\|	\|
		\|		0.155	\|	0.083	\|	0.250	\|	\|
-------------------- -----------		\|			\|-----------		\|-----------		\|-----------	\|
	Column Total	\|		42	\|	14	\|	28	\|	84 \|
		\|		0.500	\|	0.167	\|	0.333	\|	\|
-------------------- -----------		\|			\|-----------		\|-----------		\|-----------	\|

Frequency and contingency tables

149

The CrossTable() function has options to report percentages (row, column, and cell); specify decimal places; produce chi-square, Fisher, and McNemar tests of independence; report expected and residual values (Pearson, standardized, and adjusted standardized); include missing values as valid; annotate with row and column titles; and format as SAS or SPSS style output. See help(CrossTable) for details.

If you have more than two categorical variables, you’re dealing with multidimensional tables. We’ll consider these next.

MULTIDIMENSIONAL TABLES

Both table() and xtabs() can be used to generate multidimensional tables based on three or more categorical variables. The margin.table(), prop.table(), and addmargins() functions extend naturally to more than two dimensions. Additionally, the ftable() function can be used to print multidimensional tables in a compact and attractive manner. An example is given in the next listing.

Listing 7.11 Three-way contingency table

> mytable <- xtabs(~ Treatment+Sex+Improved, data=Arthritis)

>	mytable	b Cell frequencies
,	, Improved = None

Sex

Treatment Female Male

Placebo 19 10

Treated 6 7

, , Improved = Some

	Sex
Treatment	Female	Male
Placebo	7	0
Treated	5	2
, , Improved = Marked
	Sex
Treatment	Female	Male
Placebo	6	1
Treated	16	5
> ftable(mytable)
		Sex Female Male
Treatment	Improved
Placebo	None	19	10
	Some	7	0
	Marked	6	1
Treated	None	6	7
	Some	5	2
	Marked	16	5

c Marginal frequencies

> margin.table(mytable, 1)

150	CHAPTER 7 Basic statistics

Treatment

Placebo Treated

4341

>margin.table(mytable, 2) Sex

Female Male

5925

>margin.table(mytable, 3) Improved

None	Some Marked						d Treatment × Improved
42	14	28					d Treatment × Improved
> margin.table(mytable, c(1, 3))								marginal frequencies
> margin.table(mytable, c(1, 3))
	Improved
Treatment	None Some Marked
Placebo	29	7	7						e Improved proportions
Treated	13	7	21						e Improved proportions
> ftable(prop.table(mytable, c(1, 2)))										for Treatment × Sex
> ftable(prop.table(mytable, c(1, 2)))
		Improved		None	Some Marked
Treatment	Sex
Placebo	Female			0.594	0.219	0.188
	Male			0.909	0.000	0.091
Treated	Female			0.222	0.185	0.593
	Male			0.500	0.143	0.357
> ftable(addmargins(prop.table(mytable, c(1,								2)), 3))
		Improved		None	Some Marked			Sum
Treatment	Sex
Placebo	Female			0.594	0.219	0.188		1.000
	Male			0.909	0.000	0.091		1.000
Treated	Female			0.222	0.185	0.593		1.000
	Male			0.500	0.143	0.357		1.000

The code at b produces cell frequencies for the three-way classification. The code also demonstrates how the ftable() function can be used to print a more compact and attractive version of the table.

The code at c produces the marginal frequencies for Treatment, Sex, and Improved. Because you created the table with the formula ~Treatment+Sex+ Improved, Treatment is referred to by index 1, Sex is referred to by index 2, and Improved is referred to by index 3.

The code at d produces the marginal frequencies for the Treatment x Improved classification, summed over Sex. The proportion of patients with None, Some, and Marked improvement for each Treatment × Sex combination is provided in e. Here you see that 36% of treated males had marked improvement, compared to 59% of treated females. In general, the proportions will add to 1 over the indices not included in the prop.table() call (the third index, or Improved in this case). You can see this in the last example, where you add a sum margin over the third index.

If you want percentages instead of proportions, you can multiply the resulting table by 100. For example, this statement

ftable(addmargins(prop.table(mytable, c(1, 2)), 3)) * 100

<<< < Предыдущая 40 41 42 43 44 45 46 47 48 49 50 5152 / 17352 53 54 55 56 57 58 59 60 61 62 63 64 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx