7.6 Visualizing group differences

Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 45 46 47 48 49 50 51 52 53 54 55 5657 / 17357 58 59 60 61 62 63 64 65 66 67 68 69 > Следующая >>>

			Visualizing group differences						163
median		0.60	0.70		1.1	1.75
mad		0.15	0.15		0.3	0.59
Multiple Comparisons (Wilcoxon Rank Sum Tests)
Probability Adjustment = holm
Probability Adjustment = holm								d Pairwise comparisons
		Group.1	Group.2	W		p
1		West	North Central	88	8.7e-01
2		West	Northeast 46		8.7e-01
3		West	South 39		1.8e-02		*
4	North	Central	Northeast 20		5.4e-02		.
5	North	Central	South	2	8.1e-05 ***
6	Northeast		South 18		1.2e-02		*
---
Signif. codes:			0 '***' 0.001	'*' 0.01 '' 0.05 '.' 0.1 ' ' 1

The source() function downloads and executes the R script defining the wmc() function b. The function’s format is wmc(y ~ A, data, method), where y is a numeric outcome variable, A is a grouping variable, data is the data frame containing these variables, and method is the approach used to limit Type I errors. Listing 7.17 uses an adjustment method developed by Holm (1979). It provides strong control of the fam- ily-wise error rate (the probability of making one or more Type I errors in a set of comparisons). See help(p.adjust) for a description of the other methods available.

The wmc() function first provides the sample sizes, medians, and median absolute deviations for each group c. The West has the lowest illiteracy rate, and the South has the highest. The function then generates six statistical comparisons (West versus North Central, West versus Northeast, West versus South, North Central versus Northeast, North Central versus South, and Northeast versus South) d. You can see from the two-sided p-values (p) that the South differs significantly from the other three regions and that the other three regions don’t differ from each other at a p < .05 level.

Nonparametric multiple comparisons are a useful set of techniques that aren’t easily accessible in R. In chapter 21, you’ll have an opportunity to expand the wmc() function into a fully developed package that includes error checking and informative graphics.

7.6Visualizing group differences

In sections 7.4 and 7.5, we looked at statistical methods for comparing groups. Examining group differences visually is also a crucial part of a comprehensive data-analysis strategy. It allows you to assess the magnitude of the differences, identify any distributional characteristics that influence the results (such as skew, bimodality, or outliers), and evaluate the appropriateness of the test assumptions. R provides a wide range of graphical methods for comparing groups, including box plots (simple, notched, and violin), covered in section 6.5; overlapping kernel density plots, covered in section 6.4.1; and graphical methods for visualizing outcomes in an ANOVA framework, discussed in chapter 9. Advanced methods for visualizing group differences, including grouping and faceting, are discussed in chapter 19.

164	CHAPTER 7 Basic statistics

7.7Summary

In this chapter, we reviewed the functions in R that provide basic statistical summaries and tests. We looked at sample statistics and frequency tables, tests of independence and measures of association for categorical variables, correlations between quantitative variables (and their associated significance tests), and comparisons of two or more groups on a quantitative outcome variable.

In the next chapter, we’ll explore simple and multiple regression, where the focus is on understanding relationships between one (simple) or more than one (multiple) predictor variables and a predicted or criterion variable. Graphical methods will help you diagnose potential problems, evaluate and improve the fit of your models, and uncover unexpected gems of information in your data.

Part 3

Intermediate methods

Whereas part 2 of this book covered basic graphical and statistical methods, part 3 discusses intermediate methods. We move from describing the relationship between two variables to, in chapter 8, using regression models to model the relationship between a numerical outcome variable and a set of numeric and/or categorical predictor variables. Modeling data is typically a complex, multistep, interactive process. Chapter 8 provides step-by-step coverage of the methods available for fitting linear models, evaluating their appropriateness, and interpreting their meaning.

Chapter 9 considers the analysis of basic experimental and quasi-experimen- tal designs through the analysis of variance and its variants. Here we’re interested in how treatment combinations or conditions affect a numerical outcome variable. The chapter introduces the functions in R that are used to perform an analysis of variance, analysis of covariance, repeated measures analysis of variance, multifactor analysis of variance, and multivariate analysis of variance. Methods for assessing the appropriateness of these analyses and visualizing the results are also discussed.

In designing experimental and quasi-experimental studies, it’s important to determine whether the sample size is adequate for detecting the effects of interest (power analysis). Otherwise, why conduct the study? A detailed treatment of power analysis is provided in chapter 10. Starting with a discussion of hypothesis testing, the presentation focuses on how to use R functions to determine the sample size necessary to detect a treatment effect of a given size with a given degree of confidence. This can help you to plan studies that are likely to yield useful results.

166	Intermediate methods
	CHAPTER

Chapter 11 expands on the material in chapter 5 by covering the creation of graphs that help you to visualize relationships among two or more variables. This includes the various types of 2D and 3D scatter plots, scatter-plot matrices, line plots, and bubble plots. It also introduces the very useful, but less well-known, corrgrams, and mosaic plots.

The linear models described in chapters 8 and 9 assume that the outcome or response variable is not only numeric, but also randomly sampled from a normal distribution. There are situations where this distributional assumption is untenable. Chapter 12 presents analytic methods that work well in cases where data is sampled from unknown or mixed distributions, where sample sizes are small, where outliers are a problem, or where devising an appropriate test based on a theoretical distribution is mathematically intractable. They include both resampling and bootstrapping approaches—computer-intensive methods that are powerfully implemented in R. The methods described in this chapter will allow you to devise hypothesis tests for data that don’t fit traditional parametric assumptions.

After completing part 3, you’ll have the tools to analyze most common dataanalytic problems encountered in practice. And you’ll be able to create some gorgeous graphs!

<<< < Предыдущая 45 46 47 48 49 50 51 52 53 54 55 5657 / 17357 58 59 60 61 62 63 64 65 66 67 68 69 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx