Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный исследовательский университет «Высшая школа экономики»

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

R in Action, Second Edition.pdf

Скачиваний:

540

Добавлен:

26.03.2016

Размер:

20.33 Mб

Скачать

☆

<<< < Предыдущая 29 30 31 32 33 34 35 36 37 38 39 4041 / 17341 42 43 44 45 46 47 48 49 50 51 52 53 > Следующая >>>

Summary

113

With aggregation

dcast(md, ID~variable, mean)

ID	X1	X2

1	4	5.5

2	4	2.5

(a)

dcast(md, Time~variable, mean)

Time	X1	X2

1	5.5	3.5

2	2.5	4.5

(b)

dcast(md, ID~Time, mean)

ID	Time1	Time2
1	5.5	4

2	3.5	3

(c)

Reshaping a dataset

mydata

ID	Time	X1	X2

1	1	5	6

1	2	3	5

2	1	6	1

2	2	2	4

md <- melt(mydata, ID=c("ID", "Time"))

ID	Time	variable	value

1	1	X1	5

1	2	X1	3

2	1	X1	6

2	2	X1	2

1	1	X2	6

1	2	X2	5

2	1	X2	1

2	2	X2	4

Without aggregation

dcast(md, ID+Time~variable)

ID	Time	X1	X2

1	1	5	6

1	2	3	5

2	1	6	1

2	2	2	4

(d)

dcast(md, ID+variable~Time)

ID	variable	Time1	Time 2

1	X1	5	3

1	X2	6	5

2	X1	6	2

2	X2	1	4

(e)

dcast(md, ID~variable+Time)

ID	X1	X1	X2	X2
	Time1	Time2	Time1	Time2
1	5	3	6	5

2	6	2	1	4

(f)

Figure 5.1 Reshaping data with the melt() and dcast() functions

mean as an aggregating function. Thus the data is not only reshaped but aggregated as well. For example, example a gives the means on X1 and X2 averaged over time for each observation. Example b gives the mean scores of X1 and X2 at Time 1 and Time 2, averaged over observations. In example c, you have the mean score for each observation at Time 1 and Time 2, averaged over X1 and X2.

As you can see, the flexibility provided by the melt() and dcast() functions is amazing. There are many times when you’ll have to reshape or aggregate data prior to analysis. For example, you’ll typically need to place your data in what’s called long format, resembling table 5.9, when analyzing repeated-measures data (data where multiple measures are recorded for each observation). See section 9.6 for an example.

5.7Summary

This chapter reviewed dozens of mathematical, statistical, and probability functions that are useful for manipulating data. You saw how to apply these functions to a wide range of data objects including vectors, matrices, and data frames. You learned to use control-flow constructs for looping and branching to execute some statements repetitively and execute other statements only when certain conditions are met. You then

114	CHAPTER 5 Advanced data management

had a chance to write your own functions and apply them to data. Finally, we explored ways of collapsing, aggregating, and restructuring data.

Now that you’ve gathered the tools you need to get your data into shape (no pun intended), you’re ready to bid part 1 goodbye and enter the exciting world of data analysis! In upcoming chapters, we’ll begin to explore the many statistical and graphical methods available for turning data into information.

Part 2

Basic methods

In part 1, we explored the R environment and discussed how to input data from a wide variety of sources, combine and transform it, and prepare it for further analyses. Once your data has been input and cleaned up, the next step is typically to explore the variables one at a time. This provides you with information about the distribution of each variable, which is useful in understanding the characteristics of the sample, identifying unexpected or problematic values, and selecting appropriate statistical methods. Next, variables are typically studied two at a time. This can help you to uncover basic relationships among variables and is a useful first step in developing more complex models.

Part 2 focuses on graphical and statistical techniques for obtaining basic information about data. Chapter 6 describes methods for visualizing the distribution of individual variables. For categorical variables, this includes bar plots, pie charts, and the newer fan plot. For numeric variables, this includes histograms, density plots, box plots, dot plots, and the less well-known violin plot. Each type of graph is useful for understanding the distribution of a single variable.

Chapter 7 describes statistical methods for summarizing individual variables and bivariate relationships. The chapter starts with coverage of descriptive statistics for numerical data based on the dataset as a whole and on subgroups of interest. Next, the use of frequency tables and cross-tabulations for summarizing categorical data is described. The chapter ends by discussing basic inferential methods for understanding relationships between two variables at a time, including bivariate correlations, chi-square tests, t-tests, and nonparametric methods.

When you have finished this part of the book, you’ll be able to use basic graphical and statistical methods available in R to describe your data, explore group differences, and identify significant relationships among variables.

116

CHAPTER

<<< < Предыдущая 29 30 31 32 33 34 35 36 37 38 39 4041 / 17341 42 43 44 45 46 47 48 49 50 51 52 53 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
05.08.2019741.83 Кб0psihologia.rtf
#
02.06.2015162.69 Кб76Psyh_final_ver.docx
#
02.06.2015141.74 Кб44Psyh_final_ver.docx
#
26.03.2016226.3 Кб23public_corporation.doc
#
26.03.2016451.53 Кб7pud_finansovyy-menedjment_318476.pdf
#
26.03.201620.33 Mб540R in Action, Second Edition.pdf
#
26.03.2016296.21 Кб17Radaev_Kak_napisat_akademicheskiy_text.pdf
#
26.03.20163.76 Mб4Raeff_Modernity.pdf
#
26.03.20162.12 Mб19raigorodskii_d_ya_hrestomatiya_psihologiya_lich.pdf
#
02.06.2015494.59 Кб6raschet_SRK_smorodin.doc
#
02.06.201563.98 Кб4referat_IOGP_3.docx