Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
R in Action, Second Edition.pdf
Скачиваний:
540
Добавлен:
26.03.2016
Размер:
20.33 Mб
Скачать

Summary

113

With aggregation

dcast(md, ID~variable, mean)

ID

X1

X2

 

 

 

1

4

5.5

 

 

 

2

4

2.5

 

 

 

(a)

dcast(md, Time~variable, mean)

Time

X1

X2

 

 

 

1

5.5

3.5

 

 

 

2

2.5

4.5

 

 

 

(b)

dcast(md, ID~Time, mean)

ID

Time1

Time2

1

5.5

4

 

 

 

2

3.5

3

 

 

 

(c)

Reshaping a dataset

mydata

ID

Time

X1

X2

 

 

 

 

1

1

5

6

 

 

 

 

1

2

3

5

 

 

 

 

2

1

6

1

 

 

 

 

2

2

2

4

 

 

 

 

md <- melt(mydata, ID=c("ID", "Time"))

ID

Time

variable

value

 

 

 

 

1

1

X1

5

 

 

 

 

1

2

X1

3

 

 

 

 

2

1

X1

6

 

 

 

 

2

2

X1

2

 

 

 

 

1

1

X2

6

 

 

 

 

1

2

X2

5

 

 

 

 

2

1

X2

1

 

 

 

 

2

2

X2

4

 

 

 

 

Without aggregation

dcast(md, ID+Time~variable)

ID

Time

X1

X2

 

 

 

 

1

1

5

6

 

 

 

 

1

2

3

5

 

 

 

 

2

1

6

1

 

 

 

 

2

2

2

4

 

 

 

 

(d)

dcast(md, ID+variable~Time)

ID

variable

Time1

Time 2

 

 

 

 

1

X1

5

3

 

 

 

 

1

X2

6

5

 

 

 

 

2

X1

6

2

 

 

 

 

2

X2

1

4

 

 

 

 

(e)

dcast(md, ID~variable+Time)

ID

X1

X1

X2

X2

 

Time1

Time2

Time1

Time2

1

5

3

6

5

 

 

 

 

 

2

6

2

1

4

 

 

 

 

 

(f)

Figure 5.1 Reshaping data with the melt() and dcast() functions

mean as an aggregating function. Thus the data is not only reshaped but aggregated as well. For example, example a gives the means on X1 and X2 averaged over time for each observation. Example b gives the mean scores of X1 and X2 at Time 1 and Time 2, averaged over observations. In example c, you have the mean score for each observation at Time 1 and Time 2, averaged over X1 and X2.

As you can see, the flexibility provided by the melt() and dcast() functions is amazing. There are many times when you’ll have to reshape or aggregate data prior to analysis. For example, you’ll typically need to place your data in what’s called long format, resembling table 5.9, when analyzing repeated-measures data (data where multiple measures are recorded for each observation). See section 9.6 for an example.

5.7Summary

This chapter reviewed dozens of mathematical, statistical, and probability functions that are useful for manipulating data. You saw how to apply these functions to a wide range of data objects including vectors, matrices, and data frames. You learned to use control-flow constructs for looping and branching to execute some statements repetitively and execute other statements only when certain conditions are met. You then

114

CHAPTER 5 Advanced data management

had a chance to write your own functions and apply them to data. Finally, we explored ways of collapsing, aggregating, and restructuring data.

Now that you’ve gathered the tools you need to get your data into shape (no pun intended), you’re ready to bid part 1 goodbye and enter the exciting world of data analysis! In upcoming chapters, we’ll begin to explore the many statistical and graphical methods available for turning data into information.

Part 2

Basic methods

In part 1, we explored the R environment and discussed how to input data from a wide variety of sources, combine and transform it, and prepare it for further analyses. Once your data has been input and cleaned up, the next step is typically to explore the variables one at a time. This provides you with information about the distribution of each variable, which is useful in understanding the characteristics of the sample, identifying unexpected or problematic values, and selecting appropriate statistical methods. Next, variables are typically studied two at a time. This can help you to uncover basic relationships among variables and is a useful first step in developing more complex models.

Part 2 focuses on graphical and statistical techniques for obtaining basic information about data. Chapter 6 describes methods for visualizing the distribution of individual variables. For categorical variables, this includes bar plots, pie charts, and the newer fan plot. For numeric variables, this includes histograms, density plots, box plots, dot plots, and the less well-known violin plot. Each type of graph is useful for understanding the distribution of a single variable.

Chapter 7 describes statistical methods for summarizing individual variables and bivariate relationships. The chapter starts with coverage of descriptive statistics for numerical data based on the dataset as a whole and on subgroups of interest. Next, the use of frequency tables and cross-tabulations for summarizing categorical data is described. The chapter ends by discussing basic inferential methods for understanding relationships between two variables at a time, including bivariate correlations, chi-square tests, t-tests, and nonparametric methods.

When you have finished this part of the book, you’ll be able to use basic graphical and statistical methods available in R to describe your data, explore group differences, and identify significant relationships among variables.

116

CHAPTER

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]