1greenacre_m_primicerio_r_multivariate_analysis_of_ecological
.pdfMichael Greenacre
Raul Primicerio
Multivariate Analysis
of Ecological Data
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data
MICHAEL GREENACRE
RAUL PRIMICERIO
First published December 2013
©The authors, 2013
©Fundación BBVA, 2013
Plaza de San Nicolás, 4. 48005 Bilbao www.fbbva.es publicaciones@fbbva.es
Supporting website www.multivariatestatistics.org provide the data sets used in the book, R scripts for the computations, glossary of terms, and additional material.
A digital copy of this book can be downloaded free of charge at www.fbbva.es.
The BBVA Foundation’s decision to publish this book does not imply any responsibility for its content, or for the inclusion therein of any supplementary documents or information facilitated by the authors.
Edition and production: Rubes Editorial
ISBN: 978-84-92937-50-9
Legal deposit no.: BI-1622-2013
Printed in Spain
Printed by Comgrafic on 100% recycled paper, manufactured to the most stringent European environmental standards.
To Zerrin and Maria
5
MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA
6
CONTENTS
Preface |
|
|
Michael Greenacre and Raul Primicerio .......................................................... |
9 |
|
ECOLOGICAL DATA AND MULTIVARIATE METHODS |
|
|
1. |
Multivariate Data in Environmental Science ....................................... |
15 |
2. |
The Four Corners of Multivariate Analysis .......................................... |
25 |
3. |
Measurement Scales, Transformation and Standardization ............... |
33 |
MEASURING DISTANCE AND CORRELATION |
|
|
4. |
Measures of Distance between Samples: Euclidean ........................... |
47 |
5. |
Measures of Distance between Samples: Non-Euclidean .................... |
61 |
6. |
Measures of Distance and Correlation between Variables.................. |
75 |
VISUALIZING DISTANCES AND CORRELATIONS |
|
|
7. |
Hierarchical Cluster Analysis ................................................................ |
89 |
8. |
Ward Clustering and k-means Clustering............................................. |
99 |
9. |
Multidimensional Scaling ..................................................................... |
109 |
REGRESSION AND PRINCIPAL COMPONENT ANALYSIS |
|
|
10. |
Regression Biplots.................................................................................. |
127 |
11. |
Multidimensional Scaling Biplots ........................................................ |
139 |
12. |
Principal Component Analysis ............................................................. |
151 |
CORRESPONDENCE ANALYSIS |
|
|
13. |
Correspondence Analysis ..................................................................... |
165 |
7
|
MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA |
|
14. |
Compositional Data and Log-ratio Analysis ........................................ |
177 |
15. |
Canonical Correspondence Analysis ................................................... |
189 |
INTERPRETATION, INFERENCE AND MODELLING |
|
|
16. |
Variance Partitioning in PCA, LRA, CA and CCA............................... |
203 |
17. |
Inference in Multivariate Analysis ........................................................ |
213 |
18. |
Statistical Modelling............................................................................... |
229 |
CASE STUDIES
19.Case Study 1: Temporal Trends and Spatial Patterns across a Large
Ecological Data Set ............................................................................... |
249 |
20. Case Study 2: Functional Diversity of Fish in the Barents Sea............ |
261 |
APPENDICES |
|
Appendix A: Aspects of Theory ..................................................................... |
279 |
Appendix B: Bibliography and Web Resources............................................ |
293 |
Appendix C: Computational Note................................................................. |
303 |
List of Exhibits ................................................................................................ |
307 |
Index ............................................................................................................... |
325 |
About the Authors .......................................................................................... |
331 |
8
PREFACE
The world around us – and especially the biological world – is inherently multidimensional. Biological diversity is the product of the interaction between many species, be they marine, plant or animal life, and of the many limiting factors that characterize the environment in which the species live. The environment itself is a complex mix of natural and man-induced parameters: for example, meteorological parameters such as temperature or rainfall, physical parameters such as soil composition or sea depth, and chemical parameters such as level of carbon dioxide or heavy metal pollution.
The properties and patterns we focus on in ecology and environmental biology consist of these many covarying components. Evolutionary ecology has shown that phenotypic traits, with their functional implications, tend to covary due to correlational selection and trade-offs. Community ecology has uncovered gradients in community composition. Much of this biological variation is organized along axes of environmental heterogeneity, consisting of several correlated physical and chemical characteristics. Spectra of functional traits, ecological and environmental gradients, all imply correlated properties that will respond collectively to natural and human perturbations. Small wonder that scientific inference in these fields must rely on statistical tools that help discern structure in datasets with many variables (i.e., multivariate data). These methods are comprehensively referred to as multivariate analysis, or multivariate statistics, the topic of this book. Multivariate analysis uses relationships between variables to order the objects of study according to their collective properties, that is to highlight spectra and gradients, and to classify the objects of study, that is to group species or ecosystems in distinct classes each containing entities with similar properties.
Although multivariate analysis is widely applied in ecology and environmental biology, also thanks to statistical software that makes the variety of methods more accessible, its concepts, potentials and limitations are not always transparent to practitioners. A scattered methodological literature, heterogeneous terminology, and paucity of introductory texts sufficiently comprehensive to provide a methodological overview and synthesis, are partly responsible for this. Another reason is the fact that biologists receive a formal quantitative training often limited to univariate, parametric statistics (regression and analysis of variance), with some
9