Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

1greenacre_m_primicerio_r_multivariate_analysis_of_ecological

.pdf
Скачиваний:
3
Добавлен:
19.11.2019
Размер:
7.36 Mб
Скачать

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

the logits of the probabilities are modelled as linear functions of

 

the predictor ......................................................................................

234

Exhibit 18.5: A scatterplot of a continuous response y and a predictor x, showing

 

a scatterplot smoother (brown line) which suggests a nonlinear

 

relationship. The estimated quadratic relationship is shown by a

 

dashed curve .......................................................................................

236

Exhibit 18.6: Same data as in Exhibit 18.5, with the estimated quadratic relation-

 

ship in gray, and the relationship according to (18.5) shown by

 

black dashed lines ..............................................................................

237

Exhibit 18.7: Linear regression relationships (18.6) between diversity and tem-

 

perature that depend on the depth (illustrated for three values of

 

depth), showing an interaction effect (regression lines with differ-

 

ent slopes). If the interaction effect were absent, the three lines

 

would be parallel and the effect of temperature would be the same

 

(i.e., not contingent on depth) .........................................................

239

Exhibit 18.8: Generalized additive modelling of diversity as smooth functions of

 

depth and temperature: depth is diagnosed as having a significant

 

quadratic relationship, while the slightly increasing linear relation-

 

ship with temperature is non-significant. Both plots are centred verti-

 

cally at mean diversity, so show estimated deviations from the mean.

 

Confidence regions for the estimated relationships are also shown

240

Exhibit 18.9: Contour plot (upper) and perspective plot (down) of the diag-

 

nosed interaction regression surface of depth and temperature,

 

predicting the deviations from mean diversity. The concave rela-

 

tionship with depth is clearly seen as well as the slight relationship

 

with depth. The difference between the model with or without

 

interactions is, however, not significant ............................................

242

Exhibit 18.10: Classification tree model for predicting the presence of polar cod.

 

The one branch which predicts their presence gives the rule: tem-

 

perature < 1.6 ºC and depth 306 m. This rule correctly predicts

 

the presence of polar cod in 16 samples but misclassifies 5 samples

 

as having polar cod when they do not ..............................................

243

Exhibit 18.11: Comparison of misclassification rates for the classification tree of

 

Exhibit 18.10, compared to that for logistic regression, using the

 

same predictors. The classification tree correctly predicts presence

 

and absence in 79 of the 89 samples, while logistic regression cor-

 

rectly predicts 74 ................................................................................

244

Exhibit 18.12: Regression tree predicting fish diversity from latitude and longi-

 

tude of sample positions. The terminal nodes give the average di-

 

320

LIST OF EXHIBITS

 

versity of the samples that fall into them. This tree yields the spatial

classification of the sampling region given in Exhibit 18.13

........... 244

Exhibit 18.13: Map of Barents Sea showing the locations of the 89 sampling sites

(see Exhibit 11.1) and the slicing up of the region according to the

regression tree of Exhibit 18.12, and the average fish diversities in

each block. Most of the slices divide the latitudes north to south,

with just two east-west divisions of the longitudes. Dark locations

show the 21 sites where polar cod were found ...............................

245

Exhibit 19.1: (a) Actual number of sample taken in three regions over a three-

year period, with overall proportions of samples in each region over

the whole period. (b) Expected number of samples if in each year

sampling had taken place in accordance with the overall propor-

tions. (c) The weights computed by dividing the values in table (b)

by those in table (a) ...........................................................................

250

Exhibit 19.2: Sums of fuzzy-coded regional categories for each year and for

all years. Columns are the eight compass points and a central

region (C) ..........................................................................................

252

Exhibit 19.3: Weights for data according to year and fuzzy region .......................

252

Exhibit 19.4: Correspondence analysis contribution biplot of the “Barents fish

trend” data set. The upper plot shows the active data, the samples

and species (high-contributing species are shown with bigger

labels).The lower plot shows the centroids of all the categories,

linking together categories of ordinal variables. 32.6% of the total

inertia of 4.017 is explained by these two first dimensions ..............

254

Exhibit 19.5: Canonical correspondence analysis of the “Barents fish trends”

data. The format is the same as Exhibit 19.4, with the samples and

species plotted in the upper biplot and an enlarged version of the

category centroids in the lower plot. 58.5% of the restricted inertia

is explained by these two dimensions ...............................................

256

Exhibit 19.6: Temporal trajectories in regional categories north, east, south and

west. Time and regional centroids are at (weighted) averages of the

corresponding category points: for example, S is at the average of

the six points making up the trajectory for south, while 2004 is at

the average of all the 2004 points (for all nine regions, only four

shown here) ...................................................................................

257

Exhibit 19.7: In descending order, the proportion of inertia explained, R2, and

adjusted R2, of the five categorical environmental variables; k is the

number of categories .........................................................................

257

321

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

Exhibit 19.8: Scree plot of the inertias of successive dimensions in the con-

 

strained space of the CCA of the “Barents fish trends” data. The first

 

three dimensions clearly stand out from the rest .............................

258

Exhibit 19.9: Partitioning the total inertia in the abundance data into parts due

 

to the spatial variables and other variables separately, and their part

 

in common .........................................................................................

259

Exhibit 20.1: Part of the trait matrix coding the various functional character-istics

 

of Barents Sea fish species .................................................................

262

Exhibit 20.2: CA of the trait matrix, part of which is shown in Exhibit 20.1. Traits

 

are shown in principal coordinates in (a) and the fish species in

 

principal coordinates in (b). 27.4% of the inertia is displayed .........

264

Exhibit 20.3: Hierarchical clustering of fish based on distances between fish,

 

showing boxes indicating eight clusters ............................................

265

Exhibit 20.4: CA of the trait matrix aggregated according to the fish groups (G1

 

to G8) that were defined in Exhibit 20.3. The solution optimizes

 

the group differences, although the basic configuration is similar

 

to that of Exhibit 20.2 which optimized the fish differences. The

 

functional traits are displayed in contribution coordinates in (a).

 

52.4% of the inertia between fish groups is displayed .....................

266

Exhibit 20.5: (a) Histogram of the group-based FDs defined as Shannon-Weaver

 

diversities on the aggregated abundances in 600 samples for eight

 

functional groups; (b) Histogram of the tree-based FDs using pres-

 

ences only and summing the branches in the dendrogram for the

 

subset of observed species, normalized with respect to the FD of the

 

species pool; (c) Scatterplot of the two functional diversity indices

 

(Spearman rho correlation = 0.300) .................................................

268

Exhibit 20.6: Permutation distribution of the species pool FD, under the null hy-

 

pothesis of no relationship between the traits. The observed value

 

of 20.31 is the smallest and the associated p -value, based on 1,000

 

permutations, is thus p = 0.001 ..........................................................

269

Exhibit 20.7: Scatterplots of the two FD measures versus species richness (SR,

 

the number of species in sample), showing the modelled quadratic

 

relationships. The horizontal axis is marked with the value of SR,

 

and below the number of sites with the corresponding value) .......

270

Exhibit 20.8: Scatterplots of the variables depth, slope, temperature, longitude

 

and latitude with one another as well as with the two measures of

 

functional diversity, based on the functional groups (FDgroup)

 

and on the dendrogram (FDtree). Spearman rank correlations are

 

322

LIST OF EXHIBITS

 

shown in the upper triangle, with font size proportional to their

 

absolute values ....................................................................................

271

Exhibit 20.9: Contour plots of the spatial component of functional diversity ac-

 

cording to the two definitions (first row is the tree-based FD, second

 

row is group-based FD) using two modelling methods (in columns,

 

first column is using fuzzy spatial categories, second is using GAM

 

modelling). The northern border of Norway with Russia and the

 

southern tip of Svalbard situate the region of interest ....................

272

Exhibit 20.10: Plot of regression coefficients for each year showing average esti-

 

mated year effects for the residuals of (tree-based) functional diver-

 

sities from the spatial model, with p -values for testing differences

 

compared to the zero mean of the residuals (dashed line) ............

273

Exhibit 20.11: GAM of tree-based FD as a smooth function of depth (p = 0.0001)

 

and temperature (p 0.0001). To model these effects parametrically

 

depth would be modelled as a quadratic and temperature linear ....

274

323

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

324

INDEX

Akaike information criterion, 235, 239, 241, 272E, 273

AIC, see Akaike information criterion analysis of covariance, 30

of variance, 21, 23E, 28, 30, 99-101, 221, 283, 305

ANCOVA, see analysis of covariance ANOVA, see analysis of variance association, test between two groups of vari-

ables, 221, 223, 227

test between two variables, 82, 216, 217 average profile, 55, 56E

barycentre, 169

barycentric relationship, 169

biplot, 10, 146, 155E, 156, 159, 161, 165, 166, 175, 176, 181-183E, 184, 186

axis, 129, 130, 132, 143, 176, 181, 184, 188, 192

column-principal, 156, 157E, 161, 174 contribution, 170-174, 177, 186, 187, 190,

208, 209E, 211, 254E log-ratio, 181, 182, 188 multidimensional scaling, 139-149

regression, 127, 132-134E, 135, 137E-139, 143, 190, 193

row-principal, 155E, 156, 161, 168E, 170, 171E, 174, 182E, 184E, 186E

scaling, 171E

support of, 135, 138, 190 Bonferroni correction, 20 bootstrap distribution, 215E-217E

bootstrapping, 215, 217, 226, 280, 288, 289 Box-and-whisker plot, 15, 19-20E, 23E, 24

100, 304

Box-Cox transformation, 37E, 38, 42, 183, 280

boxplot, 21, 304

Bray-Curtis dissimilarity, 61-64E, 67, 73, 80, 118, 120E, 141E, 166, 281, 284, 305 between count variables, 81E

compared to chi-square distance, 6466, 141

BSS, see sum of squares, between-group

CA, see correspondence analysis

CCA, see canonical correspondence analysis canonical correlation analysis, 135-138, 190 correspondence analysis, 31, 32, 189-199, 203, 209E-211, 221, 225, 256E, 258E,

259, 288

as analysis of weighted averages, 194-195

as discriminant analysis, 197-199 of reweighted data, 253-255 partial, 194, 199, 258, 288

categorical variable, 16-19, 24, 28-31, 38, 39, 42, 57, 68, 72, 80-82, 85, 133, 282, 304 chi-square distance, 55-58, 63-68, 73, 81E,

112E, 165, 166, 170, 174, 176, 263, 281, 306

between count variables, 80 compared to Bray-Curtis dissimilarity,

64-66, 141

Note: E after the page number indicates a reference to an Exhibit.

325

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

on raw counts, 55, 63-66 city-block distance, 61, 67, 68E, 71, 73 classification, 28, 29, 34E, 244E

tree, 241, 243, 244 closure, 35, 188

clustering, 29, 89, 94, 95, 101, 104, 105, 107, 114-116E, 225, 227, 275, 283, 289 average linkage, 93, 94, 97, 104 comparing two solutions, 104 complete linkage, 94, 96E, 97, 104, 263,

282

hierarchical, 89-93E, 95-99, 102, 105, 107, 225, 263-265E, 275, 282, 283, 305

k-means, 99, 104-107, 283 nonhierarchical, 99, 104-106, 282, 283,

305

of correlations, 95 single linkage, 94, 97

Ward, 99, 102-107, 263, 283, 305 weighting the objects, 106

complete linkage, 94, 96E, 97, 104, 262, 282 composition, 177-180E, 280 compositional data, 26, 34E, 35, 139, 168,

181, 185, 188, 194, 280, 281 zeros in, 185-187

conditional distribution, 230-233E, 245, 290 mean, 230E-234, 269, 290

confirmatory data analysis, 33-36 continuous variable, 17-19E, 24, 27-31, 34,

39E-43, 51, 52E, 55, 70-72, 80E, 99, 143149, 204, 262, 282, 304-306

contour, 129-132, 138, 144, 145E, 241, 242E, 272E

nonlinear, 144, 145, 149 contribution, 206-208

biplot, 170-174, 186-188, 190, 208, 209211, 254E

coordinates, 171, 172E, 174, 176, 191E, 193E, 208, 211, 266E, 286, 305

in constrained analysis, 209-211 coordinate, contribution, 171, 172E, 174,

176, 191E, 193E, 208, 211, 266E, 286, 305

principal, 156, 159E, 161, 174-176, 191E, 264E, 283-286

standard, 156, 161, 171, 174-176, 208, 286, 287

correlation, 18-19

as a scalar product, 77-79 between points and axes, 208-209 circle of, 158, 159E

point biserial, 19

Spearman rank, 79-80, 142E, 217, 271E, 284, 304

correspondence analysis, 31, 165-176, 183, 184E, 185E, 187E, 190, 191E, 211, 286

of reweighted data, 253-255

count variable, 15, 18, 24, 36, 54, 55, 80, 85, 120, 280

Cramer’s V coefficient, 82, 104

data set “Barents fish”, 139, 140, 175E “Barents fish trends”, 251-254E, 256E,

258E

“bioenv”, 15-17, 190-193E, 207E, 209E, 213, 216, 221

“climate”, 151-152, 154-157E, 159, 160E

“fatty acid”, 180-182E, 184-187E, 205 defuzzification, 40

dendrogram, 89-97, 99, 102, 104, 109, 225, 226E, 265-275, 282, 283, 289

deviance, 234, 235, 241 null, 235

residual, 235, 241

direct gradient analysis, 192

dissimilarity, 61, 67-73, 80, 82, 83, 89-98, 112-116, 120E, 281-283

Bray-Curtis, 61-64E, 67, 73, 80, 120E, 141E, 166, 281, 284, 305

Sørensen, 62, 63, 281 distance, 61, 281

based on correlation, 79, 80 between categories, 80-82, 85 between count variables, 80

chi-square, 55-58, 63-68, 73, 81E, 112E, 165, 166, 170, 174, 176, 263, 281, 306

city-block, 61, 67, 68E, 71, 73

326

INDEX

definition, 61

Euclidean, 47-59, 61, 68E, 73, 75, 80, 97E, 103E, 109, 142, 153-158, 161, 206, 281, 306

for categorical data, 57-59 for mixed-scale data, 70-72 Manhattan, 67

standardized Euclidean, 51, 53, 54, 59, 97E, 103E, 153E, 154E, 281

taxicab, 67

weighted Euclidean, 53-55, 58, 59, 161, 281

distribution, bivariate, 15, 17, 24, 216 normal, 35, 36, 230E, 235, 288, 304 Poisson, 42, 232, 233, 290 univariate, 17, 24

dummy variable, 38-39, 42, 58, 71, 191-195, 210, 280, 282

eigenvalue, 111-113, 120E, 123, 158-161, 169, 170E, 206-208, 224-227, 284

Euclidean distance, 47-59, 61, 68E, 73, 75, 80, 97E, 103E, 109, 142, 153-158, 161, 206, 281, 306

embeddable, 284 exploratory data analysis, 35, 36

FD, see functional diversity

F-statistic, 21, 23E, 101

F-test, 101

functional dispersion, 269

diversity, 261, 263, 265, 267-272, 275 group-based, 268E, 270E-275 tree-based, 269-276

trait, 261-267, 275

fuzzy coding, 39, 40, 135, 136E, 144-149, 195-197, 236, 237, 251-253, 271, 282

of interaction, 145-148

of spatial coordinates, 147E, 148E

GAM, see generalized additive model general linear model, 28, 30

generalized additive model, 229, 237-240E, 246, 271, 274E, 276, 290-291

linear model, 30-32, 133, 229, 232, 233, 245, 290

GLM, see generalized linear model Gower’s dissimilarity, 282

gradient, 30, 129, 130E, 132-138, 143E-146, 171, 176

analysis, direct, 192 indirect, 190-192, 195

heteroscedastic, 231

hierarchical clustering, 89-99, 102, 263, 265E, 275, 305

hinge points, 39E, 40

histogram, 15, 17E, 18E, 24, 53, 83, 215E, 267, 268E, 304

homoscedastic, 231

indirect gradient analysis, 190, 192, 195 inertia, 82, 169, 174, 175, 183, 190, 192-

195, 199, 203-211, 221, 223, 225, 227, 234, 254E-260, 264E, 266E, 286, 288-290 partialling out, see partialling out variance partitioning of, 258-260

inference, 36, 94, 213, 216, 226 interval variable, 34-35, 128, 281 isoline, 129

Jaccard index, 61, 69, 70, 73, 89, 90E-96E, 225

k-means, 99, 104-106, 283 Kruskal-Wallis rank test, 217, 305

link function, 235, 245, 290, 305 loading, 208

logarithmic transformation, 36-38, 42, 195 logistic regression, 133-134, 138, 233-235,

243-245, 290, 305

327

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

logit, 133, 234E, 245, 290 log-odds, see logit

log-ratio analysis, 139, 177, 180-185, 188, 203, 205, 211, 284

interpretation of, 181, 182 relationship to correspondence analy-

sis, 182, 183 transformation, 36, 179, 180, 188

LRA, see log-ratio analysis

Manhattan distance, 67

map, 127, 139E-144, 148, 153-156, 168E, 169, 175E, 176, 184E, 245E, 306

matching coefficient, 58, 68, 70, 73, 89, 281, 282

MDS, see multidimensional scaling mean-square contingency coefficient, 82 mixed-scale variables, 26

distance, 70, 72 monotonic regression, 116, 118

multidimensional scaling, 109-122, 148, 161, 283, 305

classical, 110-113, 117E-121E, 123, 153, 158, 165, 283, 284

nonmetric, 114-123, 141E, 151, 166, 284

weighted, 166, 167, 305

multiple correspondence analysis, 58, 305 multivariate test of group difference, 219-

221, 288

nominal variable, 18, 24, 34, 279 nonhierarchical clustering, 99, 104-106,

282, 283, 305

normal distribution, 35, 36, 230E, 235, 288, 304

null distribution, 226, 227, 288, 289

odds ratio, 205, 211

ordinal variable, 18, 24, 34E, 254, 279, 281 ordination, 29-31, 146E, 189, 192

constrained, see ordination restricted

restricted, 192, 194 overdispersion, 55

partial least squares, 31, 32 partialling out variance, 259

PCA, see principal component analysis permutation distribution, 83, 84E, 214-216,

219-227, 269E

test, 18, 19, 83-85, 101, 104, 213, 214, 216-219, 223-226, 255, 258, 260, 263, 288-290

for clustering, 225, 263 permutation testing, 18, 19, 35, 83, 217,

226, 260, 288-290, 304 point biserial, 19

Poisson distribution, 42, 232, 233, 290 regression, 31, 133, 134E, 138, 143, 232,

233, 245, 290, 305

power transformation, 37E, 38, 42, 55, 183, 280

principal axes, 159, 161, 206, 211, 284, 305 component analysis, 151, 154-156, 159E161, 165-170, 176, 180, 181, 194, 203-

211, 224E, 225, 284-289

coordinate, 156, 159E, 161, 174-176, 191E, 193, 264E, 283, 284

analysis, 283

inertia, 169, 174, 175, 206-209 variance, 206

Procrustes analysis, 142 correlation, 142, 173E, 174, 183

profile, 55, 56E, 59, 85, 166, 168-171, 174, 176, 177, 191, 287

average, 55, 170 unit, 168

Pythagoras’ theorem, 47-50, 59

redundancy analysis, 31, 194, 306 regression, 20, 22E, 27-31, 38, 113, 127-

131E, 133-139, 143E, 154E-156, 161, 166, 192, 194-196E, 230-246, 253, 257, 287, 305

algebra of, 127, 128

328

INDEX

biplot, 127, 132, 138, 139, 143, 148, 149, 156, 170, 190, 193

scaling of, 156, 158

coefficient, 128, 129, 132, 133, 138, 143, 151, 153, 156, 157, 174, 190-193, 273, 287

geometry of, 128-131

logistic, 31, 133, 134E, 138, 233-235, 243, 244E, 290, 305

monotonic, 116, 118

multiple linear, 30, 31, 127, 128, 132, 138, 230, 245, 276, 286

Poisson, 31, 133, 143E, 166, 232, 233,

245, 290, 305

tree, 28, 229, 243-246, 290, 291, 306

sampling bias, 249-253

scalar product, 75, 77, 78, 127, 130, 283 scaling,29,30,109,113,123,156,170,171,176 scatterplot, 18-22E, 24, 65, 80, 81E, 127,

138, 142E, 182, 183E, 187E, 224E, 236E, 268E, 270E, 271E, 304

scree plot, 159-161, 170E, 258E, 289 Shannon-Weaver diversity, 238, 239E, 268E,

275 index, 267

Shapiro-Wilks test, 214, 304 shifted log transformation, 55 singular value, 285, 286

vector, 285

singular value decomposition, 155, 284-286 Spearman rank correlation, 79, 142E, 217,

271E, 284, 304 species pool, 267-269E

richness, 269, 270E

standard coordinate, 156, 161, 169, 171, 174-176, 208, 287

standardization, 33, 36, 39-43, 51-54, 59, 73, 219, 279-282

standardized regression coefficient, 128, 133, 156-161, 174

stress, 118, 120, 123, 284

structural equation modelling, 31, 32 subcomposition, 177-180, 188

subcompositional coherence, 177, 179, 180, 183, 186, 188

sum of squares, between-group, 100, 105, 106, 288

total, 101, 102E within-group, 100, 106

supervised learning, 29

supplementary point, 133, 134E, 136E, 191, 287

variable, 286-288 support, 135-139, 190

SVD, see singular value decomposition

taxicab distance, 67 terminal nodes, 243, 244

transformation, Box-Cox, 37E, 38, 42, 183, 280

logarithmic, 36-38, 42, 195, 263, 280 log-ratio, 36, 179, 185

power, 37E, 38, 55, 280 shifted log, 55

triangle inequality, 62E-64, 67, 73, 118, 123, 281

violation of, 63 triplot, 138, 192-199, 213-216 TSS, see sum of squares, total t -test, 18, 84, 213, 214

unit variable, 77-79, 85 unsupervised learning, 29

variable, categorical, 16-19, 24, 28-31, 38, 39, 42, 57, 68, 72, 80-82, 85, 133, 282, 304

compositional, 34E, 35, 177-182, 188, 280

continuous, 17-19E, 24, 27-31, 34, 39E43, 51, 52E, 55, 70-72, 80E, 99, 143149, 204, 262, 282, 304-306

count, 18, 24, 36, 54, 55, 80, 85, 120, 280 dummy, 38-39, 42, 58, 71, 191-195, 210,

280, 282

329