1greenacre_m_primicerio_r_multivariate_analysis_of_ecological
.pdfMULTIVARIATE ANALYSIS OF ECOLOGICAL DATA |
|
the logits of the probabilities are modelled as linear functions of |
|
the predictor ...................................................................................... |
234 |
Exhibit 18.5: A scatterplot of a continuous response y and a predictor x, showing |
|
a scatterplot smoother (brown line) which suggests a nonlinear |
|
relationship. The estimated quadratic relationship is shown by a |
|
dashed curve ....................................................................................... |
236 |
Exhibit 18.6: Same data as in Exhibit 18.5, with the estimated quadratic relation- |
|
ship in gray, and the relationship according to (18.5) shown by |
|
black dashed lines .............................................................................. |
237 |
Exhibit 18.7: Linear regression relationships (18.6) between diversity and tem- |
|
perature that depend on the depth (illustrated for three values of |
|
depth), showing an interaction effect (regression lines with differ- |
|
ent slopes). If the interaction effect were absent, the three lines |
|
would be parallel and the effect of temperature would be the same |
|
(i.e., not contingent on depth) ......................................................... |
239 |
Exhibit 18.8: Generalized additive modelling of diversity as smooth functions of |
|
depth and temperature: depth is diagnosed as having a significant |
|
quadratic relationship, while the slightly increasing linear relation- |
|
ship with temperature is non-significant. Both plots are centred verti- |
|
cally at mean diversity, so show estimated deviations from the mean. |
|
Confidence regions for the estimated relationships are also shown |
240 |
Exhibit 18.9: Contour plot (upper) and perspective plot (down) of the diag- |
|
nosed interaction regression surface of depth and temperature, |
|
predicting the deviations from mean diversity. The concave rela- |
|
tionship with depth is clearly seen as well as the slight relationship |
|
with depth. The difference between the model with or without |
|
interactions is, however, not significant ............................................ |
242 |
Exhibit 18.10: Classification tree model for predicting the presence of polar cod. |
|
The one branch which predicts their presence gives the rule: tem- |
|
perature < 1.6 ºC and depth 306 m. This rule correctly predicts |
|
the presence of polar cod in 16 samples but misclassifies 5 samples |
|
as having polar cod when they do not .............................................. |
243 |
Exhibit 18.11: Comparison of misclassification rates for the classification tree of |
|
Exhibit 18.10, compared to that for logistic regression, using the |
|
same predictors. The classification tree correctly predicts presence |
|
and absence in 79 of the 89 samples, while logistic regression cor- |
|
rectly predicts 74 ................................................................................ |
244 |
Exhibit 18.12: Regression tree predicting fish diversity from latitude and longi- |
|
tude of sample positions. The terminal nodes give the average di- |
|
320
LIST OF EXHIBITS |
|
versity of the samples that fall into them. This tree yields the spatial |
|
classification of the sampling region given in Exhibit 18.13 |
........... 244 |
Exhibit 18.13: Map of Barents Sea showing the locations of the 89 sampling sites |
|
(see Exhibit 11.1) and the slicing up of the region according to the |
|
regression tree of Exhibit 18.12, and the average fish diversities in |
|
each block. Most of the slices divide the latitudes north to south, |
|
with just two east-west divisions of the longitudes. Dark locations |
|
show the 21 sites where polar cod were found ............................... |
245 |
Exhibit 19.1: (a) Actual number of sample taken in three regions over a three- |
|
year period, with overall proportions of samples in each region over |
|
the whole period. (b) Expected number of samples if in each year |
|
sampling had taken place in accordance with the overall propor- |
|
tions. (c) The weights computed by dividing the values in table (b) |
|
by those in table (a) ........................................................................... |
250 |
Exhibit 19.2: Sums of fuzzy-coded regional categories for each year and for |
|
all years. Columns are the eight compass points and a central |
|
region (C) .......................................................................................... |
252 |
Exhibit 19.3: Weights for data according to year and fuzzy region ....................... |
252 |
Exhibit 19.4: Correspondence analysis contribution biplot of the “Barents fish |
|
trend” data set. The upper plot shows the active data, the samples |
|
and species (high-contributing species are shown with bigger |
|
labels).The lower plot shows the centroids of all the categories, |
|
linking together categories of ordinal variables. 32.6% of the total |
|
inertia of 4.017 is explained by these two first dimensions .............. |
254 |
Exhibit 19.5: Canonical correspondence analysis of the “Barents fish trends” |
|
data. The format is the same as Exhibit 19.4, with the samples and |
|
species plotted in the upper biplot and an enlarged version of the |
|
category centroids in the lower plot. 58.5% of the restricted inertia |
|
is explained by these two dimensions ............................................... |
256 |
Exhibit 19.6: Temporal trajectories in regional categories north, east, south and |
|
west. Time and regional centroids are at (weighted) averages of the |
|
corresponding category points: for example, S is at the average of |
|
the six points making up the trajectory for south, while 2004 is at |
|
the average of all the 2004 points (for all nine regions, only four |
|
shown here) ................................................................................... |
257 |
Exhibit 19.7: In descending order, the proportion of inertia explained, R2, and |
|
adjusted R2, of the five categorical environmental variables; k is the |
|
number of categories ......................................................................... |
257 |
321
MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA |
|
Exhibit 19.8: Scree plot of the inertias of successive dimensions in the con- |
|
strained space of the CCA of the “Barents fish trends” data. The first |
|
three dimensions clearly stand out from the rest ............................. |
258 |
Exhibit 19.9: Partitioning the total inertia in the abundance data into parts due |
|
to the spatial variables and other variables separately, and their part |
|
in common ......................................................................................... |
259 |
Exhibit 20.1: Part of the trait matrix coding the various functional character-istics |
|
of Barents Sea fish species ................................................................. |
262 |
Exhibit 20.2: CA of the trait matrix, part of which is shown in Exhibit 20.1. Traits |
|
are shown in principal coordinates in (a) and the fish species in |
|
principal coordinates in (b). 27.4% of the inertia is displayed ......... |
264 |
Exhibit 20.3: Hierarchical clustering of fish based on distances between fish, |
|
showing boxes indicating eight clusters ............................................ |
265 |
Exhibit 20.4: CA of the trait matrix aggregated according to the fish groups (G1 |
|
to G8) that were defined in Exhibit 20.3. The solution optimizes |
|
the group differences, although the basic configuration is similar |
|
to that of Exhibit 20.2 which optimized the fish differences. The |
|
functional traits are displayed in contribution coordinates in (a). |
|
52.4% of the inertia between fish groups is displayed ..................... |
266 |
Exhibit 20.5: (a) Histogram of the group-based FDs defined as Shannon-Weaver |
|
diversities on the aggregated abundances in 600 samples for eight |
|
functional groups; (b) Histogram of the tree-based FDs using pres- |
|
ences only and summing the branches in the dendrogram for the |
|
subset of observed species, normalized with respect to the FD of the |
|
species pool; (c) Scatterplot of the two functional diversity indices |
|
(Spearman rho correlation = 0.300) ................................................. |
268 |
Exhibit 20.6: Permutation distribution of the species pool FD, under the null hy- |
|
pothesis of no relationship between the traits. The observed value |
|
of 20.31 is the smallest and the associated p -value, based on 1,000 |
|
permutations, is thus p = 0.001 .......................................................... |
269 |
Exhibit 20.7: Scatterplots of the two FD measures versus species richness (SR, |
|
the number of species in sample), showing the modelled quadratic |
|
relationships. The horizontal axis is marked with the value of SR, |
|
and below the number of sites with the corresponding value) ....... |
270 |
Exhibit 20.8: Scatterplots of the variables depth, slope, temperature, longitude |
|
and latitude with one another as well as with the two measures of |
|
functional diversity, based on the functional groups (FDgroup) |
|
and on the dendrogram (FDtree). Spearman rank correlations are |
|
322
LIST OF EXHIBITS |
|
shown in the upper triangle, with font size proportional to their |
|
absolute values .................................................................................... |
271 |
Exhibit 20.9: Contour plots of the spatial component of functional diversity ac- |
|
cording to the two definitions (first row is the tree-based FD, second |
|
row is group-based FD) using two modelling methods (in columns, |
|
first column is using fuzzy spatial categories, second is using GAM |
|
modelling). The northern border of Norway with Russia and the |
|
southern tip of Svalbard situate the region of interest .................... |
272 |
Exhibit 20.10: Plot of regression coefficients for each year showing average esti- |
|
mated year effects for the residuals of (tree-based) functional diver- |
|
sities from the spatial model, with p -values for testing differences |
|
compared to the zero mean of the residuals (dashed line) ............ |
273 |
Exhibit 20.11: GAM of tree-based FD as a smooth function of depth (p = 0.0001) |
|
and temperature (p 0.0001). To model these effects parametrically |
|
depth would be modelled as a quadratic and temperature linear .... |
274 |
323
MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA
324
INDEX
Akaike information criterion, 235, 239, 241, 272E, 273
AIC, see Akaike information criterion analysis of covariance, 30
of variance, 21, 23E, 28, 30, 99-101, 221, 283, 305
ANCOVA, see analysis of covariance ANOVA, see analysis of variance association, test between two groups of vari-
ables, 221, 223, 227
test between two variables, 82, 216, 217 average profile, 55, 56E
barycentre, 169
barycentric relationship, 169
biplot, 10, 146, 155E, 156, 159, 161, 165, 166, 175, 176, 181-183E, 184, 186
axis, 129, 130, 132, 143, 176, 181, 184, 188, 192
column-principal, 156, 157E, 161, 174 contribution, 170-174, 177, 186, 187, 190,
208, 209E, 211, 254E log-ratio, 181, 182, 188 multidimensional scaling, 139-149
regression, 127, 132-134E, 135, 137E-139, 143, 190, 193
row-principal, 155E, 156, 161, 168E, 170, 171E, 174, 182E, 184E, 186E
scaling, 171E
support of, 135, 138, 190 Bonferroni correction, 20 bootstrap distribution, 215E-217E
bootstrapping, 215, 217, 226, 280, 288, 289 Box-and-whisker plot, 15, 19-20E, 23E, 24
100, 304
Box-Cox transformation, 37E, 38, 42, 183, 280
boxplot, 21, 304
Bray-Curtis dissimilarity, 61-64E, 67, 73, 80, 118, 120E, 141E, 166, 281, 284, 305 between count variables, 81E
compared to chi-square distance, 6466, 141
BSS, see sum of squares, between-group
CA, see correspondence analysis
CCA, see canonical correspondence analysis canonical correlation analysis, 135-138, 190 correspondence analysis, 31, 32, 189-199, 203, 209E-211, 221, 225, 256E, 258E,
259, 288
as analysis of weighted averages, 194-195
as discriminant analysis, 197-199 of reweighted data, 253-255 partial, 194, 199, 258, 288
categorical variable, 16-19, 24, 28-31, 38, 39, 42, 57, 68, 72, 80-82, 85, 133, 282, 304 chi-square distance, 55-58, 63-68, 73, 81E,
112E, 165, 166, 170, 174, 176, 263, 281, 306
between count variables, 80 compared to Bray-Curtis dissimilarity,
64-66, 141
Note: E after the page number indicates a reference to an Exhibit.
325
MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA
on raw counts, 55, 63-66 city-block distance, 61, 67, 68E, 71, 73 classification, 28, 29, 34E, 244E
tree, 241, 243, 244 closure, 35, 188
clustering, 29, 89, 94, 95, 101, 104, 105, 107, 114-116E, 225, 227, 275, 283, 289 average linkage, 93, 94, 97, 104 comparing two solutions, 104 complete linkage, 94, 96E, 97, 104, 263,
282
hierarchical, 89-93E, 95-99, 102, 105, 107, 225, 263-265E, 275, 282, 283, 305
k-means, 99, 104-107, 283 nonhierarchical, 99, 104-106, 282, 283,
305
of correlations, 95 single linkage, 94, 97
Ward, 99, 102-107, 263, 283, 305 weighting the objects, 106
complete linkage, 94, 96E, 97, 104, 262, 282 composition, 177-180E, 280 compositional data, 26, 34E, 35, 139, 168,
181, 185, 188, 194, 280, 281 zeros in, 185-187
conditional distribution, 230-233E, 245, 290 mean, 230E-234, 269, 290
confirmatory data analysis, 33-36 continuous variable, 17-19E, 24, 27-31, 34,
39E-43, 51, 52E, 55, 70-72, 80E, 99, 143149, 204, 262, 282, 304-306
contour, 129-132, 138, 144, 145E, 241, 242E, 272E
nonlinear, 144, 145, 149 contribution, 206-208
biplot, 170-174, 186-188, 190, 208, 209211, 254E
coordinates, 171, 172E, 174, 176, 191E, 193E, 208, 211, 266E, 286, 305
in constrained analysis, 209-211 coordinate, contribution, 171, 172E, 174,
176, 191E, 193E, 208, 211, 266E, 286, 305
principal, 156, 159E, 161, 174-176, 191E, 264E, 283-286
standard, 156, 161, 171, 174-176, 208, 286, 287
correlation, 18-19
as a scalar product, 77-79 between points and axes, 208-209 circle of, 158, 159E
point biserial, 19
Spearman rank, 79-80, 142E, 217, 271E, 284, 304
correspondence analysis, 31, 165-176, 183, 184E, 185E, 187E, 190, 191E, 211, 286
of reweighted data, 253-255
count variable, 15, 18, 24, 36, 54, 55, 80, 85, 120, 280
Cramer’s V coefficient, 82, 104
data set “Barents fish”, 139, 140, 175E “Barents fish trends”, 251-254E, 256E,
258E
“bioenv”, 15-17, 190-193E, 207E, 209E, 213, 216, 221
“climate”, 151-152, 154-157E, 159, 160E
“fatty acid”, 180-182E, 184-187E, 205 defuzzification, 40
dendrogram, 89-97, 99, 102, 104, 109, 225, 226E, 265-275, 282, 283, 289
deviance, 234, 235, 241 null, 235
residual, 235, 241
direct gradient analysis, 192
dissimilarity, 61, 67-73, 80, 82, 83, 89-98, 112-116, 120E, 281-283
Bray-Curtis, 61-64E, 67, 73, 80, 120E, 141E, 166, 281, 284, 305
Sørensen, 62, 63, 281 distance, 61, 281
based on correlation, 79, 80 between categories, 80-82, 85 between count variables, 80
chi-square, 55-58, 63-68, 73, 81E, 112E, 165, 166, 170, 174, 176, 263, 281, 306
city-block, 61, 67, 68E, 71, 73
326
INDEX
definition, 61
Euclidean, 47-59, 61, 68E, 73, 75, 80, 97E, 103E, 109, 142, 153-158, 161, 206, 281, 306
for categorical data, 57-59 for mixed-scale data, 70-72 Manhattan, 67
standardized Euclidean, 51, 53, 54, 59, 97E, 103E, 153E, 154E, 281
taxicab, 67
weighted Euclidean, 53-55, 58, 59, 161, 281
distribution, bivariate, 15, 17, 24, 216 normal, 35, 36, 230E, 235, 288, 304 Poisson, 42, 232, 233, 290 univariate, 17, 24
dummy variable, 38-39, 42, 58, 71, 191-195, 210, 280, 282
eigenvalue, 111-113, 120E, 123, 158-161, 169, 170E, 206-208, 224-227, 284
Euclidean distance, 47-59, 61, 68E, 73, 75, 80, 97E, 103E, 109, 142, 153-158, 161, 206, 281, 306
embeddable, 284 exploratory data analysis, 35, 36
FD, see functional diversity
F-statistic, 21, 23E, 101
F-test, 101
functional dispersion, 269
diversity, 261, 263, 265, 267-272, 275 group-based, 268E, 270E-275 tree-based, 269-276
trait, 261-267, 275
fuzzy coding, 39, 40, 135, 136E, 144-149, 195-197, 236, 237, 251-253, 271, 282
of interaction, 145-148
of spatial coordinates, 147E, 148E
GAM, see generalized additive model general linear model, 28, 30
generalized additive model, 229, 237-240E, 246, 271, 274E, 276, 290-291
linear model, 30-32, 133, 229, 232, 233, 245, 290
GLM, see generalized linear model Gower’s dissimilarity, 282
gradient, 30, 129, 130E, 132-138, 143E-146, 171, 176
analysis, direct, 192 indirect, 190-192, 195
heteroscedastic, 231
hierarchical clustering, 89-99, 102, 263, 265E, 275, 305
hinge points, 39E, 40
histogram, 15, 17E, 18E, 24, 53, 83, 215E, 267, 268E, 304
homoscedastic, 231
indirect gradient analysis, 190, 192, 195 inertia, 82, 169, 174, 175, 183, 190, 192-
195, 199, 203-211, 221, 223, 225, 227, 234, 254E-260, 264E, 266E, 286, 288-290 partialling out, see partialling out variance partitioning of, 258-260
inference, 36, 94, 213, 216, 226 interval variable, 34-35, 128, 281 isoline, 129
Jaccard index, 61, 69, 70, 73, 89, 90E-96E, 225
k-means, 99, 104-106, 283 Kruskal-Wallis rank test, 217, 305
link function, 235, 245, 290, 305 loading, 208
logarithmic transformation, 36-38, 42, 195 logistic regression, 133-134, 138, 233-235,
243-245, 290, 305
327
MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA
logit, 133, 234E, 245, 290 log-odds, see logit
log-ratio analysis, 139, 177, 180-185, 188, 203, 205, 211, 284
interpretation of, 181, 182 relationship to correspondence analy-
sis, 182, 183 transformation, 36, 179, 180, 188
LRA, see log-ratio analysis
Manhattan distance, 67
map, 127, 139E-144, 148, 153-156, 168E, 169, 175E, 176, 184E, 245E, 306
matching coefficient, 58, 68, 70, 73, 89, 281, 282
MDS, see multidimensional scaling mean-square contingency coefficient, 82 mixed-scale variables, 26
distance, 70, 72 monotonic regression, 116, 118
multidimensional scaling, 109-122, 148, 161, 283, 305
classical, 110-113, 117E-121E, 123, 153, 158, 165, 283, 284
nonmetric, 114-123, 141E, 151, 166, 284
weighted, 166, 167, 305
multiple correspondence analysis, 58, 305 multivariate test of group difference, 219-
221, 288
nominal variable, 18, 24, 34, 279 nonhierarchical clustering, 99, 104-106,
282, 283, 305
normal distribution, 35, 36, 230E, 235, 288, 304
null distribution, 226, 227, 288, 289
odds ratio, 205, 211
ordinal variable, 18, 24, 34E, 254, 279, 281 ordination, 29-31, 146E, 189, 192
constrained, see ordination restricted
restricted, 192, 194 overdispersion, 55
partial least squares, 31, 32 partialling out variance, 259
PCA, see principal component analysis permutation distribution, 83, 84E, 214-216,
219-227, 269E
test, 18, 19, 83-85, 101, 104, 213, 214, 216-219, 223-226, 255, 258, 260, 263, 288-290
for clustering, 225, 263 permutation testing, 18, 19, 35, 83, 217,
226, 260, 288-290, 304 point biserial, 19
Poisson distribution, 42, 232, 233, 290 regression, 31, 133, 134E, 138, 143, 232,
233, 245, 290, 305
power transformation, 37E, 38, 42, 55, 183, 280
principal axes, 159, 161, 206, 211, 284, 305 component analysis, 151, 154-156, 159E161, 165-170, 176, 180, 181, 194, 203-
211, 224E, 225, 284-289
coordinate, 156, 159E, 161, 174-176, 191E, 193, 264E, 283, 284
analysis, 283
inertia, 169, 174, 175, 206-209 variance, 206
Procrustes analysis, 142 correlation, 142, 173E, 174, 183
profile, 55, 56E, 59, 85, 166, 168-171, 174, 176, 177, 191, 287
average, 55, 170 unit, 168
Pythagoras’ theorem, 47-50, 59
redundancy analysis, 31, 194, 306 regression, 20, 22E, 27-31, 38, 113, 127-
131E, 133-139, 143E, 154E-156, 161, 166, 192, 194-196E, 230-246, 253, 257, 287, 305
algebra of, 127, 128
328
INDEX
biplot, 127, 132, 138, 139, 143, 148, 149, 156, 170, 190, 193
scaling of, 156, 158
coefficient, 128, 129, 132, 133, 138, 143, 151, 153, 156, 157, 174, 190-193, 273, 287
geometry of, 128-131
logistic, 31, 133, 134E, 138, 233-235, 243, 244E, 290, 305
monotonic, 116, 118
multiple linear, 30, 31, 127, 128, 132, 138, 230, 245, 276, 286
Poisson, 31, 133, 143E, 166, 232, 233,
245, 290, 305
tree, 28, 229, 243-246, 290, 291, 306
sampling bias, 249-253
scalar product, 75, 77, 78, 127, 130, 283 scaling,29,30,109,113,123,156,170,171,176 scatterplot, 18-22E, 24, 65, 80, 81E, 127,
138, 142E, 182, 183E, 187E, 224E, 236E, 268E, 270E, 271E, 304
scree plot, 159-161, 170E, 258E, 289 Shannon-Weaver diversity, 238, 239E, 268E,
275 index, 267
Shapiro-Wilks test, 214, 304 shifted log transformation, 55 singular value, 285, 286
vector, 285
singular value decomposition, 155, 284-286 Spearman rank correlation, 79, 142E, 217,
271E, 284, 304 species pool, 267-269E
richness, 269, 270E
standard coordinate, 156, 161, 169, 171, 174-176, 208, 287
standardization, 33, 36, 39-43, 51-54, 59, 73, 219, 279-282
standardized regression coefficient, 128, 133, 156-161, 174
stress, 118, 120, 123, 284
structural equation modelling, 31, 32 subcomposition, 177-180, 188
subcompositional coherence, 177, 179, 180, 183, 186, 188
sum of squares, between-group, 100, 105, 106, 288
total, 101, 102E within-group, 100, 106
supervised learning, 29
supplementary point, 133, 134E, 136E, 191, 287
variable, 286-288 support, 135-139, 190
SVD, see singular value decomposition
taxicab distance, 67 terminal nodes, 243, 244
transformation, Box-Cox, 37E, 38, 42, 183, 280
logarithmic, 36-38, 42, 195, 263, 280 log-ratio, 36, 179, 185
power, 37E, 38, 55, 280 shifted log, 55
triangle inequality, 62E-64, 67, 73, 118, 123, 281
violation of, 63 triplot, 138, 192-199, 213-216 TSS, see sum of squares, total t -test, 18, 84, 213, 214
unit variable, 77-79, 85 unsupervised learning, 29
variable, categorical, 16-19, 24, 28-31, 38, 39, 42, 57, 68, 72, 80-82, 85, 133, 282, 304
compositional, 34E, 35, 177-182, 188, 280
continuous, 17-19E, 24, 27-31, 34, 39E43, 51, 52E, 55, 70-72, 80E, 99, 143149, 204, 262, 282, 304-306
count, 18, 24, 36, 54, 55, 80, 85, 120, 280 dummy, 38-39, 42, 58, 71, 191-195, 210,
280, 282
329