Добавил:

Ravochking Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Кузбасский государственный технический университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

1greenacre_m_primicerio_r_multivariate_analysis_of_ecological

.pdf

Скачиваний:

Добавлен:

19.11.2019

Размер:

7.36 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 11 1213 / 3413 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

Exhibit 9.8:

The eigenvalues in the classical MDS of the BrayCurtis dissimilarity indices of Exhibit 5.2, showing positive eigenvalues in green and negative ones in brown

5,000

10,000

15,000

Adding count variables to MDS maps

Performing nonmetric MDS on the same data gives a stress value of 13.5%, which is not a big improvement on the 16.3%, suggesting that the two resulting maps will not be as different as we found for the smaller data set of Jaccard indices. This is indeed the case, as shown by the quite similar maps in Exhibit 9.9.

In our experience, when there is a large number of samples (and by “large” we mean, as most statisticians do, 30 or more, as in this example), the metric and nonmetric approaches generally agree in their solutions. Where they disagree is in the quantiﬁcation of the success of their results, with the stress measure always giving a more optimistic value because it does not measure the recovery of the proximities themselves, but their ordering in the map.

The maps in Exhibit 9.9 emanate originally from abundance data on ﬁve species, so the question now is how to include these species on the map. We shall consider alternative ways of doing this in future chapters, but for the moment let us use the same approach as in Exhibit 9.7 when the species were positioned at the averages of the samples that contained them. The difference here is that we have abundance counts for the species across the samples, so what we can do is to position each species at their weighted average across the samples. For example, species a has abundances of 0, 26, 0, 0, 13, etc., and a total abundance of 0 26 0 0

13 ... 404, so the position of a is at a weighted average position of the 30 species, with weights 26/404 0.064 on sample s2, 13/404 0.032 on sample s5, and

120

MULTIDIMENSIONAL SCALING

Exhibit 9.9:

s17 s8

Classical MDS map (upper)

and nonmetric MDS map

(lower) of the Bray-Curtis

dissimilarities of Exhibit 5.2

s21

Dimension 2

s14

s29

s15

s13

s23

s19

s11s1

s10

s9s28

s3 s20

s16 s22

s2 s12s24

s27

–20

s26

s30

s18

s25

–40

–20

	60
	40
Dimension 2	20
	0
	–20

Dimension 1

s17

		s21
	s14				s29		s13
	s14				s29	s15
		s7				s15	s4
		s7					s4
			s5
			s23			s19	s1
						s19	s1
				s9s28			s11
				s9s28			s10
	s22			s2	s12		s20
s16	s22			s2	s24		s3
s16	s26s27		s6
	s25	s30	s6
	s25	s30		s18

–40

–20

Dimension 1

121

Exhibit 9.10:

Nonmetric MDS solution (right hand map in Exhibit 9.9) with species a to e added by weighted

averaging of sample points, and sediment types C, S and G by averaging

	40
Dimension 2	20

–20

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

s17

s21

s14

s13

s29

s15

s23

s19

s9 s28

s11

s10

s12

s22

s20

s24

s16

s26 s27

s25

s30

s18

–40

–20

Dimension 1

SUMMARY: Multidimensional scaling

so on. Exhibit 9.10 shows the species positions on the nonmetric MDS solution, showing, for example, that species a and b are relatively more abundant in the samples at lower left, while c is more associated with samples on the right. Similarly, even though the ordinal sediment types C (clay), S (sand) and G (gravel) have not been used in the mapping, they can be depicted at the averages of the subsets of samples corresponding to them. The samples thus appear to follow a trend from top right (more clay) to bottom left (more gravel).

1.Multidimensional scaling (MDS) is a method that attempts to make a spatial map of a matrix of proximities, either distances or dissimilarities deﬁned between sample units, so that the interpoint distances in the map come as close as possible to the given proximities according to the chosen ﬁt criterion.

2.The ﬁt criterion in metric MDS involves approximating the actual proximity values by the mapped distances, for example by least-squares.

122

MULTIDIMENSIONAL SCALING

3.Classical MDS is a particular form of metric MDS that relies on the eigenvalueeigenvector decomposition of a square matrix. The eigenvalues give convenient measures of variance explained on each axis, and the dimensions of the solution are uncorrelated.

4.Nonmetric MDS has a more relaxed ﬁt criterion in that it strives to match only the ordering of the proximities to the ordering of the mapped distances.

5.The error in classical MDS is quantiﬁed by the percentage of unexplained variance, while in nonmetric MDS the error is quantiﬁed by the stress.

6.The stress measure always gives a more optimistic result, because of the relaxation of approximating the proximity values in the map in favour of their rank ordering.

7.In most cases, however, when the size of the proximity data matrix is quite large, say for at least 30 sample units, the results of the two approaches will be essentially the same.

8.When the proximities are of a Euclidean type, it will be more useful to use the metric scaling approach because of the connection with methods such as principal component analysis (Chapter 12) and correspondence analysis (Chapter 13). There would be little advantage, for example, in applying nonmetric scaling to a matrix of chi-square distances.

9.When the proximities are non-Euclidean, the nonmetric approach avoids the dilemma that the triangle inequality is violated by concentrating on ordering of proximities rather than their actual values.

123

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

124

REGRESSION AND PRINCIPAL COMPONENT ANALYSIS

125

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

126

CHAPTER 10

Regression Biplots

In the previous chapter, displays of samples were obtained in a scatterplot with spatial properties (hence often called a map), approximating given distance or dissimilarity matrices. Then some types of variables were added to the display, speciﬁcally zero/one categorical variables (e.g., presences of species, sediment categories) and count variables (e.g., species abundances). In this chapter we continue with this theme of adding variables to a plot of samples, including continuous variables in their original form or in fuzzy-coded form. When samples and variables are displayed jointly in such a scatterplot, it is often called a biplot. This designation implies that a certain property holds between the two sets of points in the display in terms of the scalar products between the samples and variables. In this chapter we consider the simplest form of biplot, the regression biplot, which will serve two purposes: ﬁrst, to give a different geometric interpretation of multiple regression; and second, to give a basic understanding of all the joint displays of samples and variables that will appear in the rest of this book.

Contents
Algebra of multiple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .	127
Geometry of multiple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .	128
Regression biplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .	132
Generalized linear model biplots with categorical variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .	133
Fuzzy-coded species abundances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .	135
More than two predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .	135
SUMMARY: Regression biplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .	138

The multiple linear regression model postulates that the expected value of a response variable Y (i.e., the mean of Y) is a linear combination of several explanatory variables x1, x2, …, xp:

E(Y ) = α + β1x1 + β2x2 + βpxp

(10.1)

Algebra of multiple linear regression

127

Geometry of multiple linear regression

MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

For example, using the data of Exhibit 1.1, consider the regression of species labelled d on depth, pollution and temperature. The model is estimated as:

E(d) 6.271 0.148 depth 1.388 pollution 0.043 temperature (10.2)

Notice that, for the moment, we do not comment on whether this type of linear model of a count variable on three environmental variables would be sensible or not, because d is not an interval variable – we will return to this point later.

Since the coefﬁcients in (10.2) depend on the units of the variables, we prefer to consider the regression using all variables in comparable units. Usually this is done by standardization of the variables, so that they are all in units of standard deviation. Let us denote these standardized variables (i.e., centred and normalized) with an asterisk, then the regression model becomes:

E(d*) 0.347 depth* 0.446 pollution* 0.002 temperature* (10.3)

The constant term now vanishes and the coefficients, called standardized regression coefficients, can be compared with one another. Thus it seems that pollution has the strongest influence on the average level of species d, reducing it by 0.446 of a standard deviation for every increase of one standard deviation of pollution. The effect of temperature is minimal and, in fact, is nonsignificant statistically (p 0.99), while depth and pollution are both significant (p 0.039 and p 0.010, respectively), so we drop temperature and consider just the regression on the other two variables, which maintains the value of the coefficients, but slightly smaller p -values: p 0.035 and p 0.008, respectively:

E(d*) 0.347 depth* 0.446 pollution*

(10.4)

When referring to the multiple regression model, it is often said that a hyperplane is being ﬁtted to the data. For a single explanatory variable this reduces to a straight line in the familiar case of simple linear regression. When there are two explanatory variables, as in (10.4), the model is a two-dimensional plane in three dimensions, the third dimension being the response variable d* – a view of this plane in three dimensions is given in Exhibit 10.1, with standardized depth* and pollution* forming the two horizontal dimensions and d* the vertical one. Notice how the plane is going down in the direction of pollution, but going up in the direction of depth, according to the regression coefﬁcients (see the web site of the book which shows a video of this three-dimensional image). Notice too the lack of ﬁt of the points to the plane – the value of R 2 for the regression is 0.442, which means that 44.2% of the variance of d is being

128

REGRESSION BIPLOTS
4				d*	Exhibit 10.1:
4				d*	Exhibit 10.1:
					Regression plane defined
					by Equation (10.4) for
					standardized response
					d* and standardized
					explanatory variables
					pollution* and depth*. The
2					view is from above the plane
2					Pollution*
					Pollution*
–4
					4
–2					2
					2

	0
	0

			0

		0
–2
–2					2
–2					2
–4					Depth*
–4					4
–4					4

explained, and 55.8% of the variance unexplained and considered residual, or error, variance.

The linearity of the plane means that predictions of the same mean values form parallel straight lines in the plane. From a mountaineer’s point of view, if you are standing on the plane and want to stay at the same height, you need to walk in a straight line. Projecting these parallel straight lines onto the depth pollution plane gives the contours, also called isolines, as shown in Exhibit 10.2. Finally, the vector in the depth pollution plane with coordinates equal to the regression coefﬁcients, 0.347 0.446 , called the gradient, indicates the direction of steepest ascent in the regression plane, and is perpendicular to the contours. Given the geometry of the regression plane in Exhibit 10.2, it follows that we can do away with the d* dimension, just like cartographers do, and consider just the depth pollution plane and the contours of the regression plane, which are perpendicular to the gradient vector. Exhibit 10.3 shows this “ground view” of the model.

The short arrow labelled d is the gradient vector. The dashed line through this vector is called the biplot axis for the variable d. Contour lines are perpendicular to the biplot axis. Exhibit 10.3(a) corresponds to the darker “shadow” in Exhibit 10.2 in the depth pollution plane, where the contours are in units of standard

129

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 11 1213 / 3413 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
29.10.20191.66 Mб191garanina_o_d_istoriya_i_filosofiya_nauki_chast_2.pdf
#
19.11.2019345.13 Кб11gavrilova_yu_v_sinkretizm_kak_faktor_formirovaniya_i_evolyut.pdf
#
19.11.20191.66 Mб31gel_dibaev_m_kh_ugolovnoe_pravo_obshchaya_chast.pdf
#
19.11.20192.75 Mб81gillman_m_an_introduction_to_mathematical_models_in_ecology.pdf
#
19.11.2019348.49 Кб51goldobina_l_a_ekzistentsial_naya_filosofiya.pdf
#
19.11.20197.36 Mб61greenacre_m_primicerio_r_multivariate_analysis_of_ecological.pdf
#
19.11.2019492.1 Кб101grigor_yan_e_l_lingvisticheskaya_pragmatika.pdf
#
19.11.2019911.23 Кб131gromova_l_a_etika_upravleniya-1.pdf
#
29.10.2019911.23 Кб111gromova_l_a_etika_upravleniya.pdf
#
29.10.2019587.33 Кб101istoriya_i_filosofiya_nauki-1.pdf
#
29.10.2019587.33 Кб361istoriya_i_filosofiya_nauki-2.pdf