Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

1manly_b_f_j_statistics_for_environmental_science_and_managem

.pdf
Скачиваний:
8
Добавлен:
19.11.2019
Размер:
4.8 Mб
Скачать

126 Statistics for Environmental Science and Management, Second Edition

Although monitoring schemes sometimes require fairly complicated designs, as a general rule it is a good idea to keep designs as simple as possible so that they are easily understood by administrators and the public. Simple designs also make it easier to use the data for purposes that were not foreseen in the first place, which is something that will often occur. As noted by Overton and Stehman (1995, 1996), complex sample structures create potential serious difficulties that do not exist with simple random sampling.

5.2  Purposely Chosen Monitoring Sites

For practical reasons, the sites for long-term monitoring programs are often not randomly chosen. For example, Cormack (1994) notes that the original nine sites for the United Kingdom ECN were chosen on the basis of having:

1.agoodgeographicaldistributioncoveringawiderangeofenvironmental conditions and the principal natural and managed ecosystems;

2.some guarantee of long-term physical and financial security;

3.a known history of consistent management;

4.reliable and accessible records of past data, preferably for 10 or more years; and

5.sufficient size to allow the opportunity for further experiments and observations.

In this scheme it is assumed that the initial status of sites can be allowed for by only considering time changes. These changes can then be related to differences between the sites in terms of measured meteorological variables and known geographical differences.

5.3  Two Special Monitoring Designs

Skalski (1990) suggested a rotating panel design with augmentation for longterm monitoring. This takes the form shown in Table 5.1 if there are eight sites that are visited every year and four sets of 10 sites that are rotated. Site set 7, for example, consists of 10 sites that are visited in years 4 to 7 of the study. The number of sites in different sets is arbitrary. Preferably, the sites will be randomly chosen from an appropriate population of sites. This design has some appealing properties: The sites that are always measured can be used to detect long-term trends, but the rotation of blocks of 10 sites

Environmental Monitoring

 

 

 

 

 

 

 

 

 

127

 

Table 5.1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Rotating Panel Design with Augmentation

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Number

 

 

 

 

 

 

Year

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Site Set

of Sites

1

2

3

4

5

6

7

8

9

10

11

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  0

  8

x

x

x

x

x

x

x

x

x

x

x

x

 

  1

10

x

 

 

 

 

 

 

 

 

 

 

 

 

 

  2

10

x

x

 

 

 

 

 

 

 

 

 

 

 

 

  3

10

x

x

x

 

 

 

 

 

 

 

 

 

 

 

  4

10

x

x

x

x

 

 

 

 

 

 

 

 

 

 

  5

10

 

x

x

x

x

 

 

 

 

 

 

 

 

 

  6

10

 

 

x

x

x

x

 

 

 

 

 

 

 

 

  7

10

 

 

 

x

x

x

x

 

 

 

 

 

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

 

.

.

.

.

.

.

.

.

.

.

.

.

.

.

 

14

10

 

 

 

 

 

 

 

 

 

 

x

x

15

10

 

 

 

 

 

 

 

 

 

 

 

x

Note: Every year 48 sites are visited. Of these, 8 are always the same and the other 40 sites are in four blocks of size 10, such that each block of 10 remains in the sample for four years after the initial startup period.

Source: Skalski (1990).

ensures that the study is not too dependent on an initial choice of sites that may be unusual in some respects.

The serially alternating design with augmentation that is used for EMAP is of the form shown in Table 5.2. It differs from the previous monitoring design in that sites are not rotated out of the study. Rather, there are 8 sites that are measured every year and another 160 sites in blocks of 40, where each block of 40 is measured every four years. The number of sites in different sets is at choice in a design of this form. Sites should be randomly selected from an appropriate population.

Table 5.2

Serially Alternating Design with Augmentation

 

Number

 

 

 

 

 

 

Year

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Site Set

of Sites

1

2

3

4

5

6

7

8

9

10

11

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

  8

x

x

x

x

x

x

x

x

x

x

x

x

1

40

x

 

 

 

x

 

 

 

x

 

 

 

2

40

 

x

 

 

 

x

 

 

 

x

 

 

3

40

 

 

x

 

 

 

x

 

 

 

x

 

4

40

 

 

 

x

 

 

 

x

 

 

 

x

Note: Every year, 48 sites are measured. Of these, eight sites are always the same, and the other 40 sites are measured every four years.

128 Statistics for Environmental Science and Management, Second Edition

Urquhart et al. (1993) compared the efficiency of the designs in Tables 5.1 and 5.2 when there are a total of 48 sites, of which the number visited every year (i.e., in set 0) ranged from 0 to 48. To do this, they assumed the model

Yijk = Si(j)k + Tj + eijk

where Yijk is a measure of the condition at site i, in year j, within site set k; Si(j)k is an effect specific to site i, in site set k, in year j; Tj is a year effect common to all sites; and eijk is a random disturbance. They also allowed for autocorrelation­ between the overall year effects and between the repeated measurements at one site. They found the design of Table 5.2 to always be better for estimating the current mean and the slope in a trend because more sites are measured in the first few years of the study. However, in a later study that compared the two designs in terms of variance and cost, Lesser and Kalsbeek (1997) concluded that the first design tends to be better for detecting short-term change, while the second design tends to be better for detecting long-term change. See also Urquhart and Kincaid (1999).

The EMAP sample design is based on approximately 12,600 points on a grid, each of which is the center of a hexagon with area 40 km². The grid is itself within a large hexagonal region covering much of North America, as shown in Figure 5.1. The area covered by the 40-km² hexagons entered on the grid points is 1/16 of the total area of the conterminous United States, with the area used being chosen after a random shift in the grid. Another aspect of the design is that the four sets of sites that are measured on different years are spatially interpenetrating, as indicated in Figure 5.2. This allows the estimation of parameters for the whole area every year.

Figure 5.1

The EMAP baseline grid for North America. The shaded area shown is covered by about 12,600 small hexagons, with a spacing of 27 km between their centers.

Environmental Monitoring

129

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Year 1

 

 

Year 2

 

Year 3

 

Year 4

 

 

 

 

 

 

Figure 5.2

The use of spatially interpenetrating samples for visits at four-year intervals.

5.4  Designs Based on Optimization

One approach to the design of monitoring schemes is by choosing the sites so that the amount of information is in some sense maximized. The main question then is how to measure the information that is to be maximized, particularly if the monitoring scheme has a number of different objectives, some of which will only become known in the future.

One possibility involves choosing a network design, or adding or subtracting stations to minimize entropy, where low entropy corresponds to high information (Caselton et al. 1992). The theory is complex and needs more prior information than will usually be available, particularly if there is no existing network to provide this.

Another possibility considers the choice of a network design to be a problem of the estimation of a regression function for which a classical theory of optimal design exists (Fedorov and Mueller 1989).

5.5  Monitoring Designs Typically Used

In practice, sample designs for monitoring often consist of selecting a certain number of sites, preferably (but not necessarily) at random from the potential sites in a region, and then measuring the variable of interest at those sites at a number of points in time. A complication is that, for one reason or another, some of the sites may not be measured at some of the times. A typical set of data will then look like the data in Table 5.3 for pH values measured on lakes

130 Statistics for Environmental Science and Management, Second Edition

Table 5.3

pH Values for Lakes in Southern Norway with the Latitudes and Longitudes for the Lakes

 

 

 

 

 

pH

 

Lake

Latitude

Longitude

1976

1977

1978

1981

 

 

 

 

 

 

 

  1

58.0

7.2

4.59

 

4.48

4.63

  2

58.1

6.3

4.97

 

4.60

4.96

  4

58.5

7.9

4.32

4.23

4.40

4.49

  5

58.6

8.9

4.97

4.74

4.98

5.21

  6

58.7

7.6

4.58

4.55

4.57

4.69

  7

59.1

6.5

4.80

 

4.74

4.94

  8

58.9

7.3

4.72

4.81

4.83

4.90

  9

59.1

8.5

4.53

4.70

4.64

4.54

10

58.9

9.3

4.96

5.35

5.54

5.75

11

59.4

6.4

5.31

5.14

4.91

5.43

12

58.8

7.5

5.42

5.15

5.23

5.19

13

59.3

7.6

5.72

 

5.73

5.70

15

59.3

9.8

5.47

 

5.38

5.38

17

59.1

11.8

4.87

4.76

4.87

4.90

18

59.7

6.2

5.87

5.95

5.59

6.02

19

59.7

7.3

6.27

6.28

6.17

6.25

20

59.9

8.3

6.67

6.44

6.28

6.67

21

59.8

8.9

6.06

 

5.80

6.09

24

60.1

12.0

5.38

5.32

5.33

5.21

26

59.6

5.9

5.41

5.94

 

 

30

60.4

10.2

5.60

6.10

5.57

5.98

32

60.4

12.2

4.93

4.94

4.91

4.93

34

60.5

5.5

 

 

4.90

4.87

36

60.9

7.3

5.60

5.69

5.41

5.66

38

60.9

10.0

6.72

6.59

6.39

 

40

60.7

12.2

5.97

6.02

5.71

5.67

41

61.0

5.0

4.68

4.72

5.02

 

42

61.3

5.6

5.07

 

 

5.18

43

61.0

6.9

6.23

6.34

6.20

6.29

46

61.0

9.7

6.64

 

6.24

6.37

47

61.3

10.8

6.15

6.23

6.07

5.68

49

61.5

4.9

4.82

4.77

5.09

5.45

50

61.5

5.5

5.42

4.82

5.34

5.54

57

61.7

4.9

4.99

 

5.16

5.25

58

61.7

5.8

5.31

5.77

5.60

5.55

59

61.9

7.1

6.26

5.03

5.85

 

65

62.2

6.4

5.99

6.10

5.99

6.13

80

58.1

6.7

4.63

 

4.59

4.92

81

58.3

8.0

4.47

 

4.36

4.50

82

58.7

7.1

4.60

 

4.54

4.66

Environmental Monitoring

131

Table 5.3 (continued)

pH Values for Lakes in Southern Norway with the Latitudes and Longitudes for the Lakes

 

 

 

 

 

pH

 

Lake

Latitude

Longitude

1976

1977

1978

1981

 

 

 

 

 

 

 

83

58.9

6.1

4.88

4.99

4.86

4.92

85

59.4

11.3

4.60

4.88

4.91

4.84

86

59.3

9.4

4.85

4.65

4.77

4.84

87

59.2

7.6

5.06

 

5.15

5.11

88

59.4

7.3

5.97

5.82

5.90

6.17

89

59.3

6.3

5.47

 

6.05

5.82

94

61.0

11.5

6.05

5.97

5.78

5.75

95

61.2

4.6

 

 

5.70

5.50

 

 

 

 

 

 

 

Mean

 

 

5.34

5.40

5.31

5.38

SD

 

 

0.65

0.66

0.57

0.56

 

 

 

 

 

 

 

in Norway. With this set of data, which is part of the more extensive data that are shown in Table 1.1 and discussed in Example 1.2, the main question of interest is whether there is any evidence for changes from year to year in the general level of pH and, in particular, whether the pH level was tending to increase or decrease.

5.6  Detection of Changes by Analysis of Variance

A relatively simple analysis for data like the Norwegian lake pH values shown in Table 5.3 involves carrying out a two-factor analysis of variance, as discussed in Section 3.5. The two factors are then the site and the time. The model for the observation at site i at time j is

yij = μ + Si + Tj + eij

(5.1)

where μ represents an overall general level for the variable being measured, Si represents the deviation of site i from the general level, Tj represents a time effect, and eij represents measurement errors and other random variation that is associated with the observation at the site at the particular time.

The model given in equation (5.1) does not include a term for the interaction between sites and times, as is included in the general two-factor analysis-of- variance model as defined in equation (3.31). This is because there is only, at most, one observation for a site in a particular year, which means that it is not possible to separate interactions from measurement errors. Consequently, it must be assumed that any interactions are negligible.

132 Statistics for Environmental Science and Management, Second Edition

Example 5.1:  Analysis of Variance on the pH Values

The results of an analysis of variance on the pH values for Norwegian lakes are summarized in Table 5.4. The results in this table were obtained using the MINITAB package (Minitab 2008) using an option that takes into account the missing values, although many other standard statistical packages could have been used just as well. The effects in the model were assumed to be fixed rather than random (as discussed in Section 3.5), although since interactions are assumed to be negligible, the same results would be obtained using random effects. It is found that there is a very significant difference between the lakes (p < 0.001) and a nearly significant difference between the years (p = 0.061). Therefore there is no very strong evidence from this analysis of differences between years.

To check the assumptions of the analysis, standardized residuals (the differences between the actual observations and those predicted by the model, divided by their standard deviations) can be plotted against the lake, the year, and against their position in space for each of the four years. These plots are shown in Figures 5.3 and 5.4. These residuals show no obvious patterns, so that the model seems satisfactory, except that there are one or two residuals that are rather large.

Table 5.4

Analysis-of-Variance Table for Data on pH Levels in Norwegian Lakes

Source of

Sum of

Degrees of

 

 

 

Variation

Squaresa

Freedom

Mean Square

F

p-Value

Lake

58.70

  47

1.249

37.95

0.000

Year

  0.25

    3

0.083

  2.53

0.061

Error

  3.85

117

0.033

 

 

 

 

 

 

 

 

Total

62.80

167

 

 

 

aThe sums of squares shown here depend on the order in which effects are added into the model, which is species, the lake and then the year.

Residuals

Standardized

4

 

 

 

 

 

 

 

2

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

–2

 

 

 

 

 

 

 

–4

 

 

 

 

 

 

 

10

20

30

40

1

2

3

4

 

Lake

 

 

 

Year

 

Figure 5.3

Standardized residuals from the analysis-of-variance model for pH in Norwegian lakes plotted against the lake number and the year number.

Environmental Monitoring

133

Latitude

Latitude

62

1976

 

62

 

1977

 

 

 

 

 

 

 

61

 

 

 

61

 

 

 

60

 

 

 

60

 

 

 

59

 

 

 

59

 

 

 

58

 

 

 

58

 

 

 

6

8

10

12

6

8

10

12

62

 

1978

 

62

1981

 

 

 

 

 

 

 

 

61

 

 

 

61

 

 

 

60

 

 

 

60

 

 

 

59

 

 

 

59

 

 

 

58

 

 

 

58

 

 

 

6

8

10

12

6

8

10

12

 

Longitude

 

 

Longitude

 

Figure 5.4

Standardized residuals from the analysis-of-variance model for pH in Norwegian lakes plotted against the locations of the lakes. The standardized residuals are rounded to the nearest integer for clarity.

5.7  Detection of Changes Using Control Charts

Control charts are used to monitor industrial processes (Montgomery 2005), and they can be used equally well with environmental data. The simplest approach involves using an x chart to detect changes in a process mean, together with a range chart to detect changes in the amount of variation. These types of charts are often called Shewhart control charts after their originator (Shewhart 1931).

Typically, the starting point is a moderately large set of data consisting of M random samples of size n, where these are taken at equally spaced intervals of time from the output of the process. This set of data is then used to estimate the process mean and standard deviation, and hence to construct the two charts. The data are then plotted on the charts. It is usually assumed that the observations are normally distributed.

If the process seems to have a constant mean and standard deviation, then the sampling of the process is continued, with new points being plotted to monitor whatever is being measured. If the mean or standard deviation does not seem to have been constant for the time when the initial samples were taken, then, in the industrial process situation, action is taken to bring the process under control. With environmental monitoring, this may not be possible. However, the knowledge that the process being measured is not stable will be of interest anyway.

134 Statistics for Environmental Science and Management, Second Edition

The method for constructing the x-chart involves the following stages:

1.The sample mean and the sample range (the maximum value in a sample minus the minimum value in a sample) are calculated for each of the M samples. For the ith sample, let these values be denoted by xi and Ri.

2.The mean of the variable being measured is assumed to be constant and is estimated by the overall mean of all the available observa-

tions, which is also just the mean of the sample means x1 to xM. Let the estimated mean be denoted by μˆ .

3.Similarly, the standard deviation is assumed to have remained constant, and this is estimated on the basis of a known relationship between the mean range for samples of size n and the standard deviation for samples from a normal distribution. This relationship

is of the form σ = k(n)µR, where µR is the mean range for samples of size n, and the constant k(n) is given in Table 5.5. Thus the estimated standard deviation is

 

ˆ

(5.2)

σ= k(n)R

where R is the mean of the sample ranges.

Table 5.5

Control Chart Limits for Sample Ranges, Assuming Samples from Normal Distributions

Sample

Lower Limits

 

Upper Limits

SD

 

 

 

 

 

Size

Action

Warning

 

Warning

Action

Factor (k)

 

 

 

 

 

 

 

  2

0.00

0.04

2.81

4.12

0.887

  3

0.04

0.18

2.17

2.99

0.591

  4

0.10

0.29

1.93

2.58

0.486

  5

0.16

0.37

1.81

2.36

0.430

  6

0.21

0.42

1.72

2.22

0.395

  7

0.26

0.46

1.66

2.12

0.370

  8

0.29

0.50

1.62

2.04

0.351

  9

0.32

0.52

1.58

1.99

0.337

10

0.35

0.54

1.56

1.94

0.325

Source: Tables G1 and G2 of Davies and Goldsmith (1972).

Note: To find the limits on the range chart, multiply the mean range by the tabulated value. For example, for samples of size n = 5, the lower action limit is 0.16 μR, where μR is the mean range. With a stable distribution, a warning limit is crossed with probability 0.05 (5%) and an action limit with probability 0.002 (0.2%). The last column is the factor that the mean range must be multiplied by to obtain the standard deviation. For example, for samples of size 3 the standard deviation is 0.591 μR.

Environmental Monitoring

135

4.The standard error of the mean for samples of size n is estimated to be S(x) = σˆ/√n.

5.Warning limits are set at the mean ±1.96 standard errors, i.e., at μˆ ± 1.96S(x). If the mean and standard deviation are constant, then only

about 1 in 20 (5%) of sample means should be outside these limits. Action limits are set at the mean ±3.09 standard errors, i.e., at μˆ ± 3.09S(x). Only about 1 in 500 (0.2%) sample means should plot outside these limits.

The rationale behind constructing the x chart in this way is that it shows the changes in the sample means with time, and the warning and action limits indicate whether these changes are too large to be due to normal random variation if the mean is in fact constant.

With control charts, it is conventional to measure process variability using sample ranges on the grounds of simplicity, although standard deviations or variances could be used instead. Like x charts, range charts can have warning limits placed so that the probability of crossing one of these is 0.05 (5%), assuming that the level of variation is stable. Similarly, action limits can be placed so that the probability of crossing one of them is 0.002 (0.2%) when the level of variation is stable. The setting of these limits requires the use of tabulated values that are provided and explained in Table 5.5.

Control charts can be produced quite easily in a spreadsheet program. Alternatively, some statistical packages have options to produce the charts.

Example 5.2:  Monitoring pH in New Zealand

Table 5.6 shows data that were obtained from regular monitoring of rivers in the South Island of New Zealand. Values are provided for pH for five randomly chosen rivers, with a different selection for each of the monthly sample times from January 1989 to December 1997. The data are used to construct control charts for monitoring pH over the sampled time. As shown in Figure 5.5, the distribution is reasonably close to normal.

The overall mean of the pH values for all the samples is μˆ = 7.640. This is used as the best estimate of the process mean. The mean of the

sample ranges is R = 0.694. From Table 5.5, the factor to convert this to an estimate of the process standard deviation is k(5) = 0.43. The estimated standard deviation is therefore

σˆ = 0.43 × 0.694 = 0.298

Hence, the estimated standard error for the sample means is

SÊ(x) = 0.298/√5 = 0.133

The mean control chart is shown in Figure 5.6(a), with the action limits set at 7.640 ± 3.09 × 0.133 (i.e., 7.23 and 8.05), and the warning limits at 7.640 ± 1.96 × 0.133 (i.e., 7.38 and 7.90).