Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

1manly_b_f_j_statistics_for_environmental_science_and_managem

.pdf
Скачиваний:
8
Добавлен:
19.11.2019
Размер:
4.8 Mб
Скачать

16 Statistics for Environmental Science and Management, Second Edition

Example 1.7:  Evaluating the Attainment of Cleanup Standards

Many environmental studies are concerned with the specific problem of evaluating the effectiveness of the reclamation of a site that has suffered from some environmental damage. For example, a government agency might require a mining company to work on restoring a site until the biomass of vegetation per unit area is equivalent to what is found on undamaged reference areas. This requires a targeted study as defined previously.

There are two complications with using standard statistical methods in this situation. The first is that the damaged and reference sites are not generally selected randomly from populations of potential sites, and it is unreasonable to suppose that they would have had exactly the same mean for the study variable even in the absence of any impact on the damaged site. Therefore, if large samples are taken from each site, there will be a high probability of detecting a difference, irrespective of the extent to which the damaged site has been reclaimed. The second complication is that, when a test for a difference between the two sites does not give a significant result, this does not necessarily mean that a difference does not exist. An alternative explanation is that the sample sizes were not large enough to detect a difference that does exist.

These complications with statistical tests led to a recommendation by the U.S. Environmental Protection Agency (US EPA 1989a) that the null hypothesis for statistical tests should depend on the status of a site, in the following way:

1.If a site has not been declared to be contaminated, then the null hypothesis should be that it is clean, i.e., there is no difference from the control site. The alternative hypothesis is that the site is contaminated. A nonsignificant test result leads to the conclusion that there is no real evidence that the site is contaminated.

2.If a site has been declared to be contaminated, then the null hypothesis is that this is true, i.e., there is a difference (in an unacceptable direction) from the control site. The alternative hypothesis is that the site is clean. A nonsignificant test result leads to the conclusion that there is no real evidence that the site has been cleaned up.

The point here is that, once a site has been declared to have a certain status, pertinent evidence should be required to justify changing this status.

If the point of view expressed by items 1 and 2 is not adopted, so that the null hypothesis is always that the damaged site is not different from the control, then the agency charged with ensuring that the site is cleaned up is faced with setting up a maze of regulations to ensure that study designs have large enough sample sizes to detect differences of practical importance between the damaged and control sites. If this is not done, then it is apparent that any organization wanting to have the status of a site changed from contaminated to clean should carry out the smallest study possible, with low power to detect even a large difference from the control site. The probability of a nonsignificant test result (the site is clean) will then be as high as possible.

As an example of the type of data that may be involved in the comparison of a control site and a possibly contaminated one, consider some measurements of 1,2,3,4-tetrachlorobenzene (TcCB) in parts per

The Role of Statistics in Environmental Science

17

thousand million given by Gilbert and Simpson (1992, p. 6.22). There are 47 measurements made in different parts of the control site and 77 measurements made in different parts of the possibly contaminated site, as shown in Table 1.3 and Figure 1.6. Clearly the TcCB levels are much more

Table 1.3

Measurements of 1,2,3,4-Tetrachlorobenzene from Samples Taken at a Reference Site and a Possibly Contaminated Site

Reference Site (n = 47)

  0.60

  0.50

0.39

0.84

0.46

0.39

0.62

    0.67

0.69

0.81

0.38

0.79

  0.43

  0.57

0.74

0.27

0.51

0.35

0.28

    0.45

0.42

1.14

0.23

0.72

  0.63

  0.50

0.29

0.82

0.54

1.13

0.56

    1.33

0.56

1.11

0.57

0.89

  0.28

  1.20

0.76

0.26

0.34

0.52

0.42

    0.22

0.33

1.14

0.48

 

 

 

 

 

 

 

 

 

 

 

 

 

Mean

  0.60

 

 

 

 

 

 

 

 

 

 

SD

  0.28

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Possibly Contaminated Site (n = 75)

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  1.33

  0.09

0.12

0.28

0.14

0.16

0.17

    0.47

0.17

0.18

0.19

0.09

18.40

  0.20

0.21

0.12

0.22

0.22

0.22

168.6

0.24

0.25

0.25

0.20

  0.48

  0.26

5.56

0.21

0.29

0.31

0.33

    3.29

0.33

0.34

0.37

0.25

  2.59

  0.39

0.40

0.28

0.43

6.61

0.48

    0.17

0.49

0.51

0.51

0.38

  0.92

  0.60

0.61

0.43

0.75

0.82

0.85

    0.23

0.94

1.05

1.10

0.54

  1.53

  1.19

1.22

0.62

1.39

1.39

1.52

    0.33

1.73

2.35

2.46

1.10

51.97

  2.61

3.06

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mean

  4.02

 

 

 

 

 

 

 

 

 

 

SD

20.27

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

TcCB

1000

100

10

1

0.1

0.01

1

2

 

Site

Figure 1.6

Comparison of TcCB measurements in parts per thousand million at a contaminated site (2) and a reference site (1).

18 Statistics for Environmental Science and Management, Second Edition

variable at the possibly contaminated site. Presumably, this might have occurred from the TcCB levels being lowered in parts of the site by cleaning, while very high levels remained in other parts of the site.

Methods for comparing samples such as these in terms of means and variation are discussed further in Chapter 7. For the data in this example, the extremely skewed distribution at the contaminated site, with several very extreme values, should lead to some caution in making comparisons based on the assumption that distributions are normal within sites.

Example 1.8:  Delta Smelt in the Sacramento–San Joaquin Delta

The final example considered in this chapter concerns the decline in the abundance of delta smelt (Hypomesus transpacificus) in the Sacramento– San Joaquin Delta in California, that apparently started in about 2000. The delta smelt is classified as a threatened species in the delta, so that there has been particular concern in the decline of this fish, although there is also evidence of the decline of other pelagic fish numbers at the same time.

A number of surveys of the abundance of delta smelt take place each year. One of these surveys is called the fall midwater trawl, which samples adults over the four months September to December. From the survey results, an index of abundance is constructed by weighting the densities of fish observed in different parts of the delta by values that reflect the different amounts of habitats expected in those different regions. It is then assumed that the actual abundance of adult delta smelt in any year is proportional to the index value, at least approximately. Figure 1.7 shows the values of the fall midwater trawl index for the years since sampling began. The decline in numbers since 1999 is very clear.

There is much speculation about the cause of the decline in delta smelt numbers. Some claim that it is the result of the water being pumped out of the delta for use in towns and for agriculture, while others suggest

FMWT Abundance Index

2000

1500

1000

500

0

1970

1980

1990

2000

Figure 1.7

The fall midwater trawl abundance index for delta smelt for the years since sampling began in 1967, with no sampling in 1974 and 1979.

The Role of Statistics in Environmental Science

19

that, although pumping may cause some reduction in delta smelt numbers, the effect is not enough to account for the very large decline that has been observed. Unfortunately, determining the true cause of the decline is not a simple matter. Regression methods, as discussed in Chapter 3, can be used to find relationships between different variables, but this is not the same as proving causal relationships. To date, a number of explanations for the decline have been proposed, but the true cause is still unknown (Bennett 2005).

1.3  The Importance of Statistics in the Examples

The examples that are presented above demonstrate clearly the importance of statistical methods in environmental studies. With the Exxon Valdez oil spill, problems with the application of the study designs meant that rather complicated analyses were required to make inferences. With the Norwegian study on acid rain, there is a need to consider the impact of correlation in time and space on the water-quality variables that were measured. The estimation of the yearly survival rates of salmon in the Snake River requires the use of special models for analyzing mark-recapture experiments combined with the use of the theory of sampling for finite populations. Monitoring studies such as the one involving the measurement of tree-ring width in Argentina call for the use of methods for the detection of trends and abrupt changes in distributions. Monitoring of whole ecosystems as carried out by the Commission for the Conservation of Antarctic Marine Living Resources requires the collection and analysis of vast amounts of data, with many very complicated statistical problems. The comparison of samples from contaminated and reference sites may require the use of tests that are valid with extremely nonnormal distributions. Attempts to determine the cause of the recent decline in delta smelt and other fish numbers in the Sacramento–San Joaquin Delta has involved many standard and not-so-standard analyses to try to determine the relationship between delta smelt numbers and other variables. All of these matters are considered in more detail in the pages that follow.

1.4  Chapter Summary

Statistics is important in environmental science because much of what is known about the environment comes from numerical data.

Three broad types of study of interest to resource managers are baseline studies (to document the present state of the environment), targeted studies (to assess the impact of particular events), and regular monitoring (to detect trends and other changes in important variables).

20Statistics for Environmental Science and Management, Second Edition

All types of study involve sampling over time and space, and it is important that sampling designs be cost effective and, if necessary, that they can be justified in a court of law.

Eight examples are discussed to demonstrate the importance of statistical methods to environmental science. These examples involve the shoreline impact of the Exxon Valdez oil spill in Prince William Sound, Alaska, in March 1989; a Norwegian study of the possible impact of acid precipitation on small lakes; estimation of the survival rate of salmon in the Snake and Columbia Rivers in the Pacific Northwest of the United States; a large-scale perturbation experiment carried out in Wisconsin in the United States, involving changing the piscivore and planktivore composition of a lake and comparing changes in the chlorophyll composition with changes in a control lake; monitoring of the annual ring widths of Andean alders near Tucuman in Argentina; monitoring marine life in the Antarctic, comparing a possibly contaminated site with a control site in terms of measurements of the amounts of a pollutant in samples taken from the two sites; and the determination of the cause or causes of a reduction in the numbers of a species of fish (Hypomesus transpacificus, delta smelt) in the Sacramento–San Joaquin Delta.

Exercises

Exercise 1.1

Despite the large amounts of money spent on them, the studies on the effects of the Exxon Valdez oil spill on the coastal habitat of Alaska all failed to produce simple, easily understood estimates of these effects. What happened was that the oil spill took everyone by surprise, and none of the groups involved (state and federal agencies and Exxon) apparently was able to quickly produce a good sampling design and start collecting data. Instead, it seems that there were many committee meetings, but very little actually done while the short Alaskan summer was disappearing. With the benefit of hindsight, what do you think would have been a good approach to use for estimating the effects of the oil spill? This question is not asking for technical details—just a broad suggestion for how a study might have been designed.

Exercise 1.2

The study designs for the estimation of the survival rate of fish through the dams on the Snake River in Washington State have been criticized because the releases of batches of fish were made over short periods of time during only part of the migration season. The mark–recapture estimates may therefore not reflect the average survival rate for all fish over

The Role of Statistics in Environmental Science

21

the whole season if this rate varies during the season. (Issues like this seem to come up with all studies in controversial areas.) Assuming that you know the likely length of the migration season, and roughly how many fish are likely to pass through the dam each week, how would you suggest that the average survival rate be estimated?

2

Environmental Sampling

2.1  Introduction

All of the examples considered in the previous chapter involved sampling of some sort, showing that the design of sampling schemes is an important topic in environmental statistics. This chapter is therefore devoted to considering this topic in some detail. The estimation of mean values, totals, and proportions from the data collected by sampling is covered at the same time, and this means that the chapter includes all methods that are needed for many environmental studies.

The first task in designing a sampling scheme is to define the population of interest as well as the sample units that make up this population. Here the population is defined as a collection of items that are of interest, and the sample units are these items. In this chapter it is assumed that each of the items is characterized by the measurements that it has for certain variables (e.g., weight or length), or which of several categories it falls into (e.g., the color that it possesses, or the type of habitat where it is found). When this is the case, statistical theory can assist in the process of drawing conclusions about the population using information from a sample of some of the items.

Sometimes defining the population of interest and the sample units is straightforward because the extent of the population is obvious, and a natural sample unit exists. At other times, some more-or-less arbitrary definitions will be required. An example of a straightforward situation is where the population is all the farms in a region of a country and the variable of interest is the amount of water used for irrigation on a farm. This contrasts with the situation where there is interest in the impact of an oil spill on the flora and fauna on beaches. In that case, the extent of the area that might be affected may not be clear, and it may not be obvious which length of beach to use as a sample unit. The investigator must then subjectively choose the potentially affected area and impose a structure in terms of sample units. Furthermore, there will not be one correct size for the sample unit. A range of lengths of beach may serve equally well, taking into account the method that is used to take measurements. The choice of what to measure will also, of course, introduce some further subjective decisions.

23

24 Statistics for Environmental Science and Management, Second Edition

2.2  Simple Random Sampling

A simple random sample is one that is obtained by a process that gives each sample unit the same probability of being chosen. Usually it will be desirable to choose such a sample without replacement so that sample units are not used more than once. This gives slightly more accurate results than sampling with replacement, whereby individual units can appear two or more times in the sample. However, for samples that are small in comparison with the population size, the difference in the accuracy obtained is not great.

Obtaining a simple random sample is easiest when a sampling frame is available, where this is just a list of all the units in the population from which the sample is to be drawn. If the sampling frame contains units numbered from 1 to N, then a simple random sample of size n is obtained without replacement by drawing n numbers one by one in such a way that each choice is equally likely to be any of the numbers that have not already been used. For sampling with replacement, each of the numbers 1 to N is given the same chance of appearing at each draw.

The process of selecting the units to use in a sample is sometimes facilitated by using a table of random numbers such as the one shown in Table 2.1. As an example of how such a table can be used, suppose that a study area is divided into 116 quadrats as shown in Figure 2.1, and it is desirable to select a simple random sample of ten of these quadrats without replacement. To do this, first start at an arbitrary place in the table such as the beginning of row five. The first three digits in each block of four digits can then be considered, to give the series 698, 419, 008, 127, 106, 605, 843, 378, 462, 953, 745, and so on. The first ten different numbers between 1 and 116 then give a simple random sample of quadrats: 8, 106, and so on. For selecting large samples, essentially the same process can be carried out on a computer using pseudo-random numbers in a spreadsheet, for example.

2.3  Estimation of Population Means

Assume that a simple random sample of size n is selected without replacement from a population of N units, and that the variable of interest has values y1, y2, …, yn, for the sampled units. Then the sample mean is

n

 

y = (yi/n)

(2.1)

i=1

the sample variance is

Environmental Sampling

25

Table 2.1

Random Number Table

1252

9045

1286

2235

6289

5542

2965

1219

7088

1533

9135

3824

8483

1617

0990

4547

9454

9266

9223

9662

8377

5968

0088

9813

4019

1597

2294

8177

5720

8526

3789

9509

1107

7492

7178

7485

6866

0353

8133

7247

6988

4191

0083

1273

1061

6058

8433

3782

4627

9535

7458

7394

0804

6410

7771

9514

1689

2248

7654

1608

2136

8184

0033

1742

9116

6480

4081

6121

9399

2601

5693

3627

8980

2877

6078

0993

6817

7790

4589

8833

1813

0018

9270

2802

2245

8313

7113

2074

1510

1802

9787

7735

0752

3671

2519

1063

5471

7114

3477

7203

7379

6355

4738

8695

6987

9312

5261

3915

4060

5020

8763

8141

4588

0345

6854

4575

5940

1427

8757

5221

6605

3563

6829

2171

8121

5723

3901

0456

8691

9649

8154

6617

3825

2320

0476

4355

7690

9987

2757

3871

5855

0345

0029

6323

0493

8556

6810

7981

8007

3433

7172

6273

6400

7392

4880

2917

9748

6690

0147

6744

7780

3051

6052

6389

0957

7744

5265

7623

5189

0917

7289

8817

9973

7058

2621

7637

1791

1904

8467

0318

9133

5493

2280

9064

6427

2426

9685

3109

8222

0136

1035

4738

9748

6313

1589

0097

7292

6264

7563

2146

5482

8213

2366

1834

9971

2467

5843

1570

5818

4827

7947

2968

3840

9873

0330

1909

4348

4157

6470

5028

6426

2413

9559

2008

7485

0321

5106

0967

6471

5151

8382

7446

9142

2006

4643

8984

6677

8596

7477

3682

1948

6713

2204

9931

8202

9055

0820

6296

6570

0438

3250

5110

7397

3638

1794

2059

2771

4461

2018

4981

8445

1259

5679

4109

4010

2484

1495

3704

8936

1270

1933

6213

9774

1158

1659

6400

8525

6531

4712

6738

7368

9021

1251

3162

0646

2380

1446

2573

5018

1051

9772

1664

6687

4493

1932

6164

5882

0672

8492

1277

0868

9041

0735

1319

9096

6458

1659

1224

2968

9657

3658

6429

1186

0768

0484

1996

0338

4044

8415

1906

3117

6575

1925

6232

3495

4706

3533

7630

5570

9400

7572

1054

6902

2256

0003

2189

1569

1272

2592

0912

3526

1092

4235

0755

3173

1446

6311

3243

7053

7094

2597

8181

8560

6492

1451

1325

7247

1535

8773

0009

4666

0581

2433

9756

6818

1746

1273

1105

1919

0986

5905

5680

2503

0569

1642

3789

8234

4337

2705

6416

3890

0286

9414

9485

6629

4167

2517

9717

2582

8480

3891

5768

9601

3765

9627

6064

7097

2654

2456

3028

Note: Each digit was chosen such that 0, 1, …, 9 were equally likely to occur. The grouping of four digits is arbitrary so that, for example, to select numbers from 0 to 99999, the digits can be considered five at a time.