Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Brereton Chemometrics

.pdf
Скачиваний:
61
Добавлен:
15.08.2013
Размер:
4.3 Mб
Скачать

EXPERIMENTAL DESIGN

109

 

 

x1

x2

y

1

1

11.1540

1

0

12.4607

1

0

6.3716

0

1

6.1280

0

1

2.1698

1.By constructing the design matrix and then using the pseudo-inverse, calculate the coefficients for the best fit model given by the equation

y= b0 + b1x1 + b2x2 + b11x12 + b22x22 + b12x1x2

2.From these coefficients, calculate the 12 predicted responses, and so the residual (modelling) error as the sum of squares of the residuals.

3. Calculate the contribution to this error of the replicates simply by calculating the average response over the four replicates, and then subtracting each replicate response, and summing the squares of these residuals.

4.Calculate the sum of square lack-of-fit error by subtracting the value in question 3 from that in question 2.

5.Divide the lack-of-fit and replicate errors by their respective degrees of freedom and comment.

Problem 2.10 The Application of a Plackett–Burman Design to the Screening of Factors Influencing a Chemical Reaction

Section 2.3.3

The yield of a reaction of the form

A + B −−→ C

is to be studied as influenced by 10 possible experimental conditions, as follows:

Factor

 

Units

Low

High

 

 

 

 

 

x1

% NaOH

%

40

50

x2

Temperature

C

80

110

x3

Nature of catalyst

 

A

B

x4

Stirring

 

Without

With

x5

Reaction time

min

90

210

x6

Volume of solvent

ml

100

200

x7

Volume of NaOH

ml

30

60

x8

Substrate/NaOH ratio

mol/ml

0.5 × 103

1 × 103

x9

Catalyst/substrate ratio

mol/ml

4 × 103

6 × 103

x10

Reagent/substrate ratio

mol/mol

1

1.25

110

CHEMOMETRICS

 

 

The design, including an eleventh dummy factor, is as follows, with the observed yields:

Expt No.

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11

Yield (%)

1

− − − − − − − − − − −

15

2

+ + − + + + − − − + −

42

3

− + + − + + + − − − +

3

4

+ − + + − + + + − − −

57

5

− + − + + − + + + − −

38

6

− − + − + + − + + + −

37

7

− − − + − + + − + + +

74

8

+ − − − + − + + − + +

54

9

+ + − − − + − + + − +

56

10

+ + + − − − + − + + −

64

11

− + + + − − − + − + +

65

12

+ − + + + − − − + − +

59

1.Why is a dummy factor employed? Why is a Plackett–Burman design more desirable than a two level fractional factorial in this case?

2.Verify that all the columns are orthogonal to each other.

3.Set up a design matrix, D, and determine the coefficients b0 to b11.

4.An alternative method for calculating the coefficients for factorial designs such as the Plackett–Burman design is to multiply the yields of each experiment by the levels of the corresponding factor, summing these and dividing by 12. Verify that this provides the same answer for factor 1 as using the inverse matrix.

5.A simple method for reducing the number of experimental conditions for further study is to look at the size of the factors and eliminate those that are less than the dummy factor. How many factors remain and what are they?

Problem 2.11 Use of a Constrained Mixture Design to Investigate the Conductivity of a Molten Salt System

Section 2.5.4 Section 2.5.2.2

A molten salt system consisting of three components is prepared, and the aim is to investigate the conductivity according to the relative proportion of each component. The three components are as follows:

Component

 

Lower limit

Upper limit

 

 

 

 

x1

NdCl3

0.2

0.9

x2

LiCl

0.1

0.8

x3

KCl

0.0

0.7

The experiment is coded to give pseudo-components so that a value of 1 corresponds to the upper limit, and a value of 0 to the lower limit of each component. The experimental

EXPERIMENTAL DESIGN

 

 

111

 

 

 

 

 

 

results are as follows:

 

 

 

 

 

 

 

 

 

 

 

z1

z2

z3

Conductivity ( 1 cm1)

 

1

0

0

3.98

 

0

1

0

2.63

 

0

0

1

2.21

 

0.5

0.5

0

5.54

 

0.5

0

0.5

4.00

 

0

0.5

0.5

2.33

 

0.3333

0.3333

0.3333

3.23

 

 

 

 

 

 

 

1.Represent the constrained mixture space, diagrammatically, in the original mixture space. Explain why the constraints are possible and why the new reduced mixture space remains a triangle.

2.Produce a design matrix consisting of seven columns in the true mixture space as follows. The true composition of a component 1 is given by Z1(U1 L1) + L1, where U and L are the upper and lower bounds for the component. Convert all three columns of the matrix above using this equation and then set up a design matrix, containing three single factor terms, and all possible two and three factor interaction terms (using a Sheffe´ model).

3.Calculate the model linking the conductivity to the proportions of the three salts.

4.Predict the conductivity when the proportion of the salts is 0.209, 0.146 and 0.645.

Problem 2.12 Use of Experimental Design and Principal Components Analysis for Reduction of Number of Chromatographic Tests

Section 2.4.5 Section 4.3.6.4 Section 4.3 Section 4.4.1

The following table represents the result of a number of tests performed on eight chromatographic columns, involving performing chromatography on eight compounds at pH 3 in methanol mobile phase, and measuring four peakshape parameters. Note that you may have to transpose the matrix in Excel for further work. The aim is to reduce the number of experimental tests necessary using experimental design. Each test is denoted by a mnemonic. The first letter (e.g. P) stands for a compound, the second part of the name, k, N, N(df), or As standing for four peakshape/retention time measurements.

 

Inertsil

Inertsil

Inertsil

Kromasil

Kromasil

Symmetry

Supelco

Purospher

 

ODS

ODS-2

ODS-3

C18

C8

C18

ABZ+

 

Pk

0.25

0.19

0.26

0.3

0.28

0.54

0.03

0.04

PN

10 200

6930

7420

2980

2890

4160

6890

6960

PN(df)

2650

2820

2320

293

229

944

3660

2780

PAs

2.27

2.11

2.53

5.35

6.46

3.13

1.96

2.08

Nk

0.25

0.12

0.24

0.22

0.21

0.45

0

0

NN

12 000

8370

9460

13 900

16 800

4170

13 800

8260

NN(df)

6160

4600

4880

5330

6500

490

6020

3450

 

 

 

 

 

 

 

 

 

112

 

 

 

 

 

 

CHEMOMETRICS

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Inertsil

Inertsil

Inertsil

Kromasil

Kromasil

Symmetry

Supelco

Purospher

 

ODS

ODS-2

ODS-3

C18

C8

C18

ABZ+

 

NAs

1.73

1.82

1.91

2.12

1.78

5.61

2.03

2.05

Ak

2.6

1.69

2.82

2.76

2.57

2.38

0.67

0.29

AN

10 700

14 400

11 200

10 200

13 800

11 300

11 700

7160

AN(df)

7790

9770

7150

4380

5910

6380

7000

2880

AAs

1.21

1.48

1.64

2.03

2.08

1.59

1.65

2.08

Ck

0.89

0.47

0.95

0.82

0.71

0.87

0.19

0.07

CN

10 200

10 100

8500

9540

12 600

9690

10 700

5300

CN(df)

7830

7280

6990

6840

8340

6790

7250

3070

CAs

1.18

1.42

1.28

1.37

1.58

1.38

1.49

1.66

Qk

12.3

5.22

10.57

8.08

8.43

6.6

1.83

2.17

QN

8800

13 300

10 400

10 300

11 900

9000

7610

2540

QN(df)

7820

11 200

7810

7410

8630

5250

5560

941

QAs

1.07

1.27

1.51

1.44

1.48

1.77

1.36

2.27

Bk

0.79

0.46

0.8

0.77

0.74

0.87

0.18

0

BN

15 900

12 000

10 200

11 200

14 300

10 300

11 300

4570

BN(df)

7370

6550

5930

4560

6000

3690

5320

2060

BAs

1.54

1.79

1.74

2.06

2.03

2.13

1.97

1.67

Dk

2.64

1.72

2.73

2.75

2.27

2.54

0.55

0.35

DN

9280

12 100

9810

7070

13 100

10 000

10 500

6630

DN(df)

5030

8960

6660

2270

7800

7060

7130

3990

DAs

1.71

1.39

1.6

2.64

1.79

1.39

1.49

1.57

Rk

8.62

5.02

9.1

9.25

6.67

7.9

1.8

1.45

RN

9660

13 900

11 600

7710

13 500

11 000

9680

5140

RN(df)

8410

10 900

7770

3460

9640

8530

6980

3270

RAs

1.16

1.39

1.65

2.17

1.5

1.28

1.41

1.56

 

 

 

 

 

 

 

 

 

1.Transpose the data so that the 32 tests correspond to columns of a matrix (variables) and the eight chromatographic columns to the rows of a matrix (objects). Standardise each column by subtracting the mean and dividing by the population standard deviation (Chapter 4, Section 4.3.6.4). Why is it important to standardise these data?

2.Perform PCA (principal components analysis) on these data and retain the first three loadings (methods for performing PCA are discussed in Chapter 4, Section 4.3; see also Appendix A.2.1 and relevant sections of Appendices A.4 and A.5 if you are using Excel or Matlab).

3.Take the three loadings vectors and transform to a common scale as follows. For

each loadings vector select the most positive and most negative values, and code these to +1 and 1, respectively. Scale all the intermediate values in a similar fashion, leading to a new scaled loadings matrix of 32 columns and 3 rows. Produce the new scaled loadings vectors.

4.Select a factorial design as follows, with one extra point in the centre, to obtain a range of tests which is a representative subset of the original tests:

EXPERIMENTAL DESIGN

 

 

113

 

 

 

 

 

 

 

 

 

 

 

 

 

Design point

PC1

PC2

PC3

 

 

 

 

 

 

1

2

+

3

+

4

+

+

5

+

 

6

+

+

 

7

+

+

 

8

+

+

+

 

 

9

0

0

0

 

Calculate the Euclidean distance of each of the 32 scaled loadings from each of the nine design points; for example, the first design point calculates the Euclidean distance of the loadings scaled as in question 3 from the point (1,1,1), by the equation

d1 = (p11 + 1)2 + (p12 + 1)2 + (p13 + 1)2

(Chapter 4, Section 4.4.1).

5.Indicate the chromatographic parameters closest to the nine design points. Hence recommend a reduced number of chromatographic tests and comment on the strategy.

Problem 2.13 A Mixture Design with Constraints

Section 2.5.4

It is desired to perform a three factor mixture design with constraints on each factor as follows:

 

x1

x2

x3

Lower

0.0

0.2

0.3

Upper

0.4

0.6

0.7

 

 

 

 

1.The mixture design is normally represented as an irregular polygon, with, in this case, six vertices. Calculate the percentage of each factor at the six coordinates.

2.It is desired to perform 13 experiments, namely on the six corners, in the middle of the six edges and in the centre. Produce a table of the 13 mixtures.

3.Represent the experiment diagrammatically.

Problem 2.14 Construction of Five Level Calibration Designs

Section 2.3.4

The aim is to construct a five level partial factorial (or calibration) design involving 25 experiments and up to 14 factors, each at levels 2, 1, 0, 1 and 2. Note that this design is only one of many possible such designs.

114

CHEMOMETRICS

 

 

1.Construct the experimental conditions for the first factor using the following rules.

The first experiment is at level 2.

This level is repeated for experiments 2, 8, 14 and 20.

The levels for experiments 3–7 are given as follows (0, 2, 0, 0, 1).

A cyclic permuter of the form 0 −−→ −1 −−→ 1 −−→ 2 −−→ 0 is then used. Each block of experiments 9–13, 15–19 and 21–25 are related by this permuter,

each block being one permutation away from the previous block, so experiments 9 and 10 are at levels 1 and 0, for example.

2.Construct the experimental conditions for the other 13 factors as follows.

Experiment 1 is always at level 2 for all factors.

The conditions for experiments 2–24 for the other factors are simply the cyclic permutation of the previous factor as explained in Section 2.3.4.

So produce the matrix of experimental conditions.

3.What is the difference vector used in this design?

4.Calculate the correlation coefficients between all pairs of factors 1–14. Plot the two graphs of the levels of factor 1 versus factors 2 and 7. Comment.

Problem 2.15 A Four Component Mixture Design Used for Blending of Olive Oils

Section 2.5.2.2

Fourteen blends of olive oils from four cultivars A–D are mixed together in the design below presented together with a taste panel score for each blend. The higher the score the better the taste of the olive oil.

A

B

C

D

Score

 

 

 

 

 

1

0

0

0

6.86

0

1

0

0

6.50

0

0

1

0

7.29

0

0

0

1

5.88

0.5

0.5

0

0

7.31

0.5

0

0.5

0

6.94

0.5

0

0

0.5

7.38

0

0.5

0.5

0

7.00

0

0.5

0

0.5

7.13

0

0

0.5

0.5

7.31

0.333 33

0.333 33

0.333 33

0

7.56

0.333 33

0.333 33

0

0.333 33

7.25

0.333 33

0

0.333 33

0.333 33

7.31

0

0.333 33

0.333 33

0.333 33

7.38

 

 

 

 

 

1.It is desired to produce a model containing 14 terms, namely four linear, six two component and four three component terms. What is the equation for this model?

2.Set up the design matrix and calculate the coefficients.

3.A good way to visualise the data is via contours in a mixture triangle, allowing three components to vary and constraining the fourth to be constant. Using a step size of 0.05, calculate the estimated responses from the model in question 2 when

EXPERIMENTAL DESIGN

115

 

 

D is absent and A + B + C = 1. A table of 231 numbers should be produced. Using a contour plot, visualise these data. If you use Excel, the upper right-hand half of the plot may contain meaningless data; to remove these, simply cover up this part of the contour plot with a white triangle. In modern versions of Matlab and some other software packages, triangular contour plots can be obtained straightforwardly Comment on the optimal blend using the contour plot when D is absent.

4.Repeat the contour plot in question 3 for the following: (i) A + B + D = 1, (ii) B + C + D = 1 and (iii) A + C + D = 1, and comment.

5.Why, in this example, is a strategy of visualisation of the mixture contours probably more informative than calculating a single optimum?

Problem 2.16 Central Composite Design Used to Study the Extraction of Olive Seeds in a Soxhlet

Section 2.4 Section 2.2.2

Three factors, namely (1) irradiation power as a percentage, (2) irradiation time in seconds and (3) number of cycles, are used to study the focused microwave assisted Soxhlet extraction of olive oil seeds, the response measuring the percentage recovery, which is to be optimised. A central composite design is set up to perform the experiments.

The results are as follows, using coded values of the variables:

Factor 1

Factor 2

Factor 3

Response

 

 

 

 

1

1

1

46.64

1

1

1

47.23

1

1

1

45.51

1

1

1

48.58

1

1

1

42.55

1

1

1

44.68

1

1

1

42.01

1

1

1

43.03

1

0

0

49.18

1

0

0

44.59

0

1

0

49.22

0

1

0

47.89

0

0

1

48.93

0

0

1

49.93

0

0

0

50.51

0

0

0

49.33

0

0

0

49.01

0

0

0

49.93

0

0

0

49.63

0

0

0

50.54

 

 

 

 

1.A 10 parameter model is to be fitted to the data, consisting of the intercept, all single factor linear and quadratic terms and all two factor interaction terms. Set up the design matrix, and by using the pseudo-inverse, calculate the coefficients of the model using coded values.

116

CHEMOMETRICS

 

 

2. The true values of the factors are as follows:

Variable

1

+1

Power (%)

30

60

Time (s)

20

30

Cycles

5

7

 

 

 

Re-express the model in question 1 in terms of the true values of each variable, rather than the coded values.

3.Using the model in question 1 and the coded design matrix, calculate the 20 predicted responses and the total error sum of squares for the 20 experiments.

4.Determine the sum of squares replicate error as follows: (i) calculate the mean response for the six replicates; (ii) calculate the difference between the true and average response, square these and sum the six numbers.

5.Determine the sum of squares lack-of-fit error as follows: (i) replace the six replicate responses by the average response for the replicates; (ii) using the 20 responses (with the replicates averaged) and the corresponding predicted responses, calculate the differences, square them and sum them.

6.Verify that the sums of squares in questions 4 and 5 add up to the total error obtained in question 3.

7.How many degrees of freedom are available for assessment of the replicate and lack-of-fit errors? Using this information, comment on whether the lack-of-fit is significant, and hence whether the model is adequate.

8.The significance each term can be determined by omitting the term from the overall model. Assess the significance of the linear term due to the first factor and the interaction term between the first and third factors in this way. Calculate a new design matrix with nine rather than ten columns, removing the relevant column, and also remove the corresponding coefficients from the equation. Determine the new predicted responses using nine factors, and calculate the increase in sum of square error over that obtained in question 3. Comment on the significance of these two terms.

9.Using coded values, determine the optimum conditions as follows. Discard the two interaction terms that are least significant, resulting in eight remaining terms in the equation. Obtain the partial derivatives with respect to each of the three variables,

and set up three equations equal to zero. Show that the optimum value of the third factor is given by b3/(2b33), where the coefficients correspond to the linear and quadratic terms in the equations. Hence calculate the optimum coded values for each of the three factors.

10.Determine the optimum true values corresponding to the conditions obtained in question 9. What is the percentage recovery at this optimum? Comment.

Problem 2.17 A Three Component Mixture Design

Section 2.5.2

A three factor mixture simplex centroid mixture design is performed, with the following results:

EXPERIMENTAL DESIGN

 

 

117

 

 

 

 

 

 

 

 

 

 

 

 

 

x1

x2

x3

Response

 

 

 

 

 

 

1

0

0

9

 

0

1

0

12

 

0

0

1

17

 

0.5

0.5

0

3

 

0.5

0

0.5

18

 

0

0.5

0.5

14

 

0.3333

0.3333

0.3333

11

 

 

 

 

 

 

 

1.A seven term model consisting of three linear terms, three two factor interaction terms and one three factor interaction term is fitted to the data. Give the equation for this model, compute the design matrix and calculate the coefficients.

2.Instead of seven terms, it is decided to fit the model only to the three linear terms. Calculate these coefficients using only three terms in the model employing the pseudo-inverse. Determine the root mean square error for the predicted responses, and comment on the difference in the linear terms in question 1 and the significance of the interaction terms.

3.It is possible to convert the model of question 1 to a seven term model in two

independent factors, consisting of two linear terms, two quadratic terms, two linear interaction terms and a quadratic term of the form x1x2(x1 + x2). Show how the models relate algebraically.

4.For the model in question 3, set up the design matrix, calculate the new coefficients and show how these relate to the coefficients calculated in question 1 using the relationship obtained in question 3.

5.The matrices in questions 1, 2 and 4 all have inverses. However, a model that consisted of an intercept term and three linear terms would not, and it is impossible to use regression analysis to fit the data under such circumstances. Explain these observations.

Chemometrics: Data Analysis for the Laboratory and Chemical Plant.

Richard G. Brereton

Copyright 2003 John Wiley & Sons, Ltd.

ISBNs: 0-471-48977-8 (HB); 0-471-48978-6 (PB)

3 Signal Processing

3.1 Sequential Signals in Chemistry

Sequential signals are surprisingly widespread in chemistry, and require a large number of methods for analysis. Most data are obtained via computerised instruments such as those for NIR, HPLC or NMR, and raw information such as peak integrals, peak shifts and positions is often dependent on how the information from the computer is first processed. An appreciation of this step is essential prior to applying further multivariate methods such as pattern recognition or classification. Spectra and chromatograms are examples of series that are sequential in time or frequency. However, time series also occur very widely in other areas of chemistry, for example in the area of industrial process control and natural processes.

3.1.1 Environmental and Geological Processes

An important source of data involves recording samples regularly with time. Classically such time series occur in environmental chemistry and geochemistry. A river might be sampled for the presence of pollutants such as polyaromatic hydrocarbons or heavy metals at different times of the year. Is there a trend, and can this be related to seasonal factors? Different and fascinating processes occur in rocks, where depth in the sediment relates to burial time. For example, isotope ratios are a function of climate, as relative evaporation rates of different isotopes are temperature dependent: certain specific cyclical changes in the Earth’s rotation have resulted in the Ice Ages and so climate changes, leave a systematic chemical record. A whole series of methods for time series analysis based primarily on the idea of correlograms (Section 3.4) can be applied to explore such types of cyclicity, which are often hard to elucidate. Many of these approaches were first used by economists and geologists who also encounter related problems.

One of the difficulties is that long-term and interesting trends are often buried within short-term random fluctuations. Statisticians distinguish between various types of noise which interfere with the signal as discussed in Section 3.2.3. Interestingly, the statistician Herman Wold, who is known among many chemometricians for the early development of the partial least squares algorithm, is probably more famous for his work on time series, studying this precise problem.

In addition to obtaining correlograms, a large battery of methods are available to smooth time series, many based on so-called ‘windows’, whereby data are smoothed over a number of points in time. A simple method is to take the average reading over five points in time, but sometimes this could miss out important information about cyclicity especially for a process that is sampled slowly compared to the rate of oscillation. A number of linear filters have been developed which are applicable to this time of data (Section 3.3), this procedure often being described as convolution.

Соседние файлы в предмете Химия