Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Казанский национальный исследовательский технологический университет

Предмет:

Химия

Файл:

Brereton Chemometrics

.pdf

Скачиваний:

Добавлен:

15.08.2013

Размер:

4.3 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 11 1213 / 5013 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

EXPERIMENTAL DESIGN	109

x1	x2	y
1	1	11.1540
1	0	12.4607
−1	0	6.3716
0	−1	6.1280
0	1	2.1698

1.By constructing the design matrix and then using the pseudo-inverse, calculate the coefﬁcients for the best ﬁt model given by the equation

y= b0 + b1x1 + b2x2 + b11x12 + b22x22 + b12x1x2

2.From these coefﬁcients, calculate the 12 predicted responses, and so the residual (modelling) error as the sum of squares of the residuals.

3. Calculate the contribution to this error of the replicates simply by calculating the average response over the four replicates, and then subtracting each replicate response, and summing the squares of these residuals.

4.Calculate the sum of square lack-of-ﬁt error by subtracting the value in question 3 from that in question 2.

5.Divide the lack-of-ﬁt and replicate errors by their respective degrees of freedom and comment.

Problem 2.10 The Application of a Plackett–Burman Design to the Screening of Factors Inﬂuencing a Chemical Reaction

Section 2.3.3

The yield of a reaction of the form

A + B −−→ C

is to be studied as inﬂuenced by 10 possible experimental conditions, as follows:

Factor		Units	Low	High

x1	% NaOH	%	40	50
x2	Temperature	◦C	80	110
x3	Nature of catalyst		A	B
x4	Stirring		Without	With
x5	Reaction time	min	90	210
x6	Volume of solvent	ml	100	200
x7	Volume of NaOH	ml	30	60
x8	Substrate/NaOH ratio	mol/ml	0.5 × 10−3	1 × 10−3
x9	Catalyst/substrate ratio	mol/ml	4 × 10−3	6 × 10−3
x10	Reagent/substrate ratio	mol/mol	1	1.25

110	CHEMOMETRICS

The design, including an eleventh dummy factor, is as follows, with the observed yields:

Expt No.	x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11	Yield (%)
1	− − − − − − − − − − −	15
2	+ + − + + + − − − + −	42
3	− + + − + + + − − − +	3
4	+ − + + − + + + − − −	57
5	− + − + + − + + + − −	38
6	− − + − + + − + + + −	37
7	− − − + − + + − + + +	74
8	+ − − − + − + + − + +	54
9	+ + − − − + − + + − +	56
10	+ + + − − − + − + + −	64
11	− + + + − − − + − + +	65
12	+ − + + + − − − + − +	59

1.Why is a dummy factor employed? Why is a Plackett–Burman design more desirable than a two level fractional factorial in this case?

2.Verify that all the columns are orthogonal to each other.

3.Set up a design matrix, D, and determine the coefﬁcients b0 to b11.

4.An alternative method for calculating the coefﬁcients for factorial designs such as the Plackett–Burman design is to multiply the yields of each experiment by the levels of the corresponding factor, summing these and dividing by 12. Verify that this provides the same answer for factor 1 as using the inverse matrix.

5.A simple method for reducing the number of experimental conditions for further study is to look at the size of the factors and eliminate those that are less than the dummy factor. How many factors remain and what are they?

Problem 2.11 Use of a Constrained Mixture Design to Investigate the Conductivity of a Molten Salt System

Section 2.5.4 Section 2.5.2.2

A molten salt system consisting of three components is prepared, and the aim is to investigate the conductivity according to the relative proportion of each component. The three components are as follows:

Component		Lower limit	Upper limit

x1	NdCl3	0.2	0.9
x2	LiCl	0.1	0.8
x3	KCl	0.0	0.7

The experiment is coded to give pseudo-components so that a value of 1 corresponds to the upper limit, and a value of 0 to the lower limit of each component. The experimental

EXPERIMENTAL DESIGN				111

results are as follows:

	z1	z2	z3	Conductivity ( −1 cm−1)
1		0	0	3.98
0		1	0	2.63
0		0	1	2.21
0.5		0.5	0	5.54
0.5		0	0.5	4.00
0		0.5	0.5	2.33
0.3333		0.3333	0.3333	3.23

1.Represent the constrained mixture space, diagrammatically, in the original mixture space. Explain why the constraints are possible and why the new reduced mixture space remains a triangle.

2.Produce a design matrix consisting of seven columns in the true mixture space as follows. The true composition of a component 1 is given by Z1(U1 − L1) + L1, where U and L are the upper and lower bounds for the component. Convert all three columns of the matrix above using this equation and then set up a design matrix, containing three single factor terms, and all possible two and three factor interaction terms (using a Sheffe´ model).

3.Calculate the model linking the conductivity to the proportions of the three salts.

4.Predict the conductivity when the proportion of the salts is 0.209, 0.146 and 0.645.

Problem 2.12 Use of Experimental Design and Principal Components Analysis for Reduction of Number of Chromatographic Tests

Section 2.4.5 Section 4.3.6.4 Section 4.3 Section 4.4.1

The following table represents the result of a number of tests performed on eight chromatographic columns, involving performing chromatography on eight compounds at pH 3 in methanol mobile phase, and measuring four peakshape parameters. Note that you may have to transpose the matrix in Excel for further work. The aim is to reduce the number of experimental tests necessary using experimental design. Each test is denoted by a mnemonic. The ﬁrst letter (e.g. P) stands for a compound, the second part of the name, k, N, N(df), or As standing for four peakshape/retention time measurements.

	Inertsil	Inertsil	Inertsil	Kromasil	Kromasil	Symmetry	Supelco	Purospher
	ODS	ODS-2	ODS-3	C18	C8	C18	ABZ+
Pk	0.25	0.19	0.26	0.3	0.28	0.54	0.03	0.04
PN	10 200	6930	7420	2980	2890	4160	6890	6960
PN(df)	2650	2820	2320	293	229	944	3660	2780
PAs	2.27	2.11	2.53	5.35	6.46	3.13	1.96	2.08
Nk	0.25	0.12	0.24	0.22	0.21	0.45	0	0
NN	12 000	8370	9460	13 900	16 800	4170	13 800	8260
NN(df)	6160	4600	4880	5330	6500	490	6020	3450

112							CHEMOMETRICS


	Inertsil	Inertsil	Inertsil	Kromasil	Kromasil	Symmetry	Supelco	Purospher
	ODS	ODS-2	ODS-3	C18	C8	C18	ABZ+
NAs	1.73	1.82	1.91	2.12	1.78	5.61	2.03	2.05
Ak	2.6	1.69	2.82	2.76	2.57	2.38	0.67	0.29
AN	10 700	14 400	11 200	10 200	13 800	11 300	11 700	7160
AN(df)	7790	9770	7150	4380	5910	6380	7000	2880
AAs	1.21	1.48	1.64	2.03	2.08	1.59	1.65	2.08
Ck	0.89	0.47	0.95	0.82	0.71	0.87	0.19	0.07
CN	10 200	10 100	8500	9540	12 600	9690	10 700	5300
CN(df)	7830	7280	6990	6840	8340	6790	7250	3070
CAs	1.18	1.42	1.28	1.37	1.58	1.38	1.49	1.66
Qk	12.3	5.22	10.57	8.08	8.43	6.6	1.83	2.17
QN	8800	13 300	10 400	10 300	11 900	9000	7610	2540
QN(df)	7820	11 200	7810	7410	8630	5250	5560	941
QAs	1.07	1.27	1.51	1.44	1.48	1.77	1.36	2.27
Bk	0.79	0.46	0.8	0.77	0.74	0.87	0.18	0
BN	15 900	12 000	10 200	11 200	14 300	10 300	11 300	4570
BN(df)	7370	6550	5930	4560	6000	3690	5320	2060
BAs	1.54	1.79	1.74	2.06	2.03	2.13	1.97	1.67
Dk	2.64	1.72	2.73	2.75	2.27	2.54	0.55	0.35
DN	9280	12 100	9810	7070	13 100	10 000	10 500	6630
DN(df)	5030	8960	6660	2270	7800	7060	7130	3990
DAs	1.71	1.39	1.6	2.64	1.79	1.39	1.49	1.57
Rk	8.62	5.02	9.1	9.25	6.67	7.9	1.8	1.45
RN	9660	13 900	11 600	7710	13 500	11 000	9680	5140
RN(df)	8410	10 900	7770	3460	9640	8530	6980	3270
RAs	1.16	1.39	1.65	2.17	1.5	1.28	1.41	1.56

1.Transpose the data so that the 32 tests correspond to columns of a matrix (variables) and the eight chromatographic columns to the rows of a matrix (objects). Standardise each column by subtracting the mean and dividing by the population standard deviation (Chapter 4, Section 4.3.6.4). Why is it important to standardise these data?

2.Perform PCA (principal components analysis) on these data and retain the ﬁrst three loadings (methods for performing PCA are discussed in Chapter 4, Section 4.3; see also Appendix A.2.1 and relevant sections of Appendices A.4 and A.5 if you are using Excel or Matlab).

3.Take the three loadings vectors and transform to a common scale as follows. For

each loadings vector select the most positive and most negative values, and code these to +1 and −1, respectively. Scale all the intermediate values in a similar fashion, leading to a new scaled loadings matrix of 32 columns and 3 rows. Produce the new scaled loadings vectors.

4.Select a factorial design as follows, with one extra point in the centre, to obtain a range of tests which is a representative subset of the original tests:

EXPERIMENTAL DESIGN				113


	Design point	PC1	PC2	PC3

1		−	−	−
2		+	−	−
3		−	+	−
4		+	+	−
5		−	−	+
6		+	−	+
7		−	+	+
8		+	+	+
	9	0	0	0

Calculate the Euclidean distance of each of the 32 scaled loadings from each of the nine design points; for example, the ﬁrst design point calculates the Euclidean distance of the loadings scaled as in question 3 from the point (−1,−1,−1), by the equation

d1 = (p11 + 1)2 + (p12 + 1)2 + (p13 + 1)2

(Chapter 4, Section 4.4.1).

5.Indicate the chromatographic parameters closest to the nine design points. Hence recommend a reduced number of chromatographic tests and comment on the strategy.

Problem 2.13 A Mixture Design with Constraints

Section 2.5.4

It is desired to perform a three factor mixture design with constraints on each factor as follows:

	x1	x2	x3
Lower	0.0	0.2	0.3
Upper	0.4	0.6	0.7

1.The mixture design is normally represented as an irregular polygon, with, in this case, six vertices. Calculate the percentage of each factor at the six coordinates.

2.It is desired to perform 13 experiments, namely on the six corners, in the middle of the six edges and in the centre. Produce a table of the 13 mixtures.

3.Represent the experiment diagrammatically.

Problem 2.14 Construction of Five Level Calibration Designs

Section 2.3.4

The aim is to construct a ﬁve level partial factorial (or calibration) design involving 25 experiments and up to 14 factors, each at levels −2, −1, 0, 1 and 2. Note that this design is only one of many possible such designs.

114	CHEMOMETRICS

1.Construct the experimental conditions for the ﬁrst factor using the following rules.

•The ﬁrst experiment is at level −2.

•This level is repeated for experiments 2, 8, 14 and 20.

•The levels for experiments 3–7 are given as follows (0, 2, 0, 0, 1).

•A cyclic permuter of the form 0 −−→ −1 −−→ 1 −−→ 2 −−→ 0 is then used. Each block of experiments 9–13, 15–19 and 21–25 are related by this permuter,

each block being one permutation away from the previous block, so experiments 9 and 10 are at levels −1 and 0, for example.

2.Construct the experimental conditions for the other 13 factors as follows.

•Experiment 1 is always at level −2 for all factors.

•The conditions for experiments 2–24 for the other factors are simply the cyclic permutation of the previous factor as explained in Section 2.3.4.

So produce the matrix of experimental conditions.

3.What is the difference vector used in this design?

4.Calculate the correlation coefﬁcients between all pairs of factors 1–14. Plot the two graphs of the levels of factor 1 versus factors 2 and 7. Comment.

Problem 2.15 A Four Component Mixture Design Used for Blending of Olive Oils

Section 2.5.2.2

Fourteen blends of olive oils from four cultivars A–D are mixed together in the design below presented together with a taste panel score for each blend. The higher the score the better the taste of the olive oil.

A	B	C	D	Score

1	0	0	0	6.86
0	1	0	0	6.50
0	0	1	0	7.29
0	0	0	1	5.88
0.5	0.5	0	0	7.31
0.5	0	0.5	0	6.94
0.5	0	0	0.5	7.38
0	0.5	0.5	0	7.00
0	0.5	0	0.5	7.13
0	0	0.5	0.5	7.31
0.333 33	0.333 33	0.333 33	0	7.56
0.333 33	0.333 33	0	0.333 33	7.25
0.333 33	0	0.333 33	0.333 33	7.31
0	0.333 33	0.333 33	0.333 33	7.38

1.It is desired to produce a model containing 14 terms, namely four linear, six two component and four three component terms. What is the equation for this model?

2.Set up the design matrix and calculate the coefﬁcients.

3.A good way to visualise the data is via contours in a mixture triangle, allowing three components to vary and constraining the fourth to be constant. Using a step size of 0.05, calculate the estimated responses from the model in question 2 when

EXPERIMENTAL DESIGN	115

D is absent and A + B + C = 1. A table of 231 numbers should be produced. Using a contour plot, visualise these data. If you use Excel, the upper right-hand half of the plot may contain meaningless data; to remove these, simply cover up this part of the contour plot with a white triangle. In modern versions of Matlab and some other software packages, triangular contour plots can be obtained straightforwardly Comment on the optimal blend using the contour plot when D is absent.

4.Repeat the contour plot in question 3 for the following: (i) A + B + D = 1, (ii) B + C + D = 1 and (iii) A + C + D = 1, and comment.

5.Why, in this example, is a strategy of visualisation of the mixture contours probably more informative than calculating a single optimum?

Problem 2.16 Central Composite Design Used to Study the Extraction of Olive Seeds in a Soxhlet

Section 2.4 Section 2.2.2

Three factors, namely (1) irradiation power as a percentage, (2) irradiation time in seconds and (3) number of cycles, are used to study the focused microwave assisted Soxhlet extraction of olive oil seeds, the response measuring the percentage recovery, which is to be optimised. A central composite design is set up to perform the experiments.

The results are as follows, using coded values of the variables:

Factor 1	Factor 2	Factor 3	Response

−1	−1	−1	46.64
−1	−1	1	47.23
−1	1	−1	45.51
−1	1	1	48.58
1	−1	−1	42.55
1	−1	1	44.68
1	1	−1	42.01
1	1	1	43.03
−1	0	0	49.18
1	0	0	44.59
0	−1	0	49.22
0	1	0	47.89
0	0	−1	48.93
0	0	1	49.93
0	0	0	50.51
0	0	0	49.33
0	0	0	49.01
0	0	0	49.93
0	0	0	49.63
0	0	0	50.54

1.A 10 parameter model is to be ﬁtted to the data, consisting of the intercept, all single factor linear and quadratic terms and all two factor interaction terms. Set up the design matrix, and by using the pseudo-inverse, calculate the coefﬁcients of the model using coded values.

116	CHEMOMETRICS

2. The true values of the factors are as follows:

Variable	−1	+1
Power (%)	30	60
Time (s)	20	30
Cycles	5	7

Re-express the model in question 1 in terms of the true values of each variable, rather than the coded values.

3.Using the model in question 1 and the coded design matrix, calculate the 20 predicted responses and the total error sum of squares for the 20 experiments.

4.Determine the sum of squares replicate error as follows: (i) calculate the mean response for the six replicates; (ii) calculate the difference between the true and average response, square these and sum the six numbers.

5.Determine the sum of squares lack-of-ﬁt error as follows: (i) replace the six replicate responses by the average response for the replicates; (ii) using the 20 responses (with the replicates averaged) and the corresponding predicted responses, calculate the differences, square them and sum them.

6.Verify that the sums of squares in questions 4 and 5 add up to the total error obtained in question 3.

7.How many degrees of freedom are available for assessment of the replicate and lack-of-ﬁt errors? Using this information, comment on whether the lack-of-ﬁt is signiﬁcant, and hence whether the model is adequate.

8.The signiﬁcance each term can be determined by omitting the term from the overall model. Assess the signiﬁcance of the linear term due to the ﬁrst factor and the interaction term between the ﬁrst and third factors in this way. Calculate a new design matrix with nine rather than ten columns, removing the relevant column, and also remove the corresponding coefﬁcients from the equation. Determine the new predicted responses using nine factors, and calculate the increase in sum of square error over that obtained in question 3. Comment on the signiﬁcance of these two terms.

9.Using coded values, determine the optimum conditions as follows. Discard the two interaction terms that are least signiﬁcant, resulting in eight remaining terms in the equation. Obtain the partial derivatives with respect to each of the three variables,

and set up three equations equal to zero. Show that the optimum value of the third factor is given by −b3/(2b33), where the coefﬁcients correspond to the linear and quadratic terms in the equations. Hence calculate the optimum coded values for each of the three factors.

10.Determine the optimum true values corresponding to the conditions obtained in question 9. What is the percentage recovery at this optimum? Comment.

Problem 2.17 A Three Component Mixture Design

Section 2.5.2

A three factor mixture simplex centroid mixture design is performed, with the following results:

EXPERIMENTAL DESIGN				117


	x1	x2	x3	Response

1		0	0	9
0		1	0	12
0		0	1	17
0.5		0.5	0	3
0.5		0	0.5	18
0		0.5	0.5	14
0.3333		0.3333	0.3333	11

1.A seven term model consisting of three linear terms, three two factor interaction terms and one three factor interaction term is ﬁtted to the data. Give the equation for this model, compute the design matrix and calculate the coefﬁcients.

2.Instead of seven terms, it is decided to ﬁt the model only to the three linear terms. Calculate these coefﬁcients using only three terms in the model employing the pseudo-inverse. Determine the root mean square error for the predicted responses, and comment on the difference in the linear terms in question 1 and the signiﬁcance of the interaction terms.

3.It is possible to convert the model of question 1 to a seven term model in two

independent factors, consisting of two linear terms, two quadratic terms, two linear interaction terms and a quadratic term of the form x1x2(x1 + x2). Show how the models relate algebraically.

4.For the model in question 3, set up the design matrix, calculate the new coefﬁcients and show how these relate to the coefﬁcients calculated in question 1 using the relationship obtained in question 3.

5.The matrices in questions 1, 2 and 4 all have inverses. However, a model that consisted of an intercept term and three linear terms would not, and it is impossible to use regression analysis to ﬁt the data under such circumstances. Explain these observations.

Chemometrics: Data Analysis for the Laboratory and Chemical Plant.

Richard G. Brereton

ISBNs: 0-471-48977-8 (HB); 0-471-48978-6 (PB)

3 Signal Processing

3.1 Sequential Signals in Chemistry

Sequential signals are surprisingly widespread in chemistry, and require a large number of methods for analysis. Most data are obtained via computerised instruments such as those for NIR, HPLC or NMR, and raw information such as peak integrals, peak shifts and positions is often dependent on how the information from the computer is ﬁrst processed. An appreciation of this step is essential prior to applying further multivariate methods such as pattern recognition or classiﬁcation. Spectra and chromatograms are examples of series that are sequential in time or frequency. However, time series also occur very widely in other areas of chemistry, for example in the area of industrial process control and natural processes.

3.1.1 Environmental and Geological Processes

An important source of data involves recording samples regularly with time. Classically such time series occur in environmental chemistry and geochemistry. A river might be sampled for the presence of pollutants such as polyaromatic hydrocarbons or heavy metals at different times of the year. Is there a trend, and can this be related to seasonal factors? Different and fascinating processes occur in rocks, where depth in the sediment relates to burial time. For example, isotope ratios are a function of climate, as relative evaporation rates of different isotopes are temperature dependent: certain speciﬁc cyclical changes in the Earth’s rotation have resulted in the Ice Ages and so climate changes, leave a systematic chemical record. A whole series of methods for time series analysis based primarily on the idea of correlograms (Section 3.4) can be applied to explore such types of cyclicity, which are often hard to elucidate. Many of these approaches were ﬁrst used by economists and geologists who also encounter related problems.

One of the difﬁculties is that long-term and interesting trends are often buried within short-term random ﬂuctuations. Statisticians distinguish between various types of noise which interfere with the signal as discussed in Section 3.2.3. Interestingly, the statistician Herman Wold, who is known among many chemometricians for the early development of the partial least squares algorithm, is probably more famous for his work on time series, studying this precise problem.

In addition to obtaining correlograms, a large battery of methods are available to smooth time series, many based on so-called ‘windows’, whereby data are smoothed over a number of points in time. A simple method is to take the average reading over ﬁve points in time, but sometimes this could miss out important information about cyclicity especially for a process that is sampled slowly compared to the rate of oscillation. A number of linear ﬁlters have been developed which are applicable to this time of data (Section 3.3), this procedure often being described as convolution.

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 11 1213 / 5013 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете Химия

#
15.08.20134.29 Mб19Baer M., Billing G.D. (eds.) - The role of degenerate states in chemistry (Adv.Chem.Phys. special issue, Wiley, 2002).pdf
#
15.08.20137.08 Mб61Basov N.I. i dr. Raschet i konstruirovanie formiruyushchego instrumenta dlya izgotovleniya izdelij (1991.pdf
#
15.08.20135.59 Mб73Becker O.M., MacKerell A.D., Roux B., Watanabe M. (eds.) Computational biochemistry and biophysic.pdf
#
15.08.2013324.82 Кб35benzyne-cyclization.pdf
#
15.08.201314.48 Mб21Borowko M. 2000 Computational methods in surface and colloid science.djvu
#
15.08.20134.3 Mб65Brereton Chemometrics.pdf
#
15.08.20131.07 Mб33Burshtejn K.Ya., Shorygin P.P. Kvantovohimicheskie raschety v organicheskoj himii i molekulyarnoj.pdf
#
15.08.201321.36 Mб50Carey F.A. - Organic Chemistry (2004)(en).djvu
#
15.08.201321.36 Mб43Carey F.A. Advanced organic chemistry 5ed., MGH, 2004.djvu
#
15.08.201311.62 Mб29Carey F.A. Advanced organic chemistry. Part A structure and mechanisms 1938.djvu
#
15.08.20138.77 Mб20Carey F.A. Advanced organic chemistry. Part B reaction and synthesis 1938.djvu