
Brereton Chemometrics
.pdf
APPENDICES |
413 |
|
|
Compute the Component and Calculate Residuals
7. Subtract the effect of the new PC from the datamatrix to obtain a residual
data matrix:
resid X = X − t.p
Further PCs
8.If it is desired to compute further PCs, substitute the residual data matrix for X and go to step 2.
A.2.2 PLS1
There are several implementations; the one below is noniterative.
Initialisation
1.Take a matrix Z and, if required, preprocess (e.g. mean centre or standardise) to give the matrix X which is used for PLS.
2.Take the concentration vector k and preprocess it to give the vector c which is used for PLS. Note that if the data matrix Z is centred down the columns, the
concentration vector must also be centred. Generally, centring is the only form of preprocessing useful for PLS1. Start with an estimate of cˆ that is a vector of 0s (equal to the mean concentration if the vector is already centred).
New PLS Component
3. Calculate the vector
h = X .c
4. Calculate the scores, which are simply given by
X.h t =
h2
5. Calculate the x loadings by
t .X p =
t2
6. Calculate the c loading (a scalar) by
c .t q =
t2
Compute the Component and Calculate Residuals
7. Subtract the effect of the new PLS component from the data matrix to get a residual
data matrix:
resid X = X − t.p

APPENDICES |
415 |
|
|
5. |
Calculate the guessed scores by |
|
|
|
|
|
|
|
|
new t |
|
|
X.h |
||||
|
ˆ = |
|
|
|
|
|
||
|
|
|
|
|||||
|
|
h2 |
|
|||||
6. |
Calculate the guessed x loadings by |
|
|
|
|
|
|
|
|
|
|
|
t .X |
||||
|
pˆ = |
|
ˆ |
|
||||
|
|
|
t2 |
|||||
|
|
|
|
ˆ |
||||
7. |
Calculate the c loadings (a vector rather than scalar in PLS2) by |
|||||||
|
|
|
|
C .t |
||||
|
qˆ = |
|
|
ˆ |
|
|||
|
|
|
t2 |
|||||
|
|
|
ˆ |
|||||
8. |
If this is the first iteration, remember the scores, and call them initial t, then produce |
|||||||
|
a new vector u by |
|
|
C.qˆ |
|
|
||
|
u |
= |
|
|
||||
|
|
|
||||||
|
|
|
|
|
|
|||
|
|
|
|
|
q2 |
and return to step 4.
Check for Convergence
9.If this is the second time round, compare the new and old scores vectors for
example, by looking at the size of the sum of square difference in the old and new scores, i.e. (initial tˆ − new tˆ)2. If this is small the PLS component has been
adequately modelled, set the PLS scores (t) and both types of loadings (p and c) for the current PC to tˆ, pˆ , and qˆ . Otherwise, calculate a new value of u as in step 8 and return to step 4.
Compute the Component and Calculate Residuals
10.Subtract the effect of the new PLS component from the data matrix to obtain a residual data matrix:
|
resid X = X − t.p |
|||||||
11. Determine the new concentration estimate by |
+ |
|
||||||
|
ˆ |
|
= |
|
|
ˆ |
|
|
|
new C |
|
initial C |
|
t.q |
|||
and sum the contribution of all |
components calculated to give an |
|||||||
estimated cˆ. Calculate |
resid |
C = |
true |
C |
− |
ˆ |
||
|
|
|
|
|
|
|
|
C |
Further PLS Components
12.If further components are required, replace both X and C by the residuals and return to step 3.

416 |
CHEMOMETRICS |
|
|
A.2.4 Tri-linear PLS1
The algorithm below is based closely on PLS1 and is suitable when there is only one column in the c vector.
Initialisation
1.Take a three-way tensor Z and, if required, preprocess (e.g. mean centre or stan-
dardise) to give the tensor X which is used for PLS. Perform all preprocessing on this tensor. The tensor has dimensions I × J × K.
2.Preprocess the concentrations if appropriate to give a vector c.
New PLS Component
3. From the original tensor, create a new matrix H with dimensions J × K which is the sum of each of the I matrices for each of the samples multiplied by the concentration of the analyte for the relevant sample, i.e.
H = X1c1 + X2c2 + · · · + XI cI
or, as a summation
I
hjk = ci xijk i=1
4.Perform PCA on H to obtain the scores and loadings, h t and h p for the first PC of H. Note that only the first PC is retained, and for each new PLS component a fresh H matrix is obtained.
5.Calculate the two x loadings for the current PLS component of the overall dataset by normalising the scores and loadings of H, i.e.
j p = ht
ht2
k p = hp
hp2
(the second step is generally not necessary for most PCA algorithm as hp is usually normalised).
6. Calculate the overall scores by
J |
K |
|
|
ti = |
xijk j pj k pk |
j =1 k=1
7. Calculate the c loadings vector
q = (T .T )−1.T .c

APPENDICES |
417 |
|
|
where T is the scores matrix each column consisting of one component (a vector for the first PLS component).
Compute the Component and Calculate Residuals
8.Subtract the effect of the new PLS component from the original data matrix to obtain a residual data matrix (for each sample i):
resid |
j k |
|
Xi = Xi − ti . p. p |
9. Determine the new concentration estimates by
cˆ = T .q
Calculate
resid c = true c − cˆ
Further PLS Components
10.If further components are required, replace both X and c by the residuals and return to step 3.
A.3 Basic Statistical Concepts
There are numerous texts on basic statistics, some of them oriented towards chemists. It is not the aim of this section to provide a comprehensive background, but simply to provide the main definitions and tables that are helpful for using this text.
A.3.1 Descriptive Statistics
A.3.1.1 Mean
The mean of a series of measurements is defined by
I
x = xi /I
i=1
Conventionally a bar is placed above the letter. Sometimes the letter m is used, but in this text we will avoid this, as m is often used to denote an index. Hence the mean of the measurements
4 8 5 − 6 2 − 5 6 0
is x = (4 + 8 + 5 − 6 + 2 − 5 + 6 + 0)/8 = 1.75.
Statistically, this sample mean is often considered an estimate of the true population mean sometimes denoted by µ. The population involves all possible samples, whereas only a selection are observed. In some cases in chemometrics this distinction is not so

418 |
CHEMOMETRICS |
|
|
clear; for example, the mean intensity at a given wavelength over a chromatogram is a purely experimental variable.
A.3.1.2 Variance and Standard Deviation
The estimated or sample variance of a series of measurements is defined by
I
ν = (xi − x)2/(I − 1)
i=1
which can also be calculated using the equation
|
|
I |
2 |
|
|
|
|
|
2 |
|
|
|
|
ν |
= |
|
/(I − 1)−− |
× |
|
− 1 |
xi |
I /(I |
|||||
|
|
x |
|
) |
||
|
|
i=1 |
|
|
|
|
So the variance of the data in Section A.3.1.1 is
ν = (42 + 82 + 52 + 62 + 22 + 52 + 62 + 02)/7 − 1.752 × 8/7 = 25.928
This equation is useful when it is required to estimate the variance from a series of samples. However, the true population variance is defined by
I I
ν = (xi − x)2/I = xi2/I − x2
i=1 i=1
The reason why there is a factor of I − 1 when using measurements in a number of samples to estimate statistics is because one degree of freedom is lost when determining variance experimentally. For example, if we record one sample, the sum of squares
I=1 (xi − x)2 must be equal to 0, but this does not imply that the variance of the
i
parent population is 0. As the number of samples increases, this small correction is not very important, and sometimes ignored.
The standard deviation, s, is simply the square root of the variance. The population standard deviation is sometimes denoted by σ .
In chemometrics it is usual to use the population and not the sample standard deviation for standardising a data matrix. The reason is that we are not trying to estimate parameters in this case, but just to put different variables on a similar scale.
A.3.1.3 Covariance and Correlation Coefficient
The covariance between two variables is a method for determining how closely they follow similar trends. It will never exceed in magnitude the geometric mean of the variance of the two variables; the lower is the value, the less close are the trends. Both variables must be measured for an identical number of samples, I in this case. The sample or estimated covariance between variables x and y is defined by
I
covxy = (xi − x)(yi − y)/(I − 1)
i=1

APPENDICES |
419 |
|
|
whereas the population statistic is given by
I
covxy = (xi − x)(yi − y)/I
i=1
Unlike the variance, it is perfectly possible for a covariance to take on negative values. Many chemometricians prefer to use the correlation coefficient, given by
r |
|
|
|
I |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
(xi − |
|
|
|
|
|
||||
xy = |
covxy |
= |
|
|
i=1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
sx .sy |
|
|
|
|
|
|
|
|
|
|
||||
I |
(xi |
|
x)2 |
I |
(yi |
|
|
y)2 |
||||||
|
|
|
|
− |
i |
− |
||||||||
|
|
|
i |
1 |
|
|
1 |
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
= |
|
|
|
|
= |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note that the definition of the correlation coefficient is identical both for samples and populations.
The correlation coefficient has a value between −1 and +1. If close to +1, the two variables are perfectly correlated. In many applications, correlation coefficients of −1 also indicate a perfect relationship. Under such circumstances, the value of y can be exactly predicted if we know x. The closer the correlation coefficients are to zero, the harder it is to use one variable to predict another. Some people prefer to use the square of the correlation coefficient which varies between 0 and 1.
If two columns of a matrix have a correlation coefficient of ±1, the matrix is said to be rank deficient and has a determinant of 0, and so no inverse; this has consequences both in experimental design and in regression. There are various ways around this, such as by removing selected variables.
In some areas of chemometrics we used a variance–covariance matrix. This is a square matrix, whose dimensions usually equal the number of variables in a dataset, for example, if there are 20 variables the matrix has dimensions 20 × 20. The diagonal elements equal the variance of each variable and the off-diagonal elements the covariances. This matrix is symmetric about the diagonal. It is usual to employ population rather than sample statistics for this calculation.
A.3.2 Normal Distribution
The normal distribution is an important statistical concept. There are many ways of introducing such distributions. Many texts use a probability density function
|
= |
σ |
√2π |
|
− 2 |
σ |
|
|
||
f (x) |
|
|
1 |
exp |
1 |
|
x − µ |
|
2 |
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
This rather complicated equation can be interpreted as follows. The function f (x) is proportional to the probability that a measurement has a value x for a normally distributed population of mean µ and standard deviation σ . The function is scaled so that the area under the normal distribution curve is 1.

420 |
CHEMOMETRICS |
|
|
Table A.1 Cumulative standardised normal distribution.
|
|
Values of cumulative probability for a given number of standard deviations from the mean. |
|
|
|
||||||||||||||||
0.0 |
0.00 |
|
0.01 |
|
0.02 |
|
0.03 |
|
0.04 |
|
0.05 |
|
0.06 |
|
0.07 |
|
0.08 |
|
0.09 |
|
|
0.500 |
00 |
0.503 |
99 |
0.507 |
98 |
0.511 |
97 |
0.515 |
95 |
0.519 |
94 |
0.523 |
92 |
0.527 |
90 |
0.531 |
88 |
0.535 |
86 |
|
|
0.1 |
0.539 |
83 |
0.543 |
80 |
0.547 |
76 |
0.551 |
72 |
0.555 |
67 |
0.559 |
62 |
0.563 |
56 |
0.567 |
49 |
0.571 |
42 |
0.575 |
35 |
|
0.2 |
0.579 |
26 |
0.583 |
17 |
0.587 |
06 |
0.590 |
95 |
0.594 |
83 |
0.598 |
71 |
0.602 |
57 |
0.606 |
42 |
0.610 |
26 |
0.614 |
09 |
|
0.3 |
0.617 |
91 |
0.621 |
72 |
0.625 |
52 |
0.629 |
30 |
0.633 |
07 |
0.636 |
83 |
0.640 |
58 |
0.644 |
31 |
0.648 |
03 |
0.651 |
73 |
|
0.4 |
0.655 |
42 |
0.659 |
10 |
0.662 |
76 |
0.666 |
40 |
0.670 |
03 |
0.673 |
64 |
0.677 |
24 |
0.680 |
82 |
0.684 |
39 |
0.687 |
93 |
|
0.5 |
0.691 |
46 |
0.694 |
97 |
0.698 |
47 |
0.701 |
94 |
0.705 |
40 |
0.708 |
84 |
0.712 |
26 |
0.715 |
66 |
0.719 |
04 |
0.722 |
40 |
|
0.6 |
0.725 |
75 |
0.729 |
07 |
0.732 |
37 |
0.735 |
65 |
0.738 |
91 |
0.742 |
15 |
0.745 |
37 |
0.748 |
57 |
0.751 |
75 |
0.754 |
90 |
|
0.7 |
0.758 |
04 |
0.761 |
15 |
0.764 |
24 |
0.767 |
30 |
0.770 |
35 |
0.773 |
37 |
0.776 |
37 |
0.779 |
35 |
0.782 |
30 |
0.785 |
24 |
|
0.8 |
0.788 |
14 |
0.791 |
03 |
0.793 |
89 |
0.796 |
73 |
0.799 |
55 |
0.802 |
34 |
0.805 |
11 |
0.807 |
85 |
0.810 |
57 |
0.813 |
27 |
|
0.9 |
0.815 |
94 |
0.818 |
59 |
0.821 |
21 |
0.823 |
81 |
0.826 |
39 |
0.828 |
94 |
0.831 |
47 |
0.833 |
98 |
0.836 |
46 |
0.838 |
91 |
|
1.0 |
0.841 |
34 |
0.843 |
75 |
0.846 |
14 |
0.848 |
49 |
0.850 |
83 |
0.853 |
14 |
0.855 |
43 |
0.857 |
69 |
0.859 |
93 |
0.862 |
14 |
|
1.1 |
0.864 |
33 |
0.866 |
50 |
0.868 |
64 |
0.870 |
76 |
0.872 |
86 |
0.874 |
93 |
0.876 |
98 |
0.879 |
00 |
0.881 |
00 |
0.882 |
98 |
|
1.2 |
0.884 |
93 |
0.886 |
86 |
0.888 |
77 |
0.890 |
65 |
0.892 |
51 |
0.894 |
35 |
0.896 |
17 |
0.897 |
96 |
0.899 |
73 |
0.901 |
47 |
|
1.3 |
0.903 |
20 |
0.904 |
90 |
0.906 |
58 |
0.908 |
24 |
0.909 |
88 |
0.911 |
49 |
0.913 |
08 |
0.914 |
66 |
0.916 |
21 |
0.917 |
74 |
|
1.4 |
0.919 |
24 |
0.920 |
73 |
0.922 |
20 |
0.923 |
64 |
0.925 |
07 |
0.926 |
47 |
0.927 |
85 |
0.929 |
22 |
0.930 |
56 |
0.931 |
89 |
|
1.5 |
0.933 |
19 |
0.934 |
48 |
0.935 |
74 |
0.936 |
99 |
0.938 |
22 |
0.939 |
43 |
0.940 |
62 |
0.941 |
79 |
0.942 |
95 |
0.944 |
08 |
|
1.6 |
0.945 |
20 |
0.946 |
30 |
0.947 |
38 |
0.948 |
45 |
0.949 |
50 |
0.950 |
53 |
0.951 |
54 |
0.952 |
54 |
0.953 |
52 |
0.954 |
49 |
|
1.7 |
0.955 |
43 |
0.956 |
37 |
0.957 |
28 |
0.958 |
18 |
0.959 |
07 |
0.959 |
94 |
0.960 |
80 |
0.961 |
64 |
0.962 |
46 |
0.963 |
27 |
|
1.8 |
0.964 |
07 |
0.964 |
85 |
0.965 |
62 |
0.966 |
38 |
0.967 |
12 |
0.967 |
84 |
0.968 |
56 |
0.969 |
26 |
0.969 |
95 |
0.970 |
62 |
|
1.9 |
0.971 |
28 |
0.971 |
93 |
0.972 |
57 |
0.973 |
20 |
0.973 |
81 |
0.974 |
41 |
0.975 |
00 |
0.975 |
58 |
0.976 |
15 |
0.976 |
70 |
|
2.0 |
0.977 |
25 |
0.977 |
78 |
0.978 |
31 |
0.978 |
82 |
0.979 |
32 |
0.979 |
82 |
0.980 |
30 |
0.980 |
77 |
0.981 |
24 |
0.981 |
69 |
|
2.1 |
0.982 |
14 |
0.982 |
57 |
0.983 |
00 |
0.983 |
41 |
0.983 |
82 |
0.984 |
22 |
0.984 |
61 |
0.985 |
00 |
0.985 |
37 |
0.985 |
74 |
|
2.2 |
0.986 |
10 |
0.986 |
45 |
0.986 |
79 |
0.987 |
13 |
0.987 |
45 |
0.987 |
78 |
0.988 |
09 |
0.988 |
40 |
0.988 |
70 |
0.988 |
99 |
|
2.3 |
0.989 |
28 |
0.989 |
56 |
0.989 |
83 |
0.990 |
10 |
0.990 |
36 |
0.990 |
61 |
0.990 |
86 |
0.991 |
11 |
0.991 |
34 |
0.991 |
58 |
|
2.4 |
0.991 |
80 |
0.992 |
02 |
0.992 |
24 |
0.992 |
45 |
0.992 |
66 |
0.992 |
86 |
0.993 |
05 |
0.993 |
24 |
0.993 |
43 |
0.993 |
61 |
|
2.5 |
0.993 |
79 |
0.993 |
96 |
0.994 |
13 |
0.994 |
30 |
0.994 |
46 |
0.994 |
61 |
0.994 |
77 |
0.994 |
92 |
0.995 |
06 |
0.995 |
20 |
|
2.6 |
0.995 |
34 |
0.995 |
47 |
0.995 |
60 |
0.995 |
73 |
0.995 |
85 |
0.995 |
98 |
0.996 |
09 |
0.996 |
21 |
0.996 |
32 |
0.996 |
43 |
|
2.7 |
0.996 |
53 |
0.996 |
64 |
0.996 |
74 |
0.996 |
83 |
0.996 |
93 |
0.997 |
02 |
0.997 |
11 |
0.997 |
20 |
0.997 |
28 |
0.997 |
36 |
|
2.8 |
0.997 |
44 |
0.997 |
52 |
0.997 |
60 |
0.997 |
67 |
0.997 |
74 |
0.997 |
81 |
0.997 |
88 |
0.997 |
95 |
0.998 |
01 |
0.998 |
07 |
|
2.9 |
0.998 |
13 |
0.998 |
19 |
0.998 |
25 |
0.998 |
31 |
0.998 |
36 |
0.998 |
41 |
0.998 |
46 |
0.998 |
51 |
0.998 |
56 |
0.998 |
61 |
|
3.0 |
0.0 |
|
0.1 |
|
0.2 |
|
0.3 |
|
0.4 |
|
0.5 |
|
0.6 |
|
0.7 |
|
0.8 |
|
0.9 |
|
|
0.998 |
65 |
0.999 |
03 |
0.999 |
31 |
0.999 |
52 |
0.999 |
66 |
0.999 |
77 |
0.999 |
84 |
0.999 |
89 |
0.999 |
93 |
0.999 |
95 |
|
|
4.0 |
0.999 |
968 |
0.999 |
979 |
0.999 |
987 |
0.999 |
991 |
0.999 |
995 |
0.999 |
997 |
0.999 |
998 |
0.999 |
999 |
0.999 |
999 |
1.000 |
000 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Most tables deal with the standardised normal distribution. This involves first standardising the raw data, to give a new value z, and the equation simplifies to
1 |
exp − |
z2 |
|
|||
f (z) = |
√ |
|
|
|||
2 |
||||||
2π |
||||||
Instead of calculating f (z), most people |
look at |
the area under the normal distri- |
bution curve. This is proportional to the probability that a measurement is between certain limits. For example the probability that a measurement is between one and two
APPENDICES |
421 |
|
|
standard deviations can be calculated by taking the proportion of the overall area for which 1 ≤ z ≤ 2.
These numbers can be obtained using simple functions, e.g. in a spreadsheet, but are often conventionally presented in tabular form. There are a surprisingly large number of types of tables, but Table A.1 allows the reader to calculate relevant information. This table is of the cumulative normal distribution, and represents the area to the left of the curve for a specified number of standard deviations from the mean. The number of standard deviations equals the sum of the left-hand column and the top row, so, for example, the area for 1.17 standard deviations equals 0.879 00.
Using this table it is then possible to determine the probability of a measurement between any specific limits.
•The probability that a measurement is above 1 standard deviation from the mean is equal to 1 − 0.841 34 = 0.158 66.
•The probability that a measurement is more than 1 standard deviation from the mean will be twice this, because both positive and negative deviations are possible and
the curve is symmetrical, and is equal to 0.317 32. Put another way, around a third of all measurements will fall outside 1 standard deviation from the mean.
•The probability that a measurement falls between −2 and +1 standard deviations from the mean can be calculated as follows:
—the probability that a measurement falls between 0 and −2 standard deviations is the same as the probability it falls between 0 and +2 standard deviations and is equal to 0.977 25 − 0.5 = 0.477 25;
—the probability that a measurement falls between 0 and +1 standard deviations is equal to 0.841 34 − 0.5 = 0.341 34;
—therefore the total probability is 0.477 25 + 0.341 34 = 0.818 59.
The normal distribution curve is not only a probability distribution but is also used to describe peakshapes in spectroscopy and chromatography.
A.3.3 F Distribution
The F -test is normally used to compare two variances or errors and ask either whether one variance is significantly greater than the other (one-tailed) or whether it differs significantly (two-tailed). In this book we use only the one-tailed F -test, mainly to see whether one error (e.g. lack-of-fit) is significantly greater than a second one (e.g. experimental or analytical).
The F statistic is the ratio between these two variances, normally presented as a number greater than 1, i.e. the largest over the smallest. The F distribution depends on the number of degrees of freedom of each variable, so, if the highest variance is obtained from 10 samples, and the lowest from seven samples, the two variables have nine and six degrees of freedom, respectively. The F distribution differs according to the number of degrees of freedom, and it would be theoretically possible to produce an F distribution table for every possible combination of degrees of freedom, similar to the normal distribution table. However, this would mean an enormous number of tables (in theory an infinite number), and it is more usual simply to calculate the F statistic at certain well defined probability levels.
A one-tailed F statistic at the 1 % probability level is the value of the F ratio above which only 1 % of measurements would fall if the two variances were not significantly
