Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Chau Chemometrics From Basics to Wavelet Transform

.pdf
Скачиваний:
119
Добавлен:
15.08.2013
Размер:
2.71 Mб
Скачать

digital smoothing and filtering methods

29

%The parameter win_num is the window size which can be chosen %to have a value of 7 to 17, say 7 9 11 13 15 17;

%The parameter poly_order is the polynomiar order which %can be chosen to have a value of 2 or 3, and 4 or 5.

[m1,n1]=size(x);

y=zeros(size(x)); if win_num==7

if poly_order==2 | poly_order==3 coef1=[-2 3 6 7 6 3 -2]/21;

for j=1:n1 for i=4:m1-3

y(i,j)=coef1(1) x(i-3,j)+coef1(2) x(i-2,j)+coef1(3) x(i-1,j)+ . . .

coef1(4) x(i,j)+coef1(5) x(i+1,j)+coef1(6) x(i+2,j)+ . . .

coef1(7) x(i+3,j);

end end else

coef1=[5 -30 75 131 75 -30 5]/231; for j=1:n1

for i=4:m1-3

y(i,j)=coef1(1) x(i-3,j)+coef1(2) x(i-2,j)+coef1(3) x(i-1,j)+ . . .

coef1(4) x(i,j)+coef1(5) x(i+1,j)+coef1(6) x(i+2,j)+ . . .

coef1(7) x(i+3,j);

end end end

elseif win_num==9

if poly_order==2|poly_order==3 coef1=[-21 14 39 54 59 54 39 14 -21]/231; for j=1:n1

for i=5:m1-4

y(i,j)=coef1(1) x(i-4,j)+coef1(2) x(i-3,j)+coef1(3) x(i-2,j)+ . . .

coef1(4) x(i-1,j)+coef1(5) x(i,j)+coef1(6) x(i+1,j)+ . . .

coef1(7) x(i+2,j)+coef1(8) x(i+3,j)+coef1(9) x(i+4,j);

end end else

coef1=[15 -55 30 135 179 135 30 -55 15]/429; for j=1:n1

for i=5:m1-4

y(i,j)=coef1(1) x(i-4,j)+coef1(2) x(i-3,j)+coef1(3) x(i-2,j) . . .

+coef1(4) x(i-1,j)+coef1(5) x(i,j)+coef1(6) x(i+1,j) . . .

+coef1(7) x(i+2,j)+coef1(8) x(i+3,j)+coef1(9) x(i+4,j);

end end

end

elseif win_num==11

if poly_order==2|poly_order==3

coef1=[-36 9 44 69 84 89 84 69 44 9 -36]/429;

30 one-dimensional signal processing techniques in chemistry

for j=1:n1 for i=6:m1-5

y(i,j)=coef1(1) x(i-5,j)+coef1(2) x(i-4,j)+coef1(3) x(i-3,j) . . .

+coef1(4) x(i-2,j)+coef1(5) x(i-1,j)+coef1(6) x(i,j)+ . . .

coef1(7) x(i+1,j)+coef1(8) x(i+2,j)+coef1(9) x(i+3,j) . . .

+coef1(10) x(i+4,j)+coef1(11) x(i+5,j);

end end else

coef1=[18 -45 -10 60 120 143 120 60 -10 -45 18]/429; for j=1:n1

for i=6:m1-5

y(i,j)=coef1(1) x(i-5,j)+coef1(2) x(i-4,j)+coef1(3) x(i-3,j) . . .

+coef1(4) x(i-2,j)+coef1(5) x(i-1,j)+coef1(6) x(i,j)+ . . .

coef1(7) x(i+1,j)+coef1(8) x(i+2,j)+coef1(9) x(i+3,j) . . .

+coef1(10) x(i+4,j)+coef1(11) x(i+5,j);

end end end

elseif win_num==13

if poly_order==2|poly_order==3

coef1=[-11 0 9 16 21 24 25 24 21 16 9 0 -11]/143; for j=1:n1

for i=7:m1-6

y(i,j)=coef1(1) x(i-6,j)+coef1(2) x(i-5,j)+coef1(3) x(i-4,j) . . .

+coef1(4) x(i-3,j)+coef1(5) x(i-2,j) . . .

+coef1(6) x(i-1,j)+coef1(7) x(i,j)+coef1(8) x(i+1,j)+ . . .

coef1(9) x(i+2,j)+coef1(10) x(i+3,j)+coef1(11) x(i+4,j) . . .

+coef1(12) x(i+5,j)+coef1(13) x(i+6,j);

end end else

coef1=[110 -198 -135 110 390 600 677 600 390 110 -135 -198 110]/2431;

for j=1:n1 for i=7:m1-6

y(i,j)=coef1(1) x(i-6,j)+coef1(2) x(i-5,j)+coef1(3) x(i-4,j) . . .

+coef1(4) x(i-3,j)+coef1(5) x(i-2,j)+coef1(6) x(i-1,j) . . .

+coef1(7) x(i,j)+coef1(8) x(i+1,j)+ . . .

coef1(9) x(i+2,j)+coef1(10) x(i+3,j)+coef1(11) x(i+4,j) . . .

+coef1(12) x(i+5,j)+coef1(13) x(i+6,j);

end end end

elseif win_num==15

if poly_order==2|poly_order==3

coef1=[-78 -13 42 87 122 147 162 167 162 147 122 87 42 -13 -78]/1105; for j=1:n1

for i=8:m1-7

y(i,j)=coef1(1) x(i-7,j)+coef1(2) x(i-6,j)+coef1(3) x(i-5,j) . . .

digital smoothing and filtering methods

31

+coef1(4) x(i-4,j)+coef1(5) x(i-3,j)+coef1(6) x(i-2,j) . . .

+coef1(7) x(i-1,j)+coef1(8) x(i,j)+coef1(9) x(i+1,j) . . .

+coef1(10) x(i+2,j)+coef1(11) x(i+3,j)+coef1(12) x(i+4,j) . . .

+coef1(13) x(i+5,j)+coef1(14) x(i+6,j)+coef1(15) x(i+7,j);

end end else

coef1=[2145 -2860 -2937 -165 3755 7500 10125 11063 10125 7500 3755 -165 -2937 -2860 2145]/46189;

for j=1:n1 for i=8:m1-7

y(i,j)=coef1(1) x(i-7,j)+coef1(2) x(i-6,j)+coef1(3) x(i-5,j) . . .

+coef1(4) x(i-4,j)+coef1(5) x(i-3,j) . . .

+coef1(6) x(i-2,j) . . .

+coef1(7) x(i-1,j)+coef1(8) x(i,j)+coef1(9) x(i+1,j) . . .

+coef1(10) x(i+2,j)+coef1(11) x(i+3,j)+coef1(12) x(i+4,j) . . .

+coef1(13) x(i+5,j)+coef1(14) x(i+6,j)+coef1(15) x(i+7,j);

end end end

elseif win_num==17

if poly_order==2|poly_order==3

coef1=[-21 -6 7 18 27 34 39 42 43 42 39 34 27 18 7 -6 -21]/323; for j=1:n1

for i=9:m1-8

y(i,j)=coef1(1) x(i-8,j)+coef1(2) x(i-7,j)+coef1(3) x(i-6,j) . . .

+coef1(4) x(i-5,j)+coef1(5) x(i-4,j) . . .

+coef1(6) x(i-3,j)+coef1(7) x(i-2,j)+coef1(8) x(i-1,j) . . .

+coef1(9) x(i,j)+coef1(10) x(i+1,j)+coef1(11) x(i+2,j) . . .

+coef1(14) x(i+5,j)+coef1(12) x(i+3,j)+coef1(13) x(i+4,j) . . .

+coef1(15) x(i+6,j)+coef1(16) x(i+7,j)+coef1(17) x(i+8,j);

end end else

coef1=[195 -195 -260 -117 135 415 660 825 883 825 660 415 135 -117 -260 -195 195]/4199;

for j=1:n1 for i=9:m1-8

y(i,j)=coef1(1) x(i-8,j)+coef1(2) x(i-7,j)+coef1(3) x(i-6,j) . . .

+coef1(4) x(i-5,j)+coef1(5) x(i-4,j)+coef1(6) x(i-3,j) . . .

+coef1(7) x(i-2,j)+coef1(8) x(i-1,j)+coef1(9) x(i,j)+ . . .

coef1(10) x(i+1,j)+coef1(11) x(i+2,j)+coef1(12) x(i+3,j) . . .

+coef1(13) x(i+4,j)+coef1(14) x(i+5,j) . . .

+coef1(15) x(i+6,j)+coef1(16) x(i+7,j)+coef1(17) x(i+8,j);

end end end end

32 one-dimensional signal processing techniques in chemistry

Table 2.2. Weights of Savitsky--Golay Filter for Smoothing Based on a Quadratic/Cubic Polynomial

Points

25

23

21

19

17

15

13

11

9

7

 

 

 

 

 

 

 

 

 

 

 

 

−12

1, 265

 

 

 

 

 

 

 

 

 

 

−11

−345

95

 

 

 

 

 

 

 

 

 

−10

−1, 122

−38

11,

628

 

 

 

 

 

 

 

−9

−1, 255

−95

−6,

460

340

 

 

 

 

 

 

−8

−915

−95 −13,

005

−255

195

 

 

 

 

 

−7

−255

−55 −11,

220

−420

−195

2, 145

 

 

 

 

−6

590

10

−3,

940

−290

−260 −2, 860

110

 

 

 

−5

1, 503

87

6,

378

18

−117 −2, 937

−198

18

 

 

−4

2, 385

165

17,

655

405

135

−165

−135

−45

15

 

−3

3, 155

235

28,

190

790

415

3, 755

110

−10 −55

5

−2

3, 750

290

36,

660

1, 110

660

7, 500

390

60

30

−30

−1

4, 125

325

42,

120

1, 320

825

10, 125

600

120

135

75

0

4, 253

−339

44,

003

1, 393

883

11, 063

677

143

179

131

1

4, 125

325

42,

120

1, 320

825

10, 125

600

120

135

75

2

3, 750

290

36,

660

1, 110

660

7, 500

390

60

30

−30

3

3, 155

235

28,

190

790

415

3, 755

110

−10 −55

5

4

2, 385

165

17,

655

405

135

−165

−135

−45

15

 

5

1, 503

87

6,

378

18

−117 −2, 937

−198

18

 

 

6

590

10

−3,

940

−290

−260 −2, 860

110

 

 

 

7

−255

−55 −11,

220

−420

−195

2, 145

 

 

 

 

8

−915

−95 −13,

005

−255

195

 

 

 

 

 

9

−1, 255

−95

−6,

460

340

 

 

 

 

 

 

10

−1, 122

−38

11,

628

 

 

 

 

 

 

 

11

−345

95

 

 

 

 

 

 

 

 

 

12

1, 265

 

 

 

 

 

 

 

 

 

 

 

30, 015

6, 555

260,

015

7, 429

4, 199

46, 189

2, 431

429

429

231

2.1.3. Kalman Filtering

Kalman filtering is a kind of optimal linear recursive estimation method. Its operation speed is very high, and relatively small memory space is required for computation. Kalman filtering has been extensively used in engineering, especially in space technology. Recursive operation is the key feature of the method. Here we will first introduce what recursive operation is before discussing Kalman filtering in detail.

The basic idea of recursive operation is its efficient use of the results obtained previously and also the newly acquired information so as to avoid unnecessary repeated calculation. Let us first have a look at the basic feature of the recursive operation through a simple example. The mean

digital smoothing and filtering methods

33

signal intensity

10

x 10-3

Smoothing with window size=7

 

 

x 10-3

Smoothing with window size=11

 

 

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

solid line: original signal

 

(a)

 

 

 

 

 

 

 

(b)

 

8

red dashed line: smoothed

 

 

 

 

8

 

 

 

 

 

 

 

cross line: noisy signal

 

 

 

 

 

 

 

 

 

 

 

6

 

 

 

 

 

 

 

intensity

6

 

 

 

 

 

 

4

 

 

 

 

 

 

 

4

 

 

 

 

 

 

2

 

 

 

 

 

 

 

signal

2

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

-2

 

 

 

 

 

 

 

 

-2

 

 

 

 

 

 

20

40

60

80

100

120

140

 

20

40

60

80

100

120

140

 

 

 

 

signal point

 

 

 

 

 

 

signal point

 

 

x 10-3 Smoothing with window size=13

x 10-3 Smoothing with window size=17

 

10

 

 

 

 

 

 

 

10

 

 

 

 

 

 

 

 

 

 

 

 

(c)

 

 

 

 

 

 

 

(d)

 

 

8

 

 

 

 

 

 

 

8

 

 

 

 

 

 

intensity

6

 

 

 

 

 

 

intensity

6

 

 

 

 

 

 

4

 

 

 

 

 

 

4

 

 

 

 

 

 

signal

2

 

 

 

 

 

 

signal

2

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

-2

 

 

 

 

 

 

 

-2

 

 

 

 

 

 

 

20

40

60

80

100

120

140

 

20

40

60

80

100

120

140

 

 

 

 

signal point

 

 

 

 

 

 

signal point

 

 

Figure 2.3. Smoothing results obtained by the Savitsky--Golay filter with different window sizes. They are depicted by four plots with the original curve (solid line), the raw noisy signals (cross line), and the smoothed curve (dashed line) with window size of 7 (a), 11 (b), 13 (c), and 17 (d).

value is usually evaluated using the following formula

x¯ =

xi

(2.11)

n

where xi denotes the sum of n observations, say xi (i = 1, . . . , n). When one measures a new xi (i = n + 1), one has to calculate the mean again using Equation (2.11). Hence, all the n observations obtained before should be stored in the computer for future use. However, for recursive operation, a new mean can be evaluated through the following formula without using all the observations:

x

x

xn+1 x¯n

(2.12)

 

¯n+1

= ¯n +

n

+

1

 

 

 

 

 

 

Comparing this formula with Equation (2.11), one can obviously see that the recursive operation is faster and more efficient, and this is the attractive feature of Kalman filtering.

34 one-dimensional signal processing techniques in chemistry

Kalman filter is based on a dynamic system model

x(k ) = F(k , k − 1)x(k − 1) + w(k )

(2.13)

and a measurement model

y (k ) = h(k )t x(k − 1) + e(k )

(2.14)

where x(k ), y (k ), and h(k ) denote the state vector, the measurement, and the measurement function vector, respectively. The variable k represents a measurement point that can be time, wavelength, or other. It should be noted that F(k , k − 1) is the system transition matrix which represents how the system transits from state (k − 1) to state k . Very often, it is an identity matrix for smoothing purposes. w(k ) denotes the dynamic system noise, and could be a zero vector approximately because the smoothing filter can be regarded as a static model. e(k ) is the measurement noise, which can be a stochastic variable with zero mean and constant variance obeying the Gaussian distribution.

The core recursive state estimate update in Kalman filtering is given by the following equation

x(k ) = x(k − 1) + g(k )[ y (k ) − h(k )t x(k − 1)]

(2.15)

where the vector g(k ) is called Kalman gain. Comparing this equation with Equation (2.12), one can easily see the similarity between the two. The Kalman gain, g(k ), corresponds to 1/(n + 1) in Equation (2.12) and is used to adjust the difference between the state vectors x(k ) and x(k − 1) through the term of measurement difference, of [y (k ) − h(k )t x(k − 1)]. Through Equation (2.15), one can also see that the state estimate update is just based on the newly measured y (k ) and the state vector x(k − 1) obtained before. Equation (2.15) makes the efficient usage of recursive operation possible.

The Kalman gain can be determined by the following formula

g(k ) = P(k − 1)h(k )[h(k )t P(k − 1)h(k ) + r (k )]−1

(2.16)

where r (k ) represents the variance of the measurement noise e(k ). P(k − 1) is the covariance matrix of the system estimated from the (k − 1) observations obtained before through

P(k ) = [I g(k − 1)h(k )t ]P(k − 1)[I g(k − 1)h(k )t ]

 

+ g(k − 1)r (k )g(k − 1)t

(2.17)

where I is an identity matrix.

digital smoothing and filtering methods

35

From the discussion above, it can be seen that the Kalman gain vector can be deduced through Equation (2.16) if the initial values of x(k ) and P(k ), say, x(0) and P(0), are known. Then, the next x(k ) and P(k ) can be computed through Equations (2.15) and (2.17) until convergence is attained.

In summary, the procedure of Kalman filtering can be carried out via the following steps:

1. Setting the initial values:

x(0) = 0, P(0) = γ 2I

(2.18)

where γ 2 is an initial estimation of variance of measurement noises that might be given by the following empirical formula

γ 2

= a

r (1)

(2.19)

[h(1)t h(1)]1/2

The factor a can influence the calculation accuracy and can have values from 10 to 100. It is worthwhile to note that the initial value of P(0) is crucial for the estimation. If its value is too small, it can result in bias estimation. Yet, if its value is too high, it is difficult to have the computation converging to the desired value.

2. Recursive calculation loop:

g(k ) = P(k − 1)h(k )[h(k )t P(k − 1)h(k ) + r (k )]−1 x(k ) = x(k − 1) + g(k )[ y (k ) − h(k )t x(k − 1)]

P(k ) = [I g(k − 1)h(k )t ]P(k − 1)[I g(k − 1)h(k )t ] + g(k − 1)r (k )g(k − 1)t

where r (k ) is the variance of measurement noises that can be determined by the variance of real noise. This loop procedure is repeated until the estimates become stable.

In Kalman filtering algorithm, the innovative series is very important and might provide information about whether the results obtained are reliable. The innovative series can be obtained by the following equation:

v (k ) = y (k ) − h(k )t x(k − 1)

(2.20)

In fact, the series is the difference between the measurement and estimation and can be regarded as a residual at the k point. The innovative series should be a white noise with zero mean if the filtering model used is correct. Otherwise, the results obtained are not reliable.

36 one-dimensional signal processing techniques in chemistry

Kalman filtering can be applied for filtering, smoothing, and prediction. The most common application is known in multicomponent analysis.

2.1.4. Spline Smoothing

In addition to the smoothing methods based on digital filters as discussed previously, the other widely used one in signal processing is spline functions. The main advantage of spline functions is their differentiability in the entire measurement domain.

Among various spline functions, the cubic spline function is the most common one and is defined as follows

y = S(x ) = Ak (x xk )3 + Bk (x xk )2 + Ck (x xk ) + Dk

(2.21)

where Ak , Bk , Ck , and Dk are the spline coefficients at data point k . The cubic spline function S(x ) or y for observations on the abscissa intervals x1 < x2 < · · · < xn satisfies the following conditions:

1.The intervals are called knots. The knots may be identical with the index points on the x axis (abscissa).

2.Within the knots k , S(x ) obeys the continuity constraint on the function and on its twofold derivatives.

3.S(x ) is a cubic function in each subrange [xk , xk −1] for k = 1, . . . , n−1 considered.

4.Outside the range from x1 to xk , S(x ) is a straight line.

For a fixed interval between the data points xk and xk −1, the following relationships are valid for the signal values and their derivatives:

yk = Dk

yk +1 = Ak (x xk )3 + Bk (x xk )2 + Ck (x xk ) yk = S (xk ) = Ck

yk +1 = 3Ak (x xk )2 + 2Bk (x xk ) + Ck yk = S (xk ) = 2Bk

yk +1 = 6Ak (x xk ) + 2Bk

The spline coefficients can be determined by a method that also smoothes the data under study at the same time. The ordinate values yˆk are calculated such that the differences of the observed values are positive

digital smoothing and filtering methods

37

proportional jumps rk in their third derivative at point xk :

 

rk

= S (xk ) − S (xk +1)

(2.22)

rk

= pk (yk yˆk )

(2.23)

The proportionality factors pk are determined by cross-validation. In contrast with polynomials, spline functions may be applied to approximate and smooth any kind of curve shape. It should be mentioned that many more coefficients must be estimated and stored in comparison with the polynomial filters because different coefficients apply in each interval. A disadvantage is valid for smoothing splines where the parameter estimates are biased. Therefore, it is more difficult to describe the statistical properties of spline functions than those of linear regression.

In MATLAB, there is a cubic spline function, named csaps. csaps(X , Y , p, X ), which returns a smoothed version of the input data (X , Y ) by cubic smoothing spline, and the result depends on the value of the smoothing parameter p (from 0 to 1). For p = 0, the smoothing spline corresponds to the least-squares straight-line fit to the data, while at the other extreme, with p = 1, it is the ‘‘natural’’ or variational cubic spline interpolation. The transition region between these two extremes is usually only a rather small range of values for p and its optimal value strongly depends on the nature of the data. Figure 2.4 shows an example of smoothing by a cubic spline smoother with different p values. From the plots as given in the figure, one can see that the choice of the right value for parameter p is crucial. The smoothing results are satisfactory if one makes a good choice as depicted in Figure 2.4c. In order to make it easier for the readers to understand the smoothing procedure using the cubic spline smoother, a MATLAB source code is given in the following frame:

xi=[0:.05:1.5];

yi=cos(xi)

ybad=yi+.2 (rand(size(xi))-.5); figure(2)

subplot(221),plot(xi,yi,‘k:’,xi,ybad,‘kx’),grid on title(‘Original curve: dashed line; Noisey data: cross’) axis([0 1.5 0 1.2])

xlabel(‘Varibale (x)’) ylabel(‘Signal, (y)’)

yy1=csaps(xi,ybad,.9981,xi); subplot(222),plot(xi,yi,‘k:’,xi,ybad,‘kx’,xi,yy1,‘k’), grid on title(‘Smoothed curve: solid line with p=9981’)

axis([0 1.5 0 1.2])

38 one-dimensional signal processing techniques in chemistry

xlabel(‘Varibale (x)’) ylabel(‘Signal, (y)’)

yy2=csaps(xi,ybad,.9756,xi); subplot(223),plot(xi,yi,‘k:’,xi,ybad,‘kx’,xi,yy2,‘k’), grid on title(‘Smoothed curve: solid line with p=9756’)

axis([0 1.5 0 1.2]) xlabel(‘Varibale (x)’) ylabel(‘Signal, (y)’) yy3=csaps(xi,ybad,.7856,xi);

subplot(224),plot(xi,yi,‘k:’,xi,ybad,‘kx’,xi,yy3,‘k’), grid on title(‘Smoothed curve: solid line with p=7856’)

axis([0 1.5 0 1.2]) xlabel(‘Varibale (x)’) ylabel(‘Signal, (y)’)

Usually, it is difficult to choose the best value for the parameter p without experimentation. If one has difficulty in doing this but has an idea of the noise level in Y , the MATLAB command spaps(X , Y , tol) may help. Select

Original curve: dashed line; Noisy data: cross

Smoothed curve: solid line with p=0.9981

 

 

 

 

(a)

 

 

1

 

 

 

1

(y)

0.8

 

 

(y)

0.8

 

 

 

 

Signal,

0.6

 

 

Signal,

0.6

0.4

 

 

0.4

 

 

 

 

 

0.2

 

 

 

0.2

 

0

0.5

1

1.5

0

 

0

 

 

 

 

(b)

0

0.5

1

1.5

Variable (x)

Variable (x)

Smoothed curve: solid line with p=0.9756

Smoothed curve: solid line with p=0.7856

 

 

 

 

(c)

 

 

1

 

 

 

1

(y)

0.8

 

 

(y)

0.8

 

 

 

 

Signal,

0.6

 

 

Signal,

0.6

0.4

 

 

0.4

 

 

 

 

 

0.2

 

 

 

0.2

 

0

0.5

1

1.5

0

 

0

 

 

 

 

(d)

0

0.5

1

1.5

Variable (x)

Variable (x)

Figure 2.4. Smoothing results obtained by a cubic spline smoother with different values of the parameter p: (a) the original curve and the raw noisy signals; (b) the smoothed curve with p = 0.9981; (c) the smoothed curve with p = 0.9756; (d) the smoothed curve with p = 0.7856.

Соседние файлы в предмете Химия