
Chau Chemometrics From Basics to Wavelet Transform
.pdf
baseline/background removal |
189 |
5.Perform forward WT to obtain the wavelet coefficients.
6.Display Figure 5.25.
7.Set approximation coefficients with zeros, and construct the signal by applying inverse WT.
8.Display Figure 5.24
Method B:
1.Load experimental spectrum (Fig. 5.24a).
2.Extend the spectral data for avoiding the edge effect.
3.Make a wavelet filter---Daubechies4.
4.Set resolution level J = 5.
5.Perform forward WT to obtain the c and d components with the improved algorithm.
6.Display Figure 5.26.
7.Subtract c4 from the experimental spectrum.
8.Convert the subtracted result to k space.
9.Display Figure 5.27, curve (a).
As mentioned above, the aim of analyzing the EXAFS spectrum is to obtain the structural parameters such as N and r in Equation (5.47). In order to obtain the structural parameters from the EXAFS oscillation, Fourier filtering and least-square fitting can be performed. Figure 5.27, curve (b) shows the filtered results for the first coordination shell from the EXAFS signals in Figure 5.27, curve (a), and Table 5.3 compares the structural parameters obtained by least-square fitting of three Cu samples. The last column in the table shows a comparison of the fitted errors, which is
Figure 5.26. Plots of the approximations obtained by applying WT to the experimental EXAFS spectrum with the improved algorithm.

190 |
application of wavelet transform in chemistry |
Figure 5.27. Background-removed results by conventional (solid line) and WT (dotted line) methods (a), and their filtered results (b).
calculated by
|
|
i |
|
|
1 |
N |
|
E = |
N |
(Xcali − Xexpi )2 |
(5.48) |
|
|
= |
|
|
|
1 |
|
where N is the number of points in the spectra and Xcali and Xexpi are, respectively, the fitted and the experimental values. For the sake of com-
parison between the two methods, both of the experimental and the fitted spectra were normalized when calculating the fitted error.
From Figure 5.27, curve (b), it is clear that the result by WT method is superior to that of the cubic spline method. From Table 5.3 it can be seen that, except for the coordination distance r , which is very close to the results of the two methods, all the other three parameters obtained by the wavelet transform method are larger than the results of cubic spline method. But they are reasonable. The fitted errors are also improved by the WT method. Table 5.3 also shows that the reproducibility of the three results
Table 5.3. Comparison of Structural Parameters and Fitted Errors Obtained by Least-Squares-Fitting from Background-Removed Spectra with WT and Spline Method Respectively
Spectrum |
Method |
N |
r |
σ |
λ |
Fitted Error |
|
|
|
|
|
|
|
I |
WT |
12 |
2.50 |
0.112 |
6.5 |
0.0026 |
|
Spline |
8 |
2.52 |
0.077 |
4.4 |
0.0074 |
II |
WT |
12 |
2.49 |
0.109 |
6.3 |
0.0029 |
|
Spline |
9 |
2.53 |
0.082 |
4.5 |
0.0027 |
III |
WT |
12 |
2.51 |
0.113 |
6.3 |
0.0011 |
|
Spline |
9 |
2.52 |
0.084 |
4.5 |
0.0056 |
|
|
|
|
|
|
|

baseline/background removal |
191 |
obtained by wavelet transform method is obviously superior to that of the cubic spline method. The reason for this is that the cubic spline method performs the background removal according to the points selected by the operator. However, there is no operator interference in the WT method.
5.3.3. Baseline Correction
Baseline drift is caused mainly by continuous variations of experiment conditions, such as temperature, solvent programming in liquid chromatography, or temperature programming in gas chromatography. Therefore, baseline drift is a very common problem in chromatographic studies.
Figure 5.28 shows an example of the separation of the drifting baseline from a chromatogram with gradient elution by method B. Curve (a) is the experimental chromatogram. From the figure, it can be seen that there is a strong baseline drift caused by the gradient elution in the chromatogram. Curve (b) is the 8th-scale discrete approximation c8 decomposed by WT with Symmlet (S5) wavelet. Apparently, it resembles the baseline. Figure 5.29 shows the result obtained by subtracting curve (b) from curve (a) of Figure 5.28 with a factor f of 0.93. From the result, it is clear that the removal of baseline by this method is complete and satisfactory.
5.3.4. Background Removal Using Continuous
Wavelet Transform
As stated in Chapter 4, the CWT of a signal s(t ) with an analyzing wavelet ψ(t ) is the convolution of s(t ) with a scaled and conjugated wavelet
Figure 5.28. An experimental chromatogram (a) and its eighth discrete approximation obtained by WT decomposition (b).

192 |
application of wavelet transform in chemistry |
Figure 5.29. Baseline-corrected chromatogram obtained by subtracting the eigth discrete approximation from the experimental chromatogram.
ψa(t ) = ψa(− t ):
Wf (a, b) = ψa s(b) =
In Fourier domain, the equation
Wf (a, b) = 1
2π
√|a| −∞ |
|
|
|
|
a |
|
|
|
|||||||
1 |
|
|
+∞ |
ψ |
t − b |
|
f (t )dt |
(5.49) |
|||||||
takes the form |
|
|
|
|
|
|
|
||||||||
|
|
+∞ |
|
|
|
|
ω |
|
i ωb |
|
|
ω |
|
||
|
ψ |
ω |
)sˆ( |
)e |
d |
(5.50) |
|||||||||
|
|
|
ˆ (a |
|
|
|
|
|
−∞
where |
ˆ |
ˆ |
|
and the signal |
and s are the Fourier transforms of the wavelet |
ψ |
|||
|
ψ |
|
|
s, respectively. Equations (5.49) and (5.50) show clearly that the wavelet analysis is a time--frequency analysis, or, more properly, a timescale analysis because the scale parameter a behaves as the inverse of a frequency. In particular, Equation (5.50) shows that the CWT of a signal is a filter with a constant relative bandwidth ω/ω. Therefore, the CWT should be used for separating the smooth background and the sharp peaks. In the following paragraphs, a method for removal of large spectral line from NMR spectrum is introduced.
Let s(t ) be a signal of the form
l |
|
N |
|
s(t ) = sl (t ) |
(5.51) |
=1
where sl (t ) = Al (t ) exp (i ωl t ) is the l th spectral line, which has a constant frequency fl = ωl /2π, and N is the number of the spectral lines. Its CWT
|
baseline/background removal |
|
|
|
193 |
|||||||||||||
is given by |
|
|
|
|
|
|
|
l |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
Wf (a, b) = |
N |
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
= |
Wfl (a, b) |
|
|
|
|
(5.52) |
||||||||
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
and the Wfl is |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Wfl (a, b) = |
1 |
|
+∞ ψ |
ω A |
ω |
− |
ω |
|
)ei ωb d ω |
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|||||||||||
|
2π −∞ |
|
ˆ (a |
) |
ˆl ( |
|
l |
|
|
|
|
|
|
|||||
|
1 |
|
i ωl b |
+∞ |
(a( |
|
|
|
|
A |
(ω)e |
i ωb |
d |
|
(5.53) |
|||
|
|
|
|
|
ψ |
ω |
|
ω |
|
ω |
||||||||
= 2π e |
|
|
+ |
|
|
|||||||||||||
|
−∞ |
ˆ |
|
|
l )) ˆl |
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Using the Taylor expansion of the Fourier transform of the analyzing
wavelet |
ˆ |
around the pulsation |
ω |
l |
|
ψ |
|
|
|
|
|
|
|
|
|
|
|
|
! |
k |
|
|
k ψ |
|
|
|
ˆ |
(a( |
|
+ |
|
l )) = ˆ |
(a |
|
l ) + |
(aω) |
|
|
d |
ˆ |
|
l ) |
||
|
|
|
k |
|
|
|
d ωk (a |
ω |
|||||||||
ψ |
|
ω |
|
ω |
ψ |
|
ω |
|
|
|
|
|
|
|
|
|
k
we obtain the following expansion for Wfl :
Wfl (a, b) = |
ˆ |
(a |
ω |
)sl (b) + e |
i ωl b |
|
ψ |
|
|
|
Therefore, we have
Wfl (a, b)
and
Wf (a, b) ≈
|
(− ia) |
k |
d |
k ψ |
|
d |
k |
Al |
|
|||
|
|
|
ˆ |
(aωl ) |
|
(b) |
||||||
≥ |
|
|
|
|
|
|
|
|
|
|||
|
k |
! |
|
|
d ωk |
|
dbk |
|||||
k |
1 |
|
|
|
|
|
|
|
|
|
|
|
≈ˆ (a )s (b)
ψωl l
N ˆ (a )s (b)
ψ ωl l
l =1
(5.54)
(5.55)
(5.56)
(5.57)
If the values of frequency ωl are sufficiently far away from each other, the
factor ˆ (a ) will allow us to treat each spectral line independently. In this
ψ ω
case, the contribution of the l th spectral line to the Wf (a, b) is localized on the scale al = ω0/ωl , where ω0 is the frequency of the analyzing wavelet. Therefore, we have
|
Wf (ω0/ωl , b) |
≈ sl (b) |
(5.58) |
||
|
|||||
|
ˆ ( |
ω |
0) |
|
|
|
ψ |
|
|
|
Using this equation, we can easily separate the large spectral line and small peaks. However, in many cases, especially when the frequency of
194 |
application of wavelet transform in chemistry |
the each component are close to each other, we cannot obtain satisfactory results using this equation because
Wf (al , b) |
≈ sl (b) + Wf (al , b) |
(5.59) |
||
|
||||
ˆ ( |
ω |
0) |
|
|
ψ |
|
|
|
the second term in the equation is a sum over the other spectral lines, with the amplitudes attenuated by the exponential factor.
Therefore, in practical applications, we can define
(k ) |
|
Wf (al |
, b) |
||
sl |
(b) = |
|
|
|
(5.60) |
ˆ ( |
ω |
0) |
|||
|
|
ψ |
|
|
and iterate the procedure with sl(k −1)(b) as the new input signal. After a certain number of iterations, the second term in Equation (5.59) will become negligible.
Example 5.9: Large Spectral Line Removal of a Simulated NMR Spectrum. The NMR signal in Fourier domain can be simulated by
sl (t ) = Al (t ) exp (i ωl t ) = Al exp (− dl t ) exp (i ωl t ) |
(5.61) |
Figure 5.30 shows three simulated signals s1(t ), s2(t ), and s(t ) = s1(t ) + s2(t ) and their Fourier transforms, where s1(t ) and s2(t ) were simulated by
s |
1 |
(t ) |
= |
1.0 exp |
|
−t |
|
exp (i 0.2πt ) |
(5.62) |
|||
200 |
||||||||||||
|
|
|
|
|
|
|
||||||
s |
2 |
(t ) |
= |
10.0 exp |
−t |
|
exp (i 0.19πt ) |
(5.63) |
||||
|
|
|||||||||||
|
|
|
100 |
|
|
where t is sampled by 2048 data points. It can be seen that the large spectral line s1(t ) can be viewed as the baseline or the background of the small peak s2(t ).
In order to separate the signals s1(t ) and s2(t ) from the mixed signal, we can use the iterative procedure described by Equation (5.60). Figure 5.31 shows the results at the number of the iteration k = 100, 200, and 300. In the calculation, Morlet wavelet, which is defined by
ψ(t ) |
= ei ω0t e−t 2/2σ02 |
(5.64) |
||||
ψ(ω) |
|
√ |
|
|
e−(ω−ω0)2σ02/2 |
(5.65) |
= |
2πσ |
|||||
ˆ |
0 |
|
|
and σ0 = 1, ω0 = 5 were adopted. Because the aim is to remove the s1(t ) and extract s2(t ), al = ω0/ω1 = 5/0.2π was used. It can be seen that, after 500 iterations, the large spectral line is completely removed.

baseline/background removal |
195 |
Figure 5.30. Simulated NMR signals in Fourier domain (left) and the plots of their Fourier transform (right).
Figure 5.31. Plots of the extracted NMR signals (left) and their Fourier transform at k = 100 (a), 300 (b), and 500 (c) (right).

196 |
application of wavelet transform in chemistry |
Computational Details of Example 5.9
1.Generate the mixed signal in Figure 5.30 with ω1 = 0.2π and ω2 = 0.19π.
2.Set the scale parameter for the CWT: a = ω0/ω1, where the ω0 is 5, which is determined by wavelet function used in the CWT.
3.Perform CWT to remove the large spectral line iteratively.
4.Display Figure 5.31.
Using the same procedure, D. Barache et al. successfully subtracted a large component from an experimental NMR spectrum of polyethylene as shown in Figure 5.32a. The huge line is the peak corresponding to CH2 groups, which completely obliterates the fine details of the other peaks. After subtraction of the large peak as shown in Figure 5.32b, the small peaks become clearly identifiable.
5.3.5. Background Removal of Two-Dimensional Signals
With the development of modern instruments, more and more analytical instruments provide two-way data matrices as their measurement results. A method for background removal of 2D analytical signals was also proposed on the basis of the WT technique.
Assume a data matrix X of order n × m, in which each line represents a spectral measurement with m wavelength sampling points and each column represents a chromatographic measurement with n retention-time
Figure 5.32. An experimental NMR spectrum with large spectral line (a) and the spectrum obtained by CWT extraction (b) [copied from D. Barache et al. J. Magn. Reson. 128:1--11, (1997)].
baseline/background removal |
197 |
sampling points. The data matrix can be divided into two parts:
X = Xc + Xb |
(5.66) |
Here Xc originated from chemical components and Xb from noise and background. For the sake of convenience, we assume that the data matrix is free of noise; Xb will composed of spectral and chromatographic background. The background of 2D analytical data matrices such as the ‘‘hyphenated’’ chromatographic--spectroscopic data generally has the following properties: (1) there is no direct correlation between the chromatographic baseline drift and the spectral background, (2) there is a very similar spectral background at the two ends of a chromatographic peak, and (3) there is a similar drift of the baseline at each retention time since the scanning time for each spectrum is very short. Thus, the background matrix can be written as
Xb = t1T + 1sT |
(5.67) |
where t denotes the baseline drift in chromatographic direction, sT denotes the spectral background, and 1 and 1T denote that a vector contains only 1s. The superscript T denotes transposition.
Therefore, the spectrum at retention time i can be expressed as
xiT = xcT,i + ti 1T + sT |
(5.68) |
where xcT,i denotes a ‘‘pure’’ spectrum at retention time i and ti corresponding to the baseline drift at the retention time. In a zero-component regions, xcT,i should be a zero vector. Equation (5.68) turns into
xiT = ti 1T + sT |
(5.69) |
Equation (5.69) shows that the spectra in zero-component regions before and after elution should be similar if there is spectral background. If not, they should be flat lines. Therefore, the zero component regions may be used to detect the presence of spectral background.
According to the algorithm of WT, we can obtain approximation and detail coefficients by using Equations (5.3) and (5.4). The detail coefficients of a spectrum at retention time i on scale k can be expressed as
dk = xiT H0H1H2 · · · Gk −1 |
|
= xcT,i H0H1H2 · · · Gk −1 + ti 1T H0H1H2 · · · Gk −1 |
|
+ sT H0H1H2 · · · Gk −1 |
(5.70) |
According to the properties of the filters H and G, it is easy to deduce that the detail coefficients of a constant vector c = {c, c, . . . , c} should be

198 |
application of wavelet transform in chemistry |
|
a zero vector: |
|
|
|
d1 = cG = 0 |
(5.71) |
Using this property, Equation (5.70) can be reduced to |
|
|
|
dk = xcT,i H0H1H2 · · · Gk −1 + sT H0H1H2 · · · Gk −1 |
(5.72) |
In zero-component regions, this last equation can be further reduced into
dk = sT H0H1H2 · · · Gk −1 |
(5.73) |
Therefore, using Equations (5.72) and (5.73), we can remove both the chromatographic baseline drift and the spectral background.
For example, Figure 5.33a,b shows two simulated chromatographic peaks and two simulated spectra, which are used to simulate a theoretical HPLC-DAD data matrix (denoted as X). Figure 5.33c,d shows a simulated chromatographic baseline and a simulated spectral background, which are used to simulate an HPLC-DAD data matrix containing baseline drift and background. We can simulate two data matrices X1 and X2,
Figure 5.33. Chromatograms (a), spectra (b), chromatographic baseline (c), and spectral background (d) used in the simulation of the HPLC-DAD data matrices.