Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Chau Chemometrics From Basics to Wavelet Transform

.pdf
Скачиваний:
119
Добавлен:
15.08.2013
Размер:
2.71 Mб
Скачать

contents

xi

A.2.6.2. Eigenvalues and

 

Eigenvectors (eig)

287

A.2.7. Graphic Functions

288

INDEX

293

PREFACE

When talking about chemistry, this always leads many to think of doing wet experiments in a laboratory. This was the situation decades ago. Thanks to the development of quantum theory as well as the advancement in electronic and optical devices, chemistry is now evolving into a discipline that corporates both experimentation and modeling together. For instance, nowadays, before synthesizing a new organic compound, database searching can provide information on related reactions to assist in designing viable pathways to synthesize it. In addition, computational chemistry can help determine whether these pathways are favored from the thermodynamic point of view; and QSAR (quantitative structure--activity relationship) studies can help predict the properties of the compound of interest. Similarly, analytical measurements are no longer used only to acquire data from chemical experiments. Signal processing techniques can be used to estimate the precision of these data, extracting more information from the chemical measurements. According to M. Valcarcel, analytical chemistry is a metrological science that develops, optimizes, and applies measuring processes intended to derive both global and partial quality chemical information in order to solve the measuring problems posed.

Chemometrics with the use of statistics and related mathematical techniques forms a new area in chemistry. According to D. L. Massart, its targets are to design or select optimal measurement procedures and experiments as well as to extract a maximum of information from chemical data. With these unique features and applications, some believe that chemometrics provides an important theoretical background for analytical chemistry.

In recent years, wavelet transform (WT), a new mathematical technique, has been widely used in engineering sciences owing to its localization properties in both the frequency and time domains. It was introduced to chemistry in 1990s and has now attracted the attention of many chemists. Prior to January 2003, over 370 chemistry papers and references related to WT had been published.

Many chemists facing sophisticated practical problems are unfamiliar with the chemometric methods, especially using such new approaches as WT, available for solving their problems. They would be happy if they could

xiii

xiv

preface

find the appropriate methods they needed, but where to find them? This seems to be one of the major obstacles in the way of wide applications of chemometric methods in chemistry. The famous Chinese philosopher Guo-wei Wang (1877--1927) cited the lyrics of Song Dynasty to describe different extent of a scholar’s learning. According to Wang, the highest extent of knowledge is that described by Qi-ji Xin (1140--1207); from the tune ‘‘Green Jade Cup,’’ Lantern Festival:

But in the crowd once and again

I look for her in vain.

When all at once I turn my head,

I find her there where lantern light is dimly shed.

It is not so easy for the average chemist to reach such an extent of learning in the mathematical background of chemometric theory. Fortunately, this book on chemometrics from the basics to wavelet transform by Professor F. T. Chau’s team is written in a tutorial manner with many examples provided to clarify the theory and methods described. The basic theory of WT and its applications to analytical chemistry are described. In addition, the fundamentals of chemometrics and various common signal processing techniques are provided to help readers learn more about the applications of this new mathematical technique. The basic fundamentals of vector and matrix operations and the mathematical programming language MATLAB are also provided in the Appendix to enable newcomers to the field to derive more from the contents of this book. In addition, computer codes are provided for some topics to help the readers to see how the proposed algorithms work in real life. Relevant literature references are also listed at the end of each chapter.

It is really a great honor for me to be invited to write these lines for the book. The authors have undertaken the large task of surveying the subject to provide a valuable reference book for chemists, biochemists, and postgraduate students. In fact, even the most modern innovations of WT have found a place in this concise volume. With its own distinctiveness, this book is indeed a very welcome addition to the existing literature on chemometrics.

Professor of Chemistry

Ru-Qin Yu

Member of Chinese Academy of Sciences

 

Hunan University

 

Changsha, People’s Republic of China

 

CHAPTER

1

INTRODUCTION

1.1.MODERN ANALYTICAL CHEMISTRY

1.1.1.Developments in Modern Chemistry

The field of chemistry is currently facing major changes. As we know, optical, mechanical, and microelectronic technologies have advanced rapidly in recent years. Computer power has increased dramatically as well. All these developments, together with other factors, provide a new opportunity but also challenge to chemists in research and development.

A recent (as of 2003) development in the pharmaceutical industry is the use of combinatorial synthesis to generate a library of many compounds with structural diversity. These compounds are then subjected to high-throughput screening for bioassays. In such a process, tremendous amounts of data on the structure--activity relationship are generated. For analytical measurements, a new, advanced, modern technology called hyphenated instrumentation using two or more devices simultaneously for quantitative measurement has been introduced [1]. Examples of this technique are the high-performance liquid chromatography--diode array detector system (HPLC-DAD), gas chromatography with mass spectrometry (GC-MS), and liquid chromatography coupled with mass spectrometry such as LC-MS and LC-MS-MS. Huge amounts of data are generated from these pieces of equipment. For example, the Hewlett-Packard (HP) HPLC 1100 instrument with a diode array detector (DAD) system (Agilent Technology Inc., CA) produces 1.26 million spectrochromatographic data in a 30-min experimental run with a sampling rate of 5 Hz, and a spectral range of 190--400 nm with a resolution of 1 data item per 2 nm. To mine valuable information from these data, different mathematical techniques have been developed. Up to now, research and development of this kind with the application of statistical and mathematical techniques in chemistry has been confined mainly to analytical studies. Thus, our

Chemometrics: From Basics To Wavelet Transform. Foo-Tim Chau, Yi-Zeng Liang, Junbin Gao, and Xue-Guang Shao. Chemical Analysis Series, Volume 164. ISBN 0-471-20242-8. Copyright ? 2004 John Wiley & Sons, Inc.

1

2

introduction

discussion will focus on analytical chemistry but other disciplines of chemistry will also be included if appropriate. The main content of this book provides basic chemometric techniques for processing and interpretation of chemical data as well as chemical applications of advanced techniques, including wavelet transformation (WT) and mathematical techniques for manipulating higher-dimensional data.

1.1.2. Modern Analytical Chemistry

Modern analytical chemistry has long been recognized mainly as a measurement science. In its development, there are two fundamental aspects:

1.From the instrumental and experimental point of view, analytical chemistry makes use of the basic properties such as optics, electricity, magnetism, and acoustic to acquire the data needed.

In addition,

2.New methodologies developed in mathematical, computer, and biological sciences as well as other fields are also employed to provide in-depth and broad-range analyses.

Previously the main problem confronting analytical scientists was how to obtain data. At that time, measurements were labor-intensive, tedious, time-consuming, and expensive, with low-sensitivity, and manual recording. There were also problems of preparing adequate materials, lack of proper techniques, as well as inefficient equipment and technical support. Workers had to handle many unpleasant routine tasks to get only a few numbers. They also had to attempt to extract as much information as possible about the structure, composition, and other properties of the system under investigation, which was an insurmountable task in many cases. Now, many modern chemical instruments are equipped with advanced optical, mechanical, and electronic components to produce high-sensitivity, high-quality signals, and many of these components are found in computers for controlling different devices, managing system operation, data acquisition, signal processing, data interpretation in the first aspect and reporting analytical results. Thus the workload on analytical measurement mentioned above (item 1 in list) is reduced to minimum compared to the workload typical decades ago.

After an analytical measurement, the data collected are often treated by different signal processing techniques as mentioned earlier. The aim

modern analytical chemistry

3

is to obtain higher quality or ‘‘true’’ data and to extract maximum amount of meaningful information, although this is not easy to accomplish. For instance, in an HPLC study, two experimental runs were carried out on the same sample mixture. The two chromatograms acquired usually differed from each other to a certain extent because of the variations in instrumentation, experimental conditions, and other factors. To obtain quality results that are free from these disturbances, it is a common practice to carry out data preprocessing first. The techniques involved include denoising, data smoothing, and/or adjustment of baseline, drift, offset, and other properties. Methods such as differentiation may then be applied to determine more accurate retention times of peaks, especially the overlapping peaks that arise from different component mixtures. In this way, some of these components may be identified via their retention times with a higher level of confidence through comparison with those of the standards or known compounds. If the peak heights or peak areas are available, the concentrations of these components can also be determined if the relevant calibration curves are available. Statistical methods can also help in evaluating the results deduced and to calculate the level of confidence or concentrations of the components being identified. All these data obtained are very important in preparing a reliable report for an analytical test. Data treatment and data interpretation on, for instance, the HPLC chromatograms as mentioned above form part of an interdisciplinary area known as chemometrics.

1.1.3. Multidimensional Dataset

Many analytical instruments generate one-dimensional (1D) data. Very often, even if they can produce multidimensional signals, 1D datasets are still selected for data treatment and interpretation because it is easier and less time-consuming to manipulate them. Also, most investigators are used to handle 1D data. Yet, valuable information may be lost in this approach.

Figure 1.1 shows the spectrochromatogram obtained in a study of the herb Danggui (Radix angeliciae sinensis) [2] by using the Hewlett-Packard (HP) HPLC-DAD model 1100 instrument. Methanol was utilized for sample extraction. In carrying out the experiment, a Sep-Pak C18 column was used and the runtime was 90 min. The two-dimensional (2D) spectrochromatogram shown in Figure 1.1 contains 2.862 million data points [3]. It looks very complicated and cannot be interpreted easily just by visual inspection. As mentioned earlier, many workers simplify the job by selecting a good or an acceptable 1D chromatogram(s) from Fig. 1.1 for analysis. Figure 1.2

4

introduction

Figure 1.1. The 2D HPLC chromatogram of Danggui.

shows the 1D chromatograms selected with the measured wavelengths of 225, 280, and 320 nm, respectively. However, which one should be chosen as the fingerprint of Danggui is not an easy question to answer since the profiles look very different from one another. The variation in these chromatographic profiles is due mainly to different extents of ultraviolet absorption of the components within the herb at different wavelengths. From an information analysis [3], Figure 1.2b is found to be the best chromatogram. Yet, the two other chromatograms may be useful in certain aspects.

Methods for processing 1D data have been developed and applied by chemists for a long time. As previously mentioned, noise removal, background correction, differentiation, data smoothing and filtering, and calibration are examples of this type of data processing. Chemometrics is considered to be the discipline that does this kind of job. With the growing popularity of hyphenated instruments, chemometric methods for manipulating 2D data have been developing. The increasing computing power and memory capacity of the current computer further expedites the process. The major aim is to extract more useful information from mountainous 2D data. In the following section, the basic fundamentals of chemometrics are briefly introduced. More details will be provided in the following chapters.

chemometrics

5

 

mAU

Intensity

400

200

 

300

 

100

 

0

 

0

 

mAU

 

175

 

150

Intensity

125

100

 

 

75

 

50

 

25

 

0

 

0

 

mAU

 

200

 

175

Intensity

150

75

 

125

 

100

50

25

0

0

Figure 1.2.

(c) 320 nm.

(a)

20

40

60

80

min

Retention Time/min

(b)

20

40

60

80

min

Retention Time/min

(c)

20

40

60

80

min

Retention Time/min

The HPLC chromatogram of Danggui measured at (a) 225 nm, (b) 280 nm, and

1.2.CHEMOMETRICS

1.2.1.Introduction to Chemometrics

The term chemometrics was introduced by Svante Wold [4] and Bruce R. Kowalski in the early 1970s [4]. Terms like biometrics and econometrics were also introduced into the fields of biological science and economics. Afterward, the International Chemometrics Society was established. Since then, chemometrics has been developing and is now widely applied to different fields of chemistry, especially analytical chemistry in view of the

6

introduction

numbers of papers published, conferences and workshops being organized, and related activities. ‘‘A reasonable definition of chemometrics remains as how do we get chemical relevant information out of measured chemical data, how do we represent and display this information, and how do we get such information into data?’’ as mentioned by Wold [4]. Chemometrics is considered by some chemists to be a subdiscipline that provides the basic theory and methodology for modern analytical chemistry. Yet, the chemometricans themselves consider chemometrics is a new discipline of chemistry [4]. Both the academic and industrial sectors have benefited greatly in employing this new tool in different areas.

Howery and Hirsch [5] in the early 1980s classified the development of the chemometrics discipline into different stages. The first stage is before 1970. A number of mathematical methodologies were developed and standardized in different fields of mathematics, behavioral science, and engineering sciences. In this period, chemists limited themselves mainly to data analysis, including computation of statistical parameters such as the mean, standard deviation, and level of confidence. Howery and Hirsch, in particular, appreciated the research on correlating vast amounts of chemical data to relevant molecular properties. These pioneering works form the basis of an important area of the quantitative structure--activity relationship (QSAR) developed more recently.

The second stage of chemometrics falls in the 1970s, when the term chemometrics was coined. This new discipline of chemistry (or subdiscipline of analytical chemistry by some) caught the attention of chemists, especially analytical chemists, who not only applied the methods available for data analysis but also developed new methodologies to meet their needs. There are two main reasons why chemometrics developed so rapidly at that time: (1) large piles of data not available before could be acquired from advanced chemical instruments (for the first time, chemists faced bottlenecks similar to those encountered by social scientists or economists years before on how to obtain useful information from these large amounts of data) and (2) advancements in microelectronics technology within that period. The abilities of chemists in signal processing and data interpretation were enhanced with the increasing computer power.

The future evolution of chemometrics was also predicted by Howery and Hirsch in their article [5] and later by Brown [4]. Starting from the early 1980s, chemometrics were amalgamated into chemistry courses for graduates and postgraduates in American and European universities. In addition, it became a common tool to chemists. Since the early 1980s, development of the discipline of chemometrics verified the original predictions. Chemometrics has become a mainstay of chemistry in many universities of America and Europe and some in China and

chemometrics

7

other countries. Workshops and courses related to chemometrics are held regularly at conferences such as the National Meetings of American Chemical Society (ACS) and the Gordon Conferences, as well as at symposia and meetings of the Royal Society of Chemistry and International Chemometrics Society. For instance, four courses were offered under the title ‘‘Statistics/Experimental Design/Chemometrics’’ in the 226th ACS National Meeting held in New York in September 2003 [http://www.acs.org]. The course titles are ‘‘Chemometric Techniques for Qualitative Analysis,’’ ‘‘Experimental Design for Combinatorial and High-Througput Materials Development,’’ ‘‘Experimental Design for Productivity and Quality in R&D,’’ and ‘‘Statistical Analysis of Laboratory Data.’’ Furthermore, chemometrics training courses are held regularly by software companies like such as CAMO [6] and PRS [7]. In a review article [8] on the 25 most frequently cited books in analytical chemistry (1980--1999), four are related to chemometrics: Factor Analysis in Chemistry by Malinowski [9], Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson [10], Applied Regression Analysis by Draper and Smith [11], and Multivariate Calibration by Martens and Naes [12] with rankings of 4, 5, 7, and 16, respectively. The textbook Chemometrics: Statistics and Computer Applications in Analytical Chemistry [13] by Otto was the second most popular ‘‘bestseller’’ on analytical chemistry according to the Internet source www.amazon.com on February 16, 2001. The Internet source www.chemistry.co.nz listed ‘‘Statistics for Analytical Chemistry’’ by J. Miller and J. Miller as one of the eight analytical chemistry bestsellers on January 21, 2002 and February 10, 2003.

Chemometricians have applied the well-known approaches of multivariate calibration, chemical resolution, and pattern recognition for analytical studies. Tools such as partial least squares (PLS) [14], soft independent modeling of class analogy (SIMCA) [15], and methods based on factor analysis, including principal-component regression (PCR) [16], target factor analysis (TFA) [17], evolving factor analysis (EFA) [18,19], rank annihilation factor analysis (RAFA) [20,21], window factor analysis (WFA) [22,23], and heuristic evolving latent projection (HELP) [24,25] have been introduced. In providing the basic theory and methodology for analytical study, its evolution falls into main two categories: (1) development of new theories and algorithms for manipulating chemical data and (2) new applications of the chemometrics techniques to different disciplines of chemistry such as environmental chemistry, food chemistry, agricultural chemistry, medicinal chemistry, and chemical engineering. The advancements in computer and information science, statistics, and applied mathematics have introduced new elements into chemometrics. Neural networking [26,27], a mathematical technique that simulates the transmission of signals within

Соседние файлы в предмете Химия