- •Lecture 1 (part1)
- •Short Preambular to my course
- •Физ. Величина
- •1.1. Classical approaches. Traditional formulation of the one-dimensional regression problem
- •Regression model. Usually, it is supposed the measured values of the response are
- •Usually we suppose that:
- •These components should minimize the values of the error dispersion. Mathematically, this requirement
- •This procedure is very useful and can be considered as obligatory because it
- •Table 1. Simple functions that admit presentation in the form of the straight
- •Definitely, the list of the functions presented in Table 1 can be continued.
- •a.The elimination of the outliers;
- •ksmooth(x, y, w) in MathCad-15
- •Figure 2b. Here we demonstrate the effect of creation a trend by means
- •This procedure automatically decreases the value of the initial fluctuations by means of
- •Figure 3a. Here we show the results of application of the POLS to
- •The minimal values of the functions RelErr(w) for our model example are shown
- •If in the same time we integrate the optimal trend (6) then one
- •1.3. The description of the Eigen-Coordinates (ECs) method
- •If we compare the structure of Eqn.(15) with (5) one can see the
- •Here and below the symbol (A B) defines the scalar product in the
- •The unknown constants A1,2 are found from (22) by the LLSM, because other
- •It is easy to notice from (31) that new set of the functions
- •Questions for self-testing:
- •Questions, Comments or Remarks?
a.The elimination of the outliers;
b.The smoothing of the initial data in order to decrease the values of outliers and the values of the relative error at whole (the solution of this problem by means of the procedure of the optimal linear smoothing will be considered below in the second chapter).
In spite of the fact that the linear least squares method (LLSM) and its possible generalizations is considered in many books (see, for example [1], [2], [3] [4] [5]) nevertheless some interesting problems (even in one- dimensional regression task) are remained unsolved. In this lecture as in writing of a new chapter of the modern mathematical statistics we consider the eigen-coordinates (ECs) method that helps to reduce the problem of the fitting of a wide class of non-linear functions (when the fitting vector enters to by non-linear way) to the well- known LLSM.
1.2. Procedure of the optimal linear smoothing (POLS) of some noisy data
Any real data contain errors expressed in the form of a random error function (x) and if the random function accepts large values then it is necessary to decrease its values. If there is an approach that enables to realize the procedure of the data smoothing then one increases the quality of the regression procedure and (that is more important) the selection of the proper hypothesis. In this section, we want to suggest the procedure of the optimal linear smoothing (POLS) that can decrease the values of the initial random function. For understanding of the basic elements of the smoothing procedure, it is useful to consider some model (mimic) data. Let us choose
two functions that are defined in the interval [xmn, xmx] on the given number of discrete points
(xj = xmn+ (j/N) (xmx-xmn)) j = 0,1,…,N)
11
(8)
where parameters figuring in (8) accept correspondingly the following values: A0= -1,-2; A1=4,6; A2=6,4; 1=0.9,
1.0; 2=1.0, 0.9; = 3.0, 2.0. Then we randomize these functions by means of relationship
(9)
Here determines the value of deviation relatively maximal value (max(y)). We choose this parameter equaled
50% in order to create relatively large random deviations relatively the chosen model function (8). Expressions
Pr1,2(xj) in (9) determine two different random functions, which generate random numbers in the given interval
[0,1]. They are chosen as a linear combination of uniform, exponential and normal random functions (in equal proportions), respectively, in order to receive the final random function independently from the conventionally accepted normal distribution. The plots of these functions expressed by expressions (8) are given in Figs.2(a).
Then, as the first and basic element of the new treatment approach we apply the procedure of the optimal
linear smoothing (POLS). The essence of the POLS was described in papers [6, 7] but it is instructive to repeat
here some basic elements. For the smoothing of the initial data (6) we apply the linear procedure based on the Gaussian kernel. The used expression is written as
12
ksmooth(x, y, w) in MathCad-15 |
|
Analysis of the limiting cases? |
(10) |
W tends to infinity |
|
W tends to zero |
|
Gsm- means smoothed curve with G-kernel |
|
Here the function K(t) defines the Gaussian kernel, the value w defines the fixed width of the smoothing |
|
window. The set ninj (j = 1,2,…, N) defines the initial 'noisy' random sequence. In spite of the fact that there are |
|
many smoothing functions imbedded in many mathematical programs the chosen function has two important |
|
features: (a) the transformed smoothed function (9) is obtained in the result of linear transformation and, |
|
therefore, does not have uncontrollable error; (b) the value of the smoothing window (w) is adjustable (fitting) |
|
parameter and accept any value. |
|
|
13 |
Figure 2b. Here we demonstrate the effect of creation a trend by means of integration procedure. The initial
random sequences shown on the right-hand side do not
have a trend. A possible trend coincides with OX axis.
After integration by means of expression (12), we obtained the curves with clearly expressed trends that are shown on the central figure. The solid lines show the smoothed curves obtained by means of the POLS. The value of the optimal window equals approximately wopt
0.5.
This function in a certain sense can be considered as a pseudo-fitting function, which is not associated directly with a specific model describing the desired process. The value of the optimal window wopt is chosen from the
conditions
(11)
14
This procedure automatically decreases the value of the initial fluctuations by means of iteration procedure and helps to find the optimal value of the parameter w minimizing the value of the function of the relative error defined as RelErr(w) in the vicinity of the first local minima. In many model calculations realized with random sequences having clearly expressed or hidden trends this optimal value wopt in many cases does exist that helps
to find the optimal smoothed curve (trend) describing the large-scale fluctuations.
Here we want to mark also the following fact that was noticed from analysis of model data. If the initial random sequence determined as rndj does not have a trend then the first local minima is absent. So, in these cases it is
preferable to create a trend artificially by numerical integration of the initial trend by the trapezoid method.
After integration which is realized by means of the recurrence relationship
(12)
an initial random sequence rndj (oscillating initially near the OX axis (j=1,2…, N)) will receive a trend and (that is also interesting for applications) the integrated sequence Jninj has less deviations in comparison with
deviations of the initial sequence. The Figure 2b demonstrates this phenomenon.
15
Figure 3a. Here we show the results of application of the POLS to the curves that are defined by expressions (11). The optimal values of the smoothing windows (w1=0.4723,
w2=0.6131) minimizing the values of the relative errors
located approximately the first local minimum are shown by arrows.
Figures 3b. Here we demonstrate the result of the application of the POLS to the curves "one" (upper figure)
and "two" presented on Fig.2a.
16
The minimal values of the functions RelErr(w) for our model example are shown on Fig.3(a). The optimal trend is found as
(13)
The calculated optimal trends found by means of the POLS for both functions (9) are shown on Figs.3(b). But how to prove that these optimal trends are the most adequate in comparison with other smoothed curves? One can suggest a criterion proving that the optimal trend found by means of expressions (10)-(12) is optimal or at least close to the optimal one. If we integrate the initial random sequences nin1,2(x) with respect to the argument x
then one obtains the less "noisy" functions Jnin1,2(x). There are a lot of evidences that any integrated random
sequence decreases essentially the range of the high frequency fluctuations in comparison with the initial one. Figure 3c. Here we show the additional verification of the POLS
based on the integrated curves. Integration of the strongly
fluctuated curves and smoothed curves shows that the initially-
smoothed curves can be considered as pseudo-fitting functions.
17
If in the same time we integrate the optimal trend (6) then one expects that the calculated integral trends will serve as pseudo-fitting functions for the functions Jnin1,2(x) and should be close to the integrated ideal functions
(8) that determine the perfect trend. This observation is completely confirmed on many real and model data. We show figure 3(c) which confirms this observation. As one can notice from this figure the integrated optimal trends can serve as the pseudo-fitting functions for Jnin1,2(x) and very close to the integrated ideal functions
determined by expression (8).
Possible generalizations.
The further tests of the general expression (10) show that the smoothing with the help of the Gaussian kernel is
close to optimal expression. If we take the expression
(14)
and put the additional interval in the interval [0,6] then we cannot obtain the essential advantages in |
|
comparison with expression (10). So, in many cases the simplified expression (10) remains optimal. But it is |
|
interesting to note that a "game" with other kernels remains open. We do not know the expression for an |
|
optimal kernel which can be suitable in general case. It is necessary to formulate some general criterion, which |
|
can be helpful to find an optimal kernel. This formulation is not known and remains as an open problem. |
18 |
|
1.3. The description of the Eigen-Coordinates (ECs) method
Now everything is ready to describe the basic idea of the ECs method, which helps to reduce the fitting of the chosen hypothesis containing initially a finite set of nonlinear fitting parameters to the basic linear relationship of the type (5). If this procedure will be realized then the problem is reduced to the application of the linear
least squares method (LLSM). It is natural to give initially the basic principles and then to apply this method to
the problems that are described by a set of functions (8).
1. In what cases one can obtain the basic linear relationship for the function that initially contains a set of nonlinear fitting parameters?
In order to obtain the positive answer, it is necessary to obtain the corresponding differential equation that is satisfied by the chosen hypothesis. If the unknown parameters {Ck (k=1,2,…, s)} of the corresponding differential
equation form a linear combination (the set of parameters can be related with initial set of parameters {Al (l=1,2,
…, q)} by nonlinear way) then the answer for the question posed above is positive in other cases it can be negative. In order to specify this statement, it is useful to consider some example. Let us consider the function R(x; A) = B x exp(-a1 x- a2x2/2). Taking the natural logarithm from the both sides we obtain
(15)
19
If we compare the structure of Eqn.(15) with (5) one can see the desired BRL written relatively for the function Y(x)=ln[R(x)]. However, this presentation is not acceptable because it leads to the strong distortions when the function R(x) accepts small values close to zero. In this case, ln[R(x)] has large negative values and the initial error is increased. In order to avoid these large deviations it is necessary to differentiate expression (15) and
present the BRL in the equivalent form
(16)
Expression (16) is more preferable for further analysis in comparison with expression (14). Nevertheless, it remains unacceptable for further analysis because any numerical differentiation of the function R(x; A) (which is known only in the measured points) creates new uncontrollable errors.
In order to overcome this drawback it is necessary to transform the differential equation (16) into the Volterra
integral equation with variable that figures as the upper limit of integration. This simple procedure realized
numerically with the help of the trapezoid method decreases the value of the error (that should be always kept in mind at transformation of any hypothesis containing unknown errors!) and keeps at least the initial error in the same limits. Therefore, after integration of expression (16) we obtain finally the desired BLR of the type
20
(17) |
(18) |
|
You can verify it making a simple exercise.
Here the value x0 – corresponds to the initial point of the discrete data considered. As before, we should eliminate all possible
constants in expression (17) subjugating it to the condition . From expressions (18) one can find the unknown fitting parameters , a1,2. After their calculation with the help of the LLSM the last unknown constant B is found from the simple
relationship
(19)
21
