Diss / 10
.pdfMain Statements of Statistical Estimation Theory |
363 |
11.5.1 Simple Loss Function
Substituting the simple loss function given by (11.23) into (11.31) and using the filtering property of the delta function
∫∞ ϕ(z)δ(z − z0 )dz = ϕ(z0 ). |
(11.33) |
−∞ |
|
we obtain |
|
post (γ ) = 1 − ppost (γ ). |
(11.34) |
The a posteriori risk post(γ) and, consequently, the average risk (γ) are minimum if the a posteriori pdf ppost(γ) is the maximum for the given estimate. In other words, the a posteriori pdf ppost(γ) is maximum, if there is a single maximum, or maximum maximorum, if there are several local
maxima. This means that the probable value γm, under which the following condition
ppost (γ m ) ≥ fpost (l), γ m , l |
(11.35) |
is satisfied, can be considered as the Bayes estimate. If the a posteriori pdf ppost(γ) is differentiable with respect to the parameter l, then the estimate γm can be defined based on the following equation:
dppost (l) |
= 0 at |
d2 ppost (l) |
< 0, |
(11.36) |
|||||
|
|
|
|
|
|
|
|||
dl |
dl |
2 |
|||||||
|
γ m |
|
|
|
γ m |
|
|
||
and we take into consideration only the solution of this equation satisfying (11.35). Substituting the simple loss function in (11.29), we obtain the Bayes risk
m = 1− max ppost (l) = 1− ppost (γ m ) . |
(11.37) |
The second term on the right-hand side of (11.37) is the average probability of correct decision making that is accurate within the constant factor. Consequently, in the case of the simple loss function, the probability of correct decision-making is maximum at the Bayes estimate. It is evident that, in the case of the simple loss function, the probability of incorrect decision-making is minimal when applying the Bayes estimate to random process parameter. At the same time, all errors have the same weight that means that all errors are undesirable independent of their values. In the case of simple loss function, the Bayes estimate is well known in literature as the estimate by maximum maximorum of the a posteriori pdf.
If the a priori pdf ppr(l) is constant within the limits of interval of possible values of the estimated random process parameter, then, according to (11.9), the a posteriori pdf is matched accurately within the constant factor with the likelihood ratio Λ(l). The estimate by maximum of the a posteriori pdf γm transforms into the estimate of the maximum likelihood lm. The maximum likelihood estimate lm is defined as a maximum maximorum position of the likelihood ratio. As a rule, the maximum likelihood ratio estimations are applied in the following practical cases:
•The a priori pdf of estimated random process parameter is unknown.
•It is complicated to obtain the a posteriori pdf in comparison with the likelihood ratio or function.
Main Statements of Statistical Estimation Theory |
365 |
11.5.3 Quadratic Loss Function |
|
According to (11.31), in the case of the quadratic loss function, we can write |
|
post (γ ) = ∫∞ (γ − l)2 ppost (l)dl. |
(11.40) |
−∞ |
|
From the condition of extremum of the function post(γ), we obtain the following estimate: |
|
γ m = ∫∞ lppost (l)dl = lpost . |
(11.41) |
−∞ |
|
Thus, the average of the a posteriori pdf of lpost, that is, the main point of the a posteriori pdf, is considered as the estimate of the random process parameter l at the quadratic loss function.
The value of post(γ) characterizes the minimum dispersion of random process parameter estimate. Since post(γ) depends on a specific form of the received realization x(t), the conditional dispersion is a random variable. In the case of the quadratic loss function, the Bayes risk coincides with the unconditional dispersion of estimate given by (11.15):
m = Var(γ m ) = Var(lpost ) = ∫∫(l − lpost )2 ppost (l) p(X)dldX. |
(11.42) |
X |
|
The estimate (11.41) is defined based on average risk minimum condition. For this reason, we can state that the Bayes estimate for the quadratic loss function makes minimum the unconditional dispersion of estimate. In other words, in the case of quadratic loss function, the Bayes estimate ensures the minimum estimate variance value with respect to the true value of estimated random process parameter among all possible estimations. We need to note one more property of the Bayes estimate when the loss function is quadratic. Substituting the a posteriori pdf (11.2) into (11.41) and averaging by the sample of observed data X, we can write
lpost = ∫∞ ∫lpprior (l) p(X|l)dldX. |
(11.43) |
−∞ X |
|
Changing the order of integration and taking into consideration the condition of normalization, we obtain
lpost = ∫∞ lpprior (l)dl = lprior , |
(11.44) |
−∞ |
|
that is, in the case of the quadratic loss function, the Bayes estimate is unconditional and unbiased, forever.
The quadratic loss function is proportional to the deviation square of the estimate from the true value of estimated random process parameter; that is, the weight is assigned to deviations and this weight increases as the square function of deviation values. These losses take place very often in various applications of mathematical statistics and theory of estimations. However, although the
Main Statements of Statistical Estimation Theory |
367 |
and characteristic property of the problem to estimate the random process parameter, which is very important, giving us a possibility to define the problem of valid selection of the estimate criterion more fairly. Owing to the finite time of observation and presence of noise and interference concurrent to observation, specific errors in the course of estimate definition arise. These errors are defined both by the quality performance and by conditions under which an estimation process is carried out. In general, the requirement of minimization of estimation error by value does not depend on just one assumption. However, if the criterion of estimation is given, the quality performance is measured based on this criterion. Then the problem of obtaining the optimal estimation is reduced to a definition of solution procedure that minimizes or maximizes the quality performance. In doing so, the parameter estimate must be close, in a certain sense, to the true value of estimated parameter and the optimal estimate must minimize this measure of closeness in accordance with the chosen criterion.
In theory, for statistical parameter estimation, two types of estimates are used: the interval estimations based on the definition of confidence interval, and the point estimation, that is, the estimate defined at the point. Employing the interval estimations, we need to indicate the interval, within the limits of which there is the true value of unknown random process parameter with the probability that is not less than the predetermined value. This predetermined probability is called the confidence factor and the indicated interval of possible values of estimated random process parameter is called the confidence interval. The upper and lower bounds of the confidence interval, which are called the confidence limits, and the confidence interval are the functions to be considered during digital signal processing (a discretization) or during analog signal processing (continuous function) of the received realization x(t). In the case of point estimation, we assign one parameter value to the unknown parameter from the interval of possible parameter values; that is, some value is obtained based on the analysis of the received realization x(t) and we use this value as the true value of the evaluated parameters.
In addition to the procedure of analysis of the estimation random process parameter based on an analysis of the received realization x(t), a sequential estimation method is used. Essentially, this method is used for the sequential statistical analysis to estimate the random process parameter. The basic idea of sequential estimation is to define the time of analysis of the received realization x(t), within the limits of which we are able to obtain the estimate of parameter with the preset criteria for reliability. In the case of the point estimate, the root-mean-square deviation of estimate or other convenience function characterizing a deviation of estimate from the true value of estimated random process parameter can be considered as the measure of reliability. From a viewpoint of interval sequential estimation, the estimate reliability can be defined using a length of the confidence interval with the given confidence coefficient.
To make the point estimation means that some number from the interval of possible values of the estimated random process parameter must correspond to each possible received realization. This number is called the point estimate. Owing to the random character of the point estimate of random process parameter, it is characterized by the conditional pdf. This is a general and total characteristic of the point estimate. A shape of this pdf defines a quality of point estimate definition and, consequently, all properties of the point estimate.
There are several approaches to define the properties of the point estimations: (a) It is natural to try to define such point estimate that the conditional pdf would be grouped very close to the estimate value; (b) it is desirable that while increasing the observation interval, the estimation would coincide with or approach stochastically the true value of the estimated random process parameter (in this case, we can say that the estimate is the consistent estimate); (c) the estimate must be unbiased or, in an extreme case, asymptotically unbiased; (d) the estimate must be the best by some criterion; for example, it must be characterized by minimal values of dispersion or variance at zero or constant bias; and (e) the estimate must be a statistically sufficient measure.
Based on the random character of the observed realization, we can expect errors in any decision-making rules; that is, such a decision does not coincide with the true value of a parameter.
368 |
Signal Processing in Radar Systems |
By applying various decision-making rules, different errors will appear with different levels of probability. Since the nonzero probability of error exists forever, we need to characterize the quality of different estimations in one way or another. For this purpose, the loss function is introduced in the theory of decisions. This function defines a definite loss for each combination from the decision and parameter. The physical sense of the loss function is the following. A definite nonnegative weight is assigned to each incorrect decision. In doing so, depending on targets, for which the estimate is defined, the most undesirable decisions are assigned the greatest weights. A choice of definite loss function is made depending on a specific problem of estimation of the random process parameter. There is no general decision-making rule to select the loss function. Each decisionmaking rule is selected based on a subjective principle. Any arbitrariness with selecting losses leads to definite difficulties with applying the theory of statistical decisions. The following types of loss functions are widely used in practice: the simple loss function, the linear modulo loss function, the quadratic loss function, and the rectangle loss function.
REFERENCES
1.Kay, S.M. 2006. Intuitive Probability and Random Processes Using MATLAB. New York: Springer + Business Media, Inc.
2.Govindarajulu, Z. 1987. The Sequential Statistical Analysis of Hypothesis Testing Point and Interval Estimation, and Decision Theory (American Series in Mathematical and Management Science). New York: American Sciences Press, Inc.
3.Sieqmund, D. 2010. Sequential Analysis: Test and Confidence Intervals (Springer Series in Statistics). New York: Springer + Business Media, Inc.
4.Kay, S.M. 1993. Fundamentals of Statistical Signal Processing: Estimation Theory. Upper Saddle River, NJ: Prentice Hall, Inc.
5.Cramer, H. 1946. Mathematical Methods of Statistics. Princeton, NJ: Princeton University Press.
6.Cramer, H. and M.R. Leadbetter. 2004. Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications. Mineola, NY: Dover Publications.
Estimation of Mathematical Expectation |
371 |
υ(t) = |
|
1 |
|
|
2 |
2 |
|
|
4 |
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
2 |
|
2 s′′′′(t) + 2 (ω0 − 2α |
|
)s′′(t) + ω0s(t) |
|
|
|
|
||||||||
|
4σ |
αω0 |
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
+ |
|
|
1 |
|
|
2 |
|
|
2 |
|
2 |
|
δ(t) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
2 |
|
2 { s′′′(0) + (ω0 − 4α |
|
)s′(0) + 2αω0s(0) |
|
|
||||||||
|
|
|
2σ |
αω0 |
|
|
|
|
|
|
|
|
|
|
|||
|
− s′′′(T ) + (ω02 |
− 4α2 )s′(T ) |
− 2αω02s(T ) |
δ(t − T ) + s′′(0) |
− 2αs′(0) + ω02s(0) |
δ′(t) |
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(12.16) |
||
|
− s′′(T ) + 2αs′(T ) + ω0s(T ) δ′(t − T )}. |
|
|
|
|
||||||||||||
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
The notations s′(t), s″(t), s′″(t), s″″(t) mean the derivatives of the first, second, third, and fourth order with respect to t, respectively. If the function s(t) and its derivatives at t = 0 and t = T become zero, then (12.15) and (12.16) have a simple form. As applied to stochastic process at s(t) = 1 = const, (12.15) and (12.16) have the following form:
|
|
υ(t) = |
α |
+ |
1 |
|
[δ(t) + δ(T − t)], |
(12.17) |
||||||
|
|
2σ2 |
σ2 |
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|||
υ(t) = |
ω02 |
+ |
1 |
[δ(t) + δ(t − T )] + |
1 |
[δ′(t) − δ′(t − T )]. |
(12.18) |
|||||||
4ασ2 |
σ2 |
2ασ2 |
||||||||||||
|
|
|
|
|
|
|
|
|
|
|||||
The following spectral densities |
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
S(ω) = |
|
2ασ2 |
|
|
|
(12.19) |
|||
|
|
|
|
|
α2 + ω2 |
|
|
|||||||
and
4ασ2 (ω2 + α2 )
S(ω) = ( 1 ) ( )2 (12.20)
ω4 − 2ω2 ω12 − α2 + ω12 + α2
correspond to the correlation functions given by (12.13) and (12.14), respectively. It is necessary to note that there is no general procedure to solve (12.8). However, if the correlation function of stochastic process depends on the absolute value of difference of arguments |t2 − t1| and the observation time T is much more than the correlation interval defined as
|
1 |
∞ |
∞ |
|
|
τcor = |
|
∫| R(τ) |dτ = ∫| (τ)| dτ, |
(12.21) |
||
σ2 |
|||||
|
|
0 |
0 |
|
|
where |
|
|
|
|
|
|
|
(τ) = |
R(τ) |
|
(12.22) |
|
|
σ2 |
|||
|
|
|
|
||
is the normalized correlation function, and the function s(t) and its derivatives at t = 0 and t = T become zero, it is possible to define the approximate solution of the integral equation (12.8) using the Fourier transform. Applying the Fourier transform to the left and right side of the following equation
∞ |
|
|
(12.23) |
∫ R(t − τ)υ(τ)dτ = s(t), |
−∞
372 Signal Processing in Radar Systems
it is not difficult to use the inverse Fourier transform in order to obtain |
|
||||
|
1 |
∞ |
S(ω) |
|
|
υ(t) = |
|
∫ |
|
exp{jωt}dω, |
(12.24) |
2π |
S(ω) |
||||
|
|
−∞ |
|
|
|
where
S(ω) is the Fourier transform of the correlation function R(τ)
S(ω) is the Fourier transform of mathematical expectation of the function s(t), which can be defined as
S(ω) = ∫∞ R(τ) exp{− jωτ}dτ, |
(12.25) |
||||
|
|
−∞ |
|
||
|
|
|
∞ |
|
|
S(ω) = ∫ s(t) exp{− jωt}dt. |
(12.26) |
||||
|
|||||
|
|
−∞ |
|
||
The inverse Fourier transform gives us the following formulae: |
|
||||
R(τ) = |
1 |
|
∫∞ S(ω)exp{jωτ}dω, |
(12.27) |
|
2π |
|||||
|
|
|
|
−∞ |
|
s(t) = |
1 |
|
∫∞ S(ω)exp{jωτ}dω. |
(12.28) |
|
2π |
|||||
|
|
|
|
−∞ |
|
If the function s(t) and its derivatives do not become zero at t = 0 and t = T and the function S(ω) is a ratio of two polynomials of pth and dth orders, respectively, with respect to ω2 and d > p, then there is a need to add the delta function δ(t) and its derivative δ′(t) taken at t = 0 and t = T. Thus, there is a need to define the solution for Equation 12.8 in the following form:
|
d −1 |
|
|
|
|
|
|
|
µ |
(t) + cµδ |
µ |
|
(12.29) |
υ(t) = υ(t) + ∑ bµδ |
|
|
(t − T ) . |
|
||
µ = 0
Here, the coefficients bμ and cμ are defined from the solutions of equations obtained under the substitution of (12.29) in (12.8); δμ(t) is the delta function derivative of μth order with respect to the time.
In the case of stationary stochastic process, we have s(t) = 1. In this case, the spectral density takes the following form:
|
|
S(ω) = 2πδ(ω). |
|
|
(12.30) |
||
From (12.24) and (12.29), we have |
|
|
|
|
|
|
|
|
|
d −1 |
|
|
|
|
|
υ(t) = S |
−1 |
|
µ |
(t) + cµδ |
µ |
|
(12.31) |
|
(ω = 0) + ∑ bµδ |
|
|
(t − T ) . |
|||
µ = 0
