Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
32
Добавлен:
17.04.2013
Размер:
590.95 Кб
Скачать

Bayesian Estimation

fY|Θ ( y |θ )

 

MAP

 

ML

 

MMSE

 

MAVE

 

mean, mode,

θ

median

 

 

 

109

fY|Θ ( y |θ )

 

 

MAP

 

 

 

 

MAVE

 

 

 

MMSE

 

mode

median

mean

θ

Figure 4.10 Illustration of a symmetric and an asymmetric pdf and their respective mode, mean and median and the relations to MAP, MAVE and MMSE estimates.

4.2.6 The Influence of the Prior on Estimation Bias and Variance

The use of a prior pdf introduces a bias in the estimate towards the range of parameter values with a relatively high prior pdf, and reduces the variance of the estimate. To illustrate the effects of the prior pdf on the bias and the variance of an estimate, we consider the following examples in which the bias and the variance of the ML and the MAP estimates of the mean of a process are compared.

Example 4.6 Consider the ML estimation of a random scalar parameter θ, observed in a zero-mean additive white Gaussian noise (AWGN) n(m), and expressed as

y(m) = θ + n(m), m= 0,..., N–1

(4.55)

It is assumed that, for each realisation of the parameter θ, N observation samples are available. Note that, since the noise is assumed to be a zeromean process, this problem is equivalent to estimation of the mean of the process y(m). The likelihood of an observation vector y=[y(0), y(1), …, y(N–1)] and a parameter value of θ is given by

 

N −1

 

 

 

fY |Θ ( y | θ ) = f N

(y(m) θ )

 

 

m=0

 

 

 

=

1

 

 

 

 

exp

(2πσ n2 )N / 2

 

 

 

1

N −1

 

(4.56)

 

 

[y(m) θ ]2

 

 

2σ n2

 

m=0

 

 

110

Bayesian Estimation

From Equation (4.56) the log-likelihood function is given by

 

N

 

1

N −1

ln fY|Θ ( y |θ ) = −

ln(2πσ n2 )

[y(m) θ ]2

 

 

2

 

2σ n2 m=0

The ML estimate of θ , obtained by setting the derivative of ln f Y zero, is given by

(4.57)

|Θ ( y θ) to

 

1

N −1

 

θˆML =

y(m) =

 

(4.58)

y

 

 

N m=0

 

where y denotes the time average of y(m). From Equation (4.56), we note that the ML solution is an unbiased estimate

 

 

 

1

 

N −1

 

 

 

 

 

 

 

 

 

ˆ

 

 

[θ

 

 

=θ

 

 

 

 

(4.59)

 

 

 

 

 

 

 

 

 

E[θ ML ] = E

 

 

+n(m)]

 

 

 

 

 

 

 

N m=0

 

 

 

 

 

 

 

 

 

and the variance of the ML estimate is given by

 

 

 

 

 

 

 

 

 

2

 

 

 

 

1

N −1

 

2

 

 

2

 

ˆ

ˆ

 

]

=E

 

y(m)

 

 

=

σ n

(4.60)

 

 

 

 

Var[θ ML ]=E[(θ ML θ )

 

 

N

θ

 

N

 

 

 

 

 

 

 

 

m=0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Note that the variance of the ML estimate decreases with increasing length of observation.

Example 4.7 Estimation of a uniformly-distributed parameter observed in AWGN. Consider the effects of using a uniform parameter prior on the mean and the variance of the estimate in Example 4.6. Assume that the prior for the parameter θ is given by

1/(θ max θ min )

θmin θ θmax

(4.61)

fΘ (θ ) =

0

otherwise

 

 

as illustrated in Figure 4.11. From Bayes’ rule, the posterior pdf is given by

Bayesian Estimation

 

111

fY|Θ ( y | θ)

fΘ (θ)

fΘ|Y | y)

Likelihood

Prior

 

Posterior

 

 

 

 

 

 

 

 

 

 

 

 

 

θ ML

θ

θ min

θ max θ

θ MAP θ MMSE

θ

Figure 4.11 Illustration of the effects of a uniform prior.

fΘ|Y (θ | y) =

1

fY

( y ) fΘ (θ )

 

 

 

fY ( y)

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

1

 

 

1

N −1

 

 

 

 

 

 

 

exp

[y(m) θ ]2

,

θ min θ θ max

 

 

 

 

 

 

 

= fY ( y) (2πσ n2 ) N / 2

2σ n2

m=0

 

 

 

 

0,

 

 

 

 

 

 

otherwise

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(4.62)

The MAP estimate is obtained by maximising the posterior pdf:

θ min

θˆ ( )= θˆ

MAP y ML ( y)

θ

max

if θˆML ( y) < θ min

if θ min θˆML ( y) θ max (4.63) if θˆML ( y)> θ max

Note that the MAP estimate is constrained to the range θmin to θmax. This constraint is desirable and moderates the estimates that, due to say low signal-to-noise ratio, fall outside the range of possible values of θ. It is easy to see that the variance of an estimate constrained to a range of θmin to θmax is less than the variance of the ML estimate in which there is no constraint on the range of the parameter estimate:

θ max

Var[θˆMAP ]= (θˆMAP θ )2 fY|, ( y |θ dy

Var[θˆML ]= (θˆML θ )2 fY|, ( y ) dy

θ min

-

 

(4.64)

112

 

 

 

Bayesian Estimation

fY( y )

 

fΘ (θ )

 

fΘ|Y (θ | y) fY ( y)

 

 

 

 

Likelihood

 

 

Prior

 

Posterior

 

×

 

 

=

 

θML

θ

θ max=µθ

θ

θ MAP

θ

Figure 4.12 Illustration of the posterior pdf as product of the likelihood and the prior.

Example 4.8 Estimation of a Gaussian-distributed parameter observed in AWGN. In this example, we consider the effect of a Gaussian prior on the mean and the variance of the MAP estimate. Assume that the parameter θ is Gaussian-distributed with a mean µθ and a variance σθ2 as

fΘ (θ ) =

1

 

(θ µ

)2

 

 

exp

θ

 

 

(4.65)

(2πσθ2 )1/ 2

2σθ2

 

 

 

 

 

 

From Bayes rule the posterior pdf is given as the product of the likelihood and the prior pdfs as:

fΘ|Y

(θ | y) =

 

1

fY|Θ ( y ) fΘ (θ )

fY ( y)

 

 

 

 

 

 

=

1

1

 

 

 

 

 

 

fY ( y) (2πσ n2 ) N / 2 (2πσθ2 )1/ 2

 

 

 

exp

1

N −1

[y(m)

2σ n2

m=0

 

1

 

 

 

 

θ ]2

(θ µ

 

)2

 

 

θ

 

 

 

2

 

 

 

 

2σθ

 

 

 

 

(4.66) The maximum posterior solution is obtained by setting the derivative of the log-posterior function, ln f Θ|Y (θ | y), with respect to θ to zero:

ˆ

 

 

 

σθ2

 

 

 

 

σn2 N

 

 

 

 

 

 

 

 

 

 

 

 

 

θMAP

(y)

=

σθ2

+ σn2 N

y +

σθ2

+ σn2

 

µθ

(4.67)

 

 

 

 

 

 

N

 

N −1

where y = y(m) / N .

m=0

Note that the MAP estimate is an interpolation between the ML estimate y and the mean of the prior pdf µθ, as shown in Figure 4.12. The expectation

Bayesian Estimation

 

 

 

 

 

 

113

 

 

 

 

 

 

N2

>> N1

 

 

 

N1

 

 

 

 

 

 

 

fY( y )

 

 

fY( y )

fΘ (θ )

 

 

fΘ (θ )

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

µθ θ

θ

ML

θ

µ

θ

θ MAP

θ

MAP

 

 

θML

Figure 4.13 Illustration of the effect of increasing length of observation on the variance an estimator.

of the MAP estimate is obtained by noting that the only random variable on

the right-hand side of Equation (4.67) is the term

 

 

 

, and that E [

 

]=θ

 

 

y

y

E[θˆ

( y)] =

 

 

σθ2

θ +

 

σ n2

N

µ

 

(4.68)

 

 

+σ n2 N

σθ2 +σ n2 N

MAP

 

 

σθ2

 

 

θ

 

 

and the variance of the MAP estimate is given as

 

 

 

 

 

 

 

 

Var[θˆMAP ( y)]

 

 

σθ2

 

 

 

 

 

 

 

σ n2

N

 

 

 

=

 

 

× Var[y] =

 

(4.69)

 

 

 

 

 

 

+ σ n2

θ2

 

σθ2

+σ n2 N

 

 

1

 

 

Substitution of Equation (4.58) in Equation (4.67) yields

 

 

Var[θˆMAP ( y)] =

 

Var[θˆML ( y)]

 

 

 

 

 

 

 

(4.70)

 

+ Var[θˆML ( y)] σθ2

 

 

 

 

 

1

 

 

 

 

Note that as σθ2 , the variance of the parameter θ, increases the influence of the prior decreases, and the variance of the MAP estimate tends towards the variance of the ML estimate.

4.2.7 The Relative Importance of the Prior and the Observation

A fundamental issue in the Bayesian inference method is the relative influence of the observation signal and the prior pdf on the outcome. The importance of the observation depends on the confidence in the observation, and the confidence in turn depends on the length of the observation and on

114

Bayesian Estimation

the signal-to-noise ratio (SNR). In general, as the number of observation samples and the SNR increase, the variance of the estimate and the influence of the prior decrease. From Equation (4.67) for the estimation of a Gaussian distributed parameter observed in AWGN, as the length of the observation N increases, the importance of the prior decreases, and the MAP estimate tends to the ML estimate:

 

 

 

 

2

 

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

limitθˆ

 

( y ) = limit

 

σθ

 

 

 

+

 

σ n

N

 

µ

 

=

 

=θˆ

 

(4.71)

MAP

 

 

y

 

y

ML

2

2

 

2

 

2

 

N →∞

 

N

 

 

 

 

N

 

θ

 

 

 

 

 

N →∞

σθ

+σ n

 

 

 

σθ

+σ n

 

 

 

 

 

 

 

As illustrated in Figure 4.13, as the length of the observation N tends to infinity then both the MAP and the ML estimates of the parameter should tend to its true value θ.

Example 4.9 MAP estimation of a signal in additive noise. Consider the estimation of a scalar-valued Gaussian signal x(m), observed in an additive Gaussian white noise n(m), and modelled as

y(m)=x(m)+n(m)

The posterior pdf of the signal x(m) is given by

f X |Y (x(m) y(m))=

=

( 1 ) fY|X (y(m) x(m)) f X ( x(m)) fY y(m)

( 1 ) f N (y(m) x(m)) f X ( x(m)) fY y(m)

(4.72)

(4.73)

where f X (x(m))=N (x(m),µ x ,σ x2 ) and f N (n(m))=N (n(m),µn ,σ n2 ) are the Gaussian pdfs of the signal and noise respectively. Substitution of the signal and noise pdfs in Equation (4.73) yields

f

 

(x(m) | y(m)) =

 

1

 

1

 

 

 

[y(m) x(m) µn ]2

X |Y

 

(y(m))

 

exp

 

2

 

 

f

Y

2πσ n

 

 

 

 

 

 

 

 

 

 

 

 

 

2σ n

 

 

 

 

 

 

1

 

 

 

 

 

2

 

 

 

 

 

 

 

 

 

[x(m) µ x ]

 

 

 

 

 

 

 

 

exp

 

 

2

 

 

 

 

 

 

 

2πσ x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2σ x

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(4.74)

This equation can be rewritten as

Bayesian Estimation

 

 

 

 

 

 

 

 

 

 

 

115

 

 

 

 

 

1

1

 

 

 

 

2

2

2

2

 

 

 

(x(m) | y(m)) =

 

 

 

 

 

σ x

[y(m) x(m) µn ]

+ σ n

[x(m) µ x ]

 

f

X |Y

 

 

 

 

 

 

 

 

exp

 

 

 

 

 

 

 

f

Y

(y(m)) 2πσ

n

σ

x

 

 

 

2 2

 

 

 

 

 

 

 

 

 

 

 

 

 

2σ x σ n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(4.75)

To obtain the MAP estimate we set the derivative of the log-likelihood

function ln f X |Y (x(m) | y(m)) with respect to x(m) to zero as

 

 

 

[ln f X |Y (x(m) | y(m))]

= −

−2σ x2 (y(m) − x(m) −

µn ) + 2σ n2 (x(m) − µx )

= 0

 

ˆ

 

 

 

2

2

 

 

 

 

 

 

 

 

 

 

 

 

∂x(m)

 

 

 

 

2σ xσ n

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(4.76)

From Equation (4.76) the MAP signal estimate is given by

 

 

 

ˆ

 

2

 

 

 

2

 

 

 

 

 

 

x

 

 

 

n

 

 

 

 

 

x(m) =

 

 

 

 

[ y(m) − n ]+

 

 

 

x

(4.77)

 

 

2

 

2

 

 

2

 

 

 

+

2

+

 

 

 

 

 

x

n

 

x

n

 

 

 

Note that the estimate ˆ is x(m)

unconditional mean of x(m), µx,

a weighted linear interpolation between the and the observed value (y(m)–µn). At a very

 

2

<<

2

 

 

 

 

ˆ

poor SNR i.e. when

x

 

=

n

we have

x(m) ≈ µ x ; and, on the other hand,

 

 

2

 

 

 

 

=

ˆ

for a noise-free signal

 

n

 

0 and

 

n

 

0 and we have x(m) = y(m) .

Example 4.10 MAP estimate of a Gaussian–AR process observed in AWGN. Consider a vector of N samples x from an autoregressive (AR) process observed in an additive Gaussian noise, and modelled as

y = x + n

(4.78)

From Chapter 8, a vector x from an AR process may be expressed as

e= Ax

(4.79)

where A is a matrix of the AR model coefficients, and the vector e is the input signal of the AR model. Assuming that the signal x is Gaussian, and that the P initial samples x0 are known, the pdf of the signal x is given by

116

 

 

 

 

 

 

 

 

 

 

 

 

Bayesian Estimation

 

 

 

 

 

 

 

 

1

 

 

1

 

 

 

f

 

( x | x

 

) = f

 

(e)=

 

exp

 

xT AT Ax

(4.80)

X

0

E

(

N / 2

 

2

 

 

 

 

 

 

2σ

 

 

 

 

 

 

 

 

 

2 )

 

 

e

 

 

 

 

 

 

 

 

 

2πσ e

 

 

 

 

 

where it is assumed that the input signal e of the AR model is a zero-mean uncorrelated process with varianceσ e2 . The pdf of a zero-mean Gaussian noise vector n, with covariance matrix Σnn, is given by

 

1

 

 

 

 

 

1

 

 

f N (n) =

 

 

 

 

 

exp

 

nTΣ nn−1 n

(4.81)

(2π )N / 2

 

Σ nn

 

1/ 2

 

 

 

 

 

 

2

 

 

From Bayes’ rule, the pdf of the signal given the noisy observation is

f X |Y ( x | y) =

fY|X ( y

 

x ) f

X ( x)

=

1

f N ( y x) f X ( x) (4.82)

 

 

 

 

 

 

fY ( y)

 

fY ( y)

 

 

 

 

Substitution of the pdfs of the signal and noise in Equation (4.82) yields

 

 

 

1

 

 

 

 

 

 

f X |Y (x | y) =

 

 

 

 

 

 

 

 

 

 

 

 

 

 

exp

fY ( y)(2π )

N

σ e

 

Σ nn

 

1/ 2

 

 

 

 

 

 

N / 2

 

 

 

 

 

1

 

x

T

Α

T

Ax

 

(y x)T Σ nn−1 (y x) +

 

 

 

 

 

 

 

 

 

 

2

 

 

σ e

 

 

 

 

 

 

 

 

 

 

2

 

 

(4.83)

The MAP estimate corresponds to the minimum of the argument of the exponential function in Equation (4.83). Assuming that the argument of the exponential function is differentiable, and has a well-defined minimum, we can obtain the MAP estimate from

ˆ

 

 

 

 

 

 

T

 

−1

 

x T Α T Ax

 

 

 

( y

x)

 

Σ nn ( y

x)+

 

(4.84)

xMAP ( y)=arg zero

 

 

 

 

 

 

σ e2

x

 

x

 

 

 

 

 

 

 

 

 

 

 

 

The MAP estimate is

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

 

 

−1

 

 

 

ˆ

 

 

 

 

 

 

 

 

T

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

+

 

2 Σ nnΑ

 

y

 

(4.85)

xMAP ( y ) = I

 

 

A

 

 

 

 

 

 

e

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

where I is the identity matrix.

Estimate–Maximise (EM) Method

117

4.3 The Estimate–Maximise (EM) Method

The EM algorithm is an iterative likelihood maximisation method with applications in blind deconvolution, model-based signal interpolation, spectral estimation from noisy observations, estimation of a set of model parameters from a training data set, etc. The EM is a framework for solving problems where it is difficult to obtain a direct ML estimate either because the data is incomplete or because the problem is difficult.

To define the term incomplete data, consider a signal x from a random process X with an unknown parameter vector θ and a pdf fX;Θ(x). The notation fX;Θ(x) expresses the dependence of the pdf of X on the value of the unknown parameter θ. The signal x is the so-called complete data and the ML estimate of the parameter vector θ may be obtained from fX;Θ(x). Now assume that the signal x goes through a many-to-one non-invertible transformation (e.g. when a number of samples of the vector x are lost) and is observed as y. The observation y is the so-called incomplete data.

Maximisation of the likelihood of the incomplete data, fY;Θ(y), with respect to the parameter vector θ is often a difficult task, whereas maximisation of the likelihood of the complete data fX;Θ(x) is relatively

easy. Since the complete data is unavailable, the parameter estimate is obtained through maximisation of the conditional expectation of the loglikelihood of the complete data defined as

E[ln f X ;Θ ( x;θ )

 

y]= f X|Y ;Θ (x

 

y;θ )ln f X ;Θ ( x;θ ) dx

(4.86)

 

 

 

 

 

 

 

X

 

In Equation (4.86), the computation of the term fX|Y;Θ(x|y) requires an

estimate of the unknown parameter vector θ. For this reason, the expectation of the likelihood function is maximised iteratively starting with an initial estimate of θ, and updating the estimate as described in the following.

 

 

“Complete data”

 

“Incomplete data”

Signal process

 

x

Non-invertable

y

with

 

 

 

 

 

 

 

transformation

fY ;Θ ( y;θ )

parameter

θ

 

f X ;Θ (x;θ )

 

 

 

 

 

Figure 4.14 Illustration of transformation of complete data to incomplete data.

118

EM Algorithm

Bayesian Estimation

Step 1: Initialisation Select an initial parameter estimate θ0 , and for i = 0, 1, ... until convergence:

Step 2: Expectation Compute

U (θ ,θˆ ) = E[ln f X ;Θ (x;θ ) | y;θˆi )

= f X |Y ;Θ (x | y;θˆi )ln f X ;Θ (x;θ ) dx (4.87)

X

Step 3: Maximisation Select

θˆ +

= arg maxU (θ ,θˆ

)

(4.88)

i 1

i

 

 

 

θ

 

 

Step 4: Convergence test If not converged then go to Step 2.

4.3.1 Convergence of the EM Algorithm

In this section, it is shown that the EM algorithm converges to a maximum of the likelihood of the incomplete data fY;Θ(y;θ). The likelihood of the complete data can be written as

f X ,Y ;Θ (x, y;θ ) = f X|Y ;Θ (x|y;θ ) fY ;Θ ( y;θ )

(4.89)

where fX,Y;Θ(x,y) is the likelihood of x and y with θ as a parameter. From Equation (4.89), the log-likelihood of the incomplete data is obtained as

ln fY ;Θ ( y;θ ) = ln f X ,Y ;Θ (x, y;θ ) − ln f X|Y ;Θ (x| y;θ )

(4.90)

Using an estimate θˆi of the parameter vector θ, and taking the expectation of Equation (4.90) over the space of the complete signal x, we obtain

ln f

( y;θ )=U (θ ;θˆ )−V (θ ;θˆ )

(4.91)

Y ;Θ

i

i

 

where for a given y, the expectation of ln fY;Θ(y;θ) is itself, and the function

U(θ ;θˆ ) is the conditional expectation of ln fX,Y;Θ(x,y;θ):

Соседние файлы в папке Advanced Digital Signal Processing and Noise Reduction (Saeed V. Vaseghi)