Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Diss / (Springer Series in Information Sciences 25) S. Haykin, J. Litva, T. J. Shepherd (auth.), Professor Simon Haykin, Dr. John Litva, Dr. Terence J. Shepherd (eds.)-Radar Array Processing-Springer-Verlag

.pdf
Скачиваний:
68
Добавлен:
27.03.2016
Размер:
14.79 Mб
Скачать

110 B. Ottersten, M. Viberg, P. Stoica, and A. Nehorai

where the criterion function is defined as

(4.34)

By well-known properties ofthe trace operatorl Tr {. }, the normalized, negative log-likelihood function can be expressed as

1(8, S, 0'2) = 10giRI + Tr{R- l R} ,

(4.35)

where R is the sample covariance

(4.36)

With some algebraic effort, the ML criterion function can be concentrated with respect to Sand 0' 2 [4.24, 46, 53J, thus reducing the dimension of the required numerical optimization to pd. The SML estimates of the signal covariance matrix and the noise power are obtained by inserting the SML estimates of 8 in the following expressions:

8(8) = At(8)[R -

82 (8)I] AtH(8)

(4.37)

1

~

(4.38)

8 2 (8) = m - d Tr{P~(8)R} ,

where At is the pseudo-inverse of A, and P1 is the orthogonal projector onto the null space of AH, i.e.,

At = (AHA)-l AH

(4.39)

PA = AAt

(4.40)

P1=I-PA ·

(4.41)

The concentrated form of the SML criterion is now obtained by substituting (4.37-38) into (4.35). The signal parameter estimates are obtained by solving the following optimization problem:

Ii = arg min VSML(8)

(4.42)

o

(4.43)

I For a scalar a, Tr{a} = a, and

for matrices A and B of appropriate dimensions, Tr{AB}

= Tr{BA} and Tr{A} + Tr{B} =

Tr{A + B}.

4. Maximum Likelihood Techniques for Parameter Estimation

111

Remark 4.1 It is possible to include the obvious a priori information that Sis positive semi-definite. Since (4.37) may be indefinite, this yields a potentially different ML estimator [4.46J. In general, if the rank of Sis known to be d', a different parametrization should be employed, for instance, S = LLH, where L is a d x d', "lower triangular" matrix. If d' = d, these possible modifications will have no effect for "large enough N", since (4.37) is a consistent estimate of S. Even if d' < d, it can be shown that the asymptotic (for large N), statistical properties of the SML estimate cannot be improved by the square-root parametrization. Since the latter leads to a significantly more complicated optimization problem, the unrestricted parametrization ofSappears to be preferable. We will, therefore, always refer to the minimizer of (4.43) as being the SML estimate

of the signal parameters.

0

Although the dimension of the parameter space is reduced substantially, the form of the resulting criterion function (4.43) is complicated and, in general, the minimizing (J cannot be found analytically. In Sect. 4.6, a numerical procedure is described for carrying out the required optimization.

4.4.2 Deterministic Maximum Likelihood Method

In some applications, for example, radar and radio communication, the signal waveforms are often far from being Gaussian random variables. The deterministic model is then a natural one, since it makes no assumptions at all on the signals. Instead s(tj ), i = 1, ... , N, are regarded as unknown parameters that have to be estimated. In fact, in some applications, such as communications, estimation of s(ta is of more interest than estimation of (J. The ML estimator for this model is termed the DML method.

Similar to the stochastic signal model, the observation process, x(tj ), is Gaussian distributed, given the unknown quantities. The firstand second-order moments are different, though:

E{x(ti )}

= A(O)s(tj )

 

 

(4.44)

E {(x(tj )

-

E {x(ti )} )(x(tj )

-

E {x(tj )} )H} = (52 H jj

(4.45)

E {(x(ta -

E {x(tj )} )(x(tj )

-

E {x(tj ) })T} = 0 .

(4.46)

The unknown parameters are, in this case, (J, s(tj ), i = 1, ... , N, and (52.

The joint probability distribution of the observations is formed by conditioning on the parameters of the deterministic model: the signal parameters, the noise variance, and the waveforms. As the snapshots are independent, the conditional density is given by

(4.47)

112

B. Ottersten, M. Viberg, P. Stoica, and A. Nehorai

 

and the negative log-likelihood function has the following form:

 

 

-log[p(O, (12, SN)] = Nmlog(n;(12) + (1-2Tr{[XN - A(O)SN]"

 

 

x [XN- A(O)SN]}

 

 

= Nmlog(n;(12) + (1-21IXN- A(O)SNII~,

(4.48)

where II'IIF is the Frobenius norm2 of a matrix, and XN and SN are defined in (4.16). The deterministic maximum likelihood estimates are the minimizing arguments of (4.48). For fixed 0 and SN' the minimum, with respect to (12, is readily derived as

(4.49)

Substituting (4.49) and (4.48) shows that Ii and SN are obtained by solving the non-linear least-squares problem

[0, SN] = argminllXN - A(O)SNII~·

(4.50)

(J,SH

 

Since the above criterion function is quadratic in the signal waveform parameters, it is easy to minimize with respect to SN [4.22, 23, 54]. This results in the following estimates:

(4.51)

(4.52)

(J

(4.53)

Comparing (4.43) and (4.53), we see that the DML criterion depends on 0 in a simpler way than does the SML criterion. It is, however, important to note that (4.52~53) is also a nonlinear multidimensional minimization problem, and the criterion function often possesses a large number of local minima.

4.4.3 Bounds on Estimation Accuracy

In any practical application involving the estimation ofsignal parameters, it is of utmost importance to assess the performance of various estimation procedures. However, any accuracy measure may be of limited interest unless one has an idea of what the best possible performance is. An important measure of how well a particular method performs is the covariance matrix of the estimation errors.

2The Frobenius norm of a matrix is given by the square root of the sum of the squared moduli of the elements.

4. Maximum Likelihood Techniques for Parameter Estimation

113

Several lower bounds on the estimation error covariance are available in the literature [4.55-57]. Of these, the Cramer-Rao Lower Bound (CRLB) is by far the most commonly used. The main reason for this is its simplicity, and also the fact that it is often (asymptotically) tight, i.e., an estimator exists that (asymptotically) achieves the CRLB. Such an estimator is said to be (asymptotically) efficient. Unless otherwise explicitly stated, the word" asymptotically" is used throughout this chapter to mean that the amount of data (N) is large.

Theorem 4.1 Let Ij be an unbiased estimate of the real parameter vector '10' i.e., E{q} = '10' based on the observations XN • The Cramer-Rao lower bound on the estimation error covariance is then given by

E{(q - 'Io)(1j - 'Io?} ~[ -E f~2-1~::a~NI'I)}Tl

(4.54)

 

o

The matrix within square brackets in (4.54) (i.e., the inverse of the CRLB) is referred to as the Fisher 1'!formation Matrix (FIM).

a) Cramer-Rao Lower Bound

The CRLB based on the Gaussian signal model is discussed in [4.21, 28], and is easily derived from the normalized negative log-likelihood function in (4.35). Let 'I represent the vector of unknown parameters in the stochastic model

(4.55)

where sij = Re{sij} and 8ij = Im{siJ. Introduce the short hand notation

(4.56)

and recall the following differentiation rules [4.58]:

(4.57)

(4.58)

(4.59)

The first derivative of (4.35) with respect to the ith component of the parameter vector is given by

(4.60)

114 B. Ottersten, M. Viberg, P. Stoica, and A. Nehorai

Equation (4.61) gives the ijth element of the inverse of the CRLB

{FIM}ij = NE {iJ2I('1)}1

,,='I.

a,,/J"j

= NE{Tr[[(R- 1)jRj + R- 1Rij](J - R- 1R) - R- 1Rj(R- 1)jRJ}

= NE{Tr[ -R- 1 Rj (R- 1 )jR]}

 

(4.61)

The appearance of N on the right-hand side above is due to the normalization of (4.35). In many applications, only the signal parameters are of interest. However, the above formula involves derivatives with respect to all d2 + pd + 1 components of ", and, in general, none of the elements of (4.61) vanish. A compact expression for the CRLB on the covariance matrix of the signal parameters only is presented in [4.29, 32, 59]. The Cramer-Rao inequality for 0 is given by

 

 

 

 

 

 

 

(4.62)

where

 

 

 

 

 

 

 

-1}

2N

{ H . l

H

R

-1

}

i,j = 1, ... , pd . (4.63)

{ BSTO

jj = -2 Re[Tr

Aj PAAjSA

 

 

AS]

(J

For the special case when there is one parameter associated with each signal (p = 1), the CRLB for the signal parameters can be put in a simple matrix form

2

 

BsTO = ;N [Re{(D" P;D) 0 (SA H R- 1AS)T}J-1 ,

(4.64)

where 0 denotes the Hadamard (or Schur) product, i.e., element-wise multiplication, and

D = [aa(O)1

' ... ,aa(O) I ] .

(4.65)

ao

8=8,

ao

8=8d

 

b) Deterministic Cramer-Rao Lower Bound

The CRLB for deterministic signals is derived in [4.30, 31], and is restated here in its asymptotic form, and for the case p = 1. The emitter signals are arbitrary, second-order ergodic sequences, and with some abuse of notation, the limiting signal sample covariance matrix is denoted

(4.66)

If the signal waveforms happen to be realizations of stationary stochastic processes, the limiting signal sample covariance will indeed coincide with the

4. Maximum Likelihood Techniques for Parameter Estimation

115

signal covariance matrix under mild assumptions (e.g., bounded fourth-order moments).

Let 0be an asymptotically unbiased estimate of the true parameter vector, (Jo. For large N, the Cramer-Rao inequality for the signal parameters can then be expressed as

(4.67)

where

(4.68)

It should be noted that the above inequality implicitly assumes that asymptotically unbiased estimates of all unknown parameters in the deterministic model, i.e., (J, SN' and (12, are available. Since no assumptions on the signal waveforms are made, the inequality also applies iftheynappen to be realizations of Gaussian processes. One may, therefore, guess that BsTO is tighter than BDET • We shall prove this statement later in this section.

4.4.4 Asymptotic Properties of Maximum Likelihood Estimates

The ML estimator has a number of attractive properties that hold for general, sufficiently regular, likelihood functions. The most interesting one for our purposes states that if the ML estimates are consistent 3, then they are also asymptotically efficient. In this respect, the ML method has the best asymptotic properties possible.

a) Stochastic Maximum Likelihood

The SML likelihood function is regular, and the general theory of ML estimation can be applied to yield the following result:

Theorem 4.2 Under the Gaussian signal assumption, the SML parameter estim-

ates are consistent, and the normalized estimation error, IN(iI - I/o), has a limiting zero-mean, normal distribution with covariance matrix equal to N times the CRLB on Ij.

Proof See Chap. 6.4 in [4.56] for a proof.

0

From Theorem 4.2 we conclude that for the SML method, the asymptotic distribution of IN(O - (Jo) is N(O, CsMd, where

(4.69)

and where BsTO is given by (4.63).

3 An estimate is consistent if it converges to the true value as the amount of data tends to infinity.

116B. Ottersten, M. Viberg, P. Stoica, and A. Nehorai

b)Deterministic Maximum Likelihood

The deterministic model for the sensor array problem has an important drawback. Since the signal waveforms themselves are regarded as unknown parameters, it follows that the dimension of the parameter vector grows without bound with increasing N. For this reason, consistent estimation of all model parameters is impossible. More precisely, the DML estimate of fJ is consistent, whereas the estimate ofSN is inconsistent. To verify the consistency of 0, observe that under mild conditions, the criterion function (4.53) converges w.p.l and uniformly in fJ to the limit function

VDMdfJ) = Tr{P~"(fJ)R} = Tr{P;(fJ)[A(fJo)SAH(fJo) + (121]} ,

(4.70)

as N tends

to

infinity. Hence, 0 converges to the minimizing

argument

of

VDMdfJ). It

is

readily verified that VDMdfJ);::: (12Tr{P~(fJ)} = (12(m - d)

=

V(fJo) (recall that the trace of a projection matrix equals the dimension of the subspace onto which it projects). Let S = LLH be the Cholesky factorization of the signal covariance matrix, where L is d X d'. Clearly VDMdfJ) = (12(m - d) holds if, and only if, Pj(fJ)A(fJo)L = 0, in which case

(4.71)

for some d x d' matrix Ll of full rank. By the UP assumption and (4.27), the relation (4.71) is possible if, and only if, fJ = fJo. Thus, we conclude that the minimizer of (4.53) converges w.p.l to the true value fJo.

The signal waveform estimates are, however, inconsistent since

(4.72)

Owing to the inconsistency of SN' the general properties of ML estimators are not valid here. Thus, as observed in [4.30], the asymptotic covariance matrix of the signal parameter estimate does not coincide with the deterministic CRLB. Note that the deterministic Cramer-Rao inequality (4.67) is indeed applicable, as the DML estimate of SN can be shown (with some effort) to be asymptotically unbiased, in spite of its inconsistency.

The asymptotic distribution of the DML signal parameter estimate is derived in [4.39, 44], and is given next for the case of one parameter per emitter

signal.

 

 

 

 

 

 

 

Theorem 4.3

Let 0 be obtained from (4.53).

Then, .jN(0 - fJo) converges in

distribution to N(O, CDMd, where

 

 

 

 

 

 

1

{ noH .L

(A

H

A)

-T}

BoET'

(4.73)

N CDML =

BDET + 2NBoETRe (u--PAD) 0

 

 

with BDET the asymptotic deterministic CRLB as defined in (4.68).

From (4.73), it is clearly seen that the covariance of the DML estimate is strictly greater than the deterministic CRLB. However, these two matrices approach the

4. Maximum Likelihood Techniques for Parameter Estimation

117

same limit as the number of sensors, m, increases [4.32]. The requirement of the DML method to estimate the signal waveforms thus has a deteriorating effect on the DOA estimates, unless m is large. In many applications, the signal waveform estimates may be of importance themselves. Though the DML technique provides such estimates, it should be rem~ked that they are not guaranteed to be the most accurate ones, unless m is large enough.

4.4.5 Order Relations

As discussed above, the two models for the sensor array problem, corresponding to deterministic and stochastic modeling of the emitter signals, respectively, lead to different ML criteria and CRLB's-. The following result, due to [4.32, 60], relates the covariance matrices of the stochastic and deterministic ML estimates, and the corresponding CRLB's.

Theorem 4.4 Let CSML and COML denote the asymptotic covariances of IN(Ii - Oo),/or the stochastic and deterministic ML estimates, respectively. Further-

more, let IlsTo and BOET be the stochastic CRLB and the deterministic CRLB. The following (in)equalities then hold

(4.74)

Proof Theorem 4.2 shows the middle equality in (4.74). The left inequality follows by applying the DML method under the Gaussian signal assumption. The Cramer-Rao inequality then implies that N -1 COML ~ IlsTo, To prove the right inequality in (4.74), apply the matrix inversion lemma [4.61, Lemma A.1] to obtain

SAHR- 1AS = S-S(/-AH(ASAH + 0' 2/)-1 AS)

 

= S-S(I + AHO'- 2 AS)-l .

(4.75)

Since the matrix S(I + AHO'- 2 AS)-l is Hermitian (by the equality above) and positive semi-definite, it follows that SAHR- 1 AS::;; S. Hence, application of [4.39, Lemma A.2], yields

(4.76)

By inverting both sides of (4.76), the desired ineqUality follows. If the matrices DH P;D and S are both positive definite, the inequality (4.76) is strict, showing that the stochastic bound is, in this case, strictly tighter than the deterministic

bound.

0

Remark 4.2 It is, of course, natural that the SML estimator is more accurate than the DML method under the Gaussian signal assumption. However, this relation remains true for arbitrary second-order ergodic emitter signals, which is

118 B. Ottersten, M. Viberg, P. Stoica, and A. Nehorai

more surprising. This is a consequence of the asymptotic robustness property of both ML estimators: the asymptotic distribution of the signal parameter estimates is completely specified by limN-->00 (1/N)If=1 S(ti)SH(tJ As shown in [4.32, 60J, the actual signal waveform sequence (or its distribution) is immaterial. The fact that the SML method always outperforms the DML method provides strong justification for the stochastic model being appropriate for the sensor array problem. Indeed, the asymptotic robustness and efficiency of the SML method implies that N BSTO = CSML is a lower bound on the covariance matrix of the normalized estimation error for any asymptotically robust method. D

4.5 Large Sample Maximum Likelihood Approximations

Sect. 4.4 dealt with optimal (in the ML sense) approaches to the sensor array problem. Since these techniques are often deemed exceedingly complex, suboptimal methods are of interest. In the present section, several subspace techniques are presented based on geometrical properties of the data model.

The focus here is on subspace based techniques where the vector of unknown signal parameters is estimated by performing a multidimensional search on a pd- dimensional criterion. This is in contrast to techniques such as the MUSIC algorithm [4.12, 13J, where the location of d peaks in a p-dimensional, so-called spatial spectrum determines the signal parameter estimates. Multidimensional versions of the MUSIC approach are discussed in [4.28, 31, 62, 63]. A multidimensional, subspace based technique termed MODE (Method Of Direction Estimation) is presented and extensively analyzed in [4.32, 41, 42]. In [4.27,44, 64], a related subspace fitting formulation of the sensor array problem is analyzed, and the WSF (weighted subspace fitting) method is proposed.

This section ties together many ofthe concepts and methods presented in the papers above, and discusses the relation of these to the ML techniques of the previous section. A statistical analysis shows that appropriate selections of certain weighting matrices give the subspace methods similar (optimal) estimation accuracy as the ML techniques, at a reduced computational cost.

4.5.1 Subspace Based Approach

All subspace based methods rely on geometrical properties of the spectral decomposition of the array covariance matrix, R. The early approaches, such as MUSIC, suffer from a large finite sample bias and are unable to cope with coherent signals4 . This problem is inherent due to the one-dimensional search (p-dimensional search when more than one parameter is associated with each signal) of the parameter space. Means of reducing the susceptibility of these techniques to coherent signals have been proposed for special array structures

4 Two signals are said to be coherent if they are identical up to amplitude scaling and phase shift.

4. Maximum Likelihood Techniques for Parameter Estimation

119

[4.65]. In the general case, methods based on a pd-dimensional search need to be employed.

If the signal waveforms are noncoherent, the signal covariance matrix, S, has full rank. However, in radar applications where specular multipath is common, Smay be ill-conditioned or even rank deficient. Let the signal covariance matrix have rank d'. The covariance of the array output is

(4.77)

It is clear that any vector in the null space of the matrix ASAH is an eigenvector of R with corresponding eigenvalue (12. Since A has full rank, ASAH is positive semi-definite and has rank d'. Hence, (12 is the smallest eigenvalue of R with multiplicity m - d'. Let A1 , ••• , Am denote the eigenvalues of R in non-increas- ing order, and let e1' ... , em be the corresponding orthonormal eigenvectors. The spectral decomposition of R then takes the form

m

 

 

 

R = L Aieier =

EsAsE~ + (12 EnE~ ,

 

(4.78)

i=1

 

 

 

where

 

 

 

As = diag[A1' ...

,Ad']' Es = Eel> ...

,ed']' En = [ed'+1' ...

,em] .

 

 

 

(4.79)

The diagonal matrix As contains the so-called signal eigenvalues, and these are assumed to be distinct. From the above discussion, it is clear that En is orthogonal to ASAH, which implies that the d'-dimensional range space of Es is contained in the d-dimensional range space of A (00 )

9l{Es} £; 9l{A(00)} .

(4.80)

If the signal covariance S has full rank, these subspaces coincide since they have the same dimension, d' = d. The range space of Es is referred to as the signal subspace, and its orthogonal complement, the range space of En, is called the noise subspace. The signal and noise subspaces can be consistently estimated from the eigendecomposition of the sample covariance

(4.81)

a) Signal Subspace Formulation

The relation in (4.80) implies that there exists a d x d' matrix Toffull rank, such that

Es = A (00 ) T.

(4.82)

In general, there is no value of 0 such that Es = A (0) T when the signal subspace