Добавил:

neus500 Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Московский государственный юридический университет им. О.Е. Кутафина

Предмет:

Социология

Файл:

An Introduction to Statistical Signal Processing

.pdf

Скачиваний:

Добавлен:

10.07.2022

Размер:

1.81 Mб

Скачать

☆

<<< < Предыдущая 12 13 14 15 16 17 18 19 20 21 22 2324 / 4624 25 26 27 28 29 30 31 32 33 34 35 36 > Следующая >>>

4.7. JOINTLY GAUSSIAN VECTORS

215

and show that a is a k × k identity matrix, d is an m × m identity matrix, and that c and d contain all zeros so that the right hand matrix is indeed an identity matrix.

The conditional pdf for Y given X follows directly from the deﬁnitions

fY |X(y|x)

fXY (x, y)

fY (y)

−

y − mY

(2π)−(k+m)/2(det KU )−1/2 exp

1/2((x

mX)t (y

mY )t)K−1(

x − mX )

(2π)−k/2(det KX)−1/2 exp

1/2(x

mX)tK−1

mX)

= (2π)−

m/2

det KU

1/2

&−

−

(

)−

det KX

y − mY

−

exp

1/2((x

mX)t (y

mY )t)K−1(

x − mX

) + (x

mX)tK−1(x

mX)

Again using some brute force linear algebra, it can be shown that the quadratic terms in the exponential can be expressed in the form

((x

−

mX)t , (y

−

mY )t)K−1

x − mX

+ (x

−

mX)tK−1

−

mX)

−

=(y − mY − KY XKX−1(x − mX))tKY−|1X(y − mY − KY XKX−1(x − mX)).

Deﬁning
mY \|x = mY + KY XKX−1(x − mX)				(4.40)
the conditional density simpliﬁes to
	det KU		−1/2(y − mY \|x)tKY−\|1X	(y − mY \|x) ,
fY \|X(y\|x) = (2π)−m/2(		)−1/2 × exp
	det KX
				(4.41)

which shows that conditioned on X = x, Y has a Gaussian density. This means that we can immediately recognize the conditional expectation of Y given X as

E(Y |X = x) = mY |x = mY + KY XKX−1(x − mX),

(4.42)

so that the conditional expectation is an a ne function of the vector x. We can also infer from the form that KY |X is the (conditional) covariance

KY |X = E[(Y − E(Y |X = x))(Y − E(Y |X = x))t|x],

(4.43)

216	CHAPTER 4. EXPECTATION AND AVERAGES

which unlike the conditional mean does not depend on the vector x! Furthermore, since we know how the normalization must relate to the covariance matrix, we have that

det(K	Y \|X	) =		det(KU )	.	(4.44)

			det(KX)

These relations completely describe the conditional densities of one subvector of a Gaussian vector given another subvector. We shall see, however, that the importance of these results goes beyond the above evaluation and provides some fundamental results regarding optimal nonlinear estimation for Gaussian vectors and optimal linear estimation in general.

4.8Expectation as Estimation

Suppose that one is asked to guess the value that a random variable Y will take on, knowing the distribution of the random variable. What is the

best guess or estimate, say Y ? Obviously there are many ways to deﬁne a best estimate, but one of the most popular ways to deﬁne a cost or

	ˆ
distortion resulting from estimating the “true” value of Y by Y is to look
ˆ	ˆ	2	], the so
at the expected value of the square of the error Y − Y , E[(Y	− Y )		], the so

called mean squared error or MSE. Many arguments have been advanced in support of this approach, perhaps the simplest being that if one views the error as a voltage, then the average squared error is the average energy in the error. The smaller the energy, the weaker the signal in some sense. Perhaps a more honest reason for the popularity of the measure is its tractability in a wide variety of problems, it often leads to nice solutions that indeed work well in practice. As an example, we show that the optimal estimate of the value of an unknown random variable is in fact the mean of the random variable, a result that is highly intuitive. Rather than use calculus to prove this result — a tedious approach requiring setting derivatives to zero and then looking at second derivatives to verify that indeed the stationary point is a minimum — we directly prove the global optimality of the result.

Suppose that that our estimate is Y = a, some constant. We will show that this estimate can never have mean squared error smaller than that resulting from using the expected value of Y as an estimate. This is accomplished by a simple sequence of equalities and inequalities. Begin by adding and subtracting the mean, expanding the square, and using the second and third properties of expectation as

E[(Y − a)2] = E[(Y − EY + EY − a)2]

= E[(Y − EY )2] + 2E[(Y − EY )(EY − a)] + (EY − a)2.

4.8. EXPECTATION AS ESTIMATION

217

The cross product is evaluated using the linearity of expectation and the fact that EY is a constant as

E[(Y − EY )(EY − a)] = (EY )2 − aEY − (EY )2 + aEY = 0

and hence from Property 1 of expectation,

E[(Y − a)2] = E[(Y − EY )2] + (EY − a)2 ≥ E[(Y − EY )2], (4.45)

which is the mean squared error resulting from using the mean of Y as an estimate. Thus the mean of a random variable is the minimum mean squared error estimate (MMSE) of the value of a random variable in the absence of any a priori information.

What if one is given a priori information? For example, suppose that now you are told that X = x. What then is the best estimate of Y , say

Y (X)? This problem is easily solved by modifying the previous derivation to use conditional expectation, that is, by using the conditional distribution for Y given X instead of the a priori distribution for Y . Once again we try to minimize the mean squared error:

ˆ		ˆ
ˆ	2	ˆ	2	\|X]
E[(Y − Y (X)) ] = E E[(Y − Y (X))				\|X]
			ˆ
	=	pX(x)E[(Y − Y (X))			\|x].

Each of the terms in the sum, however, is just a mean squared error between a random variable and an estimate of that variable with respect to a distribution, here the conditional distribution pY |X(·|x). By the same argument as was used in the unconditional case, the best estimate is the mean, but now the mean with respect to the conditional distribution, i.e.,

| ˆ

E(Y x). In other words, for each x the best Y (x) in the sense of minimizing the mean squared error is E(Y |x). Plugging in the random variable X in place of the dummy variable x we have the following interpretation

The conditional expectation E(Y |X) of a random variable Y given a random variable X is the minimum mean squared estimate of Y given X.

A direct proof of this result without invoking the conditional version of the result for unconditional expectation follows from general iterated expectation. Suppose that g(X) is an estimate of Y given X. Then the resulting mean squared error is

E[(Y − g(X))2] = E[(Y − E(Y |X) + E(Y |X) − g(X))2]

=E[(Y − E(Y |X))2]

−2E[(Y − E(Y |X))(E(Y |X) − g(X))] +E[(E(Y |X) − g(X))2].

218	CHAPTER 4.	EXPECTATION AND AVERAGES
Expanding the cross term yields
E[(Y − E(Y \|X))(E(Y \|X) − g(X))]		= E[Y E(Y \|X)] − E[Y g(X)]
		−E[E(Y \|X)2] + E[E(Y \|X)g(X)]

From the general iterated expectation (4.36), E[Y E(Y |X)] = E[E(Y |X)2] (setting g(X) of the lemma to E(Y |X) and h(X, Y ) = Y ) and E[Y g(X)] = E[E(Y |X)g(X)] (setting g(X) of the lemma to the g(X) used here and h(X, Y ) = Y ).

As with ordinary expectation, the ideas of conditional expectation can be extended to continuous random variables by substituting conditional pdf’s for the unconditional pdf’s. As is the case with conditional probability, however, this constructive deﬁnition has its limitations and only makes sense when the pdf’s are well deﬁned. The rigorous development of conditional expectation is, like conditional probability, analogous to the rigorous treatment of the Dirac delta, it is deﬁned by its behavior underneath the integral sign rather than by a construction. When the constructive deﬁnition makes sense, the two approaches agree.

One of the unfortunately rare examples for which conditional expectations can be explicitly evaluated is the case of jointly Gaussian random

variables. In this case we can immediately identify from (3.61) that
E[Y \|X] = mY + ρ(σY /σX)(X − mX).	(4.46)

It will prove important that this is in fact an a ne function of X.

The same ideas extend from scalars to vectors. Suppose we observe a real-valued column vector X = (X0, · · · , Xk−1)t and we wish to predict or estimate a second random vector Y = (Y0, · · · , Ym−1)t. Note that the dimensions of the two vectors need not be the same.

ˆ ˆ

The prediction Y = Y (X) is to be chosen as a function of X which yields the smallest possible mean squared error, as in the scalar case. The mean squared error is deﬁned as

2ˆ

* (Y )

	ˆ	2	∆	ˆ t	ˆ
= E('Y − Y		'	) = E[(Y − Y )		(Y − Y )]
	m−1
=	i		ˆ 2	].	(4.47)
=	E[(Yi − Yi)			].	(4.47)
	=0

An estimator or predictor is said to be optimal within some class of predictors if it minimizes the mean squared error over all predictors in the given class.

Two speciﬁc examples of vector estimation are of particular interest. In the ﬁrst case, the vector X consists of k consecutive samples from a

4.8. EXPECTATION AS ESTIMATION

219

stationary random process, say X = (Xn−1, Xn−2, . . . , Xn−k) and Y is the next, or “future”, sample Y = Xn. In this case the goal is to ﬁnd the best one-step predictor given the ﬁnite past. In the second example, Y is a rectangular subblock of pixels in a sampled image intensity raster and X consists of similar subgroups above and to the left of Y . Here the goal is to use portions of an image already coded or processed to predict a new portion of the same image. This vector prediction problem is depicted in Figure 4.1 where subblocks A, B, and C would be used to predict subblock D.

A B

Figure 4.1: Vector Prediction of Image Subblocks

The following theorem shows that the best nonlinear predictor of Y given X is simply the conditional expectation of Y given X. Intuitively, our best guess of an unknown vector is its expectation or mean given whatever observations that we have. This extends the interpretation of a conditional expectation as an optimal estimator to the vector case.

Theorem 4.5 Given two random vectors Y and X, the minimum mean squared error estimate of Y given X is

ˆ	(4.48)
Y (X) = E(Y \|X).

Proof: As in the scalar case, the proof does not require calculus or

Lagrange minimizations. Suppose that Y is the claimed optimal estimate

˜ ˜

and that Y is some other estimate. We will show that Y must yield a mean

squared error no smaller than does Y . To see this consider

)

(Y ) = E('Y − Y

) = E('Y − Y + Y

− Y

ˆ t

= E('Y − Y

) + E('Y

− Y

) + 2E[(Y − Y )

− Y )]

ˆ t

≥ *

(Y ) + 2E[(Y − Y )

(Y − Y )].

We will prove that the rightmost term is zero and hence that *

(Y ) ≥ *

(Y ),

|X) and hence

which will prove the theorem. Recall that Y = E(Y

− ˆ |

E[(Y Y ) X] = 0.

220	CHAPTER 4.				EXPECTATION AND AVERAGES
ˆ	˜
Since Y	− Y is a deterministic function of X,
			ˆ t	ˆ	˜
	E[(Y − Y )			(Y	− Y )\|X] = 0.
Then, by iterated expectation applied to vectors, we have
	ˆ t	ˆ	˜		ˆ t	ˆ	˜
	E(E[(Y − Y )	(Y	− Y )\|X]) = E[(Y − Y )			(Y	− Y )] = 0

as claimed, which proves the theorem.

As in the scalar case, the conditional expectation is in general a di cult function to evaluate with the notable exception of jointly Gaussian vectors. Recall that (4.41)–(4.44) the conditional pdf for jointly Gaussian vectors Y

and X with K(X,Y ) = E[((Xt, Y t)−(mtX −mtY ))t((Xt, Y t)−(mtX −mtY ))], KY = E[(Y − mY )(Y − mY )t], KX = E[(X − mX)(X − mX)t], KXY =

E[(X − mX)(Y − mY )t], KY X = E[(Y − mY )(Y − mY )t] is

fY \|X(y\|x) =	(2π)−m/2(det(KY \|X))−1/2 ×
where	exp −1/2(y − mY \|x)tKY−\|1X(y − mY \|x) ,			(4.49)
where
∆
KY \|X = KY − KY XKX−1KXY
=	E[(Y − E(Y \|X))(Y − E(Y \|X))t\|X],			(4.50)
	det(K(Y,X))
	det(KY \|X) =		,	(4.51)
	det(KY \|X) =	det(KX)	,	(4.51)
and
E(Y \|X = x) = mY \|x = mY + KY XKX−1(x − mX),				(4.52)
and hence the minimum mean square estimate of Y given X is
E(Y \|X) = mY + KY XKX−1(X − mX) ,				(4.53)

which is an a ne (linear plus constant) function of X! The resulting mean squared error is (using iterated expectation)

E[(Y − E(Y \|X))t(Y − E(Y \|X))]									(4.54)
= E E[(Y − E(Y \|X))t(Y − E(Y \|X))\|X]
=	E &E[Tr[(Y		−	E(Y	X))(Y	−	E(Y	X))t]'X]
			−	\|		−	\|	\|
=	Tr(K	Y \|X).						'	(4.55)
=	&	Y \|X).						'	(4.55)

4.8. EXPECTATION AS ESTIMATION

221

In the special case where X = Xn = (X0, X1, . . . , Xn−1) and Y = Xn, the so called one-step linear prediction problem, the solution takes an interesting form. For this case deﬁne the nth order covariance matrix as the n × n matrix

				KX(n) = E[(Xn − E(Xn))(Xn − E(Xn))t],							(4.56)
i.e., the	(k, j)			entry of KX(n)			is E[(Xk	−	E(Xk))(Xj	−	E(Xj))], k, j =
0, 1, . . . , n − 1.				Then if Xn+1				−		−
0, 1, . . . , n − 1.					n	is	is Gaussian, the optimal one-step predic-
tor for Xn given X						is
ˆ		n	) = E(Xn)+
Xn(X			) = E(Xn)+

E[(Xn − E(Xn))(Xn − E(Xn))t](KX(n))−1(Xn − E(Xn)) (4.57)

which has an a ne form

ˆ	n	) = AX	n	+ b	(4.58)
Xn(X

where

	A = rt(K				(n))−1,
					X
			KX(n, 0)
			KX(n, 1)
r =				...				,
			X			−
		K		(n, n		−	1)
		K		(n, n			1)

and

(4.59)

(4.60)

b = E(Xn) − AE(Xn).

The resulting mean squared error is

MMSE =		ˆ				n 2
	E[(Xn − Xn(X					)) ]
=	Tr(KY − KY XKX−1KXY )
= σX2 n − rt(KX(n))−1r
or
		ˆ	n	2		2
MMSE = E[(Xn − Xn(X				))	] = σXn\|Xn ,
which from (4.51) can be expressed as
MMSE =		det(K(n))
			X				,
		det(K	(n−1))
			X

(4.61)

(4.62)

(4.63)

222	CHAPTER 4. EXPECTATION AND AVERAGES

a classical result from minimum mean squared error estimation theory.

If the Xn are samples of a weakly stationary random process with zero mean, then this simpliﬁes to

ˆ (Xn) = rt(K(n))−1Xn,

Xn X

where r is the n-dimensional vector

KX(n)

KX(n − 1) r = ..

KX(1)

(4.64)

(4.65)

4.9Implications for Linear Estimation

The development of optimal mean squared estimation for the Gaussian case provides a prevue and an approach to the problem of optimal mean squared estimation for the situation of completely general random vectors (not necessarily Gaussian) where only linear or a ne estimators are allowed (to avoid the problem of possibly intractable conditional expectations in the nonGaussian case). This topic will be developed in some detail in a later section, but the key results will here be shown to follow directly from the Gaussian case by reinterpreting the results.

The key fact is that the optimal estimator for a vector Y given a vector X when the two are jointly Gaussian was found to be an a ne estimator, that is, to have the form

Y (X) = AX + b.

Since it was found the lowest possible MMSE over all possible estimators was achieved by an estimator of this form with A = KY XKX−1 and b = E(Y )+AE(X) with a resulting MSE of MMSE = Tr(KY −KY XKX−1KXY ), then it is obviously true that this MMSE must be the minimum achievable MSE over all a ne estimators, i.e., that for all k × m matrices A and m-dimensional vectors b it is true that

MMSE(A, b) =	Tr (Y	AX − b)(Y − AX − b)t
≥	Tr(&KY	−		'	(4.66)
		− KY XKX−1KXY )
and that equality holds if and only if A = KY XKX−1			and b = E(Y ) +

AE(X). We shall now see that this version of the result has nothing to do with Gaussianity and that the inequality and solution are true for any distribution (providing of course that KX is invertible).

4.9. IMPLICATIONS FOR LINEAR ESTIMATION

223

Expanding the MSE and using some linear algebra results in MMSE(A, b)

=Tr &(Y − AX − b)(Y − AX − b)t'

=Tr ((Y − mY + A(X − mX) − b + mY + AmX)'

× (Y − mY + A(X − mX) − b + mY + AmX)t

=Tr &KY − AKXY − KY XAt + AKXAt' +(b − mY − AmX)t(b − mY − AmX)

where all the remaining cross terms are zero. Regardless of A the ﬁnal term is nonnegative and hence it is bound below by 0, a minimum achieved by the choice

b = mY + AmX.

(4.67)

Thus the inequality we wish to prove becomes

Tr KY

−

AKXY

−

KY XAt + AKXAt

≥

Tr(KY

−

K−1KXY )

Y X

(4.68)

KY XKX−1KXY + AKXAt − AKXY − KY XAt

≥ 0.

(4.69)

Since K

X&is a covariance matrix it is Hermitian and

since it has an inverse,

it must be positive deﬁnite. Hence it has a well deﬁned squareroot K1/2
(see Section A.4) and hence	X

Tr (AKX1/2 − KY XKX−1/2)(AKX1/2 − KY XKX−1/2)t	(4.70)

(just expand this expression to verify it is the same as the previous ex-

pression).

But this has the form Tr(BBt) which is just

, which is

i,i

nonnegative, proving the inequality. Plugging in A = K

XK−1 achieves

the lower bound with equality.

We summarize the result in the following theorem.

Theorem 4.6 Given random vectors X and Y with K

t t

(X,Y ) = E[((X , Y t)−

)

−

))

((X

, Y

−

(mX

−

mY ))], KY

E[(Y

−

mY )(Y

−

mY ) ],

KX = E[(X − mX)(X t− mX)

], KXY = E[(X − mX)(Y − mY )

], KY X =

E[(Y − mY )(Y − mY ) ], assume that KX is invertible (e.g., it is positive

deﬁnite). Then

−

min MMSE(A, b)

b)(Y

b)t

A,b

−

Tr(KY − KY XKX− KXY )

(4.71)

and the minimimum is achieved by A = KY XKX−1 and b = E(Y )+AE(X).

224	CHAPTER 4. EXPECTATION AND AVERAGES

In particular, this result does not require that the vectors be jointly Gaussian.

As in the Gaussian case, the results can be specialized to the situation where Y = Xn and X = Xn and {Xn} is a weakly stationary process to obtain that the optimal linear estimator of Xn given (X0, . . . , Xn−1) in the sense of minimizing the mean squared error is

ˆ (Xn) = rt(K(n))−1Xn,

Xn X

where r is the n-dimensional vector
	KX(n)
	KX(n − 1)
r =		...	.
		X

	K (1)

(4.72)

(4.73)

The resulting minimum mean squared error (called the “linear least squares error”) is

LLSE =	σX2 − rt(KX(n))−1r			(4.74)
=		det(K(n))		(4.75)
		X
			.
		det(K(n−1))
		X

a classical result of linear estimation theory. Note that the equation with the determinant form does not require a Gaussian density, although a Gaussian density was used to identify the ﬁrst form with the deternminant form (both being σX2 n|Xn in the Gaussian case).

4.10Correlation and Linear Estimation

As an example of the application of correlations, we consider a constrained form of the minimum mean squared error estimation problem that provided an application and interpretation for conditional expectation. A problem with the earlier result is that in some applications the conditional expectation will be complicated or unknown, but the simpler correlation might be known or at least one can approximate it based on observed data. While the conditional expectation provides the optimal estimator over all possible estimators, the correlation turns out to provide an optimal estimator over a restricted class of estimators.

Suppose again that the value of X is observed and that a good estimate

of Y , say Y (X) is desired. Once again the quality of an estimator will be measured by the resulting mean squared error, but this time we do not

<<< < Предыдущая 12 13 14 15 16 17 18 19 20 21 22 2324 / 4624 25 26 27 28 29 30 31 32 33 34 35 36 > Следующая >>>

Соседние файлы в предмете Социология

#
23.06.202254.09 Кб1Akty_pravoprimenenia.pptx
#
10.07.202222.04 Кб0Alvin Gouldner on the New Class & the Culture of Critical Discourse.chm
#
10.07.202260 б0AMER_SL-1.INF
#
10.07.202260 б1AMER_SL.INF
#
10.07.2022943.46 Кб3An Introduction To Statistical Inference And Data Analysis.pdf
#
10.07.20221.81 Mб2An Introduction to Statistical Signal Processing.pdf
#
10.07.20221.01 Mб3Analiz soderzh - soc. metod.doc
#
10.07.202249.2 Кб0anketa.chm
#
10.07.202238.95 Кб0ANNOT.PDF
#
23.06.20223.26 Mб2APRP_2_2021-3_SIGNAL.pdf
#
23.06.20222.97 Mб2APRP_3_2021-2_SIGNAL.pdf