Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

An Introduction to Statistical Signal Processing

.pdf
Скачиваний:
2
Добавлен:
10.07.2022
Размер:
1.81 Mб
Скачать

4.5. EXAMPLES: FUNCTIONS OF SEVERAL RANDOM VARIABLES205

Theorem 4.3 Two random variables X and Y are independent if and only if g(X) and h(Y ) are uncorrelated for all functions g and h with finite expectations, that is, if (4.18) holds. More generally, random variables Xi; i = 1, . . . n are mutually independent if and only if for all functions gi; i = 1, . . . n the random variables gi(Xi) are uncorrelated.

This theorem is useful as a means of showing that two random variables are not independent: If we can find any functions g and h such that E(g(X)h(Y )) = (Eg(X))(Eh(Y )), then the random variables are not independent. The theorem also provides a simple and general proof of the fact that the characteristic function of the sum of independent random variables is the product of the characteristic functions of the random variables being summed.

Corollary 4.1 Given a sequence of mutually independent random variables

X1, X2, . . . , Xn, define

n

Yn = Xi .

i=1

Then

n

MYn (ju) = MXi (ju) .

i=1

Proof. Successive application of theorem 4.3, which states that functions of independent random variables are uncorrelated, yields

 

 

 

 

 

n

 

 

 

 

&

juYn

'

jui=1Xi

 

n

juXi

 

 

 

 

 

= E

 

 

E e

 

= E

 

e

 

e

 

 

 

 

 

 

 

 

i=1

 

 

 

 

 

 

 

 

 

 

=n E &ejuXi ' = n MXi (ju) .

i=1

i=1

4.5.2Covariance

The idea of uncorrelation can be stated conveniently in terms of another quantity, which we now define. Given two random variables X and Y , define their covariance, COV (X, Y ) by

COV (X, Y ) = E[(X − EX)(Y − EY )] .

As you can see, the covariance of two random variables equals the correlation of the two “centralized” random variables, X − EX and Y − EY ,

206

CHAPTER 4. EXPECTATION AND AVERAGES

that are formed by subtracting the means from the respective random variables. Keeping in mind that EX and EY are constants, it is seen that centralized random variables are zero-mean random variables; i.e., E(X − EX) = E(X) − E(EX) = EX − EX = 0. Expanding the product in the definition, the covariance can be written in terms of the correlation and means of the random variables. Again remembering that EX and EY are constants, we get

COV (X, Y ) = E[XY − Y EX − XEY + (EX)(EY )]

=E(XY ) (EY )(EX) (EX)(EY ) + (EX)(EY )

=E(XY ) (EX)(EY ) .

(4.20)

Thus the covariance is the correlation minus the product of the means. Using this fact and the definition of uncorrelated, we have the following statement:

Corollary 4.2 Two random variables X and Y are uncorrelated if and only if their covariance is zero; that is, if COV (X, Y ) = 0.

If we set Y = X, the correlation of X with itself, E(X2), results; this is called the second moment of the random variable X. The covariance

COV (X, X) is called the variance of the random variable and is given the

special notation σX2 . σX = σX2 is called the standard deviation of X. From the definition of covariance and (4.19),

σX2 = E[(X − EX)2] = E(X2) (EX)2 .

By the first property of expectation, the variance is nonnegative, yielding the simple but powerful inequality

|EX| ≤ [E(X2)]1/2 ,

(4.21)

a special case of the Cauchy-Schwarz inequality (see problem 4.17 with the random variable Y set equal to the constant 1).

4.5.3Covariance Matrices

The fundamental theorem of expectation of functions of several random variables can also be extended to vector or even matrix functions g of random vectors as well. There are two primary examples, the covariance matrix treated here and the multivariable characteristic functions treated next.

Suppose that we are given an n-dimensional random vector X = (X0, X1, . . . , Xn−1). The mean vector m = (m0, m1, . . . , mn−1)t is defined as the vector of the

4.5. EXAMPLES: FUNCTIONS OF SEVERAL RANDOM VARIABLES207

means, i.e., mk = E(Xk) for all k = 0, 1, . . . , n − 1. This can be written more conveniently as a single vector expectation

m = E(X) = fX(x)x dx

(4.22)

n

 

where the random vector X and the dummy integration vector x are both n dimensional and the integral of a vector is simply the vector of the integrals of the individual components. Similarly we could define for each k, l = 0, 1, . . . , n − 1 the covariance KX(k, l) = E[(Xk − mk)(Xl − ml)] and then collect these together to from the covariance matrix

K = {KX(k, l); k = 0, 1, . . . , n − 1; lk = 0, 1, . . . , n − 1}.

Alternatively, we can use the outer product notation of linear algebra and the fundamental theorem of expectation to write

K = E[(X − m)(X − m)t] =

 

n (x − m)(x − m)t dx,

(4.23)

 

 

 

where the outer product of a vector a with a vector b, abt, has (k, j) entry equal to akbj.

In particular, by straightforward but tedious multiple integration, it can be shown that the mean vector and the covariance matrix of a Gaussian random vector are indeed the mean and covariance, i.e., using the fundamental theorem

m =

E(X)

 

 

 

 

 

 

 

e

1/2(x−m)tK1(x−m)

 

 

=

n x

 

(2π)n/2

 

 

dx

(4.24)

 

 

 

 

det K

K= E[(X − m)(X − m)t]

=

n (x − m)(x − m)t

e

1/2(x−m)tK1(x−m)

dx. (4.25)

 

(2π)n/2

 

 

 

 

 

 

det K

4.5.4Multivariable Characteristic Functions

The fundamental theorem of expectation of functions of several random variables can also be extended to vector functions g of random vectors as well. In fact we implicitly assumed this to be the case in the evlauation of the characteristic function of a Gaussian random variable (since ejuX is a complex function of ω and hence a vector function) and of the multidimensional characteristic function of a Gaussian random vector in (3.126): if a

208 CHAPTER 4. EXPECTATION AND AVERAGES

Gaussian random vector X has a mean vector m and covariance matrix Λ, then

MX(ju) =

ejutm−1/2utΛu

 

=

exp

$j n−1 ukmk 1/2 n−1 n−1 ukΛ(k, m)um% .

 

 

 

 

 

 

k=0

k=0 m=0

This representation for the characteristic function yields the proof of the following important result:

Theorem 4.4 Let X be a k-dimensional Gaussian random vector with mean mX and covariance matrix ΛX. Let Y be the new random vector formed by a linear operation of X:

Y = HX + b ,

(4.26)

where H is a n × k matrix and b is an n-dimensional vector. Then Y is a Gaussian random vector of dimension n with mean

mY = HmX + b

(4.27)

and covariance matrix

 

ΛY = HΛXHt.

(4.28)

Proof. The characteristic function of Y is found by direct substitution of the expression for Y in terms of X into the definition, a little matrix algebra, and (3.126):

*+

MY (ju) = E ejutY

*+

=E ejut(HX+b)

* +

=ejutbE ej(Htu)tX

=ejutbMX(jHtu)

=ejut bejutHm−12 (Htu)tΛX (Htu

=ejut(Hm+b)e12 utHΛX Htu .

It can be seen by reference to (3.126) that the resulting characteristic function is the transform of a Gaussian random vector pdf with mean vector Hm + b and covariance matrix HΛHt. This completes the proof.

4.5. EXAMPLES: FUNCTIONS OF SEVERAL RANDOM VARIABLES209

The following observation is trivial, but it emphasizes a useful fact. Suppose that X is a Gaussian random vector of dimension, say, k, and we form a new vector Y by subsampling X, that is, by selecting a subset of the

(X0, X1, . . . , Xk−1), say Yi = Xl(i), i = 0, 1, . . . , m < k. Then we can write Y = AX, where A is a matrix that has Al(i),l(i) = 1 for i = 0, 1, . . . , m < k and 0’s everywhere else. The preceeding result implies immediately that Y

is Gaussian and shows how to compute the mean and covariance. Thus any subvector of a Gaussian vector is also a Gaussian vector. This could also have been proved by a derived distribution and messy multidimensional integrals, but the previous result provides a nice shortcut.

4.5.5Example: Di erential Entropy of a Gaussian Vector

Suppose that X = (X0, X1, . . . , Xn−1) is a Gaussian random vector described by a pdf fX specified by a mean vector m and a covariance matrix KX. The di erential entropy of a continuous vector X is defined by

h(X)

=− fX(x) log fX(x) dx

=− fX0,X1,... ,Xn−1 (x0, x1, . . . , xn−1) ×

log fX0,X1,... ,Xn−1 (x0, x1, . . . , xn−1) dx0dx1 · · · dxn−1

(4.29)

where the units are called “bits” if the logarithm is base 2 and “nats” if the logorithm is base e. The di erential entropy plays a fundamental role in Shannon information theory for continuous alphabet random processes. See, for example, Cover and Thomas [?]. It will also prove a very useful aspect of a random vector when considering linear prediction or estimation. We here use the fundamental theorem of expectation for functions of several variables to evaluate the di erential entropy h(X) of a Gaussian random vector.

Plugging in the density for the Gaussian pdf and using natural loga-

rithms results in

h(X) =

− fX(x) ln fX(x) dx

 

 

 

=

1

 

 

n

 

 

 

 

 

 

 

 

 

 

ln(

2π det K) +

 

 

 

 

2

 

 

 

 

1

( x − m)tK1(x − m)(2π)−n/2

t

1

 

 

 

 

(det K)1/2e1/2(x−m)

K

(x−m) dx.

 

2

 

210

CHAPTER 4. EXPECTATION AND AVERAGES

The final term can be easily evaluated by a trick. From linear algebra we can write for any n-dimensional vector a and n × n matrix K

atAa = Tr(Aaat)

(4.30)

Where Tr is the trace or sum of diagonals of the matrix. Thus using the linearity of expectation we can rewrite the previous equation as

h(X) =

1

ln((2π)n det K) +

1

E

&(X − m)tK1(X − m)'

2

2

=12 ln((2π)n det K) + 12E &Tr[K1(X − m)(X − m)t']

=12 ln((2π)n det K) + 12 Tr[K1E &(X − m)(X − m)t']

=12 ln((2π)n det K) + 12 Tr[K1K]

=12 ln((2π)n det K) + 12 Tr[I]

=12 ln((2π)n det K) + n2

=

1

ln((2πe)n det K) nats.

(4.31)

2

 

 

 

4.6Conditional Expectation

Expectation is essentially a weighted integral or summation with respect to a probability distribution. If one uses a conditional distribution, then the expectation is also conditional. For example, suppose that (X, Y ) is a

random vector described by a joint pmf pX,Y . The ordinary expectation of

Y is defined as usual by EY = ypY (y). Suppose, however, that one is told that X = x and hence one has the conditional (a posteriori) pmf pY |X. Then one can define the conditional expectation of Y given X = x by

E(Y |x) =

 

 

ypY |X(y|x)

(4.32)

y AY

that is, the usual expectation, but with respect to the pmf pY |X(·|x). So far, this is an almost trivial generalization. Perhaps unfortunately, however, (4.32) is not in fact what is usually defined as conditional expectation. The actual derivation might appear to be only slightly di erent, but there is a fundamental di erence and a potential for confusion because of the notation. As we have defined it so far, the conditional expectation of Y given X = x is a function of the independent variable x, say g(x). In other

4.6. CONDITIONAL EXPECTATION

211

words,

g(x) = E(Y |x).

If we take any function g(x) of x and replace the independent variable x by a random variable X, we get a new random variable g(X). If we simply replace the independent variable x in E(Y |x) by the random variable X, the resulting quantity is a random variable and is denoted by E(Y |X). It is this random variable that is defined as the conditional expectation of Y given X. The previous definition E(Y |x) can be considered as a sample value of the random variable E(Y |X). Note that we can write the definition

as

E(Y |X) =

ypY |X(y|X),

(4.33)

y AY

but the reader must beware the dual use of X: in the subscript it denotes as usual the name of the random variable, in the argument it denotes the random variable itself, i.e., E(Y |X) is a function of the random variable X and hence is itself a random variable.

Since E(Y |X) is a random variable, we can evaluate its expectation using the fundamental theorem of expectation. The resulting formula has wide application in probability theory. Taking this expectation we have that

E[E(Y |X)] =

 

 

pX(x)E(Y |x)

 

x AX

ypY |X(y|x)

=

pX(x)

x AX y AY

=y pX,Y (x, y)

y AY x AX

=ypY (y)

y AY

= EY,

a result known as iterated expectation or nested expectation. Roughly speaking it states that if we wish to find the expectation of a random variable Y , then we can first find its conditional expectation with respect to another random variable, E(Y |X), and then take the expectation of the resulting random variable to obtain

EY = E[E(Y |X)].

(4.34)

In the next section we shall see an interpretation of conditional expectation as an estimator of one random variable given another. A simple example now, however, helps point out how this result can be useful.

212

CHAPTER 4. EXPECTATION AND AVERAGES

Suppose that one has a random process {Xk; k = 0, 1, . . . }, with identically distributed random variables Xn, and a random variable N that takes on positive integer values. Suppose also that the random variables Xk are all independent of N. Suppose that one defines a new random variable

N−1

Y = Xk,

k=0

that is, the sum of a random number of random variables. How does one evaluate the expectation EY ? Finding the derived distribution is daunting, but iterated expectation comes to the rescue. Iterated expectation states that EY = E[E(Y |N)], where E(Y |N) is found by evaluating E(Y |n) and replacing n by N. But given N = n, the random variable Y is simply

n−1

Y = k=0 Xk since the distribution of the Xk is not a ected by the fact that N = k since the Xk are independent of N. Hence by the linearity of expectation,

n−1

E(Y |n) = EXk,

k=0

where the identically distributed assumption implies that the EXk are all equal, say EX. Thus E(Y |n) = nEX and hence E(Y |N) = NEX. Then iterated expectation implies that

EY = E(NEX) = (EN)(EX),

(4.35)

the product of the two means. Try finding this result without using iterated expectation. As a particular example, if the random variables are Bernoulli random variables with parameter p and N has a Poisson distribution with parameter λ, then Pr(Xi = 1) = p for all i and EN = λ and hence then

EY = .

Iterated expectation has a more general form. Just as constants can be pulled out of ordinary expectations, quantities depending only on the variable conditioned on can be pulled out of conditional expectations. We

state and prove this formally.

 

Lemma 4.2 General Iterated Expectation

 

Suppose the X, Y are discrete random variables and that

g(X) and

h(X, Y ) are functions of these random variables. Then

 

E[g(X)h(X, Y )] = E (g(X)E[h(X, Y )|X]) .

(4.36)

4.7. JOINTLY GAUSSIAN VECTORS

213

Proof:

 

 

 

 

 

E[g(X)h(X, Y )] =

g(x)h(x, y)pX,Y (x, y)

 

x,y

 

 

 

 

=

pX(x)g(x) h(x, y)pY |X(y|x)

 

x

y

 

 

pX(x)g(x)E[h(X, Y )|x]

 

=

 

x

= E (g(X)E[h(X, Y )|X]) .

As with ordinary iterated expectation, this is primarily an interpretation of an algebraic rewriting of the definition of expectation. Note that if we take g(x) = x and h(x, y) = 1, this general form reduces to the previous form.

In a similar vein, one can extend the idea of conditional expectation to continuous random variables by using pdf’s instead of pmf’s. For example,

E(Y |x) = yfY |X(y|x) dy,

and E(Y |X) is defined by replacing x by X in the above formula. Both iterated expectation and its general form extend to this case by replacing sums by integrals.

4.7Jointly Gaussian Vectors

Gaussian vectors provide an interesting example of a situation where conditional expectations can be explicitly computed, and this in turn provides additional fundamental, if unsurprising, properties of Gaussian vectors. Instead of considering a Gaussian random vectory X = (X0, X1, . . . , XN−1)t, say, consider instead a random vector

U =

X

Y

formed by concatening two vectors X and Y of dimensions, say, k and m, respectively. For this section we will drop the boldface notation for vectors. If U is Gaussian, then we say that X and Y are jointly Gaussian. From Theorem 4.4 it follows that if X and Y are jointly Gaussian, then they are individually Gaussian with, say, means mX and mY , respectively, and covariance matrices KX and KY , respectively. The goal of this section is to develop the conditional second order moments for Y given X and to

((X − mX)t (Y − mY )t)]

214

CHAPTER 4. EXPECTATION AND AVERAGES

show in the process that given X, Y has a Gaussian density. Thus not only is any subcollection of a Gaussian random vector Gaussian, it is also true that the conditional densities of any subvector of a Gaussian vector given a disjoint subvector of the Gaussian vector is Gaussian. This generalizes (3.61) from two jointly Gaussian scalar random variables to two jointly Gaussian random vectors. The idea behind the proof is the same, but the algebra is messier in higher dimensions.

Begin by writing

KU =

E[UUt]

 

 

=

E[

X − mX

 

 

 

Y − mY

=

E[(X − mX)(X − mX)t]

 

E[(Y − mY )(X − mX)t]

=

 

KX

KXY

,

KY X

KY

E[(X − mX)(Y − mY )t] E[(Y − mY )(Y − mY )t]

(4.37)

where KX and KY are ordinary covariance matrices and KXY = KYt X are called cross-covariance matrices. We shall also denote KU by K(X,Y ), where the subscript is meant to emphasize that it is the covariance of the cascade vector of both X and Y in distinction to KXY , the cross covariance of X and Y .

The key to the recognizing the conditional moments and densities is the following admittedly unpleasant matrix equation, which can be proved with a fair amount of brute force linear algebra:

 

KX

KXY

 

1

 

 

 

 

 

KY X

KY

=

 

 

 

 

%,(4.38)

 

KX1 + KX1KXY KY1XKY XKX1

KX1KXY KY1X

 

$

 

KY1XKY XK| X1

KY1X

|

 

 

 

|

 

 

|

 

 

where

 

 

 

 

 

 

 

 

 

 

 

 

 

− KY XKX1KXY .

 

(4.39)

 

 

 

KY |X = KY

 

The determined reader who wishes to verify the above should do the block matrix multiplication

a

b

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

c

d

=

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

$

K1

 

|

 

K1

 

 

 

K1

 

K1K

|

K1

 

 

 

+ K1K

XY

 

K

Y X

 

XY

KX

KXY

 

X

X

 

Y X

 

X

 

X

 

Y

X

 

 

 

KY1XKY XK|

X1

 

 

 

KY1X

|

%

KY X

KY

Соседние файлы в предмете Социология