Добавил:

neus500 Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Московский государственный юридический университет им. О.Е. Кутафина

Предмет:

Социология

Файл:

An Introduction to Statistical Signal Processing

.pdf

Скачиваний:

Добавлен:

10.07.2022

Размер:

1.81 Mб

Скачать

☆

<<< < Предыдущая 5 6 7 8 9 10 11 12 13 14 15 1617 / 4617 18 19 20 21 22 23 24 25 26 27 28 29 > Следующая >>>

3.10. BINARY DETECTION IN GAUSSIAN NOISE

145

Since the denominator of the conditional pmf does not depend on x (only on y), given y the denominator has no e ect on the maximization

ˆ		argmax f		(y		x)p		(x).
X(y) = argmax	pX\|Y (x\|y) =		W		−		X
x		x

Assume for simplicity that X is equally likely to be 0 or 1 so that the rule becomes

ˆ				1	−	1 (x−y)2
				1		2	2
						2	σW
							σW
X(y) = argmax pX		Y (x	y) == argmax	2πσW2	e			.
x	\|	\|	x	2πσW2

The constant in front of the pdf does not e ect the maximization. In addition, the exponential is a mononotically decreasing function of |x−y|, so that the exponential is maximized by minimizing this magnitude di erence, i.e.,

ˆ		argmin		x		y	,	(3.99)
X(y) = argmax	pX\|Y (x\|y) ==				−
x		x	\|			\|

which yields a ﬁnal simple rule: see if x = 0 or 1 is closer to y as the best guess of x. This choice yields the MAP detection and hence the minimum probability of error. In our example this yields the rule

	0	y < 0.5
Xˆ	(y) = 1	y > 0.5 .	(3.100)

Because the optimal detector chooses the x that minimizes the Euclidean distance |x−y| to the observation y, it is called a minimum distance detector or rule. Because the guess can be computed by comparing the observation to a threshold (the value midway between the two possible values of x), the detector is also called a threshold detector.

Assumptions have been made to keep things fairly simple. The reader is invited to work out what happens if the random variable X is biased and if its alphabet is taken to be {−1, 1} instead of {0, 1}. It is instructive to sketch the conditional pmf’s for these cases.

Having derived the optimal detector, it is reasonable to look at the resulting, minimized, probability of error. This can be found using condi-

146		CHAPTER 3. RANDOM OBJECTS
tional probability:
Pe =	ˆ	= X)
	Pr(X(Y )
=	ˆ	ˆ
	Pr(X(Y )	= 0\|X = 0)pX(0) + Pr(X(Y ) = 1\|X = 1)pX(1)
=	Pr(Y > 0.5\|X = 0)pX(0) + Pr(Y < 0.5\|X = 1)pX(1)
=	Pr(W + X > 0.5\|X = 0)pX(0) + Pr(W + X < 0.5\|X = 1)pX(1)
=	Pr(W > 0.5\|X = 0)pX(0) + Pr(W + 1 < 0.5\|X = 1)pX(1)
=	Pr(W > 0.5)pX(0) + Pr(W < −0.5)pX(1)

where we have used the independence of W and X. These probabilities can be stated in terms of the Φ function of (2.78) as in (2.82), which combined with the assumption that X is uniform and (2.84)yields

Pe =	1	(1	− Φ(	0.5	) + Φ(−	0.5	)) = Φ(	1	).	(3.101)
	2			σW		σW		2σW

3.11Statistical Estimation

Discrete conditional probabilities were seen to provide method for guessing an unknown class from an observation: if all incorrect choices have equal costs so that the overall optimality criterion is to minimize the probability of error, then the optimal classiﬁcation rule is to guess that the class X = k, where pX|Y (k|y) = maxz pX|Y (x|y), the maximum a posteriori or MAP decision rule. There is an analogous problem and solution in the continuous case, but the result does not have as strong an interpretation as in the discrete case. A more complete analogy will be derived in the next chapter.

As in the discrete case, suppose that a random variable Y is observed

and the goal is to make a good guess X(Y ) of another random variable X that is jointly distributed with Y . Unfortunately in the continuous case it does not make sense to measure the quality of such a guess by the probability of its being correct because now that probability is usually zero. For example, if Y is formed by adding a Gaussian signal X to an independent Gaussian noise W to form an observation Y = X + W as in the previous section, then no rule is going to recover X perfectly from Y . Nonetheless, intuitively there should be reasonable ways to make such guesses in continuous situations. Since X is continuous, such guesses are refered to as “estimation” or “prediction” of X rather than as “classiﬁcation” or “detection” as used in the discrete case. In the statistical literature the general problem is referred to as “regression”.

One approach is to mimic the discrete approach on intuitive grounds. If the best guess in the classiﬁcation problem of a random variable X given an

3.12. CHARACTERISTIC FUNCTIONS

147

ˆ | observation Y is the MAP classiﬁer XMAP(y) = argmaxxpX|Y (x y), then

a natural analog in the continuous case is the so-called MAP estimator deﬁned by

ˆ	(3.102)
XMAP(y) = argmaxxfX\|Y (x\|y),

the value of x maximizing the conditional pdf given y. The advantage of this estimator is that it is easy to describe and provides an immediate application of conditional pdf’s paralleling that of classiﬁcation for discrete conditional probability. The disadvantage is that we cannot argue that this estimate is “optimal” in the sense of optimizing some speciﬁed criterion, it is essentially an ad hoc (but reasonable) rule. As an example of its use, consider the Gaussian signal plus noise of the previous section. There it was

found that the pdf fX\|Y (x\|y) is Gaussian with mean	σX2
		y. Since the
	σX2 +σW2

Gaussian density has its peak at its mean, in this case the MAP estimate

σ2

of X given Y = y is given by the conditional mean 2 X 2 y.

σX +σW

Knowledge of the conditional pdf is all that is needed to deﬁne another estimator: the maximum likelihood or ML estimate of X given Y = y is deﬁned as the value of x that maximizes the conditional pdf fY |X(y|x), the pdf with the roles of input and output reversed from that of the MAP estimator. Thus

ˆ	argmax	fY \|X(y\|x).	(3.103)
XML(y) =	x

Thus in the Gaussian case treated above, XML(Y ) = y.

The main interest in the ML estimator in some applications is that it is sometimes simpler and that it does not require any assumption on the input statistics. The MAP estimator depends strongly on fX, the ML estimator does not depend on it at all. It is easy to see that if the input pdf is uniform, the MAP estimator and the ML estimator are the same.

3.12Characteristic Functions

We have seen that summing two random variables produces a new random variable whose pmf or pdf is found by convolving the two pmf’s or pdf’s of the original random variables. Anyone with an engineering background will likely have had experience with convolution and recall they can be somewhat messy to evaluate. To make matters worse, if one wishes to sum

additional independent random variables to the existing sum, say form

Y = k=1 Xk from an iid collection {Xk}, then the result will be an N- fold convolution, a potential nightmare in all but the simplest of cases. As

148	CHAPTER 3. RANDOM OBJECTS

in other engineering applications such as circuit design, convolutions can be avoided by Fourier transform methods and in this subsection we describe the method as an alternative approach for the examples to come. We begin with the discrete case.

Historically the transforms used in probability theory have been slightly di erent from those in traditionally Fourier analysis. For a discrete random variable with pmf pX, deﬁne the characteristic function MX of the random variable (or of the pmf) as


MX(ju) = pX(x)ejux,	(3.104)

where u is usually assumed to be real. Recalling the deﬁnition (2.34) of the expectation of a function g deﬁned on a sample space, choosing g(ω) = ejuX(ω) shows that the characteristic function can be be more simply deﬁned as

MX(ju) = E[ejuX].

(3.105)

Thus characteristic functions, like probabilities, can be viewed as special cases of expectations.

This transform, which is also referred to as an exponential transform or operational transform, bares a strong resemblance to the discrete-parameter Fourier transform


Fν(pX) =	pX(x)e−j2πνx	(3.106)
x
and the z-transform
Zz(pX) =
Zz(pX) =	pX(x)zx.	(3.107)

In particular, MX(ju) = F−2πu(pX) = Zeju (pX). As a result, all of the properties of characteristic functions follow immediately from (are equivalent to) similar properties from Fourier or z transforms. As with Fourier and z transforms, the original pmf pX can be recovered from the transform MX by suitable inversion. For example, given a pmf pX(k); k ZN ,

1	π/2		1		π/2				e−iuk du
1	−π/2	MX(ju)e−iuk du =	1		−π/2		x	pX(x)ejux
2π	−π/2	MX(ju)e−iuk du =	2π		−π/2		x	pX(x)ejux

						1		π/2
		=		x	pX(x)	1	−π/2 eju(x−k) du
		=		x	pX(x)	2π	−π/2 eju(x−k) du

		=	pX(x)δk−x = pX(k).						(3.108)
				x

3.12. CHARACTERISTIC FUNCTIONS

149

Consider again the problem of summing two independent random variables X and W with pmf’s pX and pW with characteristic functions MX and MW , respectively. If Y = X + W as before we can evaluate the characteristic function of Y as

MY (ju) = pY (y)ejuy

where from the inverse image formula

pY (y) =	pX,W (x, w)
	x+w=y
	x,w:

so that

MY (ju) =		pX,W (x, w) ejuy
y	x,w:x+w=y

=		pX,W (x, w)ejuy
y	x,w:x+w=y

=	pX,W (x, w)eju(x+w)

y x,w:x+w=y

=pX,W (x, w)eju(x+w)

x,w

where the last equality follows because each of the sums for distinct y collects together di erent x and w and together the sums for all y gather all of the x and w. This last sum factors, however, as

MY (ju) = pX(x)pW (w)ejuxejuw

x,w

= pX(x)ejux pW (w)ejuw

x	w
= MX(ju)MW (ju),		(3.109)

which shows that the transform of the pmf of the sum of independent random variables is simply the product of the transforms.

Iterating (3.109) several times gives an extremely useful result that we state formally as a theorem. It can be proved by repeating the above argument, but we shall later see a shorter proof.

150 CHAPTER 3. RANDOM OBJECTS

Theorem 3.1 If {Xi; i = 1, . . . , N} are independent random variables with characteristic functions MXi , then the characteristic function of the

N
random variable Y = i=1 Xi is	N
MY (ju) =	i	(3.110)
MY (ju) =	MXi (ju).	(3.110)
	=1

If the Xi are independent and identically distributed with common characteristic function MX, then

MY (ju) = MXN (ju).

(3.111)

As a simple example, the characteristic function of a binary random variable X with parameter p = pX(1) = 1 − pX(0) is easily found to be

MX(ju) =

ejukpX(k) = (1 − p) + peju .

(3.112)

k=0

{

i; i = 1, . . . , n} are

independent Bernoulli random variables with iden-

, then M

(ju) = [(1

p) + pe

tical distributions and Y

−

]

and hence

k=1

MYn (ju)

pYn (k)ejuk

k=0

−

ju n

p) + pe )

((1

k=0

(1 − p)n−kpk ejuk

where we have invoked the binomial theorem in the last step. For the equality to hold, however, we have from the uniqueness of transforms that pYn (k) must be the bracketed term, that is, the binomial pmf

pYn (k) =	n	(1 − p)n−kpk; k Zn+1.	(3.113)
	k

As in the discrete case, convolutions can be avoided by transforming the densities involved. The derivation is exactly analogous to the discrete case, with integrals replacing sums in the usual way.

For a continous random variable X with pmf fX, deﬁne the characteristic function MX of the random variable (or of the pmf) as

MX(ju) = fX(x)ejux dx.

(3.114)

3.12. CHARACTERISTIC FUNCTIONS

151

As in the discrete case, this can be considered as a special case of expectation for continuous random variables as deﬁned in (2.34) so that

MX(ju) = E[ejuX].

(3.115)

The characteristic function is related to the the continuous-parameter Fourier transform

Fν(fX) =	fX(x)e−j2πνx dx	(3.116)
and the Laplace transform
Ls(fX) =
Ls(fX) =	fX(x)esx dx	(3.117)

by MX(ju) = F−2πu(fX) = Lju(fX). As a result, all of the properties of characteristic functions of densities follow immediately from (are equivalent to) similar properties from Fourier or Laplace transforms. For example, given a well-behaved density fX(x); x with characteristic function

MX(ju),

	1	∞
fX(x) =		−∞ MX(ju)e−jux du.	(3.118)
	2π

Consider again the problem of summing two independent random variables X and Y with pdf’s fX and fW with characteristic functions MX and MW , respectively. As in the discrete case it can be shown that

MY (ju) = MX(ju)MW (ju).

(3.119)

Rather than mimic the proof of the discrete case, however, we postpone the proof to a more general treatment of characteristic functions in chapter 4.

As in the discrete case, iterating (3.119) several times yields the following result, which now includes both discrete and continous cases.

Theorem 3.2 If {Xi; i = 1, . . . , N} are independent random variables with characteristic functions MXi , then the characteristic function of the

N
random variable Y = i=1 Xi is	N
MY (ju) =	i	(3.120)
MY (ju) =	MXi (ju).	(3.120)
	=1

If the Xi are independent and identically distributed with common characteristic function MX, then

MY (ju) = MXN (ju).

(3.121)

152	CHAPTER 3. RANDOM OBJECTS

As an example of characteristic functions and continuous random variables, consider the Gaussian random variable. The evaluation requires a bit of e ort, either using the “complete the square” technique of calculus or by looking up in published tables. Assume that X is a Gaussian random variable with mean m and variance σ2. Then

MX(ju) =

E(ejuX)

∞

−∞

e−(x−m)

/2σ ejux dx

(2πσ2)1/2

∞

−∞

e−(x

−2mx−2σ

jux+m

)/2σ

(2πσ2)1/2

−∞ (2πσ12)1/2 e−(x−(m+juσ

))

/2σ

dx ejum−y

∞

ejum−u2σ2/2 .

(3.122)

Thus the characteristic function of a Gaussian random variable with mean m and variance σX2 is

MX(ju) = ejum−u2σ2/2 .

(3.123)

If {Xi; i = 1, . . . , n} are independent Gaussian random variables with

identical densities N(m, σ2) and Yn = n Xi, then

k=1

MYn (ju) = [ejum−u2σ2/2]n = eju(nm)−u2(nσ2)/2,

(3.124)

which is the characteristic function of a Gaussian random variable with mean nm and variance nσ2.

The following maxim should be kept in mind whenever faced with sums of independent random variables:

When given a derived distribution problem involving the sum of independent random variables, ﬁrst ﬁnd the characteristic function of the sum by taking the product of the characteristic functions of the individual random variables. Then ﬁnd the corresponding probability function by inverting the transform. This technique is valid if the random variables are independent

—they do not have to be identically distributed.

3.13Gaussian Random Vectors

A random vector vector is said to be Gaussian if its density is Gaussian, that is, if its distribution is described by the multidimensional pdf explained in

3.13. GAUSSIAN RANDOM VECTORS

153

chapter 2. The component random variables of a Gaussian random vector are said to be jointly Gaussian random variables. Note that the symmetric matrix Λ of the k−dimensional vector pdf has k(k + 1)/2 parameters and that the vector m has k parameters. On the other hand, the k marginal pdf’s together have only 2k parameters. Again we note the impossibility of constructing joint pdf’s without more speciﬁcation than the marginal pdf’s alone. As previously, the marginals will su ce to describe the entire vector if we also know that the vector has independent components, e.g., the vector is iid. In this case the matrix Λ is diagonal.

Although di cult to describe, Gaussian random vectors have several nice properties. One of the most important of these properties is that linear or a ne operations on Gaussian random vectors produce Gaussian random vectors. This result can be demonstrated with only a modest amount of work using multidimensional characteristic functions, the extension of transforms from scalars to vectors.

The multidimensional characteristic function of a distribution is deﬁned as follows: Given a random vector X = (X0, . . . , Xn−1) and a vector parameter u = (u0, . . . , un−1), the n-dimensional characteristic function MX(ju) is deﬁned by

MX(ju) =	MX0		,... ,Xn−1 (ju0, . . . , jun−1)

=	E ejutX
			n		1	.

=	E	exp j		− ukXk			(3.125)

k=0

It can be shown using multivariable calculus (problem 3.49) that a Gaussian random vector with mean vector m and covariance matrix Λ has characteristic function

MX(ju) =	ejutm−1/2utΛu			.(3.126)
=	exp	$j n−1 ukmk − 1/2 n−1 n−1 ukΛ(k, m)um%		.(3.126)

		k=0	k=0 m=0

Observe that the Gaussian characteristic function has the same form as the Gaussian pdf — an exponential quadratic in its argument. However, unlike the pdf, the characteristic function depends on the covariance matrix directly, whereas the pdf contains the inverse of the covariance matrix. Thus the Gaussian characteristic function is in some sense simpler than the Gaussian pdf. As a further consequence of the direct dependence on the covariance matrix, it is interesting to note that, unlike the Gaussian pdf, the characteristic function is well-deﬁned even if Λ is only nonnegative

154	CHAPTER 3. RANDOM OBJECTS

deﬁnite and not strictly positive deﬁnite. Previously we give a deﬁnition of a Gaussian random vector in terms of its pdf. Now we can give an alternate, more general (in the sense that a strictly positive deﬁnite covariance matrix is not required) deﬁnition of a Gaussian random vector and hence random process):

A random vector is Gaussian if and only if it has a characteristic function of the form of (3.126).

3.14Examples: Simple Random Processes

In this section several examples of random processes deﬁned on simple probability spaces are given to illustrate the basic deﬁnition of an inﬁnite collection of random variables deﬁned on a single space. In the next section more complicated examples are considered by deﬁning random variables on a probability space which is the output space for another random process, a setup that can be viewed as signal processing.

[3.22] Consider the binary probability space (Ω, F, P ) with Ω = {0, 1}, F the usual event space, and P induced by the pmf p(0) = α and p(1) = 1 − α, where α is some constant, 0 ≤ α ≤ 1. Deﬁne a random process on this space as follows:

X(t, ω) = cos(ωt) =

cos(t), t if ω = 1 1, t if ω = 0 .

Thus if a 1 occurs a cosine is sent forever, and if a 0 occurs a constant 1 is sent forever.

This process clearly has continuous time and at ﬁrst glance it might appear to also have continuous amplitude, but only two waveforms are possible, a cosine and a constant. Thus the alphabet at each time contains at most two values and these possible values change with time. Hence this process is in fact a discrete amplitude process and random vectors drawn from this source are described by pmf’s. We can consider the alphabet of the process to be either T or [−1, 1]T , among other possibilities. Fix time at t = π/2. Then X(π/2) is a random variable with pmf

pX(π/2)(x) =	α,	if x = 1
pX(π/2)(x) =	1 − α,	if x = 0 .

The reader should try other instances of time. What happens at t = 0, 2π, 4π, m . . . ?

<<< < Предыдущая 5 6 7 8 9 10 11 12 13 14 15 1617 / 4617 18 19 20 21 22 23 24 25 26 27 28 29 > Следующая >>>

Соседние файлы в предмете Социология

#
23.06.202254.09 Кб2Akty_pravoprimenenia.pptx
#
10.07.202222.04 Кб0Alvin Gouldner on the New Class & the Culture of Critical Discourse.chm
#
10.07.202260 б0AMER_SL-1.INF
#
10.07.202260 б1AMER_SL.INF
#
10.07.2022943.46 Кб14An Introduction To Statistical Inference And Data Analysis.pdf
#
10.07.20221.81 Mб9An Introduction to Statistical Signal Processing.pdf
#
10.07.20221.01 Mб9Analiz soderzh - soc. metod.doc
#
10.07.202249.2 Кб0anketa.chm
#
10.07.202238.95 Кб1ANNOT.PDF
#
23.06.20223.26 Mб5APRP_2_2021-3_SIGNAL.pdf
#
23.06.20222.97 Mб5APRP_3_2021-2_SIGNAL.pdf