Добавил:

neus500 Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Московский государственный юридический университет им. О.Е. Кутафина

Предмет:

Социология

Файл:

An Introduction to Statistical Signal Processing

.pdf

Скачиваний:

Добавлен:

10.07.2022

Размер:

1.81 Mб

Скачать

☆

<<< < Предыдущая 4 5 6 7 8 9 10 11 12 13 14 1516 / 4616 17 18 19 20 21 22 23 24 25 26 27 28 > Следующая >>>

3.8. STATISTICAL DETECTION AND CLASSIFICATION

135

The operation corresponds to an “exclusive or” in logic; that is, it produces a 1 if one or the other but not both of its arguments is 1. Modulo 2 addition can also be thought of as a parity check, producing a 1 if there is an odd number of 1’s being summed and a 0 otherwise. An equivalent deﬁnition for the conditional pmf is

pY |X(y|x) = *x y(1 − *)1−x y;

(3.67)

For example, the channel over which the bit is being sent is noisy in that it occasionally makes an error. Suppose that it is known that the probability of such an error to be *. The error might be very small on a good phone line, but it might be very large if an evil hacker is trying to corrupt your data.

Given the observed Y , what is the best guess X(Y ) of what is actually sent? In other words, what is the best decision rule or detection rule for guessing the value of X given the observed value of Y ? A reasonable criterion for

judging how good an arbitrary rule X is the resulting probability of error

ˆ	ˆ	(3.68)
Pe(X) = Pr(X(Y ) = X).

A decision rule is optimal if it yields the smallest possible probability of error over all possible decision rules. A little probability manipulation quickly yields the optimal decision rule. Instead of minimizing the error probability, we maximize the probability of being correct:

Pr(X = X)

−ˆ

=1 Pe(X)

=pX,Y (x, y)

(x,y):X(y)=x

=pX|Y (x|y)pY (y)

	ˆ
	(x,y):X(y)=x
=		x:X(
		x:X(
	pY (y)	pX\|Y (x\|y)
	y	ˆ y)=x
=
=	pY (y)pX\|Y (X(y)\|y).

To maximize this sum, we want to maximize the terms within the sum for each y. Clearly the maximum value of the conditional probability

ˆ | |

pX|Y (X(y) y), maxu pX|Y (u y), will be achieved if we deﬁne the decision

ˆ
rule X(y) to be the value of u achieving the maximum of pX\|Y (u\|y) over u,
that is, deﬁne Xˆ to be arg max	u	p	X\|Y	(u	y) (also denoted max−1 p	X\|Y	(u	y)).
	u		X\|Y	\|	u	X\|Y	\|

In words: the optimal estimate of X given the observation Y in the sense

136 CHAPTER 3. RANDOM OBJECTS

of minimizing the probability of error is the most probable value of X given the observation. This is called the maximum a posteriori or MAP decision rule. In our binary example it reduces to choosing xˆ = y if * < 1/2 and xˆ = 1 − y if * > 1/2. If * = 1/2 you can give up and ﬂip a coin or make an arbitrary decision. (Why?) Thus the minimum (optimal) error probability over all possible rules is min(*, 1 − *).

The astute reader will notice that having introduced conditional pmf’s pY |X, the example considered the alternative pmf pX|Y . The two are easily related by Bayes’ rule (3.49).

A generalization of the simple binary detection problem provides the typical form of a statistical classiﬁcation system. Suppose that Nature selects a “class” H, a random variable described by a pmf pH(h), which is no longer assumed to be binary. Once the class is selected, Nature then generates a random “observation” X according to a pmf pX|H. For example, the class might be a medical condition and the observations the results of blood pressure, patients age, medical history, and other information regarding the patients health. Alternatively, the class might be an “input signal” put into a noisy channel which has the observation X as an “output signal.” The

question is: Given the observation X = x, what is the best guess H(x) of the unseen class? If by “best” we adopt the criterion that the best guess is

the one that minimizes the error probability Pe = Pr(H(X) = H), then the optimal classifer is again the MAP rule argmaxupH|X(u|x). More generally we might assign a cost Cy,h resulting if the true class is h and we guess y. Typically it is assumed that Ch,h = 0, that is, the cost is zero if our guess is correct. (In fact it can be shown that this assumption involves no real

			ˆ
loss of generality.) Given a classiﬁer (classiﬁcation rule, decision rule) h(x),
the Bayes risk is then deﬁned as

ˆ	Cˆ	pH,X(h, x),	(3.69)
B(h) =	Cˆ	pH,X(h, x),	(3.69)
	h(x),h

x,h

which reduces to the probability of error if the cost function is given by

Cy,h = 1 − δy,h.

(3.70)

The optimal classiﬁer in the sense of minimizing the Bayes risk is then found by observing that the inequality


ˆ	pX(x)	Chˆ(x),hpH\|X(h\|x)
B(h) =	pX(x)	Chˆ(x),hpH\|X(h\|x)
x		h

	pX(x)	min				(h	x)
≥		min	C	p	H\|X	(h	x)
≥		y		y,h	H\|X	\|
x			h

3.9. ADDITIVE NOISE			137
which lower bound is achieved by the classiﬁer
ˆ	argmin
h(x) =	y	Cy,hpH\|X(h\|x),	(3.71)
		h

the minimum average Bayes risk classiﬁer. This reduces to the MAP detection rule when Cy,h = 1 − δy,h.

3.9Additive Noise

The next examples of the use of conditional distributions treats the distributions arising when one random variable (thought of as a “noise” term) is added to another, independent random variable (thought of as a “signal” term). This is an important example of a derived distribution problem that yields an interesting conditional probability. The problem also suggests a valuable new tool which will provide a simpler way of solving many similar derived distributions — the characteristic function of random variables.

Discrete Additive Noise

Consider two independent random variables X and W and form a new random variable Y = X + W . For example, this could be a description of how errors are actually caused in a noisy communication channel connecting a binary information source to a user. In order to apply the detection and classiﬁcation signal processing methods, we must ﬁrst compute the appropriate conditional probabilities of the outpout Y given the input X. To do this we begin by computing the joint pmf of X and Y using the inverse image formula:

pX,Y (x, y) = Pr(X = x, Y = y)

=Pr(X = x, X + W = y)

=pX,W (α, β)

	α,β:α=x,α+β=y
=	pX,W (x, y − x)
=	pX(x)pW (y − x).	(3.72)

Note that this formula only makes sense if y − x is one of the values in the range space of W . Thus from the deﬁnition of conditional pmf’s:

pY \|X(y\|x) =	pX,Y (x, y)	= pW (y − x),	(3.73)
	pX(x)

138	CHAPTER 3. RANDOM OBJECTS

an answer that should be intuitive: given the input is x, the output will equal a certain value y if and only if the noise exactly makes up the di erence, i.e., W = y − x. Note that the marginal pmf for the output Y can be found by summing the joint probability:

pY (y) =	pX,Y (x, y)
	x
=
=	pX(x)pW (y − x),	(3.74)

a formula that is known as a discrete convolution or convolution sum. Anyone familiar with convolutions know that they can be unpleasant to

evaluate, so we postpone further consideration to the next section and turn to the continuous analog.

The above development assumed ordinary arithmetic, but it is worth pointing out that for discrete random variables sometimes other types of arithmetic are appropriate, e.g., modulo 2 arithmetic for binary random variables. The binary example of section 3.8 can be considered as an addi-

tive noise example if we deﬁne a random variable W which is independent of X and has a pmf pW (w) = *w(1−*)1−w; w = 0, 1 and where Y = X +W is interpreted as modulo 2 arithmetic, that is, as Y = X W . This additive noise deﬁnition is easily seen to yield the conditional pmf of (3.64) and the output pmf via a convolution. To be precise,

pX,Y (x, y)	=	Pr(X = x, Y = y)
	=	Pr(X = x, X W = y)
	=		pX,W (α, β)
	=	α,β:α=x,α β=y
	=	pX,W (x, y x)
	=	pX(x)pW (y x)		(3.75)
and hence
pY \|X(y\|x) =		pX,Y (x, y)	= pW (y x)	(3.76)
pY \|X(y\|x) =		pX(x)	= pW (y x)	(3.76)
and

pY (y)	=	pX,Y (x, y)
		x
	=
	=	pX(x)pW (y x),		(3.77)

a modulo 2 convolution.

3.9. ADDITIVE NOISE

139

Continuous Additive Noise

An entirely analogous formula arises in the continous case. Again suppose that X is a random variable, a signal, with pdf fX, and that W is a random variable, the noise, with pdf fW . The random variables X and W are assumed to be independent. Form a new random variable Y , an observed signal plus noise. The problem is to ﬁnd the conditional pdf’s fY |X(y|x) and fX|Y (x|y). The operation of producing an output Y from an input signal X is called an additive noise channel in communications systems. The channel is completely described by fY |X. The second pdf, fX|Y will prove useful later when we try to estimate X given an observed value of Y .

Independence of X and W implies that the joint pdf is fX,W (x, w) = fX(x)fW (w). To ﬁnd the needed joint pdf fX,Y , ﬁrst evaluate the joint cdf and then take the appropriate derivative. The cdf is a straightforward derived distribution problem:

FX,Y (x, y) = Pr(X ≤ x, Y ≤ y)

=Pr(X ≤ x, X + W ≤ y)

=				fX,W (α, β) dα dβ
		α,β:α≤x,α+β≤y
		x	y−α
=	−∞ dα		−∞	dβfX(α)fW (β)
		x

=dαfX(α)FW (y − α).

−∞

Taking the derivatives yields

fX,Y (x, y) = fX(x)fW (y − x)

and hence

fY |X(y|x) = fW (y − x).

(3.78)

The marginal pdf for the sum Y = X + W is then found as


fY (y) =	fX,Y (x, y) dx = fX(x)fW (y − x) dx,	(3.79)

a convolution integral of the pdf’s fX and fW , analogous to the convolution sum found when adding independent independent discrete random variables. Thus the evaluation of the pdf of the sum of two independent continuous random variables is the same as the evaluation of the output of a linear system with an input signal fX and an impulse response fW .

140	CHAPTER 3. RANDOM OBJECTS

We will later see an easy way to accomplish this using transforms The pdf fX|Y follows from Bayes’ rule:

f	X\|Y	(x y) =		fX(x)fW (y − x)	.	(3.80)
	X\|Y	\|		fX(α)fW (y − α) dα

It is instructive to work through the details of the previous example for the special case of Gaussian random variables. For simplicity the means are assumed to be zero and hence it is assumed that fX is N(0, σX), that fW is N(0, σY2 ), and that as in the Example X and W are independent and Y = X + W . From (3.78)

fY \|X(y\|x) =	fW (y − x)
				1	(y	−	x)2
				2	(y		x)2
=		e	−2σW					(3.81)
=								(3.81)
			2πσW2

from which the conditional pdf can be immediately recognized as being Gaussian with mean x and variance σW2 , that is, as N(x, σW2 ).

To evlauate the pdf fX|Y using Bayes’ rule, we begin with the denominator fY of (3.54) and write

∞

fY (y) =

fY |X(y|α)fX(α) dα

−∞

−

(y−α)

−

2σ2

∞ e

−∞

dα

(3.82)

[

2πσW2

2πσX2

σ2 ]

∞ 2

−σ2

−∞ e−

2αy+α2

α2

X dα

2πσXσW

−y2

W −∞ e−

−σW dα.

2πσ

Xσ

σX

σW

(3.83)

∞

) 2αy2 ]

e2σW

1 [α2(

This convolution of two Gaussian “signals” can be accomplished using an old trick called “completing the square.” Call the integral in the square brackets at the end of the above equation I and note that integrand resem-

bles

e−12 ( ασ−2m )2

which we know from (B.15) in appendix B integrates to

∞	1		α m		2
∞	1		α m		2
−∞ e−	2	(	σ−2	)	dα =	√2πσ2	(3.84)

3.9. ADDITIVE NOISE

141

since a Gaussian pdf integrates to 1. The trick is to modify I to resemble this integral with an additional factor. Compare the two exponents:

[α2(

) −

2αy

]

−

σ2

vs.

−

(

α − m

−

[

α2

αm

σ2

2 σ2 −

The exponent from I will equal the left two terms of the expanded exponent in the known integral if we choose

σ2

or, equivalently,

σ2

σ2 σ2

(3.85)

X W

σ2

+ σ2

and if we choose

σ2

or, equivalently,

m =

σ2

(3.86)

σ2

Using (3.85) – (3.86) we have that

α2(

)

2αy

= (

α − m

σX2

− σW2

−

σW2

σ2

where the addition of the leftmost term is called “completing the square.” With this identiﬁcation and again using (3.85) – (3.86) we have that

∞

α m 2

I =

−∞ e−2 [(

σ−2

)

−

] dα

σ2

(3.87)

√2πσ2e2σ2

which implies that

−2

σW2

fY (y) =

√2πσ2e2σ2

2πσXσW

e−2

(3.88)

σX

+σW .

2π(σX2

+ σW2

)

142	CHAPTER 3. RANDOM OBJECTS
In other words, fY is N(0, σX2	+ σW2 ) and we have shown that the sum of

two zero mean independent Gaussian random variables is another zero mean Gaussian random variable with variance equal to the sum of the variances of the two random variables being added.

Finally we turn to the a posteriori probability fX|Y . From Bayes’ rule

and a lot of algebra

fX|Y (x|y) =

fY |X(y|x)fX(x)

fY (y)

−

(y−x)

−

2σW2

2σX2

√

e√

2πσW2

2πσX2

−2

σX

+σW

√

2π(σX2 +σW2 )

1 [ y2−2yx+x2 + x2

]

−2

−

σW

σX

+σW

#2π

σ2

σX2 +σW2

σX

−

σX

σW

σX +σW

(3.89)

σX +σW

#2π

σ2

σX2 +σW2

In words: fX|Y (x|y) is a Gaussian pdf

σ2

σ2 σ2

X W

σX2 + σW2

The mean of a conditional distribution is called a conditional mean and the variance of a conditional distribution is called a conditional variance.

Continuous Additive Noise with Discrete Input

Additive noise provides a situation in which mixed distributions having both discrete and continuous parts naturally arise. Suppose that the signal X is binary, say with pmf pX(x) = px(1 − p)1−x. The noise term W is assumed to be a continuous random variable described by pdf fW (w), independent of X, with variance σW2 . The observation is deﬁned by Y = X + W . In this case the joint distribution is not deﬁned by a joint pmf or a joint pdf, but by a combination of the two. Some thought may lead to the reasonable guess that the continuous observation given the discrete signal should be describable by a conditional pdf fY |X(y|x) = fW (y − x), where now the conditional pdf is of the elementary variety, the given event

3.9. ADDITIVE NOISE

143

has nonzero probability. To prove that this is in fact correct, consider the elementary conditional probability Pr(Y ≤ y|X = x), for x = 0, 1. This is recognizable as the conditional cdf for Y given X = x, so that the desired conditional density is given by

fY \|X(y\|x) =	d	Pr(Y ≤ y\|X = x).	(3.90)
	dy

The required probability is evaluated using the independence of X and W as

Pr(Y ≤ y|X = x) = Pr(X + W ≤ y|X = x)

=Pr(x + W ≤ y|X = x)

=Pr(W ≤ y − x)

=FW (y − x).

Di erentiating gives

fY |X(y|x) = fW (y − x).

(3.91)

The joint distribution is described in this case by a combination of a pmf and a pdf. For example, to compute the joint probability that X F and Y G is accomplished by

Pr(X F and Y G) =		pX(x) G fY \|X(y\|x) dy
Pr(X F and Y G) =	F	pX(x) G fY \|X(y\|x) dy
=
=		pX(x) fW (y − x) dy. (3.92)
	F	G
	F

Choosing F = yields the output distribution

Pr(Y G) = pX(x) fY |X(y|x) dy =

pX(x) fW (y − x) dy.

Choosing G = (−∞, y] provides a formula for the cdf FY (y), which can be di erentiated to yield the output pdf


fY (y) = pX(x)fY \|X(y\|x) =	pX(x)fW (y − x),	(3.93)

a mixed discrete convolution involving a pmf and a pdf (and exactly the formula one might expect in this mixed situation given the pure discrete and continuous examples).

Continuing the parallel with the pure discrete and continuous cases, one might expect that Bayes’ rule could be used to evaluate the conditional

144	CHAPTER 3. RANDOM OBJECTS

distribution in the opposite direction, which since X is discrete should be a conditional pmf:

pX\|Y (x\|y) =	fY \|X(y\|x)pX(x)		=	fY \|X(y\|x)pX(x)		(3.94)
	fY (y)			α pX(α)fY \|X(y\|α)	.

Observe that unlike previously treated conditional pmf’s, this one is not an elementary conditional probability since the conditioning event does not have nonzero probability. Thus it cannot be deﬁned in the original manner, but must be justiﬁed in the same way as conditional pdf’s, that is, by the fact that we can rewrite the joint distribution (3.92) as

Pr(X F and Y G) G dyfY (y) Pr(X F \|Y = y) = G dyfY (y)		pX\|Y (x\|y),
	F
	(3.95)

so that pX|Y (x|y) indeed plays the role of a mass of conditional probability, that is,


Pr(X F \|Y = y) = pX\|Y (x\|y).	(3.96)

Applying these results to the speciﬁc case of the binary input and Gaussian noise, the conditional pmf of the binary input given the noisy observation is

X|Y

(x y) =

fW (y − x)pX(x)

; y

, x

0, 1

}

−

{(3.97)

This formula now permits the analysis of a classical problem in communications, the detection of a binary signal in Gaussian noise.

3.10Binary Detection in Gaussian Noise

The derivation of the MAP detector or classiﬁer extends immediately to the the situation of a binary input random variable and independent Gaussian

noise just treated. As in the purely discrete case, the MAP detector X(y) of X given Y = y is given by

Xˆ(y) = argmax p		(x y) = argmax		fW (y − x)pX(x)	. (3.98)
				α pX(α)fW (y − α)
x	X\|Y	\|	x

<<< < Предыдущая 4 5 6 7 8 9 10 11 12 13 14 1516 / 4616 17 18 19 20 21 22 23 24 25 26 27 28 > Следующая >>>

Соседние файлы в предмете Социология

#
23.06.202254.09 Кб1Akty_pravoprimenenia.pptx
#
10.07.202222.04 Кб0Alvin Gouldner on the New Class & the Culture of Critical Discourse.chm
#
10.07.202260 б0AMER_SL-1.INF
#
10.07.202260 б1AMER_SL.INF
#
10.07.2022943.46 Кб3An Introduction To Statistical Inference And Data Analysis.pdf
#
10.07.20221.81 Mб2An Introduction to Statistical Signal Processing.pdf
#
10.07.20221.01 Mб3Analiz soderzh - soc. metod.doc
#
10.07.202249.2 Кб0anketa.chm
#
10.07.202238.95 Кб0ANNOT.PDF
#
23.06.20223.26 Mб2APRP_2_2021-3_SIGNAL.pdf
#
23.06.20222.97 Mб2APRP_3_2021-2_SIGNAL.pdf

pX\|Y (x\|y) =	fY \|X(y\|x)pX(x)		=	fY \|X(y\|x)pX(x)		(3.94)
	fY (y)			α pX(α)fY \|X(y\|α)	.