
An Introduction to Statistical Signal Processing
.pdf3.18. PROBLEMS |
185 |
x
186 |
CHAPTER 3. RANDOM OBJECTS |
Chapter 4
Expectation and Averages
4.1Averages
In engineering practice we are often interested in the average behavior of measurements on random processes. The goal of this chapter is to link the two distinct types of averages that are used — long-term time averages taken by calculations on an actual physical realization of a random process and averages calculated theoretically by probabilistic averages at some given instant of time, averages that are sometimes called expectations. As we shall see, both computations often (but by no means always) give the same answer. Such results are called laws of large numbers or ergodic theorems.
At first glance from a conceptual point of view, it seems unlikely that long-term time averages and instantaneous probabilistic averages would be the same. If we take a long-term time average of a particular realization of the random process, say {X(t, ω0); t T }, we are averaging for a particular
ω— an ω which we cannot know or choose; we do not use probability in any way and we are ignoring what happens with other values of ω. Here the averages are computed by summing the sequence or integrating the waveform over t while ω0 stays fixed. If, on the other hand, we take an instantaneous probabilistic average, say at the time t0, we are taking a probabilistic average and summing or integrating over ω for the random variable X(t0, ω). Thus we have two averages, one along the time axis with
ωfixed, the other along the ω axis with time fixed. It seems that there should be no reason for the answers to agree. Taking a more practical point of view, however, it seems that the time and probabilistic averages must be the same in many situations. For example, suppose that you measure the percentage of time that a particular noise voltage exceeds 10 volts. If you make the measurement over a su ciently long period of time,
187
188 |
CHAPTER 4. EXPECTATION AND AVERAGES |
the result should be a reasonably good estimate of the probability that the noise voltage exceeds 10 volts at any given instant of time — a probabilistic average value.
To proceed further, for simplicity we concentrate on a discrete alphabet discrete time random process. Other cases are considered by converting appropriate sums into integrals. Let {Xn} be an arbitrary discrete alphabet discrete time process. Since the process is random, we cannot predict accurately its instantaneous or short-term behavior — we can only make probabilistic statements. Based on experience with coins, dice, and roulette wheels, however, one expects that the long-term average behavior can be characterized with more accuracy. For example, if one flips a fair coin, short sequences of flips are unpredictable. However, if one flips long enough, one would expect to have an average of about 50% of the flips result in heads. This is a time average of an instantaneous function of a random process — a type of counting function that we will consider extensively. It is obvious that there are many functions that we can average, i.e., the average value, the average power, etc. We will proceed by defining one particular average, the sample average value of the random process, which is formulated as
n−1
Sn = n−1 Xi ; n = 1, 2, 3, . . .
i=0
We will investigate the behavior of Sn for large n, i.e., for a long-term time average. Thus, for example, if the random process {Xn} is the coin-flipping model, the binary process with alphabet {0, 1}, then Sn is the number of 1’s divided by the total number of flips — the fraction of flips that produced a 1. As noted before, Sn should be close to 50% for large n if the coin is fair.
Note that, as in example [3.7], for each n, Sn is a random variable that is defined on the same probability space as the random process {Xn}. This is made explicit by writing the ω dependence:
|
1 n−1 |
|
|
|
|
Sn(ω) = |
|
Xk(ω) . |
|
n |
|
|
|
k=0 |
In more direct analogy to example [3.7], we can consider the {Xn} as coordinate functions on a sequence space, say ( Z, B( Z), m), where m is the distribution of the process, in which case Sn is defined directly on the sequence space. The form of definition is simply a matter of semantics or convenience. Observe, however, that in any case {Sn; n = 1, 2, . . . } is itself a random process since it is an indexed family of random variables defined on a probability space.
4.1. AVERAGES |
189 |
For the discrete alphabet random process that we are considering, we can rewrite the sum in another form by grouping together all equal terms:
Sn(ω) = |
ara(n)(ω) |
(4.1) |
|
a A |
|
where A is the range space of the discrete alphabet random variable Xn and ra(n)(ω) = n−1 [number of occurrences of the letter a in {Xi(ω), i = 0, 1, 2, . . . , n−1}]. The random variable ra(n) is called the nth−order relative frequency or of the symbol a. Note that for the binary coin flipping example we have considered, A = {0, 1}, and Sn(ω) = r1(n)(ω), the average number of heads in the first n flips. In other words, for the binary coin-flipping example, the sample average and the relative frequency of heads are the same quantity. More generally, the reader should note that rn(n) can always be written as the sample average of the indicator function for a, 1a(x):
|
n−1 |
|
i |
ra(n) = n−1 |
1a(Xi) , |
|
=0 |
where |
|
1 |
if x = a |
1a(x) = 0 |
otherwise. |
Note that 1{a} is a more precise, but more clumsy, notation for the indicator function of the singleton set {a}. We shall use the shorter form here.
Let us now assume that all of the marginal pmf’s of the given process are the same, say pX(x), x A. Based on intuition and gambling experience, one might suspect that as n goes to infinity, the relative frequency of a symbol a should go to its probability of occurrence, pX(a). To continue the example of binary coin flipping, the relative frequency of heads in n tosses of a fair coin should tend to 1/2 as n → ∞. If these statements are true, that is, if in some sense,
ra(n) n→ pX(a) , |
(4.2) |
|
→∞ |
|
|
then it follows that in a similar sense |
|
|
|
apX(a) , |
(4.3) |
Sn n→∞→ |
||
a A |
|
|
the same expression as (4.1) with the relative frequency replaced by the pmf. The formula on the right is an example of an expectation of a random variable, a weighted average with respect to a probability measure. The

190 CHAPTER 4. EXPECTATION AND AVERAGES
formula should be recognized as a special case of the definition of expectation of (2.34), where the pmf is pX and g(x) = x, the identity function. The previous plausibility argument motivates studying such weighted averages because they will characterize the limiting behavior of time averages in the same way that probabilities characterize the limiting behavior of relative frequencies.
Limiting statements of the form of (4.2) and (4.3) are called laws of large numbers or ergodic theorems. They relate long-run sample averages or time average behavior to probabilistic calculations made at any given instant of time. It is obvious that such laws or theorems do not always hold. If the coin we are flipping wears in a known fashion with time so that the probability of a head changes, then one could hardly expect that the relative frequency of heads would equal the probability of heads at time zero.
In order to make precise statements and to develop conditions under which the laws of theorems do hold, we first need to develop the properties of the quantity on the right-hand side of (4.2) and (4.3). In particular, we
cannot at this point make any sense out of a statement like “lim →∞ S =
n n
apX(a),” since we have no definition for such a limit of random variables
a A
or functions of random variables. It is obvious, however, that the usual definition of a limit used in calculus will not do, because Sn is a random variable albeit a random variable whose “randomness” decreases in some sense with increasing n. Thus the limit must be defined in some fashion that involves probability. Such limits are deferred to a later section and we begin by looking at the definitions and calculus of expectations.
4.2Expectation
Given a discrete alphabet random variable X specified by a pmf pX, define the expected value, probabilistic average, or mean of X by
E(X) = apX(x) . |
(4.4) |
x A |
|
The expectation is also denoted by EX or E[X] or by an overbar, as X. The expectation is also sometimes called an ensemble average to denote averaging across the ensemble of sequences that is generated for di erent values of ω at a given instant of time.
The astute reader might note that we have really provided two definitions of the expectation of X. The definition of (4.4) has already been noted to be a special case of (2.34) with pmf pX and function g(x) = x.
4.2. EXPECTATION |
191 |
Alternatively, we could use (2.34) in a more fundamental form and consider g(ω) = X(ω) is a function defined on an underlying probability space described by a pmf p or a pdf f, in which case (2.34) or (2.57) provide a different formula for finding the expection in terms of the original probability function:
E(X) = X(ω)p(ω) |
(4.5) |
if the original space is discrete, or |
|
|
|
E(X) = X(r)f(r) dr |
(4.6) |
if it is described by a pdf. Are these two versions consistent? The answer is yes, as will be proved soon by the fundamental theorem of expectation. The equivalence of these forms is essentially a change of variables formula.
The mean of a random variable is a weighted average of the possible values of the random variable with the pmf used as a weighting. Before continuing, observe that we can define an analogous quantity for a continuous random variable possessing a pdf: If the random variable X is described by a pdf fX, then we define the expectation of X by
EX = xfX(x) dx , |
(4.7) |
where we have replaced the sum by an integral. Analogous to the discrete case, this formula is a special case of (2.57) with pdf f = fX and g being the identity function. We can also use (2.57) to express the expectation in terms of an underlying pdf, say f, with g = X by the formula
EX = X(r)f(x) dr . |
(4.8) |
The equivalence of these two formulas will be considered when the fundamental theorem of expectation is treated.
While the integral does not have the intuitive motivation involving a relative frequency converging to a pmf that the earlier sum did, we shall see that it plays the analogous role in the laws of large numbers. Roughly speaking, this is because continuous random variables can be approximated by discrete random variables arbitrarily closely by very fine quantization. Through this procedure, the integrals with pdfs are approximated by sums with pmf’s and the discrete alphabet results imply the continuous alphabet results by taking appropriate limits. Because of the direct analogy, we shall develop the properties of expectations for continuous random variables along with those for discrete alphabet random variables. Note in passing
192 |
CHAPTER 4. EXPECTATION AND AVERAGES |
that, analogous to using the Stieltjes integral as a unified notation for sums and integrals when computing probabilities, the same thing can be done for expectations. If FX is the cdf of a random variable X, define
EX = |
x dFX(x) = |
xpX(x) |
if X is discrete |
|
|
|
if X has a pdf. |
|
xfX(x) dx |
In a similar manner, we can define the expectation of a mixture random variable having both continuous and discrete parts in a manner analogous to (3.36).
4.2.1Examples: Expectation
The following examples provide some typical expectation computations.
[4.1] As a slight generalization of the fair coin flip, consider the more general binary pmf with parameter p; that is, pX(1) = p and pX(0) = 1 − p. In this case
1
EX = xpX(x) = 0(1 − p) + 1p = p .
i=0
It is interesting to note that in this example, as is generally true for discrete random variables, EX is not necessarily in the alphabet of the random variable, i.e., EX = 0 or 1 unless p = 0 or 1.
[4.2] A more complicated discrete example is a geometric random variable. In this case
∞ |
∞ |
|
|
EX = kpX(k) = |
kp(1 − p)k−1 , |
k=1 |
k=1 |
a sum evaluated in (2.48) as 1/p. |
|
[4.3] As an example of a continuous random variable, assume that X is a uniform random variable on [0, 1], that is, that its density is one on [0, 1]. Here
1 1
EX = xfX(x) dx = x dx = 1/2 ,
00
an integral evaluated in (2.67).

4.2. EXPECTATION |
193 |
[4.4] If X is an exponentially distributed random variable with parameter λ, then from (2.71)
∞ |
|
1 |
|
|
0 |
rλe−λr dr = |
|
. |
(4.9) |
λ |
In some case expectations can be found virtually by inspection. For example, if X has an even pdf fX — that is, if fX(−x) = fX(x) for all x — then if the integral exists, EX = 0, since xfX(x) is an odd function and hence has a zero integral. The assumption that the integral exists is necessary because not all even functions are integrable. For example, suppose that we have a pdf fX(x) = c/x2 for all |x| ≥ 1, where c is a normalization constant. Then it is not true that EX is zero, even though the pdf is even, because the Riemann integral
x dx
x: |x|≥1 x2
does not exist. (The puzzled reader should review the definition of indefinite integrals. Their existence requires that the limit
S
lim lim |
xfX(x) dx |
T →∞ S→∞ −T |
|
exists regardless of how T and S tend to infinity; in particular, the existence for the limit with the constraint T = S is not su cient for the existence of the integral. These limits do not exist for the given example because 1/x is not integrable on [1, ∞).) Nonetheless, it is convenient to set EX to 0 in this example because of the obvious intuitive interpretation.
Sometimes the pdf is an even function about some nonzero value, that is, fX(x + m) = fX(x − m), where m is some constant. In this case, it is easily seen that if if the expectation exists, then EX = m, as the reader can quickly verify by a change of variable in the integral defining the expectation. The most important example of this is the Gaussian pdf, which is even about the constant m.
The same conclusions also obviously hold for an even pmf. sectionExpectations of Functions of Random Variables In addition to
the expectation of a given random variable, we will often be interested in the expectations of other random variables formed as functions of the given one. In the beginning of the chapter we introduced the relative frequency function, ra(n), which counts the relative number of occurrences of the value a in a sequence of n terms. We are interested in its expected value and in the expected value of the indicator function that appears in the expression for
194 |
CHAPTER 4. EXPECTATION AND AVERAGES |
ra(n). More generally, given a random variable X and a function g : → , we might wish to find the expectation of the random variable Y = g(X). If X corresponds to a voltage measurement and g is a simple squaring operation, g(X) = X2, then g(X) provides the instantaneous energy across a unit resistor. Its expected value, then, represents the probabilistic average energy. More generally than the square of a random variable, the moments of a random variable X are defined by E[Xk] for k = 1, 2, . . . . The mean is the first moment, the square is the second moment, and so on. Moments are often useful as general parameters of a distribution, providing information on its shape without requiring the complete pdf or pmf. Some distributions are completely characterized by a few moments. It is often useful to consider moments of a “centralized” random variable formed by removing its mean. The kth centralized moment is defined by E[(X − E(X))k]. Of particular
interest is the second centralized moment or variance σ2 |
∆ |
= E[(X−E(X))2]. |
Other functions that are of interest are indicator functions of a set, 1F (x) = 1 if x F and 0 otherwise, so that 1F (X) is a binary random variable indicating whether or not the value of X lies in F , and complex exponentials
ejuX.
Expectations of functions of random variables were defined in this chapter in terms of the derived distribution for the new random variable. In chapter 2, however, they were defined in terms of the original pmf or pdf in the underlying probability space, a formula not requiring that the new distribution be derived. We next show that the two formulas are consistent. First consider finding the expectation of Y by using derived distribution techniques to find the probability function for Y and then use the definition of expectation to evaluate EY . Specifically, if X is discrete, the pmf
for Y is found as before as |
|
|
|
pY (y) = |
pX(x), y AY . |
|
x)=y |
|
|
x: g( |
|
EY is then found as |
|
|
|
EY = |
ypY (y) . |
AY
Although it is straightforward to find the probability function for Y , it can be a nuisance if it is being found only as a step in the evaluation of the expectation EY = Eg(X). A second and easier method of finding EY is normally used. Looking at the formula for EX, it seems intuitively obvious that E(g(X)) should result if x is replaced by g(x). This can be proved by the following simple procedure. Starting with the pmf for Y , then substituting for its expression in terms of the pmf of X and reordering the summation, the expectation of Y is found directly from the pmf for X