
An Introduction to Statistical Signal Processing
.pdf5.7. TIME-AVERAGES |
305 |
- 5.42), yet which are still “physically realizable” in the sense that simple models describe the processes. Consider the following example suggested to the authors by A.B. Balakrishnan: Let X be a zero mean Gaussian random variable with variance 1 and let Θ be a random variable with a uniform distribution on [−π, π) which is independent of X. Define the random process Y = cos(Xt − Θ). Then analogous to the development of the autocorrelation function for linear modulation, we have that
E[Y (t)] = |
E[cos(Xt − Θ)] |
|
|
|
||||
= |
0 |
|
|
|
|
|
|
|
RY (τ) = |
E[cos(Xt − Θ) cos(X(t − τ) − Θ)] |
|
||||||
= |
|
1 |
E[cos(Xτ)] |
|
|
|
||
2 |
|
|
|
|||||
|
|
|
|
|
|
|
||
= |
|
1 |
E[ejτX + ejτX] |
|
|
|||
4 |
|
|
||||||
= |
1 |
(MX(jτ) + MX(−jτ) |
|
|||||
|
|
|
||||||
|
4 |
|
||||||
= |
|
1 |
e−τ2 |
|
|
|
||
2 |
|
|
|
|||||
|
|
|
|
|
|
|
||
so that the power spectral density is |
|
|
|
|||||
|
1 |
|
2 |
|
||||
|
|
|
SY (f) = |
|
e−f |
, |
(5.43) |
|
|
|
|
2 |
which fails to meet the Paley-Wiener criterion.
5.7Time-Averages
Recall the definitions of mean, autocorrelation, and covariance as expectations of samples of a weakly stationary random process {Xn; n Z}:
m |
= E[Xn] |
|
RX(k) = E[XnXn−k] |
||
KX(k) |
= |
E[(Xn − m)(Xn−k − m) ] |
|
= |
RX(k) − |m|2. |
These are collectively considered as the second-order moments of the process. The corresponding time-average moments can be described if the
306 |
CHAPTER 5. SECOND-ORDER MOMENTS |
limits are assumed to exist in some suitable sense:
|
|
|
|
|
|
|
|
|
1 |
|
N−1 |
|
|
|
|
|
|
|
|
|||||
|
M = |
< Xn |
>= Nlim |
|
|
|
|
|
Xn |
|
|
|
|
|
|
|
|
|||||||
|
|
N |
n=0 |
|
|
|
|
|
|
|
|
|||||||||||||
|
|
|
|
|
|
|
→∞ |
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
N−1 |
|
|
|
|
|
|
|
|
|
RX |
(k) |
= |
|
|
X |
|
> lim |
|
|
|
|
|
X |
|
|
|
|
|
|
|||||
< X |
|
|
|
|
|
|
X |
|
|
|
|
|
|
|||||||||||
|
|
n |
n−k |
N |
→∞ |
N |
n |
n−k |
|
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
k=0 |
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
N−1 |
|
|
|
|
|
|
KX |
(k) |
= |
< (X |
n − |
m)(X |
n−k − |
m) >= |
lim |
|
|
(X |
n − |
m)(X |
n−k − |
m) . |
|||||||||
|
|
|||||||||||||||||||||||
|
|
|
|
|
|
|
|
N→∞ N n=0 |
|
|
|
Keep in mind that these quantities, if they exist at all, are random variables. For example, if we actually view a sample function {Xn(ω); n Z}, then the sample autocorrelation is
|
|
|
1 |
N−1 |
|
|
RX |
(k) = lim |
|
|
(ω)X |
(ω), |
|
|
X |
|||||
N |
→∞ |
N |
n |
n−k |
|
|
|
|
|
k=0 |
|
|
also a function of the sample point ω and hence a random variable. Of particular interest is the autocorrelation for 0 lag:
|
1 |
N−1 |
PX = RX(0) = Nlim |
|
|
N |
|Xn|2, |
|
→∞ |
|
k=0 |
which can be considered as the sample or time average power of the sample function in the sense that it is the average power dissipated in a unit resistance if Xn corresponds to a voltage.
Analogous to the expectations, the time-average autocorrelation function and the time-average covariance function are related by
KX(k) = RX(k) − |M|2. |
(5.44) |
In fact, subject to suitable technical conditions as described in the laws of large numbers, the time averages should be the same as the expectations, that is, under suitable conditions a weakly stationary random process {Xn} should have the properties that
m = M
RX(k) = RX(k)
KX(k) = KX(k),
which provides a suggestion of how the expectations can be estimated in practice. Typically the actual moments are not known a priori, but the

5.7. TIME-AVERAGES |
307 |
random process is observed over a finite time N and the results used to estimate the moments, e.g., the sample mean
1 N−1
MN = N n=0 Xn
and the sample autocorrelation function
N−1
RN (k) = N1 n=0 XnXn−k
provide intuitive estimates of the actual moments which should converge to the true moments as N → ∞.
There are in fact many ways to estimate second-order moments and their is a wide literature on the subject. For example, the observed samples may be weighted or “windowed” so as to diminish the impact of samples in the distant past or near the borders of separate blocks of data which are handled separately. The literature on estimating correlations and covariances is particularly rich in the speech processing area.
If the process meets the conditions of the law of large numbers, then its sample average power PX will be RX(0), which is typically some nonzero
|
|
|
N−1 |
|
|
|
|
|
|
2 |
is not zero, then |
||
|
1 |
|
|
|||
positive number. But if the limit limN→∞ |
N |
|
|Xn| |
|
||
observe that necessarily the limit |
|
|
k=0 |
|
|
|
|
|
|
|
|
|
|
N−1 |
∞ |
|
|
|
||
|
|
|
|
|
|
|
Nlim |
|Xn|2 = |Xn|2 |
|
|
|
||
→∞ k=0 |
k=0 |
|
|
|
must blow up since it lacks the normalizing N in the denominator. In other words, a sample function with nonzero average power will have infinite energy. The point of this observation is that a sample function from a perfectly reasonable random process will not meet the conditions for the existence of a Fourier transform, which suggests we might not be able to apply the considerable theory of Fourier analysis when considering the behavior of random processes in linear systems. Happily this is not the case, but Fourier analysis of random processes will be somewhat di erent (as well as similar) to the Fourier analysis of deterministic finite energy signals and of deterministic periodic signals.
To motivate a possible remedy, first “window” the sample signal {Xn; n
Z} to form a new signal {Xn(N); n Z} defined by
Xn(N) = Xn |
if n ≤ N − 1 . |
(5.45) |
0 |
otherwise |
|

308 |
CHAPTER 5. SECOND-ORDER MOMENTS |
The new random process {Xn(N) clearly has finite energy (and is absolutely summable) so it has a Fourier transform in the usual sense, which can be defined as
|
|
∞ |
N−1 |
|
|
(f) = |
|
|
e−j2πfn, |
XN |
|
n |
n |
|
|
|
n=0 |
n=0 |
|
which is the Fourier transform or spectrum of the truncated sample signal. Keep in mind that this is a random variable, it depends on the underlying sample point ω through the sample waveform selected. From Parceval’s (or Plancherel’s) theorem, the energy in the truncated signal can be evaluated from the spectrum as
N−1 |
1/2 |
|
|
|
−1/2 |
|
|
EN = n=0 |Xn|2 = |
|XN (f)|2 df. |
(5.46) |
The average power is obtained by normalizing the average energy by the time duration N:
1 |
N−1 |
1 |
1/2 |
|
|
|
|
|
|
|
−1/2 |
|
|
PN = |
N |
n=0 |Xn|2 = |
N |
|XN (f)|2 df. |
(5.47) |
Because of this formula |XN (f)|2/N can be considered as the power spectral density of the truncated waveform because, analogous to a probability density or a mass density, it is a nonnegative function which when integrated gives the power. Unfortunately it gives only the power spectral density for a particular truncated sample function when what is really desired is a notion of power spectral density for the entire random process. An alternative definition of power spectral density resolves these two issues by taking the expectation to get rid of the randomness, and the limit to look at the entire signal, that is, to define the average power spectral density as the limit (if it exists)
lim E(|XN (f)|2).
N→∞ N


310 |
CHAPTER 5. SECOND-ORDER MOMENTS |
Suppose now that we have a continuous time random process {X(t)} and we form a new random process {Y (t)} by di erentiating; that is,
Y (t) = dtd X(t) .
In this section we will take {X(t)} to be a zero-mean random process for simplicity. Results for nonzero-mean processes are found by noting that X(t) can be written as the sum of a zero-mean random process plus the mean function m(t). That is, we can write X(t) = X0(t) + m(t) where X0(t) = X(t) − m(t). Then, the derivative of X(t) is the derivative of a zero-mean random process plus the derivative of the mean function. The derivative of the mean function is a derivative in the usual sense and hence provides no special problems.
To be strictly correct, there is a problem in interpreting what the derivative means when the thing being di erentiated is a random process. A derivative is defined as a limit, and as we found in chapter 4, there are several notions of limits of sequences of random variables. Care is required because the limit may exist in one sense but not necessarily in another. In particular, two natural definitions for the derivative of a random process correspond to convergence with probability one and convergence in mean square. As a first possibility we could assume that each sample function of Y (t) is obtained by di erentiating each sample function of X(t); that is, we could use ordinary di erentiation on the sample functions. This gives us a definition of the form
Y (t, ω) = |
|
d |
X(t, ω) |
||
|
|
||||
|
dt |
X(t + ∆t, ω) − X(t, ω) |
|
||
= |
|
lim |
. |
||
|
∆t→0 |
∆t |
If P ({ω : the limit exists}) = 1, then the definition of di erentiation corresponds to convergence with probability one. Alternatively, we could define
Y (t) as a limit in quadratic mean of the random variables X(t + ∆t) − X(t) ∆t
as ∆t goes to zero (which does not require that the derivative exist with probability one on sample functions). With this definition we obtain
Y (t) = l.i.m. X(t + ∆t) − X(t) .
∆t→0 ∆t
Clearly a choice of definition of derivative must be made in order to develop a theory for this simple problem and, more generally, for linear systems described by di erential equations. We will completely avoid the issue here by sketching a development with the assumption that all of the derivatives

5.8. DIFFERENTIATING RANDOM PROCESSES |
311 |
exist as required. We will blithely ignore careful specification of conditions under which the formulas make sense. (Mathematicians sometimes refer to such derivations as formal developments: Techniques are used as if they are applicable and to see what happens. This often provides the answer to a problem, which, once known, can then be proved rigorously to be correct.)
Although we make no attempt to prove it, the result we will obtain can be shown to hold under su cient regularity conditions on the process. In engineering applications these regularity conditions are almost always either satisfied, or if they are not satisfied, the answers that we obtain can be applied anyway, with care.
Formally define a process {Y∆t(t)} for a fixed ∆t as the following difference, which approximates the derivative of X(t):
Y∆t(t) = X(t + ∆t) − X(t) . ∆t
This di erence process is perfectly well defined for any fixed ∆t > 0 and in some sense it should converge to the desired Y (t) as ∆t → 0. We can easily find the following correlation:
E[Y∆t(t)Y∆s(s)] |
|
|
|
|
|
|
|
||
|
E |
X(t + ∆t) |
X(t) |
X(s + |
∆s) |
|
X(s) |
||
= |
− |
|
|
− |
|
|
|
||
|
|
∆s |
|||||||
∆t |
|
|
|
|
|||||
= |
RX(t + ∆t, s + ∆s) − RX(t + ∆t, s) − RX(t, s + ∆s) + RX(t, s) |
. |
|||||||
|
If we now (formally) let ∆t and ∆s go to zero, then, if the various limits exist, this formula becomes
RY (t, s) = |
∂ |
|
RX(t, s) . |
(5.50) |
|
∂t∂s |
|||||
|
|
|
As previously remarked, we will not try to specify complete conditions under which this sleight of hand can be made rigorous. Su ce it to say that if the conditions on the X process are su ciently strong, the formula is valid. Intuitively, since di erentiation and expectation are linear operations, the formula follows from the assumption that the linear operations commute, as they usually do. There are, however, serious issues of existence involved in making the proof precise.
One obvious regularity condition to apply is that the double derivative of (5.50) exists. If it does and the processes are weakly stationary, then we can transform (5.50) by using the property of Fourier transforms that di erentiation in the time domain corresponds to multiplication by f in the frequency domain. Then for the double derivative to exist we obtain the
312 |
CHAPTER 5. SECOND-ORDER MOMENTS |
requirement that the spectral density of {Y (t)} have finite second moment,
i.e., if SY (f) = f2SX(f), then
∞
SY (f) df < ∞ . |
(5.51) |
−∞
As a rather paradoxical application of (5.50), suppose that we have a one-sided continuous time Gaussian random process {X(t); t ≥ 0} that has zero mean and a covariance function that is the continuous time analog of example [5.3]; that is, KX(t, s) = σ2 min(t, s). (The Kolmogorov construction guarantees that there is such a random process; that is, that it is well defined.) This process is known as the continuous time Wiener process, a process that we will encounter again in the next chapter. Strictly speaking, the double derivative of this function does not exist because of the discontinuity of the function at t = s. From engineering intuition, however, the derivative of such a step discontinuity is an impulse, suggesting that
RY (t, s) = σ2δ(t − s) ,
the covariance function for Gaussian white noise! Because of this formal relation, Gaussian white noise is sometimes described as the formal derivative of a Wiener process. We have to play loose with mathematics to find this result, and the sloppiness cannot be removed in a straightforward manner. In fact, it is known from the theory of Wiener processes that they have the strange attribute of producing with probability one sample waveforms that are continuous but nowhere di erentiable! Thus we are considering white noise as the derivative of a process that is not di erentiable. In a sense, however, this is a useful intuition that is consistent with the extremely pathological behavior of sample waveforms of white noise — an idealized concept of a process that cannot really exist anyway.
5.9Linear Estimation and Filtering
In this section we give another application of second-order moments in linear systems by showing how they arise in one of the basic problems of communication; estimating the outcomes of one random process based on observations of another process using a linear filter. The initial results can be viewed as process variations on the vector results of Section 4.10, but we develop them independently here in the process and linear filtering context for completeness. We will obtain the classical orthogonality principle and the Wiener-Hopf equation and consider solutions for various simple cases. This section provides additional practice in manipulating secondorder moments of random processes and provides more evidence for their importance.
5.9. LINEAR ESTIMATION AND FILTERING |
313 |
We will focus on discrete time for the usual reasons, but the continuous time analogs are found, also as usual, by replacing the sums by integrals.
Suppose that we are given a record of observations of values of one random process; e.g., we are told the values of {Yi; N < i < M}, and we are asked to form the best estimate of a particular sample say Xn of another, related random process {Xk; k T }. We refer to the collection of indices of observed samples by K = (N, M). We permit N and M to take on infinite values. For convenience we assume throughout this section that both processes have zero means for all time. We place the strong constraint
on the estimate that it must be linear; that is, the estimate X |
|||||
have the form |
Xn = |
|
hkYn−k = |
hn−kYk |
2n of Xn must |
|
2 |
|
|
|
|
|
|
k: n−k K |
|
k K |
|
for some pulse response h. We wish to find the “best” possible filter h, perhaps under additional constraints such as causality. One possible notion of best is to define the error
* |
X |
|
n = Xn − 2n |
and define that filter to be best within some class if it minimizes the mean squared error E(*2n); that is, a filter satisfying some constraints will be considered optimum if no other filter yields a smaller expected squared error. The filter accomplishing this goal is often called a linear least squared error (LLSE) filter.
Many constraints on the filter or observation times are possible. Typical constraints on the filter and on the observations are the following:
1.We have a non-causal filter that can “see” into the infinite future and a two-sided infinite observation {Yk; k Z}. Here we consider N = −∞ and M = ∞. This is clearly not completely possible, but it may be a reasonable approximation for a system using a very long observation record to estimate a sample of a related process in the middle of the records.
2.The filter is causal (hk = 0 for k < 0), a constraint that can be incorporated by assuming that n ≥ M; that is, that samples occurring after the one we wish to estimate are not observed. When n > M the estimator is sometimes called a predictor since it estimates the value of the desired process at a time later than the last observation. Here we assume that we observe the entire past of the Y process; that is, we take N = −∞ and observe {Yk; k < M}. If, for example, the X process and the Y process are the same and M = n, then this case is called the one-step predictor (based on the semi-infinite past).
314 |
CHAPTER 5. SECOND-ORDER MOMENTS |
3.The filter is causal, and we have a finite record of T seconds; that is, we observe {Yk; M − T ≤ K < M}.
As one might suspect, the fewer the constraints, the easier the solution but the less practical the resulting filter. We will develop a general characterization for the optimum filters, but we will provide specific solutions only for certain special cases. We formally state the basic result as a theorem and then prove it.
Theorem 5.1 Suppose that we are given a set of observations {Yk; k K} of a zero-mean random process {Yk} and that we wish to find a linear
2 { } estimate Xn of a sample Xn of a zero-mean random process Xn of the
form
2 |
|
(5.52) |
Xn = |
hkYn−k . |
k: n−k K
If the estimation error is defined as |
|
|
|
* |
|
X |
, |
|
n = Xn − 2n |
|
then for a fixed n no linear filter can yield a smaller expected squared error E(*2n) than a filter h (if it exists) that satisfies the relation
E(*nYk) = 0 ; all k K , |
(5.53) |
|
or, equivalently, |
|
|
E(XnYk) = |
|
|
hiE(Yn−iYk) ; all k K . |
(5.54) |
i: n−i K
If RY (k, j) = E(YkYj) is the autocorrelation function of the Y process and RX,Y (k, j) = E(XkYj) is the cross correlation function of the two processes, then (5.54) can be written as
RX,Y (n, k) = |
hiRY (n − i, k) ; all k K . |
(5.55) |
|
i: n−i K |
|
If the processes are jointly weakly stationary in the sense that both are individually weakly stationary with a cross-correlation function that depends only on the di erence between the arguments, then, with the replacement of k by n − k, the condition becomes
RX,Y (k) = |
hiRY (k − i) ; all k : n − k K . |
(5.56) |
|
i: n−i K |
|