
Diss / (Springer Series in Information Sciences 25) S. Haykin, J. Litva, T. J. Shepherd (auth.), Professor Simon Haykin, Dr. John Litva, Dr. Terence J. Shepherd (eds.)-Radar Array Processing-Springer-Verlag
.pdf5. Systolic Adaptive Beamforming |
171 |
The rotations, as applied to the general (off-diagonal) elements, may then be expressed in the form
(5.48)
(5.49)
It is clear that (5.48) contains no square-root operations, while (5.49) permits the freedom of choosing the updated scale factor b' to be
5;' |
=p2 d 5; |
(5.50) |
||
U |
d' |
u. |
||
|
The systolic array implementation of this square-root-free Givens rotation algorithm is almost the same as that described in Sect. 5.4.3 for the more conventional form. The cells, of course, store and process different quantitiesthe internal cells update elements of the unit upper-triangular matrix R(n), while the squares of the diagonal elements of R(n) are updated in the boundary nodes. The corresponding cell instructions are given explicitly in Fig. 5.5a. Note that the cosine and sine parameters, broadcast along each row, have been replaced by the modified expressions
(5.51)
and
(5.52)
which again contain no square-roots. From (5.50) and (5.51), it can be seen that the update for b may be performed in the boundary node and takes the particularly simple form
b' = cb. |
(5.53) |
The updated scale factors b are passed diagonally from one boundary cell to the next as demonstrated in Fig. 5.5a. To compensate for the input data skew, the value of b must be delayed for one clock cycle between each pair of boundary nodes, the corresponding storage elements or latches being represented by the black dots.
Golub (private communication in [5.18]) suggested a more efficient form for the square-root-free algorithm derived by Gentleman. This version, which requires only two multiplications and two additions to be performed in each
172 T.J. Shepherd and J.G. McWhirter
internal cell, may be obtained quite simply from the one described above. Assume that x~ has been computed within the internal cell. Instead of calculating r~ from rk and xk, the computation is performed using rk and x~. Substituting for Xk from (5.49) into (5.48), and using the definitions in (5.50) to (5.53), we obtain
|
(5.54) |
However, from (5.47), (5.51) and (5.52), it follows that |
|
C+ s*xi = 1 , |
(5.55) |
and so rk is updated with only a single multiplication in (5.54). Moreover, there is now no need to broadcast the parameter calong each row, as it appears in the update for neither x~ nor r~. For this reason, the algorithm is often referred to as the "fast Givens" algorithm. The systolic array cell instructions required to implement this algorithm are specified in Fig. 5.5b. It is worth noting that Ling et al. [5.2J have derived an algorithm similar to this one, (the "error-feedback algorithm") using a recursive form of the modified Gram-Schmidt orthogonalization procedure. Their algorithm is reported to have excellent numerical stability.
Since the two square-root-free algorithms derived in this section are obviously similar in many respects, it should be assumed, unless otherwise stated, that any further discussion in this chapter applies equally well to both versions.
The data scaling factor which appears in the square-root-free Givens rotation algorithm is updated as it passes from row to row of the R (or R) matrix
BOUNDARY CELL
INTERNAL CELL
if l1in =0 or |
6in =0 |
then |
|
|
(d .. fJ2d; S |
.. 0; Z .. xin; |
|
||
60ut |
.. 6in) |
|
|
|
otherwise |
|
|
i'out<-Xin- zr |
|
|
|
|
||
(z .. Y;n; d' .. fJ2d |
+ 6in1Z12; |
r <- r + s"iout |
||
|
||||
C .. fJ2d1d'; |
S .. |
6in |
zld'; |
|
d .. d'; 00ut <- |
c6in) |
|
Fig. S.Sb. Cell instructions for square-root-free algorithm requiring only two multiplies and two additions in the internal cell
5. Systolic Adaptive Beamforming |
173 |
according to (5.53). For the purposes of normal least-squares processing, this factor is initialized to unity so that, on input to the array, i(n) = x(n) and on subsequent rows, b1/ 2 Xk = Xk' Gentleman [5.18] has noted that more general values of the initial scale factor b serve to weight the input data and this leads to a simple implementation of the weighted least-squares algorithm. The consequences of this observation are explored further in Appendix 5.C.
As with the conventional Givens rotation array described in Sect. 5.4.3, cells in the right-hand column of the square-root-free least-squares processor perform the same function as those internal to the main triangular array. Factorization applies for these also and so they update elements of the vector u(n), where
u(n) = D 1/2(n)u(n) . |
(5.56) |
Thus the whole array updates the matrices R(n} and u(n). From (5.27), (5.42), and (5.56), it is clear that the least-squares weight vector w(n) is simply given by
R(n)w(n) + u(n) = 0 , |
(5.57) |
and hence the method of back-substitution, as described in Sect. 5.4.3, could also be used in conjunction with the square-root-free algorithms.
5.4.5 Sensitivity to Arithmetic Precision
One of the main reasons for choosing the method of QR decomposition rather than the conventional Sample Matrix Inversion technique for adaptive beamforming is its superior numerical property. This should lead to improved performance under conditions of limited arithmetic precision. We conclude this section, therefore, by presenting some typical results from a detailed computer simulation which was carried out to compare the effect of reduced arithmetic wordlength on the performance of two adaptive cancellation processors - one based on Sample Matrix Inversion and the other based on recursive QR decomposition. The results indicate quite distinctly the improved performance offered by the QR decomposition method under conditions oflimited arithmetic precision. In both cases the sequence of data samples was generated and applied to a constraint pre-processor, (to be described in Sect. 5.7). The pre-processor applied a look direction constraint towards the desired signal and was implemented at full computer precision. The transformed data were then truncated to the chosen arithmetic wordlength which was retained throughout the subsequent QR decomposition and Sample Matrix Inversion computation. To ensure a fair comparison between the two basic approaches, the Sample Matrix Inversion solution as defined by (5.14) was actually computed by performing a QR decomposition on the estimated covariance matrix. The only distinction, therefore, is that the recursive QR decomposition algorithm applies its orthogonal transformation to the original data matrix rather than the associated covariance matrix. In both cases the back-substitution was performed using full computer precision.
174 T.J. Shepherd and J.G. McWhirter
~~ |
---------------------- |
|
|
|
~N~=~8~ |
--------------- |
|
|
' |
|
+ |
|
|
|
|
3 Jammers at 0 dB |
|
|
|||
o |
|
|
|
|
desired signal at |
- |
35 dB |
|
||
M |
|
|
|
|
thermal noise floor |
|
|
|||
+ |
|
|
|
|
|
|
||||
o |
|
|
QR |
|
at |
-50 dB |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
N |
|
|
|
|
|
|
|
|
|
|
+ |
|
|
|
|
|
|
|
|
|
|
o |
|
|
|
|
|
|
|
|
|
|
+ |
|
|
|
|
|
|
|
|
|
|
COo |
|
|
|
|
|
|
|
|
|
|
~ |
|
|
|
|
|
|
|
|
|
|
CI: |
|
|
|
|
|
|
|
|
|
|
Zo |
|
|
|
|
|
|
|
|
|
|
(f)~ |
|
|
|
|
|
|
|
|
|
|
I |
|
|
|
|
|
|
|
|
|
|
o |
|
|
|
|
|
|
|
|
|
|
N |
|
|
|
|
|
|
|
|
|
|
I |
Sample matrix ~ |
|
|
|
|
|
|
|||
o |
|
|
|
|
|
|
||||
|
inversion |
|
|
|
|
|
|
|
||
M |
|
|
|
|
|
|
|
|
|
|
I |
|
|
|
|
|
|
|
|
|
|
|
10 |
20 |
30 |
40 |
50 |
60 |
70 |
80 |
90 |
100 |
|
|
|
No. of data samples |
|
|
|
|
Fig. 5.6. Comparison of signal to noise ratios computed using QR decomposition and Sample Matrix Inversion
Figure 5.6 shows a typical comparative result generated using a 24-bit floating point wordlength (16-bit mantissa and 8-bit exponent). The expected signal to noise ratio at the output of an 8-element array is plotted as a function of the number of data samples used to compute the least-squares weight vector. In this particular example we have modelled the effect of three equal power jamming signals received individually at levels of0 dB relative to a thermal noise floor of - 50 dB at the antenna array elements. The complex envelope of each jammer was described by an independent, narrowband Gaussian process. The model also incorporated a desired signal received by the array at a level of 15 dB above the thermal noise floor, but approximately 40 dB below the total received jamming.
From Fig. 5.6 it can be seen that the initial rate of adaptation is extremely rapid for both the Sample Matrix Inversion and the data domain QR decomposition algorithms. In each case, a substantial degree of jamming cancellation is obtained after about 10 to 20 data samples. However, with Sample Matrix Inversion there is clear evidence of an unstable weight vector as reflected by extreme fluctuations in the adaptive response curve. In contrast, the data domain QR decomposition method shows no sign of numerical instability and it is found that, over the timescale shown on these plots, the signal to noise ratio performance gets progressively better as the statistical accuracy (in the form of the updated R matrix) increases with time.
For this particular example, it was found that the Sample Matrix Inversion technique required a floating point wordlength of 32 bits (24-bit mantissa and 8-
5. Systolic Adaptive Beamforming |
175 |
bit exponent) to achieve comparable performance with the data domain QR decomposition algorithm. It cannot be assumed, of course, that this wordlength would be sufficient for any arbitrary dynamic range environment. One should only conclude that the wordlength required by the Sample Matrix Inversion approach will always be significantly greater than that for the data domain QR decomposition method.
5.5Direct Residual Extraction
5.5.1Definition of Residuals
In many least-squares problems, and particularly in adaptive beamforming, the least squares weight vector w(n) is not the principal object of interest. Of more direct concern is the corresponding residual, since this constitutes the noisereduced output signal from a fully adaptive array [5.19]. This particular form of residual, computed using the most recent value of the weight vector w(n), is known as the "a posteriori" residual and denoted e(n, n). It is given explicitly by
(5.58)
A second form of residual, which occurs naturally in least-squares algorithms such as the least-squares lattice (or "ladder") [5.20] and fast Kalman (or "Fast Transversal Filter") algorithms [5.21], is the "a priori" residual, denoted e(n, n - 1). Defined in terms of the previously computed weight vector w(n - 1) and the latest data vector [xT(tn ), y(tn )], it takes the form
(5.59)
Either ofthese residuals may be required at each instant tn' and in this section we demonstrate how they may be obtained directly from the systolic array described in Sect. 5.4 without explicit computation of the weight vector w(n) [5.22].
In order to proceed, it is necessary to consider the structure of the rotation matrix Q(n) which plays a fundamental role within the recursive QR decomposition process. In the following subsection we derive various properties of this matrix, some of which are not essential for the analysis in this section but are required later in the chapter or have proved to be very useful during our research on adaptive beamforming. '
5.5.2 Properties of Rotation Matrix Q(n)
From (5.40), it is clear that the matrix Q(n) must take the form
A(n) |
0 |
Q(n) = [ 0 |
In - p |
,(n)]
o , |
(5.60) |
",H(n) 0 y(n)
176 T.J. Shepherd and J.G. McWhirter
where A(n) is a (p - 1) x (p - 1) matrix, l,._p denotes the (n - p) x (n - p) l;!:nit matrix, ,(n) and ",(n) are (p - 1) x 1 vectors and yen) is a scalar. Now since Q(n) represents a sequence of Givens rotations, it may also be expressed as a product of the form
Q(n) = Qp-1(n)Qp-2(n) . .. Q2(n)Q1(n) , |
|
(5.61) |
|||
where Q;(n) denotes the elementary rotation matrix |
|
||||
1 |
|
|
|
|
|
1 |
|
|
|
|
|
c;(n) |
... st(n) |
(i = |
1,2 ... |
p - 1) |
(5.62) |
|
1 |
||||
|
|
|
|
|
|
|
1 |
|
|
|
|
- s;(n) |
... c;(n) |
|
|
|
|
in which all off-diagonal elements are zero except those in the (i, n) and (n, i) locations. It follows directly that
p-1 |
|
yen) = n c;(n) , |
(5.63) |
;=1
i.e., yen) is the product of the cosine terms in the p - 1 rotations represented by Q(n). It can also be shown that
,(n) = [s!(n), s!(n)c1(n), s~(n)c1(n)c2(n),
(5.64)
and that A(n) is a (p - 1) x (p - 1) lower triangular matrix, with diagonal elements [c1(n), c2(n), ... Cp-1 (n)]. Note that the determinant ofA(n) is equivalent to the scalar yen) defined above.
Relationships between the constituent submatrices of Q(n) may be derived by exploiting the unitarity of Q(n). From (5.60) and the property
|
(5.65) |
we obtain |
|
A(n)AH(n) + ,(n)~(n) = Ip-1 , |
(5.66) |
,(n) = - A (n)",(n)jy(n) , |
(5.67) |
and |
|
~(n)",(n) = 1 - y2(n) . |
(5.68) |
5. Systolic Adaptive Beamforming |
177 |
From (5.60) and the complementary property |
|
QH(n)Q(n) = In |
(5.69) |
we obtain |
|
AH(n)A(n) + ",(n)",H(n) = Ip - 1 , |
(5.70) |
",(n) = - AH(n)qI(n)/y(n) , |
(5.71) |
and |
|
qI(n)HqI(n) = 1 - y2(n) . |
(5.72) |
Equations (5.68) and (5.72) show that the vectors qI(n) and ",(n) share the same norm.
Some insight into the significance of these submatrices of Q(n) may be gleaned by substituting the expression (5.60) for Q(n) into the update equation (5.40). This furnishes the relationships
A(n)pR(n - |
1) + qI(n)xT(tn ) |
= R(n) |
(5.73) |
and |
|
|
|
",H(n)pR(n - |
1) + y(n)xT(tn) = 0 . |
(5.74) |
|
Elimination of xT(tn ) from these equations yields |
|
||
F(n)pR(n - |
1) = R(n) , |
|
(5.75) |
where F(n) is the (p - 1) x (p - |
1) upper triangular matrix given by |
|
|
F(n) = A(n) - qI(n)",H(n)!y(n) . |
(5.76) |
Multiplying both sides of (5.40) by QH(n) and using (5.60) and (5.69) leads directly to the relationships
pR(n - 1) = AH(n)R(n) |
(5.77) |
and |
|
xT(tn) = qlH(n)R(n) , |
(5.78) |
and from (5.76) and (5.77) it follows that |
|
F(n) =T A -H(n) . |
(5.79) |
Clearly A -H(n) (where the superscript - |
H signifies the inverse of the matrix |
Hermitian conjugate) may be regarded as an operator which updates the triangular matrix pR(n - 1) to R(n). Note, however, that elements of A -H(n) are functions of the elements of pR(n - 1) and xT(tn ).
178T.J. Shepherd and J.G. McWhirter
5.5.3A Posteriori Residual Extraction
For the purposes ofdirect a posteriori residual extraction, we must derive for the residual e(n, n) defined in (5.58) an alternative expression which does not depend explicitly on the least-squares weight vector wen). Substituting for x T(t.) from (5.78) leads to the expression
e(n, n) = tpH(n) R(n) wen) + y(tn) . |
(5.80) |
Now recall that (5.78) was derived by multiplying both sides of (5.40) by QH(n). We perform the same operation on (5.41) to obtain the relationship
|
(5.81) |
Substituting (5.81) into (5.80) produces the expression |
|
e(n, n) = tpH(n) R(n) wen) + tpH(n) u(n) + yen) a(n) |
(5.82) |
and, from (5.27), it follows immediately that |
|
e(n, n) = yen) a(n) . |
(5.83) |
This expression, which does not involve the weight vector wen), permits the residual to be computed as a simple by-product ofthe recursive QR decomposition process using the systolic array in Fig. 5.4. As noted in Sect. 5.4.3, when the conventional Givens rotation algorithm is employed, the parameter a(n) is generated quite naturally within the triangularization process and simply emerges from the bottom cell of the right-hand column in Fig. 5.4 eight (in general 2p - 2) clock cycles after the first element of x(t.) enters the array. The scalar yen), as given by (5.63), may also be computed very simply. The product of cosine terms is generated recursively by the parameter y as it passes from cell to cell along the chain of boundary processors. As with the b parameter in Fig. 5.5, correct timing of the systolic array requires this parameter to be delayed by one clock cycle at the output of each boundary celL The simple product required to form the a posteriori residual in (5.83) may then be computed by the final processing cell F in Fig. SA.
The a posteriori residual may be computed in a similar manner when the square-root-free algorithm is employed. The square-root-free algorithm delivers from the bottom cell in the right-hand column a scalar a(n) given by
bien) a(n) = a(n) , |
(5.84) |
where bien) is the scaling parameter appropriate to the (p - |
l)th row at time tn, |
and it has been assumed that ben) is initialized to unity on input to the array. From (5.83) and (5.84), it follows that
e(n, n) = yen) bien) a(n) , |
(5.85) |
5. Systolic Adaptive Beamforming |
179 |
where y(n) is the product of cosine terms which arise in the conventional Givens rotation algorithm. However, Ci(n), as computed by the boundary processors for the square-root-free algorithm, is simply the product of all the cterms and it can easily be shown that this is equivalent to the product of the squares of the conventional cosine parameters. Thus,
(5.86)
and
e(n, n) = Ci(n) a(n) . |
(5.87) |
Hence the a posteriori residual e(n, n) may also be obtained directly from the square-root-free processor array, using a final multiplier cell F as illustrated in Fig.5.5a.
Avoiding the need to derive an explicit solution for the least-squares weight vector w(n), and hence the problems associated with back-substitution, greatly simplifies the overall processor architecture as is evident from Figs. 5.4 and 5.5. It also enhances the numerical stability of the adaptive combiner since the solution of(5.27) could, in general, be ill-conditioned. Note, for example, that the processor array in Fig. 5.4 or 5.5 will produce the correct (zero) residual even if n < p and the matrix X(n) is not of full rank. This sort of unconditional stability is most important in the design of real time signal processing systems. In many respects, the systolic array in Fig. 5.4 or 5.5 constitutes an extremely sophisticated digital ruter which acts like a funnel, taking in several streams of sampled data and outputting a single stream of processed data. The fact that an advanced numerical algorithm can be mapped onto such a simple and regular structure highlights the benefit of coupling the algorithm and architecture designs as closely as possible. This hybrid discipline is well described by the term "Algorithmic Engineering", further examples of which will be found later in this chapter.
5.5.4 A Priori Residual Extraction
The a priori residual e(n, n - 1) as defined in (5.59) may also be obtained directly from the triangular processor array as we shall now show. By substituting for xT(tn ) from (5.74) into (5.59), we obtain the expression
e(n, n - 1) = [ - IfIH(n) pR(n - 1) w(n - 1) + y(n) y(tn)]fy(n) . |
(5.88) |
Now recall that (5.74) was derived by substituting the expression (5.60) for Q(n) into the basic update relationship (5.40). In a similar manner, (5.41) may be used to derive the corresponding equation
y(n) y(tn) = a(n) - IfIH(n) pu(n - 1) |
(5.89) |
180 T.J. Shepherd and J.G. McWhirter
and substituting this expression for yen) y(tn ) into (5.88) allows the a priori residual to be written in the form
e(n, n - |
1) = [ - ","(n) pR(n - 1) wen - |
1) + a(n) - |
","(n) pu(n - 1)]/y(n) . |
|
|
|
|
|
(5.90) |
It then follows from (5.27) that |
|
|
||
e(n, n - |
1) = a(n)fy(n) . |
|
(5.91) |
|
Thus the a |
priori residual is computed |
from the |
same quantities as the |
|
a posteriori residual. They are related by the expression |
||||
e(n, n).= y2(n)e(n, n - |
1) |
|
(5.92) |
|
and it follows that a(n) may be written in the form |
|
|||
a(n) = [e(n, n)e(n, n - |
In!" . |
|
(5.93) |
In the context ofleast-squares lattice and fast Kalman algorithms, a scalar ofthe form of a(n) has been termed a "rationalized" residual [5.23], and the quantity [1 - y2(n)] a "log-likelihood" variable [5.20], owing to its occurrence in the description of multivariate Gaussian processes.
As regards the square-root-free algorithms, it follows from (5.84), (5.86), and (5.91) that
iX(n) = e(n, n - 1) |
(5.94) |
and so the scalar which emerges naturally from the bottom cell in the right-hand column of Fig. 5.5 is the corresponding a priori residual. Clearly, no final multiplier cell is required if this residual constitutes the desired output of the processor array.
5.6 Weight Freezing and Flushing
5.6.1 Basic Concept
In Sect. 5.5 it was shown that if a data vector [xT(tn), y(tn )] is input to the triangular array in Fig. 5.4, the corresponding a posteriori least-squares residual e(n, n) emerges from the final cell F after the appropriate number of clock cycles. In order to achieve this result, the array must effectively perform two distinct functions: (1) generate the updated triangular matrix R(n) and the corresponding vector u(n) [or D(n), R(n) and u(n) for the square-root-free algorithm] and hence, implicitly, the updated weight vector wen); (2) act as a simple linear filter which applies the updated weight vector to the input data according to (5.58).
If the array is subsequently "frozen" by suppressing any update of the stored values, but allowed to function normally in all other respects, it will continue to