Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Diss / (Springer Series in Information Sciences 25) S. Haykin, J. Litva, T. J. Shepherd (auth.), Professor Simon Haykin, Dr. John Litva, Dr. Terence J. Shepherd (eds.)-Radar Array Processing-Springer-Verlag

.pdf
Скачиваний:
68
Добавлен:
27.03.2016
Размер:
14.79 Mб
Скачать

5. Systolic Adaptive Beamforming

221

where wmj is the scalar weight for the mth tap on the jth channel, and each channel has M0 taps. The associated least-squares minimization then proceeds in a fashion similar to that for the narrow-band beamformer. It may appear at first sight that the least-squares solution would be prohibitively costly to obtain: just as the narrow-band least-squares problem requires the effective inversion of

a (p - 1) x (p -

1) matrix, so here the order of the matrix requiring inversion

appears to be N (p - 1) x N (p -

1) - an increase of two orders of magnitude in

computational

complexity, the

solution being the (p - 1) x N matrix with

elements Wmj [which may be

expressed equivalently as a vector of length

(p - I)N]. Fortunately, it proves possible to use the shift invariance of data for

the time series in each auxiliary channel to reduce the computational load. The data vector [Xj(t,,), xit,,- d, ... ,Xj(t,,_m)]T for each channel i, and at time t", differs from the vector at time t" _1 only by the addition of the datum xj(t,,) and the loss of the datum xj(t,,-m-d, while other elements are shifted one place to the right. Since the least-squares estimator is defined as an average over time, the shift invariance of the data can be exploited to construct so-called "fast" leastsquares algorithms, one particularly stable example of which is the lattice algorithm [5.20]. The specific form of residual defined in (5.217) demands a version known as the "multichannel lattice joint estimator": in the present example the term "multichannel" refers to the p - 1 auxiliary channels, and ')oint estimator" effectively implies that the primary channel has unit weighting.

The type of multi-channel lattice algorithm developed by Morf and coworkers [5.20] requires Mo modular linear prediction lattice stages, in parallel with M0 joint estimator stages. The p - 1 auxiliary channels are processed in successive lattice stages as "forward" and "backward" residual vectors of length p - 1. Unfortunately, the algorithm requires the inversion of two (p - 1) x (p - 1) matrices within every lattice module at each time step; these inversions serve effectively to decorrelate the (p - 1) channels within every residual vector. The feasibility of using parallel processing structures, however, has led to methods of avoiding the processing bottleneck due to matrix inversion at each lattice stage. One such algorithm, developed by Ling and Proakis [5.47], and Lewis [5.48], involves the introduction of two QR decomposition triangular arrays inside each lattice module, and leads to the structure shown schematically in Fig. 5.24. The large-scale structure of the network resembles a conventional single-channel least-squares lattice filtet: in which the single-channel interconnections and reflection coefficient multipliers have been replaced by multichannel QR decomposition blocks of Givens rotation cells, whiC;h perform the least-squares channel decorrelation in each lattice stage. The whole network operates in a completely parallel and pipelined manner, and, as Ling [5.49] has emphasised, requires only three basic component cells, corresponding to the bound~ry and internal cells for the QR decomposition networks, described in Sect. 5.4, together with simple delay cells, necessary for the linear prediction processing.

It should be pointed out that alternative schemes of systolic multichannel processing for broad-band beamforming have been suggested in addition to

222

T.J. Shepherd and J.G. McWhirter

 

Forward

 

output

 

residual

x

vector

 

 

Backward

 

output

 

residual

 

vector

Fig. 5.24. Structure of broad-band beamforming systolic multi-channel lattice filter

those cited above. In particular, we should like to draw attention to the work of Lev-Ari [5.50], who has developed a modular multichannel lattice containing rectangular blocks of elemental lattice processing nodes, and the work of Mansour [5.51], who has devised a multichannel lattice structure based on a beamforming processor array due to Sharman and Durrani [5.5, 52]. A detailed comparison of these algorithms and architectures remains to be performed; the subject of least-squares broad-band beamformer design is likely to remain an extremely fertile area of research.

5.10.4 QR Decomposition and Neural Networks

QR decomposition may also have a leading role to play in the expanding field of neural networks. The principal functions of a neural network, as applied to pattern recognition or classification, are those of "learning" and "generalizing": the network should be capable of learning by absorbing representative input and output data from a given system of a priori unknown function, and subsequently of mimicking, or generalizing that system to a sufficient degree upon the input of further "test" data. One form of neural network, the multilayer perceptron, employs layers of nonlinear "hidden" units, fully interconnected between adjacent layers, and with variously weighted connections [5.53]. The connection strengths are varied during the learning phase to minimize the difference between outputs of the perceptron and the unknown system. The realization that this procedure amounts to a form of multidimensional curve fitting and interpolation has led Broomhead and Lowe [5.54] to formalize the method in terms of a well-defined mathematical procedure in which the unknown function is modelled by a limited set of Radial Basis Functions (RBFs) [5.55], and a linear least-squares fit is performed. We shall briefly describe their method, and a systolic network with the ability to implement the algorithm in an efficient fashion.

We assume that there exists a nonlinear vector functionf(x) to be estimated, where f is K-dimensional and real, and the space of the vector x is J- dimensional and real. This defines K graphs, each in a (J + 1)-dimensional space. The function f(x) represents a nonlinear system into which the n data

5. Systolic Adaptive Beamforming

223

points {xmlm = 1,2, ... ,n} are passed, sampling the function/to output the n vectors {Ymlm = 1,2, ... ,n}. (Ym will only take the value/(xm) in the absence of noise).

We wish to

fit K graphs to

these n sets of training data {xm,Yml

m = 1,2, ... , n}

and subsequently

to interpolate between these points. The

RBF method consists of estimating / using a linear combination of nonlinear functions to give the estimate j(x):

_

 

Ne

 

/(x) =

-

L cp(lIx - xfll)Wi'

(5.218)

 

 

i= 1

 

where xf (i

= 1, 2, ... , Nc) is a given set of Nc "centre" vectors in data space,

cp(r) is a

given nonlinear scalar funetion

(Gaussian, linear, cubic, etc.), Wi

(i = 1,2, ... , Nc) is a set of Nc real weight vectors to be determined, each of dimensionality K. A straightforward fit is obtained when n = Nc , so that

(5.219)

This is a set of linear matrix equations for the weight vectors Wi> and can be solved by inverting the square matrix tP with elements tPmi = cp(llxmxfll) [5.56]. However, we require / to be estimated using a large data set and relatively few parameters (i.e., we have n > NC>. Equation (5.219) now becomes overdetermined, and the weight vectors {Wi} must be evaluated using an alternative method such as least-squares. The least-squares procedure involves defining the residual error vectors

Ne

 

em=Ym+ L cp(llxm-xfll)Wi, (m=1,2, ... ,n)

(5.220)

i= 1

and minimizing each of the K diagonal elements of the matrix L:=l eme~ with respect to the weights Wi'

In their original treatment Broomhead and Lowe employed a singular value decomposition ofthe (now rectangular) matrix tP, and effectively inverted (5.220) by means of the Moore-Penrose pseudoinverse. Renals [5.57] has used a method which involves the direct inversion of the matrix tPT tP. However, it is possible to solve each ofthe above K least-squares problems simultaneously and recursively in time (i.e., as each new set of training data is received) using an extended QR decomposition array. This is illustrated in Fig. 5.25, and consists of several systolic blocks. The block ABeD (of diamond-shaped cells) is a preprocessing network in whose columns are stored the centre vectors {xf}. During data processing the time-staggered data vectors {xm } are input to the left ofthe block, and the norm-squares { II x - xf 112} progressively computed down each column. These norms are output to the row of lozenge-shaped cells EF, which compute the required non-linear function cp for each norm. Subsequent processing requires a conventional QR decomposition array GHI, followed by a

224 T.J. Shepherd and J.G. McWhirter

 

 

 

 

 

 

 

:1"

 

 

 

 

 

 

0

0

 

 

0

 

 

 

 

YKn

 

 

=H=

 

 

 

 

X,n ••

Xu

X',

 

 

 

 

 

X2ft

•••

X210

 

 

 

Y2n

••

 

 

Yin

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

• ••

 

•••

 

0

 

 

Y'3

Y22

0

X In

0

 

 

y 12

Y2'

••

0

 

 

 

 

 

 

 

 

 

Y"

• ••

 

 

 

 

 

 

 

 

 

 

0

 

 

0

 

 

 

 

1

 

 

•••

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

•••

K

 

 

 

 

 

 

 

••

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

...~

 

 

• ••

• ••

82

8,

 

 

Fig. S.2S. Systolic network for performing multidimensional interpolation

L

eK

succession of right-hand columns (the block JKLM), into which the data vectors {Ym} are entered through the edge JK. The resulting residuals, if required, may be output in the usual way from the edge ML.

While the block GKLI is adaptively performing least-squares minimization, the system is in the "training" mode, and the state of the system is given by the latest values of the weights {Wi}' stored implicitly in the block. The "test" or "generalizing" mode is simply accessed by freezing those weights in the manner described in Sect. 5.6, and entering test data Xt while inputting zeros for data y. The vectors eT , of the form (el' e2' ... ,eK)' successively output from the edge ML, give the associated interpolated values of the estimated functionjfor the corresponding test vectors according to

(5.221)

5. Systolic Adaptive Beamforming

225

The complete system thus implements a "learning" algorithm, and resembles a feed-forward multilayer perceptron network. Stated alternatively, the system constitutes an efficient nonlinear adaptive filter, offering rapid convergence in adaptation. As a result of the component triangular network contained in the system, the whole network possesses considerable versatility if many of the additional features described in this chapter are incorporated; for example, some input data can be assigned infinite weight if known with perfect certainty, and hence can be imposed as a pre-processor constraint, as described in Sect. 5.7. More generally, it is anticipated that linear least-squares techniques will be found useful in many alternative nonlinear parameter estimation systems, and processors for orthogonal decomposition will invariably find use in facilitating associated computational requirements.

5.11 Comments and Conclusions

We have attempted in this chapter to demonstrate methods by which leastsquares digital adaptive beamforming may be performed using parallel and pipelined algorithms, and how these algorithms map naturally onto systolic array architectures, leading to potentially greatly enhanced data throughput capability. In the development of this material we have chosen to focus our attention upon the triangular QR decomposition array, and the manner in which it can satisfy the central processing requirement in several beamforming tasks. Indeed, it is an advantageous property of the triangular array that it proves possible to retain the integrity of the QR decomposition processor when applied to a variety of beamforming problems; that is to say that the fundamental structure and operation of the processor array require little or no modification when employed in alternative roles. This generally results in the possibility of maintaining continuous data throughput as, for example, in the application to parallel weight extraction, described in Sect. 5.10. It is anticipated that many further processing structures for diverse applications will be discovered by the kind of "Algorithmic Engineering" used for the developments described in the foregoing sections of this chapter.

In the remainder of this section we should like to mention several additional current topics of research which are directly related to the material presented in this chapter. Some are concerned explicitly with adaptive beamforming, some concern features of the algorithm, and are therefore of general applicability, and the remainder relate to the use of the QR decomposition processor in other, related applications. Lack of space prohibits more than just a few words about each, but we hope that we shall have stimulated the reader's interest sufficiently to study further some of the accompanying references. We also wish to point out that the list of subjects is by no means exhaustive, and apologize in advance for any omissions.

226T.J. Shepherd and J.G. McWhirter

5.11.1Additional Topics

2-Dimensional Antenna Array Processing: In this chapter we have concentrated upon linear antenna arrays, and the associated signal processing. In practice, two-dimensional antenna arrays are more likely to be employed, and of those, the planar array appears the most favoured. Ho and Litva [5.58] have addressed the problem of designing a systolic array for two-dimensional adaptive beamforming, based upon the QR decomposition array. The resulting structure, the "cosmic cube", is a three-dimensional network, designed to maximize the throughput from the rows and columns of the planar antenna array. Further details are given in a separate chapter of this volume.

Multiple-Constraint Post-processing: In Sect. 5.7 we described in some detail how multiple simultaneous constraints could be implemented on a pre-processor to the Kung-Gentleman array. In Sect. 5.8 a single constraint was imposed using a post-processor to the network. Yang and Bohme [5.59] have shown that it proves possible to impose multiple constraints also using a post-processor network. The method involves the explicit inclusion of hyperbolic rotations, which may be carried out using CORDIC processor cells [5.60]. It is shown in Appendix C that such rotations are also possible within the processing cells described in this chapter. Yang and Bohme employ a linear processor array for performing all operations; however, the method can be extended to twodimensional systolic arrays (of the type described in this chapter). In such an array the hyperbolic rotations would need to be carried out within an additional triangular structure fitting in below the constraint post-processor columns (Fig. 5.16).

Linear Processor Arrays: Linear systolic arrays for QR decomposition sometimes provide an attractive alternative to two-dimensional structures. Although a degree of parallelism is lost, resulting in a reduction in data throughput rate as compared with a two-dimensional array, a linear array has the advantage that the number of processing cells required varies only linearly with the system order, whereas that of a two-dimensional array varies with the square of the order. In addition, linear processors are more readily accessible with existing digital technology. Besides the array of Yang and Bohme [5.37,59] mentioned above, linear arrays for QR decomposition have been proposed by Chen and Yao [5.61], and also by Rader [5.62], who has developed an architecture suitable for wafer scale integration. Partitioning the QR decomposition problem, so that high-order systems can run on low-order processors, has been addressed by several authors, including Schreiber and Kuekes [5.17], Heller

[5.63], and Torralba and Navarro [5.64].

Other Processing Arrays for Orthogonal Decomposition: Alternative twodimensional arrays have also been proposed for least-squares problems via QR decomposition. Each of these employs a different architecture from the triangular array described here. Kung and Gal-Ezer [5.65] have designed a square array for symmetric matrices, while Heller and Ipsen [5.66], (see also Heller [5.63] and

5. Systolic Adaptive Beamforming

227

Schreiber and Tang [5.4]) have proposed a rectangular array suitable for decomposing banded matrices, and which outputs the triangular R matrix. An alternative triangular two-dimensional processing structure has been put forward by Sharman and Durrani [5.5, 52]. Their network is constructed from cells designed initially for the least-squares lattice filter, used in time series processing. The whole processor effectively performs simultaneous forward and backward spatial linear prediction (that is, it solves the forward and backward canonical problem) for all orders up to the order of the system (order p, in this chapter). Several algorithms and networks related to the Sharman-Durrani algorithm have also appeared in the literature, for example, in papers by Mansour [5.51], and Yuen et al. [5.67].

Error Analysis and Fault Detection and Location: Stability and accuracy are two essential ingredients in any signal processing algorithm. Although there have been several studies of the numerical accuracy associated with orthogonal decomposition techniques, (e.g., references [5.25,68,69]), only recently has a detailed stability analysis of the direct residual extraction method appeared: Luk and Qiao [5.70] perform this analysis, and suggest a modification, "Algorithm Estimate", which gives more accurate residuals for ill-conditioned data matrices. Fault and error detection and location within the comparatively complex hardware networks required for QR decomposition are also important issues. Fortunately, these tasks prove relatively simple to accommodate within the basic structure: Luk et al. [5.71], and Liu and Yao [5.72] present algorithms which provide these facilities, for which the principle entails monitoring the norm of a residual vector formed from a linear combination of the auxiliary channel inputs. If the computation is numerically and arithmetically exact, this vector should be identically zero. A departure from zero of the vector norm then flags the presence of an error.

Real Versus Complex Arithmetic: In earlier sections of this chapter we have assumed the ability of the processor cells to compute using complex arithmetic. A systolic processor based on real Givens rotations, however, might have some practical advantages. For example, the basic cell design could be greatly simplified, and its input/output bandwidth reduced, leading to a pipelined array with higher throughput potential. Further, the use of complex arithmetic requires that the I and Q (in-phase and quadrature) components of the input data should be in exact quadrature, with the two channels balanced in both gain and phase response. Phase and amplitude imbalance give rise to reduction of interference cancellation in adaptive nulling applications. Ward ~t al. [5.73] have shown how to perform adaptive bearnforming using a real data representation, with approximately the same number of real operations as for the complex arithmetic array. In this arrangement the I and Q residuals are computed independently, and two independent weight sets calculated. This leads to the possibility of achieving a higher degree of jamming cancellation at the bearnformer output, and obviates the necessity for auxiliary I and Q signals to be perfectly orthogonal at the outset.

228 T.J. Shepherd and J.G. McWhirter

Singular Value Decomposition and Eigen-Analysis: Singular Value Decomposition (SVD) and eigenvalue analysis of a matrix are yet more sophisticated mathematical techniques than orthogonal decomposition. They have many signal processing applications, among which may be included further highresolution beamforming and spectral estimation techniques, e.g., the methods of Kumaresan and Tufts [5.74], and the MUSIC algorithm of Schmidt [5.75]. SVD and eigen-analysis, unlike the procedures for matrix inversion and orthogonal decomposition, necessarily require iterative methods of computation for the singular values or eigenvalues, which cannot be calculated in a finite number of operations. Nevertheless, they are still amenable to systolic methods of computation. In some methods the first stage is the triangularization of the input matrix. Clearly, the Kung-Gentleman array is useful for this purpose. Methods involving Jacobi rotations, Kogbetliantz transformations, or QR iterations have been put forward for the subsequent systolic processing. In particular, Luk [5.76] has proposed a triangular architecture for algorithms which are capable of performing the initial QR matrix decomposition, followed by a series of Kogbetliantz transformations for SVD. More recently, de Villiers [5.77] has demonstrated improved performance in finding singular values using a Gentleman-Kung architecture, and algorithms for systolic updating of SVD have been described by Moonen et al. [5.78].

Systolic Kalman Filtering: Finally, we should like to mention a further application of the QR decomposition array in a field other than adaptive beamforming - the extensive area of nonstationary linear estimation embodied in Kalman filtering [5.79]. Orthogonal decomposition was introduced into Kalman filtering when it was established that square-root filtering offered improved numerical stability over the conventional approach, which involves propagation of an error covariance matrix. See the survey by Kaminski et al. [5.80] and the book by Bierman [5.81] for further details. Paige and Saunders [5.82] applied orthogonal decomposition techniques to the problem, re-cast in a weighted linear least-squares form [5.83]. These developments have given rise to a number of proposed structures in which the triangular array of Kung and Gentleman can perform the necessary decomposition, within both the measurement and time updating stages.

While Andrew [5.84] has parallelized the Kalman filter using the U-D algorithm described by Bierman [5.81], and Jover and Kailath [5.85] have gone a stage further by working with the triangularized matrix-form square-root Kalman equations, the resulting algorithms perform only the measurement updates required by the filter. Chen and Yao [5.86, 87] (who employ the method of Paige and Saunders [5.82] in their algorithm) and Sung and Hu [5.88] describe methods for handling both the measurement and time updates. Further improvements in algorithm efficiency and processor utilization have been made by Kung and Hwang [5.89], Gaston and Irwin [5.90], and by Gosling et al. [5.91]. These methods are all based on propagation of the (square-root) information filter formulation of the Kalman filter, and a more detailed review of much

5. Systolic Adaptive Beamforming

229

ofthis work is given in Chap. 8 of the book by Kung [S.24]. A systolic algorithm based on the covariance filter formulation has been described by Irwin and

Gaston [S.92, 93].

Appendix S.A Modified Gram-Schmidt Algorithm

In this appendix the QR decomposition algorithm is compared with another method commonly used in adaptive beamforming, the Modified Gram-Schmidt (MGS) algorithm [S.2S, 26J. The MGS algorithm is well-conditioned and provides an exact solution to the least-squares minimization problem. However, it is not based on recursive update techniques, and so is disadvantageous from the point of view of circuit architecture. Although its mathematical formulation is quite independent, we shall show how the QR decomposition and MGS algorithms are closely related.

The operation ofthe MGS algorithm can be described as follows: We define the n x p matrix 4>p(n) as

4>p(n) = ['1 (n), '2(n), ... ,'p(n)]

 

= [X(n), y(n)J

(S.A.I)

and take the first column of 4>p(n) (i.e., the vector of all time-sampled data entering the first channel up to time tn ) as the first vector q1 (n) of a new orthogonal set. The remaining vectors 'An) (j = 2, 3, ... ,p) are then made

orthogonal

to

q1 (n) by applying the

simple projection operations ,j(n)

= P(q1 (n), 'j(n» where

 

P(a, b) =

b -

aHb]

(S.A.2)

a [ aHa .

The vector '2(n) is taken as the second member q2(n) of the new orthogonal

set

and this in tum is made orthogonal to the remaining

vectors ,j(n)

(j =

3,4, ... ,p) by applying the projection operations ,j(n) =

P(q2(n), ,j(n».

Note that since q1 (n) is orthogonal to ,j(n) for j ~ 2 it must also be orthogonal to ,j(n). The vector ,~(n) is then taken as the third member q3(n) of the orthogonal set and the process is continued until a complete set of p orthogonal vectors is obtained. Now, the column vector qp(n) has ~en constructed by adding to the vector y(n) a linear combination of the columns in the matrix X(n). It must therefore be of the form

qp(n) = X(n)c;(n) + y(n) , (S.A.3)

where the vector of coefficients c;(n) is to be determined. Since qp(n) is orthogonal to the other vectors q1 (n), q2(n), ... qp-1 (n), and hence to the columns of X(n), it must correspond to the minimum norm residual vector e(n). This is easily

230 T.J. Shepherd and J.G. McWhirter

demonstrated by multiplying (5.A.3) by XH(n). Equation (5.A.3) is thus identical to (5.10), and so the vector c;(n) is given by

c;(n) = w(n) ,

(5.A.4)

the least-squares weight vector. The vector qp(n) therefore provides the required output from the adaptive combiner.

Although the MGS algorithm is known to have good numerical properties [5.25], it does not lead to a particularly good circuit architecture. The type of circuit required is illustrated (for p = 4) by means of the block diagram in Fig. 5.26. It comprises a triangular array of processors which implement the individual projections and produce the orthogonal vectors ql (n) to qp(n) in sequence. Note that the orthogonalization procedure is carried out column-by- column, and so the entire data matrix must be accessed before the operation can commence. This leads to a considerable overhead in the amount ofmemory and control circuitry required. It also means that as each row ofdata is received, the entire procedure must be repeated in order to update the least-squares estimate. As a result, the MGS algorithm tends to be used in block sequential mode where the updated solution is computed using the next complete block of data.

9.,

9.2

9.3

 

'i

9 • .ilH II I ilH iI

 

1I'·1l-9.il

If

Fig. 5.26. Network for performing Modified Gram-Schmidt orthogonalization