Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Diss / (Springer Series in Information Sciences 25) S. Haykin, J. Litva, T. J. Shepherd (auth.), Professor Simon Haykin, Dr. John Litva, Dr. Terence J. Shepherd (eds.)-Radar Array Processing-Springer-Verlag

.pdf
Скачиваний:
68
Добавлен:
27.03.2016
Размер:
14.79 Mб
Скачать

5. Systolic Adaptive Beamforming

211

Processor (WAP) anywhere in the world. We conclude the section by presenting some practical results which have been obtained to date.

5.9.1 Wavefront Array Processor

The central beamforming processor was based upon the triangular array architecture illustrated in Fig. 5.5 with a single linear constraint pre-processor of the kind described in Sect. 5.7.1. As stated above, it was designed to operate as a wavefront array and this was found to offer several advantages over its systolic counterpart: it was not necessary to impose a temporal skew on data input to the processor since the associated processing wavefront develops naturally within the array; it avoided the problems associated with high speed clock distribution; furthermore, using an array of transputer emulators, Broomhead et al. [5.39J have demonstrated that in situations where the processing time associated with each node is data dependent, a WAP can actually achieve higher throughput than its systolic counterpart. In order to operate in this mode, every processing element must, of course, incorporate some additional circuitry to implement a bi-directional handshake on each of its input/output links and thereby ensure that the necessary communication protocol is observed. This represents an overhead which is not negligible but which was easily absorbed within each node of the array.

The processor subsystem can accept up to six inputs in either analogue or digital form. The zero IF boards (one per channel) operate at an intermediate frequency of 184 MHz and have a bandwidth of 26 kHz with a gain of about 40 dB. The maximum sampling frequency of the A/D converters is 200 k samples/sec and the digital data is output as In-phase and Quadrature (I and Q) channels in 12-bit two's complement format. Alternatively, digital baseband signals can be input directly to the processor array.

The adaptive beamformer comprises 33 identical processing nodes, 21 of which constitute the Triangular Wavefront Array Processor (TWAP). The remaining 12 nodes are required for the Data Correction Wavefront Array Processor (DCWAP). The first 6 of these are used to correct for DC off-sets and I/Q imbalance in the receiver channels while the other 6 constitute a linear constraint pre~processor of the type illustrated in Fig. 5.12. Two processing nodes have been fitted onto one extended printed circuit board measuring 10 in. x 9 in., and 17 such boards are used to accommodate the 33 active processors. Ten other cards provide the data acquisition, system timing, and interface control functions. The processor sub-system is housed in a 4 ft long racking enclosure along with power supplies and fan cooling. The total power consumed by this unit is approximately 500 watts.

It was decided to base the processing node design on a standard programmable Digital Signal Processing (DSP) chip in order to reduce the system development cost and provide flexibility in the choice of test-bed algorithm for a range of applications. The Texas Instruments TMS32010 was chosen since the chips and the software development system were both readily available at the

212 T.J. Shepherd and J.G. McWhirter

INPUT

PORT 2

INPUT

OUTPUT

PORT 1

PORT 1

SERIAL I

OUTPUT

LINK

PORT 2

Fig. 5.20. Schematic diagram of components for wavefront array processor (WAP)

time. As illustrated in Fig. 5.20, additional hardware is used to provide multiple 16-bit input/output ports with input FIFO ("first-in/first-out") buffering, a look-up table (for reciprocals, etc.), floating point renormalization logic, localized control and a bit serial diagnostic data link. The node design also incorporates program memory ROM (containing "fixed" bootstrap and node algorithm code), and program memory RAM, which allows one to download additional node programs and thereby investigate the performance of other algorithms.

The processor subsystem is controlled via a terminal from which the operating mode can be set up and selected. The link to an external computer allows programs to be downloaded to the array nodes (each ofwhich has its own unique address) and the node contents to be dumped back to the host during software development. The final operational code for each node can be programmed into on-board ROM whereupon the array may operate in an entirely stand-alone manner.

Basing the development on a wavefront array processor greatly simplified the overall system design in several ways. For example, it was only necessary to design and test one processor board for both the triangular array and the data correction processor. Furthermore, since the sequence and timing of operations for each processor are controlled autonomously, the global system control is trivial compared with that which would be required for a more conventional design. Avoiding the problems of high-speed clock distribution simplified the electrical design and contributed significantly to the fact that the hardware (which sustains a processing rate of approximately 150 x 106 TMS32010 instruction cycles/sec) functioned perfectly first time and gives bit-for-bit agreement with the software emulator developed on a mainframe computer. With an

5. Systolic Adaptive Beamforming

213

asynchronous system, of course, great care must be taken to ensure that the communication links do not fail due to the problem of metastable states. In this respect, the test-bed array design relies entirely on the asynchronous characteristics of the TMS32010 and, so far, no problems have been experienced in practice.

Like most programmable DSP chips, the TMS32010 uses a 16-bit fixed point number representation and so the floating point operations required for the test-bed array have to be software programmed. This consumes a surprisingly large number of machine cycles and places a significant restriction on the throughput rate which can be achieved. For example, about 150 cycles are required to implement one complex 24-bit floating point multiplication, although the exact number is, of course, data dependent. Given the 200 ns instruction cycle for the TMS32010,the corresponding computation time is about 30 J1.sec. Since the internal node of the TWAP must perform two complex multiply/accumulate operations per sample time, and this constitutes the main processing bottleneck, the adaptive beamforming network can process about 104 complex samples/sec from each of the six receiver channels. Although this is too slow for most real-time applications, it is more than sufficient to carry out a wide range of laboratory experiments and trials.

Tests to date have included a systematic evaluation of the cancellation performance achieved using IF sources in the laboratory together with initial performance assessments in an anechoic chamber. The cancellation performance against multiple signals is well illustrated by the example in Fig. 5.21. The top illustration shows the combined signal spectrum before adaptation and indicates the individual signal and jammer power levels. The lower illustration shows the combined signal spectrum after adaptation subject to a constraint which maintains the gain in a wanted signal direction. It can be seen that all unwanted signals have been suppressed below the desired signal level which has not been affected. This represents an improvement in output signal-to-noise ratio of approximately 33 dB. The adaptive antenna processor test-bed constitutes an invaluable research tool which can demonstrate the performance achievable with a wide range of advanced adaptive algorithms. It is hoped that the success of this hardware development will lead to the use of wavefront array processors for solving other computationally intensive problems.

The throughput rate of the adaptive antenna processor test-bed is only limited by the very conservative node processor design. It is worth pointing out, for example, that STL have just completed the design of a very high performance programmable DSP chip [5.38, 40] intended as the node processor for a wide range of systolic and wavefront arrays. Using one of these chips, when available, for each of the 33 processing nodes would allow the throughput rate to be increased to at least 2 x 106 samples/s (equivalent to about 109 floating point operations/s). Furthermore, Lackey et al. [5.41] from the Hazeltine corporation have recently reported the development of a very high performance systolic array processor based on the architecture in Fig. 5.4. Designed using Weitek floating point chips, it can achieve an overall processing rate in excess of 109

5. Systolic Adaptive Beamforming

215

floating point operations/s and serves to highlight the processing power which can be achieved using the architecture described in this paper.

5.10 Further Developments

Previous sections have described in some detail the basic functions of the triangular QR decomposition array as required for least-squares adaptive beamforming. As we have stated in the introduction, however, the list of topics covered here is by no means exhaustive, and the area remains a subject of intensive investigation. In order to illustrate this point we should like to outline in this section a brief description of a variety of further developments which represent various topics of recently completed or current research. The following subsections contain the essentials of the topics covered; a more cursory review of other, related subjects is deferred until the final section.

5.10.1 Parallel Weight Extraction

In Sect. 5.6 a method was described whereby the least-squares weight vector could be obtained by flushing the unit matrix through the frozen network. The elements of the vector then appear at the residual output as an "impulse response" to the equivalent linear filter. Although apparently straightforward to apply, this method interrupts data flow for O(p) time units for a weight vector of length p. Recent work, however, has revealed how the weight vector may be obtained at each time epoch without the need to interrupt input data flow; the penalty for this advantage is the requirement of about twice the original amount of processing hardware. We have termed the associated procedure "parallel weight extraction".

The method is based upon the observation that if the MVDR post-processor method, as described in Sect. 5.8, is employed for the p - 1 unit constraint vectors 'Ii defined in (5.107), it will result in a network consisting of the triangular QR decomposition array alongside p - 1 post-processor columns, each containing elements of the vectors R -H(n)'Ii, (i = 1,2, ... ,p - 1). Thus the p - 1 columns together contain all the elements of the matrix R-H(n); moreover, according to the analysis of Sect. 5.8, the Givens rotation parameters generated in the left-hand triangle will update the elements of R - H from epoch to epoch, using the same internal rotation cells. Since R - H is a lower triangular matrix, t(p - l)(p - 2) of the post-processor cells will contain zeros, and are thus superfluous. The associated network takes the form of two adjacent triangles, and hence constitutes a parallelogram structure.

Several algorithms have been derived [5.42, 43J to extract the least-squares weight vector using this structure. In all cases the elements of each vector wen) emerge in parallel (although in conventional systolic time-staggered fashion) from the base of the right-hand triangle of processors; in addition, the two

R(n).

216 T.J. Shepherd and J.G. McWhirter

constituent triangles must be separated by a further column of rotation processors, which contain and update elements of the vector u(n) of (S.23), for the canonical problem, or the vector a{n) of (S.188), for the constrained problem in which a(n) = R-H(n)c*, with c the required steering vector.

We now provide an explicit example of a parallel weight extraction algorithm, as applied to the canonical problem. Consider first the QR decomposition of the extended data matrix

i(n) = [X(n),y(n)] ,

(5.207)

so that

 

Q(n)i(n) = [R~n)],

(S.208)

where R(n) is a p x p upper triangular matrix, and Q(n) is the corresponding n x n unitary matrix. It is easy to show that R(n) has the structure

R( ) =

[R(nJ

u(n)]

,

(S.209)

n

OT

r(n)

 

where R(n) and u(n) are the matrices defined in (5.20) and (S.23), respectively. [Equation (S.209) may be proved algebraically, but is obvious when the effect of an enlarged Gentleman-Kung array is considered.] r(n) is defined merely as the (p, p)th element of the matrix Now define the following extended vectors of length p,

(S.210a)

In terms of these vectors, (5.27) may be written 'in the form

R(n)w(n) = sr(n) .

(S.210b)

Solving for w(n), we obtain

 

w(n) = r(n)R- 1 (n)s

 

= r(n) x [last row of R-H(n)]* .

(5.211)

Thus w(n) may be obtained directly from the matrix R-H(n). The relevant network is shown in Fig. (S.22). It comprises a parallelogram network for the extended matrices R(n) and R-H(n), updated as described in Sect. 5.4. Equation (S.211) shows that w(n) may be extracted by outputting the complex conjugate of

Additional cells

 

 

1----

(C, s)

 

 

 

r'

r' -

~

+ I"Inl

2112

(r-

)

If "In = 0

then (c _1; 5 _0)

else (c - r/r'; 5 - Xln/r')

r~r'

5. Systolic Adaptive Beamforming

217

'!! T(n) , 1

(c, s) -~Q I----(C,S)

r'

r' - s· "In + c r r ..... r'

w

Fig. 5.22. Network for parallel weight extraction, with requisite additional cell functions

218 T.J. Shepherd and J.G. McWhirter

the final row ofthe right-hand network, and multiplying by r(n), which is locally available from the left-hand network.

In the more general processor structure, the left-hand triangle is a conventional QR decomposition array, into which enters data xTThe central column of processors contains the vector u(n) or a(n), and accepts primary channel data y or zero, depending upon whether the network is required to perform canonical or constrained least-squares processing. Rotation parameters from the boundary cells of the left-hand network serve to update elements in the left-hand triangle (R), the central column (u or a), and the right-hand triangle (R-"). Null vectors are fed into the top of this right-hand section. In general, a further row of processors is necessary at the base of this section to perform a final stage in the computation of the weights: the actual function of these processors depends upon the particular choice of algorithm.

Much work has yet to be carried out to refine the technique just described. In particular, a detailed comparison of the several suggested algorithms must be made with regard to relative stability, efficiency, and the practicality of hardware implementation. It is expected, however, that the general method of parallel weight extraction will find use in beamforming systems, although the method may eventually prove of equal importance in extracting the state vector (the equivalent of the least-squares weight vector) within the context of systolic Kalman filtering, which will be mentioned in Sect. 5.11.

5.10.2 QR-with-Feedback

Thus far we have considered purely digital methods of adaptive beamforming. In practice it may prove necessary to employ analogue techniques to apply the beamformer weights, especially in situations where high-bandwidth signals (i.e., in excess of about 10 MHz) are involved; in this regime digital multiplication at the required precision is impossible. The conventional methods for analogue beam-steering, however, have traditionally used gradient descent algorithms, such as those mentioned in Sect. 5.3.1, and although various accelerated gradient techniques have been suggested, none possesses the optimality of the least-squares methods.

Recently, a novel scheme has been suggested [5.44J (see also [5.45J) which combines the advantages of analogue beamforming with least-squares digital weight control. As will become apparent, the system possesses further desirable features; we shall mention these after a brief description of the network and method.

Consider an analogue beamformer comprising inputs from a primary channel y(t), and p - 1 auxiliaries x(t) which are weighted by the elements of the vector w. After analogue-to-digital (A/D) conversion the beamformer output may be written as a vector e of block length k,

(5.212)

where Xl is the k x (p - 1) data matrix for the block, and Yk is the corresponding

5. Systolic Adaptive Beamforming

219

primary channel vector. If Wo is the set of weights that minimizes the norm ofthe output vector eo over that block, i.e.,

 

(S.213)

then we need to correct w by an amount Llw,

 

Wo = w + Llw

(S.214)

at the end of the block. Using (S.212) to (S.214) we find

 

 

(S.21S)

Thus the necessary weight correction.Llw is obtained as the least-squares weight vector corresponding to the canonical problem with auxiliary data matrix Xl, and primary channel vector equal to the actual beamformer output vector e.

It is by now clear how the least-squares solution may be computed very efficiently using the QR decomposition network shown in Fig. S.23. In the arrangement shown, the analogue channel inputs Xl' X2' ••• Xp-l and yare simultaneously fed into the analogue beamformer, and also, in digitized form, into the QR decomposition network. The analogue output, e, ofthe beamformer is digitized and fed into the right-hand channel of the network. At the beginning

• • • XP-' Y

AID

.-.:1

~!

I§!

ILl

~i

III

.---""---""--. •__........._.........., II!!Ii

;[ I

.____.___J

BEAMFORMED

RESIDUAl.

Fig. 5.23. Schematic diagram of principal components for QR-with-feedback network

220 T.J. Shepherd and J.G. McWhirter

of each data block the network is initialized to contain a null matrix [R, u]. At the end of each block the weight vector Ll W can be flushed out of the network, converted to analogue format, and fed back to correct the analogue weights w. The weight flushing procedure does, of course, interrupt data flow for about p time units, and this can be avoided if the parallel weight extraction method, as discussed in Sect. 5.10.1, is employed. In either case the network must be reinitialized at the beginning of every k data samples.

More generally, nonstationarities in the data can be tracked by including a gain factor B into the update of the weight vector,

Wi+l = Wi + BLlwi'

(5.216)

where the subscript i denotes the ith block update. Ward et al. [5.44] have shown that convergence of W under steady-state conditions is totally independent of the signal data statistics, and will occur provided that B < 2 for positive B.

In addition to its usefulness in high-bandwidth systems, the "QR-with- feedback" method has other advantages over conventional direct data processing: because the digital network processes an output e"or, rather than the primary channel data itself, it is capable of correcting for circuit errors introduced by the system electronics. In particular, the scheme has been shown to be particularly robust against errors arising from nonlinearities present in the beamformer weights: simulation of a system contaIning a sinusoidal weight nonlinearity, giving rise to 60% peak-to-peak compression ofthe linear weight scale, has been shown to result in negligible reduction in convergence rate using this scheme, although it produced a slightly increased steady-state mean-square residual [5.44].

5.10.3 Structures for Broad-Band Adaptive Beamforming

The basic combiner illustrated in Fig. 5.2 and described in Sect. 5.3 is sufficient for nulling or locating narrow-band sources of radiation. In many situations, however, the receiver must be capable of dealing with signals of significant bandwidth, the effects of which can be quite deleterious if the fractional bandwidth exceeds a few per cent [5.11]. In effect, a single source divides into a range of sources, each of which may require nulling individually. Further degrees offreedom must then be introduced into the beamforming array in order to compensate for these sources.

A particularly effective extension of the narrow band array that supplies these degrees of freedom involves replacing each beamforming weight Wi by a tapped delay line (or finite impulse response filter) so that the input signal becomes weighted in both space and time [5.46]. For a single primary channel the generalized canonical output residual takes the form,

p-l Mo

 

e(n) = y(tn) + L L WmjXitn-m) '

(5.217)

j=l m=l