Diss / 10
.pdf
264 |
|
|
|
|
Signal Processing in Radar Systems |
|
|
N1 = 3,000 (11,230) |
1 |
|
|
2 |
N2 = 750 (7,480) |
|
|
|
|
|||
|
N3 = 3,000 (8,230) |
3 |
|
|
4 |
N4 = 1,500 (6,730) |
|
|
5 |
||||
|
|
|
|
N5 = 100 (5,230) |
||
|
|
|
|
|
|
|
N7 = 500 (1,060) |
N8 = 2,500 (810) |
|
6 |
|
N6 = 1,500 (5,130) |
|
|
|
|
|
|
||
7 |
8 |
|
9 |
|
|
N9 = 750 (3,630) |
N1 0 = 25 (560) |
10 |
|
11 |
|
|
N11 = 30 (2,880) |
|
|
|
|
|||
|
12 N12 = 250 (535)13 |
|
14 |
|
|
N14 = 500 (2,850) |
|
15 N15 = 15 (285) |
|
16 |
|
N16 = 750 (2,350) |
|
|
|
|
|
|
||
17 |
N17 = 250 (270) |
|
18 |
|
|
N18 = 1,500 (1,600) |
|
19 N19 = 20 (20) |
|
20 |
|
|
N20 = 100 (100) |
|
|
|
|
|
||
FIGURE 7.16 Multilevel graph.
Now, let us start a direct distribution of the graph tops by microprocessor subsystems. The distribution is carried out in the following sequence:
1.The graph top ranking is carried out—the weight equal to maximal way (by the operation number) leading from the given top to the end top (the top ranks are given in the last column of Table 7.5 and designated by the numbers in parentheses in Figure 7.16) is assigned to each graph top.
2.The graph top with the maximal rank (the top 1) is assigned for the first microprocessor subsystem. The graph top with the maximal rank after loading the first microprocessor subsystem and not requiring results of operations made by the first microprocessor subsystem is assigned for the second microprocessor subsystem (the top 2).
3.The remaining graph tops with maximal rank are assigned to realize the given algorithm under the condition that there are required data. If the required data are not available or absent, a microprocessor subsystem is in the standby mode until obtaining the required data from other microprocessor subsystem.
The loading schedule of microprocessor subsystems is presented in Figure 7.17. As follows from Figure 7.17, the threshold number of microprocessor subsystem operations under paralleling on two microprocessor subsystems is defined as Nth = 11,230 operations under the condition that the total
work content of the considered algorithm is given by Mtotal = 16,290 operations. The second microprocessor subsystem is loaded only on 45%. The coefficient of microprocessor subsystem loading,
as a whole, is determined in the following form:
Kload = |
Mtotal 0.725. |
(7.50) |
|
2Nth |
|
Thus, we cannot say that a paralleling of the considered algorithm by this way is an ideal process with the viewpoint of loading the two microprocessor subsystems. To increase the coefficient of loading Kload of microprocessor subsystems, there is a need to decrease a length of macro-operations and to carry out paralleling computations inside each macro-operation.
268 |
Signal Processing in Radar Systems |
resulting in increased performance and flexibility and reduced size and cost. Advances in ADC and DAC technologies are pushing the border between analog and digital processing closer and closer to the antenna. In the past, implementing a real-time radar digital signal processor typically required the design of a custom computing machine, using thousands of high-performance ICs. These machines were very difficult to design, develop, and modify. Digital technology has advanced to the point where several implementation alternatives exist that make the processor more programmable, and easier to design and change.
The parallel microprocessor subsystem architecture employs multiple general-purpose processors that are connected via high-speed communication networks. Included in this class are high-end servers and embedded processor architectures. Servers are typically homogeneous processors, where all of the processing nodes are identical and are connected by a very highperformance data bus architecture. Embedded processor architectures are typically composed of single-board computers (blades) that contain multiple general-purpose processors and plug into standard backplane architecture. This configuration offers the flexibility of supporting a heterogeneous architecture, where a variety of different processing blades or interface boards can be plugged into the standard backplane to configure a total system. It is apparent that high-speed serial links will be the primary communication mechanism for multiprocessor subsystems into the future, with ever-increasing data bandwidths. These parallel microprocessor subsystem architectures offer the benefit of being programmable using high-level languages, such as C and C++. A related advantage is that programmers can design the system without knowing the intimate details of the hardware. In addition, the software developed to implement the system can typically be moved relatively easily to new hardware architecture as part of a technology refresh cycle. On the negative side, these systems can be difficult to program to support real-time signal processing. The required operations need to be split up appropriately among the available processors, and the results need to be properly merged to form the final result. A major challenge in these applications is to support the processing latency requirements of the system, which defines the maximum length of time allowed to produce a result. The latency of the microprocessor subsystem is defined as the amount of time required to observe the effect of a change at a processor’s input on its output. Achieving latency goals often requires assigning smaller pieces of the workload to individual processors, leading to more processors and a more expensive system. Another challenge facing these systems in radar application is reset time. In a military application, when a system needs to be reset in order to fix a problem, the system needs to come back to full operation in a very short period of time. These microprocessor subsystems typically take a long time to reboot from a central program store and, hence, have difficulty meeting reset requirements. Developing techniques to address these deficiencies is an active area of research. Finally, these processors are generally used for non-real time or near-real-time data processing, as in target tracking and display processing. Since the 1990s, they have started to be applied to real-time signal processing applications. Although they might be cost-effective for relatively narrowband systems, their use in wideband DSP systems in the early twenty-first century is typically prohibitively expensive due to the large number of processors required. This situation should improve over time as faster and faster processors become available.
The introduction of the FPGA in the 1980s heralded a revolution in the way real-time DSP systems were designed. FPGAs are integrated circuits that consist of a large array of configurable logic elements that are connected by a programmable interconnect structure. At the time of this writing, FPGAs can also incorporate hundreds of multipliers that can be clocked at rates up to a half billion operations per second, and memory blocks, microprocessors, and serial communication links that can support multigigabit-per-second data transfers. FPGAs allow the designer to fabricate complex signal processing architectures very efficiently. In typical large applications, FPGA-based processors can be a factor of 10 (or more) smaller and less costly than systems based on general-purpose processors. This is due to the fact that most microprocessors only have one or very few processing elements, whereas FPGA have an enormous number of programmable logic
Design Principles of Complex Algorithm Computational Process in Radar Systems |
269 |
elements and multipliers. On the negative side, utilizing an FPGA to its best advantage typically requires the designer to have a thorough understanding of the resources available in the device. This typically makes efficient FPGA-based systems harder to design than radar systems based on general-purpose processors, where a detailed understanding of the microprocessor subsystem architecture is not necessarily required. In addition, FPGA designs tend to be aimed at a particular family of devices and take full advantage of the resources provided by that family. Hardware vendors are constantly introducing new products, invariably incorporating new and improved capabilities. Over time, the older devices become obsolete and need to be replaced during a technology refresh cycle. When a technology refresh occurs several years down the road, typically the available resources in the latest FPGAs have changed or a totally different device family is used, which probably requires a redesign. On the other hand, software developed for general-purpose processors may only need to be recompiled in order to move it to a new processor. Tools currently exist that synthesize C or MATLAB code into a FPGA design, but these tools are typically not very efficient. The evolution of design tools for FPGAs to address these problems is an area of much research and development.
The complex algorithm of the computational process is a set of elementary DSP algorithms for all stages and CRS modes of operation. Off-line complex algorithms of individual stages of DSP, which are not associated with one another by information processes and control operations, are possible, too. To design the complex algorithm of the computational process, there is a need to have an unambiguous definition of microprocessor subsystem functioning in the solution of goal-oriented DSP problems. This definition must include the elementary DSP and control algorithms, a sequence of their application, the conditions of implementation of each elementary DSP and control algorithm, and intercommunication between the DSP and control algorithms using input and output information. A general form of such definition and description can be presented using the logical and graph flowchart of the DSP and control algorithms.
One of the best-known ways to assign the complex algorithm in depending on complexity is the logical or formula-logical algorithmic flowchart. The elementary operators and recognizers are presented in a geometrical form in the logical algorithm flowchart (rectangle, jewel boxes, trapeziums, etc.) connected with each other by arrows in accordance with the given sequence order of counting operators and recognizers in the complex algorithm. Titles of elementary operations (DSP algorithms) are written inside the geometrical forms. Sometimes the formulas of logical operations carried out and logical conditions under test are written inside the geometrical forms. In this case, the corresponding logical flowchart of the algorithm is called the formula-logical block diagram.
The main problems solved by the graph flowcharts of complex algorithms are a definition of rational ways to present these problems and a choice of computational software tools and microprocessor subsystems to realize the complex DSP algorithm of a radar system. In short, the problem of optimization of computational process is assigned. The solution of this problem allows us to reduce significantly the realization time and to simplify the complex DSP algorithm of the radar system. Similar problems are the problems of network planning and control.
The deterministic network model cannot present a complex DSP algorithm functioning in a CRS since it is impossible to predict a set of elementary DSP algorithms and sequence of realizations for each practical situation. Therefore, the stochastic network model, in which the transitions in the network graph are defined by the corresponding probabilities of transitions given by specific conditions of CRS functioning, is more suitable to image and analyze a realization of complex algorithm by microprocessor systems. When the network model has been constructed, the problem to estimate the time to complete all operations, that is, the time to finish all operations by microprocessor subsystems with the given effective speed of operations, arises. This time cannot be higher than a total duration to finish a complex algorithm operation defined in the most unfavorable way from the initial graph top an to the final graph top ak, that is, along such a route
270 |
Signal Processing in Radar Systems |
that generates a maximal duration of operations. This route is called extreme. The extreme route in the stochastic network model cannot be presented in clear form as, for example, in the network model with a given structure. Because of this, the problems of defining the average time or average number of operations required to realize the complex algorithm are assigned under analysis of stochastic network models. If the statistical characteristics and parameters of noise and target environment inside the radar coverage are known and the target pip beginning algorithm parameters are selected, we are able to determine the probability of transition in the network graph of the complex DSP algorithm. However, we cannot say that this possibility exists forever. In some case, the probability of transition can only be estimated as a result of computer simulation of the complex DSP algorithm.
Results of analytical calculation of the required number of arithmetical operations are obtained separately by the number of additions and subtractions, products, and divisions. There is a need to determine the number of reduced arithmetical operations. As the reduction operation, as a rule, an addition is used (short operation). The number of reduced arithmetical operations is determined for each microprocessor subsystem taking into consideration the known ratio between the time to carry out the ith long and short operations. The DSP of target return signals has a pronounced information-logical character. Logical operations and transition operations are for about 80% of the total number of elementary DSP operations (cycles) in the process of realization of the complex DSP algorithm required for CRS functioning. Consequently, under the work content computation of the elementary DSP algorithms there is a need to take into consideration the nonarithmetical operations, too. Under the work content definition, we must take into consideration additionally that the number of microprocessor operations depends on a mode of programming.
The network model of complex DSP algorithm allows us to define, in principle, the average work content. If we know the realization time of a single reduced operation, then we can compute the average realization time of a complex DSP algorithm. Inversely, if a limitation on the average realization time of a complex DSP algorithm is given, we are able to determine the required work content of microprocessor subsystems to realize the given complex DSP algorithm. Sometimes, to solve the problems of computational resource analysis we need to know information about the work content variance.
Henceforth, there is a need to take into consideration the nonarithmetical operations by the corresponding coefficient of reduction Kredna . For example, let Kredna = 3; then we obtain that the total number of operations required for a microprocessor subsystem in the case of a single realization of the considered complex digital signal reprocessing can be presented as M ≈ 2 × 104 operations. Thus, in the considered example, there is a need to use 2 × 104 microprocessor operations on average to process a single target pip. Naturally, this number corresponds only to the considered algorithm and can be reduced significantly if we are able to upgrade the algorithm of target pip identification, to simplify the algorithm of target track parameters smoothing, etc. The main purpose of consideration of this example is to present a possibility of calculating the work content of the complex DSP algorithm and indicate simultaneously some problems arising in the course of these calculations.
The results of work content evaluation of the complex DSP algorithm give us the initial information to select the structure and elements of microprocessor subsystems assigned to realize this complex DSP algorithm in a CRS. To ensure the required work content and operational reliability, the designed computational system must include several microprocessor subsystems, as a rule. The main peculiarity of these microprocessor subsystems is instrument or programmable parallelism of computational process. To organize the parallel computational process, there is a need to carry out a paralleling of the complex DSP algorithm. In a general case, the paralleling of complex DSP algorithms can be considered only for a specific problem taking into consideration the supposed structure of a computational subsystem. Consequently, in the course of designing, the problems
Design Principles of Complex Algorithm Computational Process in Radar Systems |
271 |
of selecting the structure of a computational subsystem based on the microprocessor subsystems and algorithmic transformation in accordance with the proposed structure of the computational subsystem are closely related. There is a set of general statements and methods of algorithmic solution paralleling.
A simple consideration of the multilevel graph indicates that there is a possibility to parallel a computational process under realization of the corresponding complex DSP algorithms. As follows from Figures 7.14 and 7.15, more than one macro-operations (from 1 to 3) can be carried out simultaneously in the realization of the linear filtering algorithm. Consequently, several microprocessor subsystems, namely, from 1 to 3, respectively, can participate in the computational process. In doing so, the realization time of the complex DSP algorithm can be reduced essentially, since instead of 20 macro-operations carried out in sequence by one microprocessor system there is a need to carry out not more 10 macro-operations for each microprocessor subsystem in parallel scheme.
Further transformations of the multilevel graphs can be carried out in two avenues:
1.Definition of a rational number of microprocessor subsystems realizing the paralleling complex DSP algorithm within the limits of the given minimal time
2.Optimal distribution of macro-operations by microprocessor subsystems if the number of microprocessor subsystems is given and their characteristics are known and the minimal realization time is a criterion of effectiveness
To solve both the first and the second problems, there is a need to obtain additional information concerning the weights of graph tops, that is, the number of elementary operations carried out during the realization of all macro-operations, by which the graph tops are marked.
The loading schedule of microprocessor subsystems is presented in Figure 7.17. As follows from Figure 7.17, the threshold number of microprocessor subsystem operations under paralleling on two microprocessor subsystems is defined as Nth = 11,230 operations under the condition that
the total work content of the considered algorithm is given by Mtotal = 16,290 operations. The second microprocessor subsystem is loaded only on 45%. The coefficient of microprocessor subsys-
tem loading Kload can be defined by (7.50). Thus, we cannot say that a paralleling of the considered algorithm by this way is an ideal process with the viewpoint of loading the two microprocessor subsystems. To increase the coefficient of loading Kload of microprocessor subsystems, there is a need to decrease a length of macro-operations and to carry out paralleling computations inside each macro-operation.
Under complex DSP, we consider a set of objects, namely, the target return signals, target pips, target tracks, and so on. Information about these objects must be processed using the same complex DSP algorithms. If the considered objects are independent, then information about each object can be processed independently. In this case, we use the independent objects paralleling.
The discussed paralleling computer system consisting of the microprocessor subsystems and carrying out the digital signal reprocessing algorithms has a high reliability and operability and, additionally, ensures high-quality user information about the tracking targets. The disadvantage of this complex digital signal reprocessing algorithm paralleling is the necessity to use a very complex associative addressing device, especially if the number of tracking targets is high.
REFERENCES
1.Rahnema, M. 2007. UMTS Network Planning, Optimization, and Interoperation with GSM. New York: John Willey & Sons, Inc.
2.Laiho, J., Wacker, A., and T. Novosad. 2006. Radio Network Planning and Optimization for UMTS. 2nd edn. New York: John Wiley & Sons, Inc.
3.Woolery, J. and K. Crandall. 1983. Stochastic network model for planning scheduling. Journal of Construction Engineering and Management, 109(3): 342–354.
272 |
Signal Processing in Radar Systems |
4.Butler, R. and A. Huzurbazar. 1997. Stochastic network models for survival analysis. Journal of the American Statistical Association, 92(437): 246–257.
5.Tsitsiashvili, G. and M. Osipova. 2008. Distributions in Stochastic Network Models. New York: Nova Publishers.
6.Neely, M. 2010. Stochastic Network Optimization with Application to Communication and Queuing Systems. Synthesis Lectures on Communication Networks. Los Angeles, CA: Morgan & Claypool Publishers.
7.Creebery, D. and D. Golenko-Ginzburg. 2010. Upon scheduling and controlling large-scale stochastic network project. Journal of Applied Quantitative Methods, 5(3): 382–388.
8.Pospelov, D. 1982. Introduction to Theory of Computational Systems. Moscow, Russia: Soviet Radio (in Russian).
