
- •Contents
- •1 Introduction
- •1.1 Objectives
- •1.2 Overview
- •2 Background
- •2.1 Digital Design for DSP Engineers
- •2.1.2 The Field-Programmable Gate Array
- •2.1.3 Arithmetic on FPGAs
- •2.2 DSP for Digital Designers
- •2.3 Computation Graphs
- •2.4 The Multiple Word-Length Paradigm
- •2.5 Summary
- •3 Peak Value Estimation
- •3.1 Analytic Peak Estimation
- •3.1.1 Linear Time-Invariant Systems
- •3.1.2 Data-range Propagation
- •3.2 Simulation-based Peak Estimation
- •3.3 Hybrid Techniques
- •3.4 Summary
- •4 Word-Length Optimization
- •4.1 Error Estimation
- •4.1.1 Word-Length Propagation and Conditioning
- •4.1.2 Linear Time-Invariant Systems
- •4.1.3 Extending to Nonlinear Systems
- •4.2 Area Models
- •4.3.1 Convexity and Monotonicity
- •4.4 Optimization Strategy 1: Heuristic Search
- •4.5 Optimization Strategy 2: Optimum Solutions
- •4.5.1 Word-Length Bounds
- •4.5.2 Adders
- •4.5.3 Forks
- •4.5.4 Gains and Delays
- •4.5.5 MILP Summary
- •4.6 Some Results
- •4.6.1 Linear Time-Invariant Systems
- •4.6.2 Nonlinear Systems
- •4.6.3 Limit-cycles in Multiple Word-Length Implementations
- •4.7 Summary
- •5 Saturation Arithmetic
- •5.1 Overview
- •5.2 Saturation Arithmetic Overheads
- •5.3 Preliminaries
- •5.4 Noise Model
- •5.4.1 Conditioning an Annotated Computation Graph
- •5.4.2 The Saturated Gaussian Distribution
- •5.4.3 Addition of Saturated Gaussians
- •5.4.4 Error Propagation
- •5.4.5 Reducing Bound Slackness
- •5.4.6 Error estimation results
- •5.5 Combined Optimization
- •5.6 Results and Discussion
- •5.6.1 Area Results
- •5.6.2 Clock frequency results
- •5.7 Summary
- •6 Scheduling and Resource Binding
- •6.1 Overview
- •6.2 Motivation and Problem Formulation
- •6.3 Optimum Solutions
- •6.3.1 Resources, Instances and Control Steps
- •6.3.2 ILP Formulation
- •6.4 A Heuristic Approach
- •6.4.1 Overview
- •6.4.2 Word-Length Compatibility Graph
- •6.4.3 Resource Bounds
- •6.4.4 Latency Bounds
- •6.4.5 Scheduling with Incomplete Word-Length Information
- •6.4.6 Combined Binding and Word-Length Selection
- •6.5 Some Results
- •6.6 Summary
- •7 Conclusion
- •7.1 Summary
- •7.2 Future Work
- •A.1 Sets and functions
- •A.2 Vectors and Matrices
- •A.3 Graphs
- •A.4 Miscellaneous
- •A.5 Pseudo-Code
- •References
- •Index
4.6 Some Results |
75 |
positive to a negative value, or vice-versa, all of these MSBs will toggle. Thus the overall switching activity in a realization can be reduced dramatically by applying scaling optimization. Secondly, when a sampled signal is in a period of relatively low-frequency (with respect to the Nyquist rate), the activity amongst low-order bits is, on average, likely to be significantly larger than that amongst high-order bits due to the slowly changing signal value. Thus word-length optimization, which specifically targets the low-order bits of each signal, is likely to lead to a significant reduction in the overall activity level. In addition, it is likely that a large portion of the power consumption due to logic activity in DSP systems derives from multiplier cores. In multipliers, the power consumption is far more sensitive to reductions in the switching activity of low-order input bits than that of high-order input bits [MS01]. These explanations are supported by the plot of Fig. 4.28(b) which shows the power saving of the proposed method over scaling optimization alone increasing rapidly for low SNR. This is because the low SNR allows wordlength optimization to aggressively target more low-order bits.
4.6.3 Limit-cycles in Multiple Word-Length Implementations
The multiple word-length design paradigm, combined with a word-length optimization technique, has been shown to be highly e ective at optimizing system area for a given user-specified bound on truncation noise. However, a finite precision implementation can additionally su er from certain types of noise not considered in Section 4.1.2. A finite precision implementation of an IIR filter is essentially a finite state machine (FSM). Under any unchanging input vector, an FSM may exhibit one of two steady-state behaviours: it may either settle in an ‘attractor state’, or it may cycle around a finite number of states. The latter of the two behaviours can result in output oscillations in a finite precision implementation, which would not be present for the infinite precision case. In Digital Signal Processing, this inherently nonlinear behaviour is referred to as limit-cycle behaviour [Mit98]. There have been several studies into limit cycles [LMV88, BB90, PKBL96], generally focussing on conditions for non-existence of limit cycles in uniform word-length implementations, and indentifying regions of the coe cient space guaranteed to be limit-cycle free. While limit cycle behaviour is not considered by the optimization procedure developed in this chapter, it is nevertheless important from a user’s perspective that the limit-cycle behaviour of the optimized multiple word-length systems is not generally worse than that of more traditional implementation schemes.
In order to compare the limit-cycle behaviour of uniform word-length and optimized multiple word-length systems, the following experimental procedure has been followed. Fifty thousand second order auto-regressive filters have been generated, with coe cients uniformly selected from the coe cient regions likely to result in limit cycles of period one or two [LMV88]. Each point in Fig. 4.29 illustrates a single such coe cient vector. Each of these filters has

76 4 Word-Length Optimization
then been synthesized using the optimum uniform word-length for a range of specified maximum error variances, and the resulting truncation error has been estimated. Each of the truncation errors forms the specification for a multiple word-length implementation of the same filter. Comparison of uniform and multiple word-length implementations may be achieved by exciting each filter with a large impulse and measuring statistics on the output signal once the transient e ects have died away. The peak and the power of each limit cycle are shown in Fig. 4.30.
coefficient space explored for limit cycles
|
1 |
|
|
|
|
|
|
|
|
|
0.8 |
|
|
|
|
|
|
|
|
|
0.6 |
|
|
|
|
|
|
|
|
|
0.4 |
|
|
|
|
|
|
|
|
|
0.2 |
|
|
|
|
|
|
|
|
2 |
0 |
|
|
|
|
|
|
|
|
a |
|
|
|
|
|
|
|
|
|
|
−0.2 |
|
|
|
|
|
|
|
|
|
−0.4 |
|
|
|
|
|
|
|
|
|
−0.6 |
|
|
|
|
|
|
|
|
|
−0.8 |
|
|
|
|
|
|
|
|
|
−1 |
|
|
|
|
|
|
|
|
|
−2 |
−1.5 |
−1 |
−0.5 |
0 |
0.5 |
1 |
1.5 |
2 |
|
|
|
|
|
a1 |
|
|
|
|
Fig. 4.29. Coe cient space searched for limit cycles
The results of Fig. 4.30 are provided both on a log-log scale in order to observe the spread of results, and on a linear-linear scale in order for the cases without limit-cycles to be observable. A total of 17170 out of 50407 (34%) of uniform word-length implementations exhibited no limit cycle behaviour, whereas 20030 out of 50407 (40%) of multiple word-length implementations exhibited no limit cycle behaviour. For those cases where both implementations exhibited limit cycle behaviour, a histogram of the relative power of the two limit cycles is shown in Fig. 4.31. For these cases, a multiple wordlength implementation has on average a 0.8dB lower limit cycle power than the equivalent uniform word-length implementation.
It can be concluded that multiple word-length implementations are somewhat more likely than their uniform word-length equivalent to be free of period

|
102 |
LC peak |
100 |
10−2 |
|
multiple |
10−4 |
|
10−6 |
|
10−10 |
||
powerLC |
105 |
|
|
|
|
||
100 |
|
|
|
|
|
|
|
multiple |
10−5 |
|
|
|
|
|
|
|
10−10 |
|
|
|
|
||
|
10−10 |
14000
12000
10000
8000 frequency 6000
4000
2000
0
−6
|
|
|
|
|
|
|
|
|
|
4.6 |
Some Results |
77 |
||
|
LC errors |
|
|
|
|
|
|
|
|
LC errors |
|
|
|
|
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
peak |
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
||
|
|
|
|
|
LC |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
multiple |
2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
−5 |
0 |
|
|
5 |
|
|
0 |
|
|
|
|
|
|
|
|
|
|
|
0 |
1 |
2 |
3 |
4 |
5 |
|||||
10 |
10 |
10 |
|
|
|
|||||||||
|
|
|
|
|
|
uniform LC peak |
|
|
|
|||||
uniform LC peak |
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
LC errors |
|
|
|
|
|
|
|
|
LC errors |
|
|
|
|
|
|
|
|
|
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
power |
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
|
|
|
|
|
|
||
|
|
|
|
|
LC |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
multiple |
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
−5 |
0 |
|
|
5 |
|
|
0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
0 |
5 |
10 |
15 |
20 |
25 |
|||||
10 |
10 |
10 |
|
|
|
uniform LC power |
uniform LC power |
|
Fig. 4.30. Limit cycle peak and power
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
−4 |
|
|
−2 |
0 |
2 |
|
4 |
6 |
|||||
|
|
log( uniform LC power / multiple LC power ) |
|
|
Fig. 4.31. Relative error power of limit cycles