Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
115
Добавлен:
26.03.2016
Размер:
7.4 Mб
Скачать

12.10 The Engine Controller

This section returns to the example from Section 10.16, An Engine Controller. This ASIC gathers sampled temperature measurements from sensors, converts the temperature values from Fahrenheit to Centigrade, averages them, and stores them in a FIFO before passing the values to a microprocessor on a three-state bus. We receive the following message from the logic synthesizer when we use the FIFO-controller code shown in Table 10.25:

Warning: Made latches to store values on: net d(4), d(5), d(6), d(7), d(8), d(9), d(10), d(11), in module fifo_control

This message often indicates that we forgot to initialize a variable.

Here is the part of the code from Table 10.25 that assigns to the vector D (the error message for d is in lowercase remember VHDL is case insensitive):

case sel is

when "01" => D <= D_1 after TPD; r1 <= '1' after TPD;

when "10" => D <= D_2 after TPD; r2 <= '1' after TPD;

when "00" => D(3) <= f1 after TPD; D(2) <= f2 after TPD;

D(1) <= e1 after TPD; D(0) <= e2 after TPD;

when others => D <= "ZZZZZZZZZZZZ" after TPD;

end case ;

When sel = "00" , there is no assignment to D(4) through D(11) . This did not matter in the simulation, but to reproduce the exact behavior of the HDL code the logic synthesizer generates latches to remember the values of D(4) through D(11)

.

This problem may be corrected by replacing the "00" choice with the following:

when "00" => D(3) <= f1 after TPD; D(2) <= f2 after TPD;

D(1) <= e1 after TPD; D(0) <= e2 after TPD;

D(11 downto 4) <= "ZZZZZZZZ" after TPD;

The synthesizer recognizes the assignment of the high-impedance logic value 'Z' to a signal as an indication to implement a three-state buffer. However, there are

two kinds of three-state buffers: core logic three-state buffers and three-state I/O cells. We want a three-state I/O cell containing a bonding pad and not a three-state buffer located in the core logic. If we synthesize the code in

Table 10.25, we get a three-state buffer in the core. Table 12.9 shows the modified code that will synthesize to three-state I/O cells. The signal OE_b drives the output enable (active-low) of the three-state buffers. Table 12.10 shows the top-level code including all the I/O cells.

TABLE 12.9 A modified version of the FIFO controller to drive three-state I/O cells.

library IEEE; use IEEE.STD_LOGIC_1164. all ; use IEEE.NUMERIC_STD. all ;

entity fifo_control is generic TPD:TIME := 1 ns; port (D_1, D_2: in UNSIGNED(11 downto 0); sel : in UNSIGNED(1 downto 0) ;

read , f1, f2, e1, e2 : in STD_LOGIC;

r1, r2, w12: out STD_LOGIC; D: out UNSIGNED(11 downto 0); OE: out STD_LOGIC ) ;

end ;

architecture rtl of fifo_control is

begin process (read, sel, D_1, D_2, f1, f2, e1, e2)

begin

r1 <= '0' after TPD; r2 <= '0' after TPD; OE_b <= '0' after TPD; if (read = '1') then

w12 <= '0' after TPD; case sel is

when "01" => D <= D_1 after TPD; r1 <= '1' after TPD; when "10" => D <= D_2 after TPD; r2 <= '1' after TPD; when "00" => D(3) <= f1 after TPD; D(2) <= f2 after TPD; D(1) <= e1 after TPD; D(0) <= e2 after TPD;

D(11 downto 4) <= "00000000" after TPD; when others => OE_b <= '1' after TPD;

end case ;

elsif (read = '0') then

OE_b <= '0' after TPD; w12 <= '1' after TPD; else OE_b <= '0' after TPD;

end if ;

end process ; end rtl;

TABLE 12.10 The top-level VHDL code for the engine controller ASIC. library COMPASS_LIB, IEEE ;

use IEEE.STD. all ; use IEEE.NUMERIC_STD. all ;

use COMPASS_LIB.STDCOMP. all ; use COMPASS_LIB.COMPASS. all ;

entity t_control_ASIC is port (

PadTri : out STD_LOGIC_VECTOR (11 downto 0) ;

PadClk, PadInreset, PadInreadv : in STD_LOGIC_VECTOR ( 0 downto 0) ; PadInp1, PadInp2 : in STD_LOGIC_VECTOR (11 downto 0) ;

PadInSens : in STD_LOGIC_VECTOR ( 1 downto 0) ) ; end t_control_ASIC ;

architecture structure of t_control_ASIC is

for all : asPadIn use entity COMPASS_LIB.aspadIn(aspadIn) ; for all : asPadClk use entity COMPASS_LIB.aspadClk(aspadClk); for all : asPadTri use entity COMPASS_LIB.aspadTri(aspadTri) ;

for all : asPadVdd use entity COMPASS_LIB.aspadVdd(aspadVdd) ; for all : asPadVss use entity COMPASS_LIB.aspadVss(aspadVss) ;

component pc3c01 port ( cclk : in STD_LOGIC; cp : out STD_LOGIC ); end component ;

component t_control port(T_in1, T_in2 : in UNSIGNED(11 downto 0); SENSOR: in UNSIGNED( 1 downto 0) ; clk, rd, rst : in STD_LOGIC;

D : out UNSIGNED(11 downto 0); oe_b : out STD_LOGIC ); end component

;

signal T_in1_sv, T_in2_sv : STD_LOGIC_VECTOR(11 downto 0) ; signal T_in1_un, T_in2_un : UNSIGNED(11 downto 0) ;

signal sensor_sv : STD_LOGIC_VECTOR(1 downto 0) ; signal sensor_un : UNSIGNED(1 downto 0) ;

signal clk_sv, rd_fifo_sv, reset_sv : STD_LOGIC_VECTOR (0 downto 0) ; signal clk_core, oe_b : STD_LOGIC ;

signal D_un : UNSIGNED(11 downto 0) ; signal D_sv : STD_LOGIC_VECTOR(11 downto 0) ;

begin --compass dontTouch u* -- synopsys dont_touch etc.

u1 : asPadIn generic map (12,"2:13") port map (t_in1_sv,PadInp1) ; u2 : asPadIn generic map (12,"14:25") port map (t_in2_sv,PadInp2) ;

u3 : asPadIn generic map (2,"26:27") port map (sensor_sv, PadInSens ) ; u4 : asPadIn generic map (1,"29") port map (rd_fifo_sv, PadInReadv ) ; u5 : asPadIn generic map (1,"30") port map (reset_sv, PadInreset ) ;

u6 : asPadIn generic map (1,"32") port map (clk_sv, PadClk) ; u7 : pc3c01 port map (clk_sv(0), clk_core) ;

u8 : asPadTri generic map (12,"35:38,41:44,47:50") port map (PadTri,D_sv,oe_b);

u9 : asPadVdd generic map ("1,31,34,40,45,52") port map (Vdd) ; u10: asPadVss generic map ("28,33,39,46,51,53") port map (Vss) ;

T_in1_un <= UNSIGNED(T_in1_sv) ; T_in2_un <= UNSIGNED(T_in2_sv) ;

sensor_un <= UNSIGNED(sensor_sv) ; D_sv <= STD_LOGIC_VECTOR(D_un) ;

v_1 : t_control port map

(T_in1_un,T_in2_un,sensor_un, Clk_core, rd_fifo_sv(0), reset_sv(0),D_un, oe_b) ;

end ;

12.12 Optimization of the

Viterbi Decoder

Returning to the Viterbi decoder example (from Section 12.4), we first set the environment for the design using the following worst-case conditions: a die temperature of 25 C (fastest logic) to 120 C (slowest logic); a power supply voltage of V DD = 5.5 V (fastest logic) to V DD = 4.5 V (slowest logic); and worst process (slowest logic) to best process (fastest logic). Assume that this ASIC should run at a clock frequency of at least 33 MHz (clock period of 30 ns). An initial synthesis run gives a critical path delay at nominal conditions (the default setting) of about 25 ns and nearly 35 ns under worst-case conditions using a high-density 0.6 m m standard-cell target library.

Estimates (using simulation and calculation) show that data arrives at the input pins 5 ns (worst-case) after the rising edge of the clock. The reset signal arrives 10 ns (worst-case) after the rising edge of the clock. The outputs of the Viterbi decoder must be stable at least 4 ns before the rising edge of the clock. This allows these signals to be driven to another ASIC in time to be clocked. These timing constraints are particularly devastating. Together they effectively reduce the clock period that is available for use by 9 ns. However, these figures are typical for board-level delays.

The initial synthesis runs reveal the critical path is through the following six modules:

subset_decode -> compute_metric ->

compare_select -> reduce -> metric -> output_decision

The logic synthesizer can do little or no optimization across these module boundaries. The next step, then, is to rearrange the design hierarchy for synthesis. Flattening ( merging or ungrouping) the six modules into a new cell, called critical , allows the synthesizer to reduce the critical path delay by optimizing one large module.

At present the last module in the critical path is output_decision . This combinational logic adds 2 3 ns to the output delay requirement of 4 ns (this means the outputs of the module metric must be stable 6 7 ns before the rising clock edge). Registering the output reduces this overhead and removes the module output_decision from the critical path. The disadvantage is an increase in latency by one clock cycle, but the latency is already 12 clock cycles in this

design. If registering the output decreases the critical path delay by more than a factor of 12 / 13, performance will still improve.

To register the output, alter the code (on pages 575 576) as follows:

module viterbi_ASIC

...

wire [2:0] Out, Out_r; // Change: add Out_r.

...

asPadOut #(3,"30,31,32") u30 (padOut, Out_r); // Change: Out_r.

Outreg o_1 (Out, Out_r, Clk, Res); // Change: add output register.

...

endmodule

module Outreg (Out, Out_r, Clk, Res); // Change: add this module.

input [2:0] Out; input Clk, Rst; output [2:0] Out_r;

dff #(3) reg1(Out, Out_r, Clk, Res);

endmodule

These changes move the performance closer to the target. Prelayout estimates indicate the die perimeter required for the I/O pads will allow more than enough area to hold the core logic. Since there is unused area in the core, it makes sense to switch to a high-performance standard-cell library with a slightly larger cell height (96 l versus 72 l ). This cell library is less dense, but faster.

Typically, at this point, the design is improved by altering the HDL, the hierarchy, and the synthesis controls in an iterative manner until the desired performance is achieved. However, remember there is still no information from the layout. The best that can be done is to estimate the contribution of the interconnect using wire-load models. As soon as possible the netlist should be passed to the floorplanner (or the place-and-route software in the absence of a floorplanner) to generate better estimates of interconnect delays.

TABLE 12.13

Critical-path timing report for the Viterbi decoder.

Instance name

Delay information 1

 

 

 

 

v_1.u100

inPin --> outPin incr arrival trs rampDel cap(pF) cell

u1.subout5.Q_ff_b0 CP --> QN 1.65 1.65 F .20 .10 dfctnb

B1_i67

A1 --> ZN .63 2.27 R .14 .08 ao01d1

B1_i66

B --> ZN .84 3.12 F .15 .08 ao04d1

B1_i64

B2 --> ZN .91 4.03 F .35 .17 fn03d1

B1_i68

I --> ZN .39 4.43 R .23 .12 in01d1

B1_i316

S --> Z .91 5.33 F .34 .17 mx21d1

u3.add_rip1.u4

B0 --> CO 2.20 7.54 F .24 .14 ad02d1

... 28 other cell instances omitted ...

u5.sub_rip1.u6

B0 --> CO 2.25 23.17 F .23 .13 ad02d1

u5.sub_rip1.u8

CI --> CO .53 23.70 F .21 .09 ad01d1

B1_i301

A1 --> Z .69 24.39 R .19 .07 xo02d1

u2.metric3.Q_ff_b4 setup: D --> CP .17 24.56 R .00 .00 dfctnb slack: MET .44

Table 12.13 is a timing report for the Viterbi decoder, which shows the critical path starts at a sequential logic cell (a D flip-flop in the present example), ends at a sequential logic cell (another D flip-flop), with 37 other combinational logic cells in-between. The first delay is the clock-to-Q delay of the first flip-flop. The last delay is the setup time of the last flip-flop. The critical path delay is 24.56 ns, which gives a slack of 0.44 ns from the constraint of 25 ns (reduced from 30 ns to give an extra margin). We have met the timing constraint (otherwise we say it is violated ).

In Table 12.13 all instances in the critical path are inside instance v_1.u100 . Instance name u100 is the new cell (cell name critical ) formed by merging six blocks in module viterbi (instance name v_1 ).

The second column in Table 12.13 shows the timing arc of the cell involved on the critical path. For example, CP --> QN represents the path from the clock pin, CP , to the flip-flop output pin, QN , of a D flip-flop (cell name dfctnb ). The pin names and their functions come from the library data book. Each company adopts a different naming convention (in this case CP represents a positive clock edge,

for example). The conventions are not always explicitly shown in the data books but are normally easy to discover by looking at examples. As another example, B0 --> CO represents the path from the B input to the carry output of a 2-bit full adder (cell name ad02d1 ).

The third column ( incr ) represents the incremental delay contribution of the logic cell to the critical path.

The fourth column ( arrival ) shows the arrival time of the signal at the output pin of the logic cell. This is the cumulative delay to that point on the critical path.

The fifth column ( trs ) describes whether the transition at the output node is rising ( R ) or falling ( F ). The timing analyzer examines each possible combination of rising and falling delays to find the critical path.

The sixth column ( rampDel ) is a measure of the input slope (ramp delay, or slew rate). In submicron ASIC design this is an important contribution to delay.

The seventh column ( Cap ) is the capacitance at the output node of the logic cell. This determines the logic cell delay and also the signal slew rate at the node.

The last column ( cell ) is the cell name (from the cell-library data book). In this library suffix 'd1' represents normal drive strength with 'd0' , 'd2 ', and 'd5' being the other available strengths.

1. See the text for explanations of the column headings.

Соседние файлы в папке Для магистратуры