Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный университет

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

CS 220 / ARM / ARM1176JZ-S Technical Reference Mmanual.pdf

Источник:

https://manualmachine.com

Скачиваний:

Добавлен:

16.04.2015

Размер:

4.47 Mб

Скачать

☆

<<< < Предыдущая 100 101 102 103 104 105 106 107 108 109 110 111112 / 171112 113 114 115 116 117 118 119 120 121 122 123 124 > Следующая >>>

Coprocessor Interface

11.5Data transfer

Data transfers are managed by the LSU on the core side, and the pipeline itself on the coprocessor side. Transfers can be a single value or a vector. In the latter case, the coprocessor effectively converts a multiple transfer into a series of single transfers by iterating the instruction in the issue stage. This creates an instance of the load/store instruction for each item to be transferred.

The instruction stays in the coprocessor issue stage while it iterates, creating copies of itself that move down the pipeline. Figure 11-9 on page 11-16 illustrates this process for a load instruction.

The first of the iterated instructions, shown in uppercase, is the head and the others, shown in lowercase, are the tails. In the example shown the vector length is four so there is one head and three tails. At the first iteration of the instruction, the tail flag is set so that subsequent iterations send tail instructions down the pipeline. In the example shown in Figure 11-9 on page 11-16, instruction B has stalled in the Ex1 stage, that might be caused by the cancel queue being empty, so that instruction C does not iterate during its first cycle in the issue stage, but only starts to iterate after the stall has been removed.

Figure 11-8 shows the extra paths required for passing data to and from the coprocessor.

			I
To LSU Add stage

	Store data

			Ex1



			Ex2



			Ex3



			Ex4



From LSU Wbls stage			Ex5
		Load data
		Load data

			Ex6

Figure 11-8 Coprocessor data transfer

Two data paths are required:

•One passes store data from the coprocessor to the core, and this requires a queue, that is maintained by the core.

•The other passes load data from the core to the coprocessor and requires no queue, only two pipeline registers.

Figure 11-9 on page 11-16 shows instruction iteration for loads.

ARM DDI 0333H	Copyright © 2004-2009 ARM Limited. All rights reserved.	11-15
ID012410	Non-Confidential, Unrestricted Access

Coprocessor Interface

[C]

Ex1

[B]

Ex2

Ex3

Ex4

Ex5

Ex6

Time

Figure 11-9 Instruction iteration for loads

Only the head instruction is involved in token exchange with the core pipeline, that does not iterate instructions in this way, the tail instructions passing down the pipeline silently.

When an iterated load/store instruction is cancelled or flushed, all the tail instructions, bearing the same tag, must be removed from the pipeline. Only the head instruction becomes a phantom when cancelled. Any tail instruction can be left intact in the pipeline because it has no other effect.

Because the cancel token is received in the coprocessor Ex1 stage, a cancelled iterated instruction always consists of a head instruction in Ex1 and a single tail instruction in the issue stage.

11.5.1Loads

Load data emerge from the WBls stage of the core LSU and are received by the coprocessor Ex6 stage. Each item in a vectored load is picked up by one instance of the iterated load instruction.

The pipeline timing means that a load instruction is always ready, or arrived a short time ago, in Ex6 to pick up each data item. If a load instruction has arrived in Ex6, but the load information has not yet appeared, the load instruction must stall in Ex6, stalling the rest of the coprocessor pipeline.

The following signals are driven by the core to pass load data across to the coprocessor:

ACPLDVALID

This signal, when set, indicates that the associated data are valid.

ACPLDDATA[63:0]

This is the information passed from the core to the coprocessor.

Load buffers

To achieve correct alignment of the load data with the load instruction in the coprocessor Ex6 stage, the data must be double buffered when they arrive at the coprocessor. Figure 11-10 on page 11-17 shows an example.

ARM DDI 0333H	Copyright © 2004-2009 ARM Limited. All rights reserved.	11-16
ID012410	Non-Confidential, Unrestricted Access

Coprocessor Interface

WBls

Ex6

Data

Interconnect

Data

Valid

Interconnect

Valid

Core

Coprocessor

Figure 11-10 Load data buffering

The load data buffers function as pipeline registers and so require no flow control and are not required to carry any tags. Only the data and a valid bit are required. For load transfers to work:

•instructions must always arrive in the coprocessor Ex6 stage coincident with, or before, the arrival of the corresponding instruction in the core WBls stage

•finish tokens from the core must arrive at the same time as the corresponding load data items arrive at the end of the load data pipeline buffers

•the LSU must see the token from the accept queue before it enables a load instruction to move on from its Add stage.

Loads and flushes

If a flush does not involve the core WBls stage it cannot affect the load data buffers, and the load transfer completes normally. If a flush is initiated by an instruction in the core WBls stage, this is not a load instruction because load instructions cannot trigger a flush. Any coprocessor load instructions behind the flush point find themselves stalled if they get as far as the Ex6 stage, for the lack of a finish token, so no data transfers can have taken place. Any data in the load data buffers expires naturally during the flush dead period while the pipeline reloads.

Loads and cancels

If a load instruction is canceled both the head and any tails must be removed. Because the cancellation happens in the coprocessor Ex1 stage, no data transfers can have taken place and therefore no special measures are required to deal with load data.

Loads and retirement

When a load instruction reaches the bottom of the coprocessor pipeline it must find a data item at the end of the load data buffer. This applies to both head and tail instructions. Load instructions do not use finish queue.

11.5.2Stores

Store data emerge from the coprocessor issue stage and are received by the core LSU DC1 stage. Each item of a vectored store is generated because the store instruction iterates in the coprocessor issue stage. The iterated store instructions then pass down the pipeline but have no other use, except to act as place markers for flushes and cancels.

ARM DDI 0333H	Copyright © 2004-2009 ARM Limited. All rights reserved.	11-17
ID012410	Non-Confidential, Unrestricted Access

Coprocessor Interface

The following signals control the transfer of store data across the coprocessor interface:

CPASTDATAV

This signal is asserted when valid data is available from the coprocessor.

CPASTDATAT[3:0]

This is the tag associated with the data being passed to the core.

CPASTDATA[63:0]

This is the information passed from the coprocessor to the core.

ACPSTSTOP

This signal from the core prevents additional transfers from the coprocessor to the core, and is raised when the store queue, maintained by the core, can no longer accept any more data. When the signal is deasserted, data transfers can resume.

When ACPSTSTOP is asserted, the data previously placed onto CPASTDATA

must be left there, until new data can be transferred. This enables the core to leave data on CPASTDATA until there is sufficient space in the store data queue.

Store data queue

Because the store data transfer can be stopped at any time by the LSU, a store data queue is required. Additionally, because store data vectors can be of arbitrary length, flow control is required. A queue length of three slots is sufficient to enable flow control to be used without loss of data.

Stores and flushes

When a store instruction is involved in a flush, the store data queue must be flushed by the core. Because the queue continues to fill for two cycles after the core notifies the coprocessor of the flush, because of the signal propagation delay, the core must delay for two cycles before carrying out the store data queue flush. The dead period after the flush extends sufficiently far to enable this to be done.

Stores and cancels

If the core cancels a store instruction, the coprocessor must ensure that it sends no store data for that instruction. It can achieve this by either:

•delaying the start of the store data until the corresponding cancel token has been received in the Ex1 stage

•looking ahead into the cancel queue and start the store data transfer when the correct token is seen.

Stores and retirement

Because store instructions do not use the finish token queue they are retired as soon as they leave the Ex1 stage of the pipeline.

ARM DDI 0333H	Copyright © 2004-2009 ARM Limited. All rights reserved.	11-18
ID012410	Non-Confidential, Unrestricted Access

<<< < Предыдущая 100 101 102 103 104 105 106 107 108 109 110 111112 / 171112 113 114 115 116 117 118 119 120 121 122 123 124 > Следующая >>>

Соседние файлы в папке ARM

#
16.04.20154.47 Mб45ARM1176JZ-S Technical Reference Mmanual.pdf
#
16.04.20151.63 Mб38ARM7TDMI_TechnicalReferenceManual.pdf
#
16.04.20151.85 Mб33Intel_presentation_ARM_Architecture.ppt
#
16.04.201574 б27Links.txt
#
16.04.2015510.12 Кб29MyARMPresentation.pptx
#
16.04.2015129.95 Кб31Thumb 16-bit Instruction Set.pdf