Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
CS 220 / ARM / ARM1176JZ-S Technical Reference Mmanual.pdf
Источник:
Скачиваний:
40
Добавлен:
16.04.2015
Размер:
4.47 Mб
Скачать

Cycle Timings and Interlock Behavior

16.12 Load and Store Multiple Instructions

This section describes the cycle timing behavior for the LDM and STM instructions.

These instructions take one cycle to issue but then use multiple memory cycles to load/store all the registers. Because the memory datapath is 64-bits wide, two registers can be loaded or stored on each cycle. Following non-dependent, non-memory instructions can execute in the integer pipeline while these instructions complete. A dependent instruction is one that either:

writes a register that has not yet been stored

reads a register that has not yet been loaded.

Before a load or store multiple can begin, all the registers in the register list must be available. For example, a STM cannot begin until all outstanding loads for registers in the register list have completed.

To prevent instructions after a store multiple from writing to a register before a store multiple has stored that register, the register list has a lock latency that determines how many cycles it is before a subsequent instruction that writes to that register can start.

16.12.1 Load and Store Multiples, other than load multiples including the PC

In all cases the base register, Rx, is an Early Reg.

Table 16-18 lists the cycle timing behavior of load and store multiples including the PC.

Table 16-18 Cycle timing behavior of Load and Store Multiples, other than load multiples including the PC

Example Instruction

Cycle

Memory

Result Latency

Register Lock Latency

s

cycles

(LDM)

(STM)

 

 

 

 

 

 

First address 64-bit aligned

 

 

 

 

 

 

 

 

 

LDMIA Rx,{R1}

1

1

3

1

 

 

 

 

 

LDMIA Rx,{R1,R2}

1

1

3,3

1,2

 

 

 

 

 

LDMIA Rx,{R1,R2,R3}

1

2

3,3,4

1,2,2

 

 

 

 

 

LDMIA Rx,{R1,R2,R3,R4}

1

2

3,3,4,4

1,2,2,3

 

 

 

 

 

LDMIA Rx,{R1,R2,R3,R4,R5}

1

3

3,3,4,4,5

1,2,2,3,3

 

 

 

 

 

LDMIA Rx,{R1,R2,R3,R4,R5,R6}

1

3

3,3,4,4,5,5

1,2,2,3,3,4

 

 

 

 

 

LDMIA Rx,{R1,R2,R3,R4,R5,R6,R7}

1

4

3,3,4,4,5,5,6

1,2,2,3,3,4,4

 

 

 

 

 

First address not 64-bit aligned

 

 

 

 

 

 

 

 

 

LDMIA Rx,{R1}

1

1

3

1

 

 

 

 

 

LDMIA Rx,{R1,R2}

1

2

3,4

1,2

 

 

 

 

 

LDMIA Rx,{R1,R2,R3}

1

2

3,4,4

1,2,2

 

 

 

 

 

LDMIA Rx,{R1,R2,R3,R4}

1

3

3,4,4,5

1,2,2,3

 

 

 

 

 

LDMIA Rx,{R1,R2,R3,R4,R5}

1

3

3,4,4,5,5

1,2,2,3,4

 

 

 

 

 

LDMIA Rx,{R1,R2,R3,R4,R5,R6}

1

4

3,4,4,5,5,6

1,2,2,3,4,4

 

 

 

 

 

LDMIA Rx,{R1,R2,R3,R4,R5,R6,R7}

1

4

3,4,4,5,5,6,6

1,2,2,3,4,4,5

 

 

 

 

 

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-21

ID012410

Non-Confidential, Unrestricted Access

 

Cycle Timings and Interlock Behavior

16.12.2 Load Multiples, where the PC is in the register list

If a LDM loads the PC then the PC access is performed first to accelerate the branch, followed by the rest of the register loads. The cycle timings and all register load latencies for LDMs with the pc in the list are one greater than the cycle times for the same LDM without the PC in the list.

The processor includes a three-entry return stack that can predict procedure returns. Any LDM to the PC with the stack point, R13, as the base register, and that does not restore the SPSR to the CPSR, is predicted as a procedure return.

For condition code failing cycle counts, the cycles for the non-PC destination variants must be used. These are all single-cycle issue, consequently a condition code failing LDM to the PC takes one cycle.

In all cases the base register, Rx, is an Early Reg, and requires an extra cycle of result latency to provide its value.

Table 16-19 lists the cycle timing behavior of Load Multiples, where the PC is in the register list.

Table 16-19 Cycle timing behavior of Load Multiples, where the PC is in the register list

Example instruction

Cycle

Memory

Result

Comments

s

Cycles

Latency

 

 

 

 

 

 

 

LDMIA sp!,{...,pc}

4

1+na

4,…

Correctly return stack predicted

LDMIA sp!,{...,pc}

9

1+na

4,…

Return stack mispredicted

LDMIA <cond> sp!,{...,pc}

9

1+na

4,…

Conditional return, or empty return stack

LDMIA rx,{...,pc}

8

1+na

4,…

Not return stack predicted

a.Where n is the number of memory cycles for this instruction if the pc had not been in the register list.

16.12.3Example Interlocks

The following sequence that has an LDM instruction take five cycles, because R3 has a result latency of four cycles:

LDMIA R0, {R1-R7}

ADD R10, R10, R3

The following that has an STM instruction takes five cycles to execute, because R6 has a register lock latency of four cycles:

STMIA R0, {R1-R7}

ADD

R6, R10, R11

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-22

ID012410

Non-Confidential, Unrestricted Access

 

Соседние файлы в папке ARM