Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
CS 220 / ARM / ARM1176JZ-S Technical Reference Mmanual.pdf
Источник:
Скачиваний:
45
Добавлен:
16.04.2015
Размер:
4.47 Mб
Скачать

Cycle Timings and Interlock Behavior

16.10 Single load and store instructions

This section describes the cycle timing behavior for LDR, LDRT,LDRB, LDRBT, LDRSB,

LDRH, LDRSH, LDREX, LDREXB, LDREXH, LDREXD, STR, STRT, STRB, STRBT,

STRH, STREX, STREXB, STREXH, STREXD and PLD instructions.

Table 16-13 lists the cycle timing behavior for stores and loads, other than loads to the PC. You can replace LDR with any of the above single load or store instructions. The following rules apply:

They are single-cycle issue if a constant offset is used or if a register offset with no shift, or shift by 2 is used. Both the base and any offset register are Early Regs.

They are two-cycle issue if either a negative register offset or a shift other than LSL #2 is used. Only the offset register is an Early Reg.

If ARMv6 unaligned support is enabled then accesses to addresses not aligned to the access size generates two memory accesses, and so consume the load/store unit for an additional cycle. This extra cycle is required if the base or the offset is not aligned to the access size, consequently the final address is potentially unaligned, even if the final address turns out to be aligned.

If ARMv6 unaligned support is enabled and the final access address is unaligned there is an extra cycle of result latency.

PLD, data preload hint instructions, have cycle timing behavior as for load instructions. Because they have no destination register, the result latency is not-applicable for such instructions. Because a PLD instruction is treated as any other load instruction by all levels of cache, standard data-dependency rules and eviction procedures are followed. The PLD instruction is ignored in case of an address translation fault, a cache hit, or an abort, during any stage of PLD execution. Only use the PLD instruction to preload from cacheable Normal memory.

The updated base register has a result latency of one. For back-to-back load/store instructions with base write back, the updated base is available to the following load/store instruction with a result latency of 0.

Table 16-13 Cycle timing behavior for stores and loads, other than loads to the PC

Example instruction

Cycle

Memory

Result

Comments

s

cycles

Latency

 

 

 

 

 

 

 

LDR <Rd>, <addr_md_1cycle>a

1

1

3

Legacy access / ARMv6 aligned access

LDR <Rd>, <addr_md_2cycle>a

2

2

4

Legacy access / ARMv6 aligned access

LDR <Rd>, <addr_md_1cycle>a

1

2

3

Potentially ARMv6 unaligned access

LDR <Rd>, <addr_md_2cycle>a

2

3

4

Potentially ARMv6 unaligned access

LDR <Rd>, <addr_md_1cycle>a

1

2

4

ARMv6 unaligned access

LDR <Rd>, <addr_md_2cycle>a

1

2

4

ARMv6 unaligned access

a. See Table 16-15 on page 16-17 for an explanation of <addr_md_1cycle> and <addr_md_2cycle>.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-16

ID012410

Non-Confidential, Unrestricted Access

 

Cycle Timings and Interlock Behavior

Table 16-14 lists the cycle timing behavior for loads to the PC.

Table 16-14 Cycle timing behavior for loads to the PC

Example instruction

Cycle

Memory

Result

Comments

s

cycles

Latency

 

 

 

 

 

 

 

LDR pc, [sp, #cns] (!)

4

1

-

Correctly return stack predicted

 

 

 

 

 

LDR pc, [sp], #cns

4

1

-

Correctly return stack predicted

 

 

 

 

 

LDR pc, [sp, #cns] (!)

9

1

-

Return stack mispredicted

 

 

 

 

 

LDR pc, [sp], #cns

9

1

-

Return stack mispredicted

 

 

 

 

 

LDR <cond> pc, [sp, #cns] (!)

8

1

-

Conditional return, or empty return stack

 

 

 

 

 

LDR <cond> pc, [sp], #cns

8

1

-

Conditional return, or empty return stack

 

 

 

 

 

LDR pc, <addr_md_1cycle>a

8

1

-

-

LDR pc, <addr_md_2cycle>a

9

2

-

-

a. Table 16-15 for an explanation of <addr_md_1cycle> and <addr_md_2cycle>.

Only cycle times for aligned accesses are given because Unaligned accesses to the PC are not supported.

The processor includes a three-entry return stack that can predict procedure returns. Any load to the pc with an immediate offset, and the stack pointer R13 as the base register is considered a procedure return.

For condition code failing cycle counts, you must use the cycles for the non-PC destination variants.

Table 16-15 lists the explanation of <addr_md_1cycle> and <addr_md_2cycle> that Table 16-13 on page 16-16 and Table 16-14 use.

Table 16-15 <addr_md_1cycle> and <addr_md_2cycle> LDR example instruction explanation

Example instruction

Early Reg

Comment

 

 

 

<addr_md_1cycle>

 

 

 

 

 

LDR <Rd>, [<Rn>, #cns] (!)

<Rn>

If an immediate offset, or a positive register offset with no

 

 

shift or shift LSL #2, then one-issue cycle.

LDR <Rd>, [<Rn>, <Rm>] (!)

<Rn>, <Rm>

 

 

 

 

LDR <Rd>, [<Rn>, <Rm>, LSL #2] (!)

<Rn>, <Rm>

 

 

 

 

LDR <Rd>, [<Rn>], #cns

<Rn>

 

 

 

 

LDR <Rd>, [<Rn>], <Rm>

<Rn>, <Rm>

 

 

 

 

LDR <Rd>, [<Rn>], <Rm>, LSL #2

<Rn>, <Rm>

 

 

 

 

<addr_md_2cycle>

 

 

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-17

ID012410

Non-Confidential, Unrestricted Access

 

Cycle Timings and Interlock Behavior

Table 16-15 <addr_md_1cycle> and <addr_md_2cycle> LDR example instruction explanation (continued)

Example instruction

Early Reg

Comment

 

 

 

LDR <Rd>, [<Rn>, -<Rm>] (!)

<Rm>

If negative register offset, or shift other than LSL #2 then

 

 

two-issue cycles.

LDR <Rd>, [Rm, -<Rm> <shf> <cns>] (!)

<Rm>

 

 

 

 

LDR <Rd>, [<Rn>], -<Rm>

<Rm>

 

 

 

 

LDR <Rd>, [<Rn>], -<Rm> <shf> <cns>

<Rm>

 

 

 

 

16.10.1 Base register update

The base register update for load or store instructions occurs in the ALU pipeline. To prevent an interlock for back-to-back load or store instructions reusing the same base register, there is a local forwarding path to recycle the updated base register around the ADD stage.

For example, the following instruction sequence take three cycles to execute:

LDR R5, [R2, #4]!

LDR R6, [R2, #0x10]!

LDR R7, [R2, #0x20]!

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-18

ID012410

Non-Confidential, Unrestricted Access

 

Соседние файлы в папке ARM