Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
CS 220 / ARM / ARM1176JZ-S Technical Reference Mmanual.pdf
Источник:
Скачиваний:
45
Добавлен:
16.04.2015
Размер:
4.47 Mб
Скачать

Cycle Timings and Interlock Behavior

16.7Multiplies

The multiplier consists of a three-cycle pipeline with early result forwarding not possible other than to the internal accumulate path. For a subsequent multiply accumulate the result is available one cycle earlier than for all other uses of the result.

Certain multiplies require:

more than one cycle to execute.

more than one pipeline issue to produce a result.

Multiplies with 64-bit results take and require two cycles to write the results, consequently they have two result latencies with the low half of the result always available first. The multiplicand and multiplier are required as Early Regs because they are both required at the start of MAC1.

Table 16-10 lists the cycle timing behavior of example multiply instructions.

Table 16-10 Example multiply instruction cycle timing behavior

Example Instruction

Cycle

Cycles if sets flags

Early Reg

Late Reg

Result Latency

s

 

 

 

 

 

 

 

 

 

 

 

MUL(S)

2

5

<Rm>, <Rs>

-

4

 

 

 

 

 

 

MLA(S)

2

5

<Rm>, <Rs>

<Rn>

4

 

 

 

 

 

 

SMULL(S)

3

6

<Rm>, <Rs>

-

4/5

 

 

 

 

 

 

UMULL(S)

3

6

<Rm>, <Rs>

-

4/5

 

 

 

 

 

 

SMLAL(S)

3

6

<Rm>, <Rs>

<RdLo>

4/5

 

 

 

 

 

 

UMLAL(S)

3

6

<Rm>, <Rs>

<RdLo>

4/5

 

 

 

 

 

 

SMULxy

1

-

<Rm>, <Rs>

-

3

 

 

 

 

 

 

SMLAxy

1

-

<Rm>, <Rs>

-

3

 

 

 

 

 

 

SMULWy

1

-

<Rm>, <Rs>

-

3

 

 

 

 

 

 

SMLAWy

1

-

<Rm>, <Rs>

-

3

 

 

 

 

 

 

SMLALxy

2

-

<Rm>, <Rs>

<RdHi>

3/4

 

 

 

 

 

 

SMUAD, SMUADX

1

-

<Rm>, <Rs>

-

3

 

 

 

 

 

 

SMLAD, SMLADX

1

-

<Rm>, <Rs>

-

3

 

 

 

 

 

 

SMUSD, SMUSDX

1

-

<Rm>, <Rs>

-

3

 

 

 

 

 

 

SMLSD, SMLSDX

1

-

<Rm>, <Rs>

-

3

 

 

 

 

 

 

SMMUL, SMMULR

2

-

<Rm>, <Rs>

-

4

 

 

 

 

 

 

SMMLA, SMMLAR

2

-

<Rm>, <Rs>

<Rn>

4

 

 

 

 

 

 

SMMLS, SMMLSR

2

-

<Rm>, <Rs>

<Rn>

4

 

 

 

 

 

 

SMLALD, SMLALDX

2

-

<Rm>, <Rs>

<RdHi>

3/4

 

 

 

 

 

 

SMLSLD, SMLSLDX

2

-

<Rm>, <Rs>

<RdHi>

3/4

 

 

 

 

 

 

UMAAL

3

-

<Rm>, <Rs>

<RdLo>

4/5

 

 

 

 

 

 

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-12

ID012410

Non-Confidential, Unrestricted Access

 

Cycle Timings and Interlock Behavior

Note

Result Latency is one less if the result is used as the accumulate register for a subsequent multiply accumulate.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-13

ID012410

Non-Confidential, Unrestricted Access

 

Cycle Timings and Interlock Behavior

16.8Branches

This section describes the cycle timing behavior for the B, BL, and BLX instructions.

Branches are subject to dynamic, static and return stack predictions. Table 16-11 lists example branch instructions and their cycle timing behavior.

Table 16-11 Branch instruction cycle timing behavior

Example instruction

Cycle

Comment

s

 

 

 

 

 

B <immed>

0

Folded dynamic prediction

 

 

 

B<immed>, BL<immed>, BLX<immed>

1

Not-folded dynamic prediction

 

 

 

B<immed>, BL<immed>, BLX<immed>

1

Correct not-taken static prediction

 

 

 

B<immed>, BL<immed>, BLX<immed>

4

Correct taken static prediction

 

 

 

B<immed>, BL<immed>, BLX<immed>

5-7a

Incorrect dynamic/static prediction

BX R14

4

Correct return stack prediction

 

 

 

BX R14

7

Incorrect return stack prediction

 

 

 

BX R14

5

Empty return stack

 

 

 

BX <cond> R14

5-7a

Conditional return

BX <cond> <reg>, BLX <cond> <reg>

1

If not taken

 

 

 

BX <cond> <reg>, BLX <cond> <reg>

5-7a

If taken

a.Mispredicted branches, including taken unpredicted branches, takes a varying number of cycles to execute depending on their distance from a flag setting instruction. The timing behavior is:

Cycle = MAX (MaxCycles - FlagCycleDistance, MinCycles).

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-14

ID012410

Non-Confidential, Unrestricted Access

 

Cycle Timings and Interlock Behavior

16.9Processor state updating instructions

This section describes the cycle timing behavior for the MSR, MRS, CPS, and SETEND instructions. Table 16-12 lists processor state updating instructions and their cycle timing behavior.

Table 16-12 Processor state updating instructions cycle timing behavior

instruction

Cycles

Comments

 

 

 

MRS

1

All MRS instructions

 

 

 

MSR CPSR_f, s, fs

2

MSRs to CPSR flags and or status

 

 

 

MSR

4

All other MSRs to the CPSR

 

 

 

MSR SPSR

5

All MSRs to the SPSR

 

 

 

CPS <effect> <iflags>

1

Interrupt masks only

 

 

 

CPS <effect> <iflags>, #<mode>

2

Mode changing

 

 

 

SETEND

1

-

 

 

 

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

16-15

ID012410

Non-Confidential, Unrestricted Access

 

Соседние файлы в папке ARM