Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
CS 220 / ARM / ARM1176JZ-S Technical Reference Mmanual.pdf
Источник:
Скачиваний:
45
Добавлен:
16.04.2015
Размер:
4.47 Mб
Скачать

Chapter 5

Program Flow Prediction

This chapter describes how program flow prediction locates branches in the instruction stream and the strategies used for determining if a branch is likely to be taken or not. It also describes the two architecturally-defined SVC functions required for backwards-compatibility with earlier architectures for flushing the Prefetch Unit (PU) buffers. It contains the following sections:

About program flow prediction on page 5-2

Branch prediction on page 5-4

Return stack on page 5-7

Memory Barriers on page 5-8

ARM1176JZ-S IMB implementation on page 5-10.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

5-1

ID012410

Non-Confidential, Unrestricted Access

 

Program Flow Prediction

5.1About program flow prediction

Program flow prediction in the processor is carried out by:

The integer core Implements static branch prediction and the Return Stack.

The Prefetch Unit The PU implements dynamic branch prediction.

The processor is responsible for handling branches the first time they are executed, that is, when no historical information is available for dynamic prediction by the PU.

The integer core makes static predictions about the likely outcome of a branch early in its pipeline and then resolves those predictions when the outcome of conditional execution is known. Condition codes are evaluated at three points in the integer core pipeline, and branches are resolved as soon as the flags are guaranteed not to be modified by a preceding instruction.

When a branch is resolved, the integer core passes information to the PU so that it can make a Branch Target Address Cache (BTAC) allocation or update an existing entry as appropriate. The integer core is also responsible for identifying likely procedure calls and returns to predict the returns. It can handle nested procedures up to three deep.

The integer core includes:

a Static Branch Predictor (SBP)

a Return Stack (RS)

branch resolution logic

a BTAC update interface to the PU

a BTAC allocate interface to the PU.

The processor PU is responsible for fetching instructions from the memory system as required by the integer core, and coprocessors. The PU buffers up to seven instructions in its FIFO to:

detect branch instructions ahead of the integer core requirement

dynamically predict those that it considers are to be taken

provide branch folding of predicted branches if possible

identify unconditional procedure return instructions.

This reduces the cycle time of the branch instructions, so increasing processor performance.

The PU includes:

a BTAC

branch update and allocate logic

a Dynamic Branch Predictor (DBP), and associated update mechanism

branch folding logic.

It is responsible for providing the integer core with instructions, and for requesting cache accesses. The pattern of cache accesses is based on the predicted instruction stream as determined by the dynamic branch prediction mechanism or the integer core flush mechanism.

The BTAC can:

be globally flushed by a CP15 instruction

have individual entries flushed by a CP15 instruction

be enabled or disabled by a CP15 instruction.

For details of CP15 instructions see c7, Cache operations on page 3-69 and Flush operations on page 3-79.

The BTAC is globally flushed for:

Main TLB FCSE PID changes

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

5-2

ID012410

Non-Confidential, Unrestricted Access

 

Program Flow Prediction

Main TLB context ID changes

Global instruction cache invalidation

Switches by the integer core from Non-secure to Secure state.

When the processor switches from the Secure to the Non-secure state the Secure Monitor code is responsible for flushing the BTAC if necessary.

The PU prefetches all instruction types regardless of the state of the integer core. That is, it performs prefetches in ARM state, Thumb state, and Jazelle state. However the rate at which the PU is drained is state-dependent, and the functioning of the branch prediction hardware is a function of the state. Branch prediction is performed in all three states, but branch folding operates only in ARM and Thumb states.

The PU is responsible for fetching the instruction stream as dictated by:

the Program Counter

the dynamic branch predictor

static prediction results in the integer core

procedure calls and returns signaled by the Return Stack residing in the integer core

exceptions, instruction aborts, and interrupts signaled by the integer core.

ARM DDI 0333H

Copyright © 2004-2009 ARM Limited. All rights reserved.

5-3

ID012410

Non-Confidential, Unrestricted Access

 

Соседние файлы в папке ARM