Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Белорусский государственный университет информатики и радиоэлектроники

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

MIPS_primery_zadach / dandamudi05gtr guide risc processors programmers engineers

.pdf

Скачиваний:

Добавлен:

11.05.2015

Размер:

1.39 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 45 / 395 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

30	Guide to RISC Processors

Table 2.7 Impact of using the knowledge of past n branches on prediction accuracy

Type of mix

nCompiler Business Scientiﬁc

0	64.1	64.4	70.4
1	91.9	95.2	86.6
2	93.3	96.5	90.8
3	93.7	96.6	91.0
4	94.5	96.8	91.8
5	94.7	97.0	92.0

Dynamic Branch Prediction Dynamic strategy looks at the run-time history to make more accurate predictions. The basic idea is to take the past n branch executions of the branch type in question and use this information to predict the next one. Will this work in practice? How much additional beneﬁt can we derive over the static approach? The empirical study by Lee and Smith [15] suggests that we can get signiﬁcant improvement in prediction accuracy. A summary of their study is presented in Table 2.7. The algorithm they implemented is simple: the prediction for the next branch is the majority of the previous n branch executions. For example, for n = 3, if two or more times branches were taken in the past three branch executions, the prediction is that the branch will be taken.

The data in Table 2.7 suggest that looking at the past two branch executions will give us over 90% prediction accuracy for most mixes. Beyond that, we get only marginal improvement. This is good from the implementation point of view: we need just two bits to take the history of the past two branch executions. The basic idea is simple: keep the current prediction unless the past two predictions were wrong. Speciﬁcally, we do not want to change our prediction just because our last prediction was wrong. This policy can be expressed using the four-state ﬁnite state machine shown in Figure 2.7.

In this state diagram, the left bit represents the prediction and the right bit indicates the branch status (branch taken or not). If the left bit is zero, our prediction would be branch “not taken”; otherwise we predict that the branch will be taken. The right bit gives the actual result of the branch instruction. Thus, a 0 represents that the branch instruction did not jump (“not taken”); 1 indicates that the branch is taken. For example, state 00 represents that we predicted that the branch would not be taken (left zero bit) and the branch is indeed not taken (right zero bit). Therefore, as long as the branch is not taken, we remain in state 00. If our prediction is wrong, we move to state 01. However, we still predict “branch not taken” as we were wrong only once. If our prediction is right, we go back to state 00. If our prediction is wrong again (i.e., two times in a row), we change our

Chapter 2 • Processor Design Issues				31
	00	Branch	01
	00		01
No branch	Predict		Predict
	no branch	No branch	no branch
		No branch
	No branch		Branch
	10	Branch	11
	10		11
	Predict		Predict	Branch
	branch	No branch	branch
		No branch

Figure 2.7 State diagram for branch prediction.

	Branch			Branch
Valid	instruction	Prediction	Valid	instruction	Target	Prediction
bit	address	bits	bit	address	address	bits





.	.	.
.	.	.
.	.	.
.	.	.

(a)

(b)

Figure 2.8 Implementation of dynamic branch prediction: (a) using a 2-bit branch history;

(b) including the target address facilitates prefetching.

32	Guide to RISC Processors

prediction to “branch taken” and move to state 11. You can verify that it always takes two wrong predictions in a row to change our prediction.

Implementation of this strategy requires maintaining two bits for each branch instruction, as shown in Figure 2.8a. These two bits correspond to the two bits of the ﬁnite state machine in Figure 2.7. This works well for direct branch instructions, where the address of the target is speciﬁed as part of the instruction. However, in indirect branch instructions, the target is not known until instruction execution. Therefore, predicting whether the branch is taken is not particularly useful to ﬁll the pipeline if we do not know the target address in advance. It is reasonable to assume that the branch instruction, if the branch is taken, jumps to the same target address as the last time. Thus, if we store the target address along with the branch instruction, we can use this target address to prefetch instructions to ﬁll the pipeline. This scenario is shown in Figure 2.8b. In Part III we look at some processors that use the dynamic branch prediction strategy.

Instruction Set Design Issues

There are several design issues that inﬂuence the instruction set of a processor. We have already discussed one issue, the number of addresses used in an instruction. In this section, we present some other design issues.

Operand Types

Processor instructions typically support only the basic data types. These include characters, integers, and ﬂoating-point numbers. Because most memories are byte addressable, representing characters does not require special treatment. In a byte-addressable memory, the smallest memory unit we can address, and therefore access, is one byte. We can, however, use multiple bytes to represent larger operands. Processors provide instructions to load various operand sizes. Often, the same instruction is used to load operands of different sizes. For example, the IA-32 instruction

mov

AL,address

; Loads an 8-bit value

loads the AL register with an 8-bit value from memory at address. The same instruction can also be used to load 16and 32-bit values as shown in the following two instructions.

mov	AX,address	; Loads a 16-bit value
mov	EAX,address ; Loads a 32-bit value

In these instructions, the size of the operand is indirectly given by the size of the register used. The AL, AX, and EAX are 8-, 16-, and 32-bit registers, respectively. In those instructions that do not use a register, we can use size speciﬁers. This type of speciﬁcation is typical for the CISC processors.

RISC processors specify the operand size in their load and store operations. Note that only the load and store instructions move data between memory and registers. All other

Chapter 2 • Processor Design Issues

instructions operate on registerwide data. Below we give some examples of the MIPS load instructions:

lb	Rdest,address	; Loads a byte
lh	Rdest,address	; Loads a halfword (16 bits)
lw	Rdest,address	; Loads a word (32 bits)
ld	Rdest,address	; Loads a doubleword (64 bits)

The last instruction is available only on 64-bit processors. In general, when the size of the data moved is smaller than the destination register, it is sign-extended to the size of Rdest. There are separate instructions to handle unsigned values. For unsigned numbers, we use lbu and lhu instead of lb and lh, respectively.

Similar instructions are available for store operations. In store operations, the size is reduced to ﬁt the target memory size. For example, storing a byte from a 32-bit register causes only the lower byte to be stored at the speciﬁed address. SPARC also uses a similar set of instructions.

So far we have seen operations on operands located either in registers or in memory. In most instructions, we can also use constants. These constants are called immediate values because the constants are encoded as part of the instruction. In RISC processors, instructions excluding the load and store use registers only; any nonregister value is treated as a constant. In most assembly languages, a special notation is used to indicate registers. For example, in MIPS assembly language, the instruction

add

$t0,$t0,−32

; $t0 = $t0 − 32

subtracts 32 from the $t0 register and places the result back in the $t0 register. Notice the special notation to represent registers. But there is no special notation for constants. Some assemblers, however, use the “#” sign to indicate a constant.

Addressing Modes

Addressing mode refers to how the operands are speciﬁed. As we have seen in the last section, operands can be in one of three places: in a register, in memory, or part of the instruction as a constant. Specifying a constant as an operand is called the immediate addressing mode. Similarly, specifying an operand that is in a register is called the register addressing mode. All processors support these two addressing modes.

The difference between the RISC and CISC processors is in how they specify the operands in memory. CISC designs support a large variety of memory addressing modes. RISC designs, on the other hand, support just one or two addressing modes in their load and store instructions. Most RISC architectures support the following two memory addressing modes.

•The address of the memory operand is computed by adding the contents of a register and a constant. If this constant is zero, the contents of the register are treated as the operand address. In this mode, the memory address is computed as

34	Guide to RISC Processors

Address = contents of a register + constant.

•The address of the memory operand is computed by adding the contents of two registers. If one of the register contents is zero, this addressing mode becomes the same as the one above with zero constant. In this mode, the memory address is computed as

Address = contents of register 1 + contents of register 2.

Among the RISC processors we discuss, ARM and Itanium provide slightly different addressing modes. The Itanium uses the computed address to update the register. For example, in the ﬁrst addressing mode, the register is loaded with the value obtained by adding the constant to the contents of the register.

The IA-32 provides a variety of addressing modes. The main motivation for this is the desire to support high-level language data structures. For example, one of its addressing modes can be used to access elements of a two-dimensional array.

Instruction Types

Instruction sets provide different types of instructions. We describe some of these instruction types here.

Data Movement Instructions All instruction sets support data movement instructions. The type of instructions supported depends on the architecture. We can divide these instructions into two groups: instructions that facilitate movement of data between memory and registers and between registers. Some instruction sets have special data movement instructions. For example, the IA-32 has special instructions such as push and pop to move data to and from the stack.

In RISC processors, data movement between memory and registers is restricted to load and store instructions. Some RISC processors do not provide any explicit instructions to move data between registers. This data transfer is accomplished indirectly. For example, we can use the add instruction

add Rdest,Rsrc,0

; Rdest = Rsrc + 0

to copy contents of Rsrc to Rdest. The IA-32 provides an explicit mov instruction to copy data. The instruction

mov dest,src

copies the contents of src to dest. The src and dest can be either registers or memory. In addition, src can be a constant. The only restriction is that both src and dest cannot be located in memory. Thus, we can use the mov instruction to transfer data between registers as well as between memory and registers.

Chapter 2 • Processor Design Issues

Arithmetic and Logical Instructions Arithmetic instructions support ﬂoating-point as well as integer operations. Most processors provide instructions to perform the four basic arithmetic operations: addition, subtraction, multiplication, and division. Because the 2’s complement number system is used, addition and subtraction operations do not need separate instructions for unsigned and signed integers. However, the other two arithmetic operations need separate instructions for signed and unsigned numbers.

Some processors do not provide division instructions, whereas others support only partially. What do we mean by partially? Remember that the division operation produces two outputs: a quotient and a remainder. We say that the division operation is fully supported if the division instruction produces both results. For example, the IA-32 and MIPS provide full division support. On the other hand, SPARC and PowerPC provide only the quotient.

Logical instructions provide the basic bitwise logical operations. Processors typically provide logical and and or operations. Other logical operations including the not and xor operations are also supported by most processors.

Most of these instructions set the condition code bits, either by default or when explicitly instructed. In the IA-32 architecture, the condition code bits are set by default. In other processors, two versions of arithmetic and logical instructions are provided. For example, in SPARC, ADD does not update the condition codes, whereas the ADDcc instruction updates the condition codes.

Instruction Formats

Processors use two types of basic instruction format: ﬁxed-length or variable-length instructions. In the ﬁxed-length encoding, all (or most) instructions use the same size instructions. In the latter encoding, the length of the instructions varies quite a bit. Typically, RISC processors use ﬁxed-length instructions and the CISC designs use variable-length instructions.

All 32-bit RISC architectures discussed in this book use instructions that are 32 bits wide. Some examples are the SPARC, MIPS, ARM, and PowerPC. The Intel Itanium, which is a 64-bit processor, uses ﬁxed-length, 41 bit wide instructions. We discuss instruction encoding schemes of these processors in Part II of the book.

The size of the instruction depends on the number of addresses and whether these addresses identify registers or memory locations. Figure 2.1 shows how the size of the instruction varies with the number of addresses when all operands are located in registers. This format assumes that eight bits are reserved for the operation code (opcode). Thus we can have 256 different instructions. Each operand address is ﬁve bits long, which means we can have 32 registers. This is the case in architectures like the MIPS. The Itanium, for example, uses seven bits as it has 128 registers.

As you can see from this ﬁgure, using fewer addresses reduces the length of the instruction. The size of the instruction also depends on whether the operands are in memory or in registers. As mentioned before, RISC designs keep their operands in registers. In

36				Guide to RISC Processors
	8 bits	5 bits	5 bits

18 bits	Opcode	Rdest	Rsrc

	Register format
	8 bits		32 bits	32 bits

72 bits	Opcode	destination address		source address

Memory format

Figure 2.9 Instruction size depends on whether the operands are in registers or memory.

CISC architectures, operands can be in memory. If we use 32-bit memory addresses for each of the two addresses, we would need 72 bits for each instruction (see Figure 2.9) whereas the register-based instruction requires only 18 bits. For this and other efﬁciency reasons, the IA-32 does not permit both addresses to be memory addresses. It allows at most one address to be a memory address.

The instruction size in IA-32 varies from one byte to several bytes. Part of the reason for using variable length instructions is that CISC tends to provide complex addressing modes. For example, in the IA-32 architecture, if we use register-based operands, we need just 3 bits to identify a register. On the other hand, if we use a memory-based operand, we need up to 32 bits. In addition, if we use an immediate operand, we need an additional 32 bits to encode this value into the instruction. Thus, an instruction that uses a memory address and an immediate operand needs 8 bytes just for these two components. You can realize from this description that providing ﬂexibility in specifying an operand leads to dramatic variations in instruction sizes.

The opcode is typically partitioned into two ﬁelds: one identiﬁes the major operation type, and the other deﬁnes the exact operation within that group. For example, the major operation could be a branch operation, and the exact operation could be “branch on equal.” These points become clearer as we describe the instruction formats of various processors in later chapters.

Summary

When designing a processor, several design choices will have to be made. These choices are dictated by the available technology as well as the requirements of the target user group. Processor designers will have to make compromises in order to come up with the best design. This chapter looked at some of the important design issues involved in such an endeavor.

Here we looked at how the processor design at the ISA level gets affected by various design choices. We stated that the number of addresses in an instruction is one of

Chapter 2 • Processor Design Issues

the choices that can have an impact on the instruction set design. It is possible to have instruction sets with zero, one, two, or three addresses; however, most recent processors use the three-address format. The IA-32, on the other hand, uses the two-address format.

The addressing mode is another characteristic that affects the instruction set. RISC designs tend to use the load/store architecture and use simple addressing modes. Often, they support just one or two addressing modes. In contrast, CISC architectures provide a wide variety of addressing modes.

Both of these choices—the number of addresses and the complexity of addressing modes—affect the instruction format. RISC architectures use ﬁxed-length instructions and support simple addressing modes. In contrast, CISC designs use variable-length instructions to accommodate various complex addressing modes.

RISC Principles

In the last chapter, we presented many details on the processor design space as well as the CISC and RISC architectures. It is time we consolidated our discussion to give details of RISC principles. That’s what we do in this chapter. We describe the historical reasons for designing CISC processors. Then we identify the reasons for the popularity of RISC designs. We end our discussion with a list of the principal characteristics of RISC designs.

Introduction

The dominant architecture in the PC market, the Intel IA-32, belongs to the Complex Instruction Set Computer (CISC) design. The obvious reason for this classiﬁcation is the “complex” nature of its Instruction Set Architecture (ISA). The motivation for designing such complex instruction sets is to provide an instruction set that closely supports the operations and data structures used by Higher-Level Languages (HLLs). However, the side effects of this design effort are far too serious to ignore.

The decision of CISC processor designers to provide a variety of addressing modes leads to variable-length instructions. For example, instruction length increases if an operand is in memory as opposed to in a register. This is because we have to specify the memory address as part of instruction encoding, which takes many more bits. This complicates instruction decoding and scheduling. The side effect of providing a wide range of instruction types is that the number of clocks required to execute instructions varies widely. This again leads to problems in instruction scheduling and pipelining.

For these and other reasons, in the early 1980s designers started looking at simple ISAs. Because these ISAs tend to produce instruction sets with far fewer instructions, they coined the term Reduced Instruction Set Computer (RISC). Even though the main goal was not to reduce the number of instructions, but the complexity, the term has stuck.

There is no precise deﬁnition of what constitutes a RISC design. However, we can identify certain characteristics that are present in most RISC systems. We identify these RISC design principles after looking at why the designers took the route of CISC in the

40	Guide to RISC Processors

ﬁrst place. Because CISC and RISC have their advantages and disadvantages, modern processors take features from both classes. For example, the PowerPC, which follows the RISC philosophy, has quite a few complex instructions.

Evolution of CISC Processors

The evolution of CISC designs can be attributed to the desire of early designers to efﬁciently use two of the most expensive resources, memory and processor, in a computer system. In the early days of computing, memory was very expensive and small in capacity. This forced the designers to devise high-density code: that is, each instruction should do more work so that the total program size could be reduced. Because instructions are implemented in hardware, this goal could not be achieved until the late 1950s due to implementation complexity.

The introduction of microprogramming facilitated cost-effective implementation of complex instructions by using microcode. Microprogramming has not only aided in implementing complex instructions, it has also provided some additional advantages. Microprogrammed control units use small fast memories to hold the microcode, therefore the impact of memory access latency on performance could be reduced. Microprogramming also facilitates development of low-cost members of a processor family by simply changing the microcode.

Another advantage of implementing complex instructions in microcode is that the instructions can be tailored to high-level language constructs such as while loops. For example, the loop instruction of the IA-32 can be used to implement for loops. Similarly, memory block copying can be done by using its string instructions. Thus, by using these complex instructions, we close the “semantic gap” between HLLs and machine languages.

So far, we have concentrated on the memory resource. In the early days, effective processor utilization was also important. High code density also helps improve execution efﬁciency. As an example, consider the VAX-11/780, the ultimate CISC processor. It was introduced in 1978 and supported 22 addressing modes as opposed to 11 on the Intel 486 that was introduced more than a decade later. The VAX instruction size can range from 2 to 57 bytes, as shown in Table 3.1.

To illustrate how code density affects execution efﬁciency, consider the autoincrement addressing mode of the VAX processor. In this addressing mode, a single instruction can read data from memory, add contents of a register to it, write back the result to memory, and increment the memory pointer. Actions of this instruction are summarized below:

(R2) = (R2)+ R3; R2 = R2+1

In this example, the R2 register holds the memory pointer. To implement this CISC instruction, we need four RISC instructions:

<<< < Предыдущая 1 2 3 45 / 395 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в папке MIPS_primery_zadach

#
11.05.2015336.38 Кб34cs50-asm.pdf
#
11.05.20151.39 Mб77dandamudi05gtr guide risc processors programmers engineers.pdf