- •V. Ya. Krakovsky, m. B. Fesenko
- •In Computer Systems and Networks
- •Contents
- •Preface
- •Introduction
- •Module I. Basic Components of Digital Computers
- •1. The Structure of a Digital Computer
- •1.1. Introduction to Digital Computers
- •Questions for Self-Testing
- •1.2. The Computer Work Stages Implementation Sequence
- •Questions for Self-Testing
- •1.3. Register Gating and Timing of Data Transfers
- •Questions for Self-Testing
- •1.4. Computer Interface Organization
- •Questions for Self-Testing
- •1.5. Computer Control Organization
- •Questions for Self-Testing
- •1.6. Function and Construction of Computer Memory
- •Questions for Self-Testing
- •1.7. Architecturally-Structural Memory Organization Features
- •Questions for Self-Testing
- •2. Data processing fundamentals in digital computers
- •2.1. Element Base Development Influence on Data Processing
- •Questions for Self-Testing
- •2.2. Computer Arithmetic
- •Questions for Self-Testing
- •2.3. Operands Multiplication Operation
- •Questions for Self-Testing
- •2.4. Integer Division
- •Questions for Self-Testing
- •2.5. Floating-Point Numbers and Operations
- •Questions for Self-Testing
- •Questions for Self-Testing on Module I
- •Problems for Self-Testing on Module I
- •Module II. Digital computer organization
- •3. Processors, Memory, and the Evolution System of Instructions
- •3.1. Cisc and risc Microprocessors
- •Questions for Self-Testing
- •3.2. Pipelining
- •Questions for Self-Testing
- •3.3. Interrupts
- •Questions for Self-Testing
- •3.4. Superscalar Processing
- •Questions for Self-Testing
- •3.5. Designing Instruction Formats
- •Questions for Self-Testing
- •3.6. Building a Stack Frame
- •Questions for Self-Testing
- •4. The Structures of Digital Computers
- •4.1. Microprocessors, Microcontrollers, and Systems
- •Questions for Self-Testing
- •4.2. Stack Computers
- •Questions for Self-Testing
- •Questions for Self-Testing
- •4.4. Features of Organization Structure of the Pentium Processors
- •Questions for Self-Testing
- •4.5. Computers Systems on a Chip
- •Multicore Microprocessors.
- •Questions for Self-Testing
- •4.6. Principles of Constructing Reconfigurable Computing Systems
- •Questions for Self-Testing
- •4.7. Types of Digital Computers
- •Questions for Self-Testing
- •Questions for Self-Testing on Module II
- •Problems for Self-Testing on Module II
- •Module III. Parallelism and Scalability
- •5. Super Scalar Processors
- •5.1. The sparc Architecture
- •Questions for Self-Testing
- •5.2. Sparc Addressing Modes and Instruction Set
- •Questions for Self-Testing
- •5.3. Floating-Point on the sparc
- •Questions for Self-Testing
- •5.4. The sparc Computers Family
- •Questions for Self-Testing
- •6. Cluster Superscalar Processors
- •6.1. The Power Architecture
- •Questions for Self-Testing
- •6.2. Multithreading
- •Questions for Self-Testing
- •6.3. Power Microprocessors
- •Questions for Self-Testing
- •6.4. Microarchitecture Level Power-Performance Fundamentals
- •Questions for Self-Testing
- •6.5. The Design Space of Register Renaming Techniques
- •Questions for Self-Testing
- •Questions for Self-Testing on Module III
- •Problems for Self-Testing on Module III
- •Module IV. Explicitly Parallel Instruction Computing
- •7. The itanium processors
- •7.1. Parallel Instruction Computing and Instruction Level Parallelism
- •Questions for Self-Testing
- •7.2. Predication
- •Questions for Self-Testing
- •Questions for Self-Testing
- •7.4. The Itanium Processor Microarchitecture
- •Questions for Self-Testing
- •7.5. Deep Pipelining (10 stages)
- •Questions for Self-Testing
- •7.6. Efficient Instruction and Operand Delivery
- •Instruction bundles capable of full-bandwidth dispersal
- •Questions for Self-Testing
- •7.7. High ilp Execution Core
- •Questions for Self-Testing
- •7.8. The Itanium Organization
- •Implementation of cache hints
- •Questions for Self-Testing
- •7.9. Instruction-Level Parallelism
- •Questions for Self-Testing
- •7.10. Global Code Scheduler and Register Allocation
- •Questions for Self-Testing
- •8. Digital computers on the basic of vliw
- •Questions for Self-Testing
- •8.2. Synthesis of Parallelism and Scalability
- •Questions for Self-Testing
- •8.3. The majc Architecture
- •Questions for Self-Testing
- •8.4. Scit – Ukrainian Supercomputer Project
- •Questions for Self-Testing
- •8.5. Components of Cluster Supercomputer Architecture
- •Questions for Self-Testing
- •Questions for Self-Testing on Module IV
- •Problems for Self-Testing on Module IV
- •Conclusion
- •List of literature
- •Index and Used Abbreviations
- •03680. Київ-680, проспект Космонавта Комарова, 1.
Questions for Self-Testing
1. How are floating-point numbers represented in a computer?
2. What is the range of a floating-point number representation?
3. What is the difference in representation of fixed-point and floating-point numbers?
4. What is the reason for floating-point number representation?
5. What format do floating-point numbers have?
6. What are the advantages of floating-point numbers compared with other representation forms?
7. What are the peculiarities and rules of arithmetic operations using a floating-point format?
8. What is the reason for the necessity of rounding?
9. What are the three most common methods of rounding?
10. What is the idea of the von Neumann rounding?
Questions for Self-Testing on Module I
1. What is the architecture of a computing system?
2. What are the peculiarities of the Harvard architecture?
3. What is the structure of a computer?
4. What are the peculiarities of the von Neumann structure?
5. What are the CPU components?
6. What is an algorithm and what are its contents?
7. What memory types do you know?
8. What does the symmetrical multiprocessing (SMP) mean?
9. What are the differences between UMA and NUMA systems?
10. What are the peculiarities of MPP-systems?
11. What does the cache-only memory architecture mean?
12. What levels in the memory hierarchy are used in up-to-date computers? What are their sizes?
13. What does the processor-in-mеmогу mean?
14. What is the structural organization of the PCI bus?
15. What is the Scsi bus designed for?
Problems for Self-Testing on Module I
1. Represent each of the decimal values 26,-37, 497, and -123 as signed 10-bit numbers in the following binary formats: (a) Sign and magnitude; (b) 1’s complement; (c) 2’s complement.
2. (a) Express the decimal values 0.5, -0.123, -0.75, and -0.1 as signed 6-bit numbers in the binary formats. (b) What is the maximum representation error e involved in using only 5 significant bits after the binary point?
(c) Calculate the number of bits needed after the binary point so that (1) e < 1/10, e < 1/100, (3) e < 1/1000, and (4) e < 1/106.
3. The 1's-complement and 2's-complement binary representation methods are special cases of the (b-1)'s-complement and b's-complement representation techniques in base b number systems. For example, consider the decimal system. The sign and magnitude values +527 and -382 have four-digit signed-number representations in each of the complement systems as shown in
Table Q1.1 Signed numbers in base 10 |
||
Representation |
Two examples |
|
Sign and magnitude |
+527 |
-382 |
9's complement |
0527 |
9617 |
10's complement |
0527 |
9618 |
Now consider the base 3 (ternary) system, where the positive five-digit number t4t3t2t1t0 has the value t4 × 34 + t3 × 33 + t2 × 32 + t1 × 31 + t0 × 30, with 0 ≤ ti ≤ 2. Express the ternary sign and magnitude numbers +2120, -1212, +10, and -201 as five-digit signed ternary numbers in both the 2's-complement and 3's-complement systems.
4. Using the "paper-and-pencil" methods, perform the operations A×B and A÷B on the 5-bit positive numbers A = 10101 and B = 00101.
5. Consider the binary numbers in the following add and subtract problems to be signed 6-bit values in the 2's-complement representation. Perform the operations indicated, specify the cases where overflow occurs and where no overflow occurs, and check your answers by converting operands and results to decimal sign and magnitude representation.
010110 |
101011 |
111111 |
011001 |
110111 |
010101 |
+ 001001 |
+ 100101 |
+ 000111 |
+ 010000 |
+ 111001 |
+ 101011 |
010110 |
111110 |
100001 |
111111 |
000111 |
011010 |
- 011111 |
- 100101 |
- 011101 |
- 000111 |
- 111000 |
- 100010 |
6. Show how the multiplication and division operations in Question 5 would be performed by the hardware in Fig. 2.8.a (p. 62) and 2.22 (p. 72), respectively, by constructing the equivalent of Fig. 2.8.b (p. 62) and 2.24 (p. 74) for the above A and B operands.
7. Multiply the signed 6-bit numbers A = 010111 (multiplicand) and B = 110110 (multiplier) using the Booth algorithm.
8. Repeat Question 7 using bit pairing of the multiplier.
9. Construct the equivalent of Fig. 2.19 (p. 69) for the multiplication operation in Question 7.
10. Derive logic equations that specify the ADD/SUB and SR outputs of the combinational CONTROL network in Fig. 2.26 (p. 80).
11. In Section 2.5, we used a practical size 32-bit format for floating-point numbers. In this Question, we will use a shortened format that retains all the pertinent concepts but is manageable for working through numerical exercises.
C
onsider
that floating-point numbers are represented in a 12-bit format as
follows:
The
scale factor has an implied base of 4, and a 5-bit, excess-16
exponent. The 6-bit mantissa is normalized.
(a) What does “normalized” mean in the context of this format?
(b) Represent the numbers +1.7, -0.012, +19, and 1/8 in this format.
(c) What is the range representable in this format?
(d) How does the range calculated in (c) compare to 12-bit integer or fraction ranges?
(e) Perform Add, Subtract, Multiply, and Divide operations on the operands:
12. Consider the representation of the decimal number 0.1 as a signed 8-bit binary fraction in the representation discussed in Section 2.5. If this number does not convert exactly in this 8-bit format, give the approximations to it that are developed by all three of the truncation methods discussed in Section “Guard Bits and Rounding” (p. 78-79).
13. It is required to build a modulo 10 adder for adding BCD digits. Modulo 10 addition of two numbers A = A3A2A1A0 and B = B3B2B1B0 can be achieved in two stages: 1. Add A to B (binary addition). 2. If the result is an illegal code that is greater than or equal to 1010, add 610; otherwise, add 0. Ignore overflow from this stage of the adder.
(a) Show that the above algorithm will give correct results for:
(1) A = 0101 B = 0110; (2) A = 0011 B = 0100.
(b) Design a BCD digit adder using logic gates and the 4-bit MSI chip described in Section 2.2. The inputs are A3A2A1A0, B3B2B1B0, and a carry-in. The outputs are the sum digit S3S2S1S0 and the carry-out. A cascade of such blocks can form a ripple-carry BCD adder.
14. If gate fan-in limited to four, how can the hex digit SHIFTER in Fig. 2.26 (p. 80) be implemented combinationally?
15. (a) Sketch a logic-gate network that implements the multiplexer MPX in Fig. 2.26 (p. 80). (b) Relate the structure of the SWAP network in Fig. 2.26 (p. 80) to your solution to part (a).
16. How can the “leading hex zeros detector” of Fig. 2.26 (p. 80) be implemented combinationally?
17. The Mantissa adder-subtractor in Fig. 2.26 (p. 80) operates on positive, unsigned binary fractions and must produce a sign and magnitude result. In the discussion accompanying Fig. 2.26 (p. 80), we stated that 1’s complement arithmetic is convenient because of the required format for input and output operands. When adding two signed numbers in 1’s-complement notation, the carry-out from the sign position must be added to the result to obtain the correct signed answer. This is called “end-around carry” correction. Consider the following examples of addition using signed 4-bit encodings of operands and results in the 1’s-complement system:
1’s-complement arithmetic is convenient when a sign and magnitude result is to be generated because a negative number in 1’s-complement notation can be converted to sign and magnitude by complementing the bits to the right of the sign-bit position. Using 2’s-complement arithmetic, addition of +1 is needed for conversion of a negative value into sign and magnitude logic.
With the above discussion as a guideline, give the complete design of the 1’s-complement adder-subtractor in Fig. 2.26 (p. 80).
18. Give the complete design of the 4-bit ALU circuit block shown in Fig. 2.6,b (p. 59). Omit the control inputs and assume that the circuit performs only 4-bit addition. Carry lookahead logic is to be used for all the internal carries c1, c2, and c3 as well as for the block output c4.
Your answer can be in the form of a sketch of the logic-gate network required or a listing of logic equations that describe the network.
19. Four 4-bit adder circuits shown in Fig. 2.6,b (p. 59) can be cascaded to form a 16-bit adder. In this cascade, the output c4 from the low-order circuit is connected as the carry-in to the next circuit. Its carry-out, c8, is connected as the carry-in to the third circuit, etc.
(a) A faster adder can be constructed by using external logic to generate the carry-in variables for the three high-order circuits. Give the logic design for an integrated circuit “carry lookahead” chip that has outputs c4, c8, and c12. Its inputs are c0, PI0, GI0, PI1, GI1, PI2, and GI2. Estimate the increase in adder speed achievable by using this circuit in conjunction with the four 4-bit adder circuits, as opposed to using a cascade of the adder circuits, in building a 16-bit adder.
(b) Extend your part (a) design by assuming that the carry lookahead chip has PI3 and GI3 as additional inputs and provides additional outputs PII0 and GII0. These higher-level propagate and generate functions are defined by
PII0 = PI3PI2PI1PI0
GII0 = GI3 + PI3GI2 + PI3PI2GI1 + PI3PI2PI1GI0
Note that this extended circuit requires 16 external pin connections, consisting of 14 for input and output variables and 2 for power connections, so that it can be accommodated in a standard 16-pin IC package.
(c) Design a 64-bit adder that uses sixteen 4-bit adder circuits, some carry lookahead circuits defined by the design from part (b), and some additional logic to generate c16, c32, and c48 from c0, and the PIIi and GIIi variables generated by the lookahead circuits. What is the relationship of the additional logic to the logic inside each lookahead circuit?
20. Fig. Q1.1 shows a two-dimensional array of combinational logic cells for 3 × 3 positive-number multiplication. Analyze the typical cells and data flow to verify that it computes products properly.
(a) Compare the total delay in developing the product in this type of array with that of Fig. 2.7 (p. 61) as a function of the input operand length n.
(b)
Which of the cells in Fig. Q1.1 and its n
× (n
+ 1) extension perform no useful function? This cellular logic
arrangement is a basis for some of the V
LSI
multiplier chips that are commercially available.
21. The array described in Question 20 is an example of the application of the principle of carry-save addition. This principle can be applied in any situation where a number of summands ne-ed to be added to generate a sum.
The simplest example is in the case of adding three numbers W, X, and Y. The straightforward way to do this is to add W to X and then add the resulting sum to Y to generate the answer. An alterna-tive, using carry-save addition, is to combine the three operands with n full adders to generate two numbers, S (sum bits) and C (carry bits), which are then added in a ripple-carry adder to generate the desired sum. This process is shown in Fig. Q1.2.
(a) Apply this idea to the multiplication of two n-bit positive numbers. Start by considering the problem as one of adding summands, appropriately shifted, as in the paper-and-pencil method. Reduce these n summands to 2/3 n numbers by performing carry-save additions on them in parallel in groups of three. Then reduce the results, etc., until only two numbers remain. They are finally added to generate the desired product. This principle is used in many high-performance computers.
(b) Compare the speed and cost of this type of multiplier with that of Question 20 or Fig. 2.7 (p. 61).
(c) Can bit grouping of the multiplier be combined with carry-save addition to configure a faster and/or cheaper multiplier?
22. (a) Multiply the signed 2's-complement 8-bit numbers A = 00011010 (Multiplicand) and B = 11000111 (Multiplier) using the Booth algorithm and generating a 16-bit product.
(b) Compute the sign and magnitude decimal representations of A, B, and the product from part (a).
23. Repeat problem 22 using bit pairing of the multiplier.
24. Consider a 16-bit floating-point number in a format similar to that of Question 11 with a 6-bit exponent and a 9-bit normalized fractional mantissa. The base of the scale factor is 8 and the exponent is represented in excess-32 format.
(a) Add the numbers A = 0 100001 111111110 and B = 0 011111 101010101 which are expressed in the above format, giving the answer in normalized form. Use rounding as the truncation method when producing the final normalized 9-bit mantissa.
(b) Using decimal numbers w, x, y, and z, express the magnitude of the largest and smallest (nonzero) values representable in the above normalized floating-point format, in the form: Largest = w × 8x, Smallest = y × 8-z.
25. Show how to modify the circuit diagram of Fig. 2.8 (p. 62) to implement multiplication of signed, 2's-complement, n-bit numbers using the Booth algorithm.
26. Suppose an “Enable” input is added to an integrated circuit that performs the decoding function. When Enable = 0, all the outputs are held at 0; when Enable = 1, the circuit performs the decode function. This can be implemented internally by adding Enable as an input to each of the AND gates that generates the decoder outputs.
Show how to use two-input decoders that have Enable inputs to implement a single four-input decoder.
27. Give a block diagram, similar to Fig. 1.26 (p. 34), for a 256K × 16 memory using 64K × 1 memory chips.
