- •V. Ya. Krakovsky, m. B. Fesenko
- •In Computer Systems and Networks
- •Contents
- •Preface
- •Introduction
- •Module I. Basic Components of Digital Computers
- •1. The Structure of a Digital Computer
- •1.1. Introduction to Digital Computers
- •Questions for Self-Testing
- •1.2. The Computer Work Stages Implementation Sequence
- •Questions for Self-Testing
- •1.3. Register Gating and Timing of Data Transfers
- •Questions for Self-Testing
- •1.4. Computer Interface Organization
- •Questions for Self-Testing
- •1.5. Computer Control Organization
- •Questions for Self-Testing
- •1.6. Function and Construction of Computer Memory
- •Questions for Self-Testing
- •1.7. Architecturally-Structural Memory Organization Features
- •Questions for Self-Testing
- •2. Data processing fundamentals in digital computers
- •2.1. Element Base Development Influence on Data Processing
- •Questions for Self-Testing
- •2.2. Computer Arithmetic
- •Questions for Self-Testing
- •2.3. Operands Multiplication Operation
- •Questions for Self-Testing
- •2.4. Integer Division
- •Questions for Self-Testing
- •2.5. Floating-Point Numbers and Operations
- •Questions for Self-Testing
- •Questions for Self-Testing on Module I
- •Problems for Self-Testing on Module I
- •Module II. Digital computer organization
- •3. Processors, Memory, and the Evolution System of Instructions
- •3.1. Cisc and risc Microprocessors
- •Questions for Self-Testing
- •3.2. Pipelining
- •Questions for Self-Testing
- •3.3. Interrupts
- •Questions for Self-Testing
- •3.4. Superscalar Processing
- •Questions for Self-Testing
- •3.5. Designing Instruction Formats
- •Questions for Self-Testing
- •3.6. Building a Stack Frame
- •Questions for Self-Testing
- •4. The Structures of Digital Computers
- •4.1. Microprocessors, Microcontrollers, and Systems
- •Questions for Self-Testing
- •4.2. Stack Computers
- •Questions for Self-Testing
- •Questions for Self-Testing
- •4.4. Features of Organization Structure of the Pentium Processors
- •Questions for Self-Testing
- •4.5. Computers Systems on a Chip
- •Multicore Microprocessors.
- •Questions for Self-Testing
- •4.6. Principles of Constructing Reconfigurable Computing Systems
- •Questions for Self-Testing
- •4.7. Types of Digital Computers
- •Questions for Self-Testing
- •Questions for Self-Testing on Module II
- •Problems for Self-Testing on Module II
- •Module III. Parallelism and Scalability
- •5. Super Scalar Processors
- •5.1. The sparc Architecture
- •Questions for Self-Testing
- •5.2. Sparc Addressing Modes and Instruction Set
- •Questions for Self-Testing
- •5.3. Floating-Point on the sparc
- •Questions for Self-Testing
- •5.4. The sparc Computers Family
- •Questions for Self-Testing
- •6. Cluster Superscalar Processors
- •6.1. The Power Architecture
- •Questions for Self-Testing
- •6.2. Multithreading
- •Questions for Self-Testing
- •6.3. Power Microprocessors
- •Questions for Self-Testing
- •6.4. Microarchitecture Level Power-Performance Fundamentals
- •Questions for Self-Testing
- •6.5. The Design Space of Register Renaming Techniques
- •Questions for Self-Testing
- •Questions for Self-Testing on Module III
- •Problems for Self-Testing on Module III
- •Module IV. Explicitly Parallel Instruction Computing
- •7. The itanium processors
- •7.1. Parallel Instruction Computing and Instruction Level Parallelism
- •Questions for Self-Testing
- •7.2. Predication
- •Questions for Self-Testing
- •Questions for Self-Testing
- •7.4. The Itanium Processor Microarchitecture
- •Questions for Self-Testing
- •7.5. Deep Pipelining (10 stages)
- •Questions for Self-Testing
- •7.6. Efficient Instruction and Operand Delivery
- •Instruction bundles capable of full-bandwidth dispersal
- •Questions for Self-Testing
- •7.7. High ilp Execution Core
- •Questions for Self-Testing
- •7.8. The Itanium Organization
- •Implementation of cache hints
- •Questions for Self-Testing
- •7.9. Instruction-Level Parallelism
- •Questions for Self-Testing
- •7.10. Global Code Scheduler and Register Allocation
- •Questions for Self-Testing
- •8. Digital computers on the basic of vliw
- •Questions for Self-Testing
- •8.2. Synthesis of Parallelism and Scalability
- •Questions for Self-Testing
- •8.3. The majc Architecture
- •Questions for Self-Testing
- •8.4. Scit – Ukrainian Supercomputer Project
- •Questions for Self-Testing
- •8.5. Components of Cluster Supercomputer Architecture
- •Questions for Self-Testing
- •Questions for Self-Testing on Module IV
- •Problems for Self-Testing on Module IV
- •Conclusion
- •List of literature
- •Index and Used Abbreviations
- •03680. Київ-680, проспект Космонавта Комарова, 1.
Problems for Self-Testing on Module III
1. A SPARC implementation has K register windows. What is the number N of physical registers?
2. SPARC is lacking a number of instructions commonly found on CISC machines. Some of these are easily simulated using either register R0, which is always set to 0, or a constant operand. These simulated instructions are called pseudo instructions and are recognized by the SPARC compiler. Show how to simulate the following pseudo instructions, each with a single SPARC instruction. In all of these, src and dst refer to registers. (Hint: A store to R0 has no effect.)
a. MOV src, dst d. NOT dst g. DEC dst
b. COMPARE src1, dst2 e. NEG dst h. CLR dst
c. TESTsrc1 f. INC dst i. NOP
3. Multiprocessors with the common memory and multicomputers with the transfer of messages are two architectures supporting parallel performing of tasks interacting with one another. For which of them is it simpler to emulate the work of another architecture? Explain your answer briefly.
4. The processor and memory are realized on the same chip. Is a cache-memory necessary for such a system? Explain your answer.
5. The branch instruction of the UltraSPARC II processor has the bit Annусul. If this bit is set by the compiler and the branch is not performed, the instruction from the defer slot is deleted from the conveyer. The instruction may also be deleted in case the branch is performed. What are the advantages of each of these approaches?
6. The program cycle is terminated by the conditional branch to the beginning of the cycle. How to realize this cycle for the conveyer processor, in which the technology of deferred branches is used with one defer slot? Under what conditions is it possible to fill the defer slot with useful instructions?
7. The computer supports one defer slot. The instruction in this slot is performed independently of the predicted branch result, but if the branch is not performed the instruction is canceled. Propose an effective method for realization of program cycles for such a computer.
8. The technology of deferred branches is used in a conveyer processor. It is necessary to choose one of two variants of the processor architecture. According to the first of them, the processor has a 4-stages conveyer and one defer slot. According to the second architecture, the processor has a 6-stage conveyer and two defer slots. Compare the throughputs of these two architectures. Take into account that 20% of the performed program instructions are branch instructions. The probability of filling one defer slot is 80% (variant 1) and 25% (variant 2).
9. The system has two universal processors performing both addition and multiplication in one time slot. Suppose that the operations data reading and writing occur instantaneously and the place for storing intermediate results is always ready. What is the minimal time slot necessary to perform the following program fragment?
a=c*d+e;
b=v*t+u;
f=x*y+z.
10. It is known that a program runs well in a superscalar processor with some set of independent units. Does it mean that the same program will run well in a VLIW-processor with the same set of units? Is the inverse statement true?
11. Give an example of the algorithmic structure of what is a “ring”.
12. How many immediate neighbors does each processor have in the “three-dimensional torus” topology?
13. What does the user have to take into account when moving from SMP-computer to the computer with NUMA architecture?
14. What is necessary to take into account when creating effective programs for computers with the NUMA architecture and for computers with distributed memory?
15. What is the most effective connection of processors in a cluster?
