- •V. Ya. Krakovsky, m. B. Fesenko
- •In Computer Systems and Networks
- •Contents
- •Preface
- •Introduction
- •Module I. Basic Components of Digital Computers
- •1. The Structure of a Digital Computer
- •1.1. Introduction to Digital Computers
- •Questions for Self-Testing
- •1.2. The Computer Work Stages Implementation Sequence
- •Questions for Self-Testing
- •1.3. Register Gating and Timing of Data Transfers
- •Questions for Self-Testing
- •1.4. Computer Interface Organization
- •Questions for Self-Testing
- •1.5. Computer Control Organization
- •Questions for Self-Testing
- •1.6. Function and Construction of Computer Memory
- •Questions for Self-Testing
- •1.7. Architecturally-Structural Memory Organization Features
- •Questions for Self-Testing
- •2. Data processing fundamentals in digital computers
- •2.1. Element Base Development Influence on Data Processing
- •Questions for Self-Testing
- •2.2. Computer Arithmetic
- •Questions for Self-Testing
- •2.3. Operands Multiplication Operation
- •Questions for Self-Testing
- •2.4. Integer Division
- •Questions for Self-Testing
- •2.5. Floating-Point Numbers and Operations
- •Questions for Self-Testing
- •Questions for Self-Testing on Module I
- •Problems for Self-Testing on Module I
- •Module II. Digital computer organization
- •3. Processors, Memory, and the Evolution System of Instructions
- •3.1. Cisc and risc Microprocessors
- •Questions for Self-Testing
- •3.2. Pipelining
- •Questions for Self-Testing
- •3.3. Interrupts
- •Questions for Self-Testing
- •3.4. Superscalar Processing
- •Questions for Self-Testing
- •3.5. Designing Instruction Formats
- •Questions for Self-Testing
- •3.6. Building a Stack Frame
- •Questions for Self-Testing
- •4. The Structures of Digital Computers
- •4.1. Microprocessors, Microcontrollers, and Systems
- •Questions for Self-Testing
- •4.2. Stack Computers
- •Questions for Self-Testing
- •Questions for Self-Testing
- •4.4. Features of Organization Structure of the Pentium Processors
- •Questions for Self-Testing
- •4.5. Computers Systems on a Chip
- •Multicore Microprocessors.
- •Questions for Self-Testing
- •4.6. Principles of Constructing Reconfigurable Computing Systems
- •Questions for Self-Testing
- •4.7. Types of Digital Computers
- •Questions for Self-Testing
- •Questions for Self-Testing on Module II
- •Problems for Self-Testing on Module II
- •Module III. Parallelism and Scalability
- •5. Super Scalar Processors
- •5.1. The sparc Architecture
- •Questions for Self-Testing
- •5.2. Sparc Addressing Modes and Instruction Set
- •Questions for Self-Testing
- •5.3. Floating-Point on the sparc
- •Questions for Self-Testing
- •5.4. The sparc Computers Family
- •Questions for Self-Testing
- •6. Cluster Superscalar Processors
- •6.1. The Power Architecture
- •Questions for Self-Testing
- •6.2. Multithreading
- •Questions for Self-Testing
- •6.3. Power Microprocessors
- •Questions for Self-Testing
- •6.4. Microarchitecture Level Power-Performance Fundamentals
- •Questions for Self-Testing
- •6.5. The Design Space of Register Renaming Techniques
- •Questions for Self-Testing
- •Questions for Self-Testing on Module III
- •Problems for Self-Testing on Module III
- •Module IV. Explicitly Parallel Instruction Computing
- •7. The itanium processors
- •7.1. Parallel Instruction Computing and Instruction Level Parallelism
- •Questions for Self-Testing
- •7.2. Predication
- •Questions for Self-Testing
- •Questions for Self-Testing
- •7.4. The Itanium Processor Microarchitecture
- •Questions for Self-Testing
- •7.5. Deep Pipelining (10 stages)
- •Questions for Self-Testing
- •7.6. Efficient Instruction and Operand Delivery
- •Instruction bundles capable of full-bandwidth dispersal
- •Questions for Self-Testing
- •7.7. High ilp Execution Core
- •Questions for Self-Testing
- •7.8. The Itanium Organization
- •Implementation of cache hints
- •Questions for Self-Testing
- •7.9. Instruction-Level Parallelism
- •Questions for Self-Testing
- •7.10. Global Code Scheduler and Register Allocation
- •Questions for Self-Testing
- •8. Digital computers on the basic of vliw
- •Questions for Self-Testing
- •8.2. Synthesis of Parallelism and Scalability
- •Questions for Self-Testing
- •8.3. The majc Architecture
- •Questions for Self-Testing
- •8.4. Scit – Ukrainian Supercomputer Project
- •Questions for Self-Testing
- •8.5. Components of Cluster Supercomputer Architecture
- •Questions for Self-Testing
- •Questions for Self-Testing on Module IV
- •Problems for Self-Testing on Module IV
- •Conclusion
- •List of literature
- •Index and Used Abbreviations
- •03680. Київ-680, проспект Космонавта Комарова, 1.
Questions for Self-Testing
1. What are the peculiarities of multithreading implementation in Systems on a Chip?
2. What is the essence of simultaneous multithreading (SMT)?
3. What are the architecture peculiarities of dual-core Systems on a Chip?
4. How is the TOP500 list composed?
5. What is the future of multi-core Systems on a Chip?
6. What are the key peculiarities of the Tukwila processor?
7. What is the purpose of the FSB bus?
8. What are the main peculiarities and improvements of the Nehalem microarchitecture?
9. What peculiarities does the set of AVX vector instructions have?
10. What components does the Larrabee concept include?
4.6. Principles of Constructing Reconfigurable Computing Systems
T
he
structure of a reconfigurable
computing system
may be presented as consisting of two parts: a constant (or «fixed»)
part F
–
Host-computer and a variable part V
–
so called «reconfigurable device» (RD), which may be united into
different configurations (Fig. 4.20).
Let us consider three main reconfigurable
computing system
types [51].
1. Computing systems oriented to Host-computer. Such type systems are characterized by the following features: the main computing power is concentrated in the Host-computer; the reconfigurable computer provides throughput increasing only for narrow class of problems; switching off the RD does not result in general to a failure.
2. Computing systems oriented to a RD. In systems of such type the Host-computer is used basically only for the auxiliary functions (service, input-output) executing, and all algorithms are mainly executed used the RD, which may have its own field of external devices (via extension cards) or common field of external devices with the Host-computer, to which RD has a direct access. Host-computer may be used in the following variants:
Host-computer is absent, in this case RD is an autonomous device;
Host-computer realizes only components of the system software;
Host-computer executes only input/output functions;
Host-computer executes dispatching and commutating functions;
Separate parts of user problems are executed by the Host-computer (not paralleled parts, operators, not effectively realized by the RD).
3. Reconfigurabale computing systems. In the given systems, Host-computers and RDs have possibilities to perform the common instruction thread, being each of them selects its own instructions, and no interrupt or input/output instructions are necessary for accessing to the RD.
Such systems, in theirs turns, are subdivided into two types. Loosely coupled systems. In such systems, Host-computers and RDs have approximately the same complexity. The RD is oriented to solve laborious problems and the Host-computer provides power supporting in translation, input/output, service, etc. Tightly coupled systems. In such systems, RDs are used as coprocessors.
Field Programmable Gate Array (FPGA). FPGA is the matrix of small-input logical elements, flip-flops, and segments of the flow lines connected by jumpers from the fields’ transistors [57]. FPGA is programmed by the change of level of the electric field in the control gates of these transistors. The control gates of all “programming” fields’ transistors are connected to the outputs of flip-flops of one long shift register which is set at programming of Erasable Programmable Logic Device (EPLD). Some of areas of this register can also carry out the role of cells of Read Only Memory (ROM). The so-called sewing is usually kept in ROM, located next to EPLD. After the shutdown of power supply or on the reset signal the sewing is automatically rewritten into the programmable shift register of EPLD. This process is called as configuring of EPLD. As basis of EPLD is made by flip-flops, keeping sewing urgent by configuration, EPLD is made on technology of microcircuits of static RAM.
Structural features of EPLD. The role of basic logical element in EPLD is executed by a lookup table being one-bit RAM. Flip-flops of lookup table are contained in the programmable shift register, and their initial state is set in the period of configuring of EPLD. Programmable D-flip-flops are used in EPLD. On the area of crystal of EPLD the matrix of the configured logical blocks (CLB), matrix of segments of lines of interconnections covered by matrices from the fields transistors-jumpers, is placed, on the edges of crystal the blocks of tuned RAM are placed, and on the perimeter of crystal the blocks of I/O signals and peripheral trunk of lines of interconnections are placed.
The S5530 processor is powered by the Stretch S5 engine. A Stretch processor can replace multiple digital signal processors (DSPs) or combinations of processors and Field Programmable Gate Arrays (FPGAs), thereby reducing system and design costs. In addition, the software-only environment eliminates the complexity of hardware/software co-development. RISC processor core and the powerful Stretch Instruction Set Extension Fabric. It is a software-configurable data path based on proprietary programmable logic. Using it system designers extend the processor instruction set and define the new instructions using only their C/C++ code. As a result, developers get the performance of logic with C/C++ development simplicity – achieving unprecedented performance, easy and rapid development, and significant cost savings. Stretch’s S5 engine unlocks the following two major RISC bottlenecks to provide an unparalleled level of performance: 1. Granularity of computations: Unlike typical RISC processors’ ALUs that perform low level operations such as shift, add, and multiply, the Instruction Set Extension Fabric can execute thousands of operations as a single instruction. 2. Data and compute bandwidth: The S5 uses 32 128-bit wide registers coupled with 128-bit wide access to memory to feed data to the Instruction Set Extension Fabric at a bandwidth not available on any other processor.
