
- •CONTENTS
- •1.2 Overview of Volume 1: Basic Architecture
- •1.3 Notational Conventions
- •1.3.1 Bit and Byte Order
- •1.3.2 Reserved Bits and Software Compatibility
- •1.3.3 Instruction Operands
- •1.3.4 Hexadecimal and Binary Numbers
- •1.3.5 Segmented Addressing
- •1.3.6 A New Syntax for CPUID, CR, and MSR Values
- •1.3.7 Exceptions
- •1.4 Related Literature
- •2.1.10 The Intel Pentium Processor Extreme Edition (2005)
- •2.2 More on SPECIFIC advances
- •2.2.1 P6 Family Microarchitecture
- •2.2.2.1 The Front End Pipeline
- •2.2.2.3 Retirement Unit
- •2.2.3 SIMD Instructions
- •2.2.4 Hyper-Threading Technology
- •2.2.4.1 Some Implementation Notes
- •3.1 Modes of Operation
- •3.2 Overview of the Basic Execution Environment
- •3.3 Memory Organization
- •3.3.1 Three Memory Models
- •3.3.2 Paging and Virtual Memory
- •3.3.4 Modes of Operation vs. Memory Model
- •3.3.6 Extended Physical Addressing in Protected Mode
- •3.3.7.1 Canonical Addressing
- •3.4 Basic Program Execution Registers
- •3.4.2 Segment Registers
- •3.4.3 EFLAGS Register
- •3.4.3.1 Status Flags
- •3.4.3.2 DF Flag
- •3.4.3.3 System Flags and IOPL Field
- •3.5 Instruction Pointer
- •3.7 Operand Addressing
- •3.7.1 Immediate Operands
- •3.7.2 Register Operands
- •3.7.3 Memory Operands
- •3.7.4 Specifying a Segment Selector
- •3.7.5 Specifying an Offset
- •3.7.6 Assembler and Compiler Addressing Modes
- •3.7.7 I/O Port Addressing
- •4.1 Fundamental Data Types
- •4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords
- •4.2 Numeric Data Types
- •4.2.1 Integers
- •4.2.1.1 Unsigned Integers
- •4.2.1.2 Signed Integers
- •4.3 Pointer Data Types
- •4.4 Bit Field Data Type
- •4.5 String Data Types
- •4.6 Packed SIMD Data Types
- •4.6.2 128-Bit Packed SIMD Data Types
- •4.7 BCD and Packed BCD Integers
- •4.8.1 Real Number System
- •4.8.2.1 Normalized Numbers
- •4.8.2.2 Biased Exponent
- •4.8.3.1 Signed Zeros
- •4.8.3.2 Normalized and Denormalized Finite Numbers
- •4.8.3.3 Signed Infinities
- •4.8.3.4 NaNs
- •4.8.3.5 Operating on SNaNs and QNaNs
- •4.8.3.6 Using SNaNs and QNaNs in Applications
- •4.8.4 Rounding
- •4.8.4.1 Rounding Control (RC) Fields
- •4.8.4.2 Truncation with SSE and SSE2 Conversion Instructions
- •4.9.1.1 Invalid Operation Exception (#I)
- •4.9.1.2 Denormal Operand Exception (#D)
- •4.9.1.4 Numeric Overflow Exception (#O)
- •4.9.1.5 Numeric Underflow Exception (#U)
- •5.1.1 Data Transfer Instructions
- •5.1.2 Binary Arithmetic Instructions
- •5.1.3 Decimal Arithmetic Instructions
- •5.1.4 Logical Instructions
- •5.1.5 Shift and Rotate Instructions
- •5.1.6 Bit and Byte Instructions
- •5.1.7 Control Transfer Instructions
- •5.1.8 String Instructions
- •5.1.9 I/O Instructions
- •5.1.10 Enter and Leave Instructions
- •5.1.11 Flag Control (EFLAG) Instructions
- •5.1.12 Segment Register Instructions
- •5.1.13 Miscellaneous Instructions
- •5.2 x87 FPU Instructions
- •5.2.1 x87 FPU Data Transfer Instructions
- •5.2.2 x87 FPU Basic Arithmetic Instructions
- •5.2.3 x87 FPU Comparison Instructions
- •5.2.4 x87 FPU Transcendental Instructions
- •5.2.5 x87 FPU Load Constants Instructions
- •5.2.6 x87 FPU Control Instructions
- •5.3 x87 FPU AND SIMD State Management Instructions
- •5.4.1 MMX Data Transfer Instructions
- •5.4.2 MMX Conversion Instructions
- •5.4.3 MMX Packed Arithmetic Instructions
- •5.4.4 MMX Comparison Instructions
- •5.4.5 MMX Logical Instructions
- •5.4.6 MMX Shift and Rotate Instructions
- •5.4.7 MMX State Management Instructions
- •5.5 SSE Instructions
- •5.5.1.1 SSE Data Transfer Instructions
- •5.5.1.2 SSE Packed Arithmetic Instructions
- •5.5.1.3 SSE Comparison Instructions
- •5.5.1.4 SSE Logical Instructions
- •5.5.1.5 SSE Shuffle and Unpack Instructions
- •5.5.1.6 SSE Conversion Instructions
- •5.5.2 SSE MXCSR State Management Instructions
- •5.5.4 SSE Cacheability Control, Prefetch, and Instruction Ordering Instructions
- •5.6 SSE2 Instructions
- •5.6.1.1 SSE2 Data Movement Instructions
- •5.6.1.2 SSE2 Packed Arithmetic Instructions
- •5.6.1.3 SSE2 Logical Instructions
- •5.6.1.4 SSE2 Compare Instructions
- •5.6.1.5 SSE2 Shuffle and Unpack Instructions
- •5.6.1.6 SSE2 Conversion Instructions
- •5.6.4 SSE2 Cacheability Control and Ordering Instructions
- •5.7 SSE3 Instructions
- •5.7.6 SSE3 Agent Synchronization Instructions
- •5.8 System Instructions
- •CHAPTER 6 Procedure Calls, Interrupts, and Exceptions
- •6.1 Procedure Call Types
- •6.2 Stacks
- •6.2.1 Setting Up a Stack
- •6.2.2 Stack Alignment
- •6.2.4 Procedure Linking Information
- •6.2.4.2 Return Instruction Pointer
- •6.3 Calling Procedures Using CALL and RET
- •6.3.1 Near CALL and RET Operation
- •6.3.2 Far CALL and RET Operation
- •6.3.3 Parameter Passing
- •6.3.3.2 Passing Parameters on the Stack
- •6.3.3.3 Passing Parameters in an Argument List
- •6.3.4 Saving Procedure State Information
- •6.3.5 Calls to Other Privilege Levels
- •6.3.6 CALL and RET Operation Between Privilege Levels
- •6.4 Interrupts and Exceptions
- •6.4.1 Call and Return Operation for Interrupt or Exception Handling Procedures
- •6.4.2 Calls to Interrupt or Exception Handler Tasks
- •6.5.1 ENTER Instruction
- •6.5.2 LEAVE Instruction
- •CHAPTER 7 Programming With General-Purpose Instructions
- •7.1 Programming environment for gp Instructions
- •7.3 Summary of gp Instructions
- •7.3.1. Data Transfer Instructions
- •7.3.1.1 General Data Movement Instructions
- •7.3.1.2 Exchange Instructions
- •7.3.1.4 Stack Manipulation Instructions
- •7.3.1.6 Type Conversion Instructions
- •7.3.2. Binary Arithmetic Instructions
- •7.3.2.1 Addition and Subtraction Instructions
- •7.3.2.2 Increment and Decrement Instructions
- •7.3.2.4 Comparison and Sign Change Instruction
- •7.3.2.5 Multiplication and Divide Instructions
- •7.3.3. Decimal Arithmetic Instructions
- •7.3.3.1 Packed BCD Adjustment Instructions
- •7.3.3.2 Unpacked BCD Adjustment Instructions
- •7.3.5. Logical Instructions
- •7.3.6. Shift and Rotate Instructions
- •7.3.6.1 Shift Instructions
- •7.3.6.3 Rotate Instructions
- •7.3.7. Bit and Byte Instructions
- •7.3.7.1 Bit Test and Modify Instructions
- •7.3.7.2 Bit Scan Instructions
- •7.3.7.3 Byte Set on Condition Instructions
- •7.3.7.4 Test Instruction
- •7.3.8. Control Transfer Instructions
- •7.3.8.1 Unconditional Transfer Instructions
- •7.3.8.2 Conditional Transfer Instructions
- •7.3.8.4 Software Interrupt Instructions
- •7.3.9. String Operations
- •7.3.9.1 Repeating String Operations
- •7.3.10. String Operations in 64-Bit Mode
- •7.3.11. I/O Instructions
- •7.3.12. I/O Instructions in 64-Bit Mode
- •7.3.13. Enter and Leave Instructions
- •7.3.14. Flag Control (EFLAG) Instructions
- •7.3.14.1 Carry and Direction Flag Instructions
- •7.3.14.2 EFLAGS Transfer Instructions
- •7.3.14.3 Interrupt Flag Instructions
- •7.3.15. Flag Control (RFLAG) Instructions in 64-Bit Mode
- •7.3.16. Segment Register Instructions
- •7.3.16.2 Far Control Transfer Instructions
- •7.3.16.3 Software Interrupt Instructions
- •7.3.16.4 Load Far Pointer Instructions
- •7.3.17. Miscellaneous Instructions
- •7.3.17.1 Address Computation Instruction
- •7.3.17.2 Table Lookup Instructions
- •7.3.17.3 Processor Identification Instruction
- •8.1 x87 FPU Execution Environment
- •8.1.2 x87 FPU Data Registers
- •8.1.2.1 Parameter Passing With the x87 FPU Register Stack
- •8.1.3 x87 FPU Status Register
- •8.1.3.1 Top of Stack (TOP) Pointer
- •8.1.3.2 Condition Code Flags
- •8.1.3.4 Stack Fault Flag
- •8.1.4 Branching and Conditional Moves on Condition Codes
- •8.1.5 x87 FPU Control Word
- •8.1.5.2 Precision Control Field
- •8.1.5.3 Rounding Control Field
- •8.1.6 Infinity Control Flag
- •8.1.7 x87 FPU Tag Word
- •8.1.8 x87 FPU Instruction and Data (Operand) Pointers
- •8.1.9 Last Instruction Opcode
- •8.1.10 Saving the x87 FPU’s State with FSTENV/FNSTENV and FSAVE/FNSAVE
- •8.1.11 Saving the x87 FPU’s State with FXSAVE
- •8.2 x87 FPU Data Types
- •8.2.1 Indefinites
- •8.3 x86 FPU Instruction Set
- •8.3.1 Escape (ESC) Instructions
- •8.3.2 x87 FPU Instruction Operands
- •8.3.3 Data Transfer Instructions
- •8.3.4 Load Constant Instructions
- •8.3.5 Basic Arithmetic Instructions
- •8.3.6 Comparison and Classification Instructions
- •8.3.6.1 Branching on the x87 FPU Condition Codes
- •8.3.7 Trigonometric Instructions
- •8.3.9 Logarithmic, Exponential, and Scale
- •8.3.10 Transcendental Instruction Accuracy
- •8.3.11 x87 FPU Control Instructions
- •8.3.13 Unsupported x87 FPU Instructions
- •8.5.1 Invalid Operation Exception
- •8.5.1.1 Stack Overflow or Underflow Exception (#IS)
- •8.5.1.2 Invalid Arithmetic Operand Exception (#IA)
- •8.5.2 Denormal Operand Exception (#D)
- •8.5.4 Numeric Overflow Exception (#O)
- •8.5.5 Numeric Underflow Exception (#U)
- •8.6 x87 FPU Exception Synchronization
- •8.7 Handling x87 FPU Exceptions in Software
- •8.7.1 Native Mode
- •8.7.3 Handling x87 FPU Exceptions in Software
- •9.1 Overview of MMX Technology
- •9.2 The MMX Technology Programming Environment
- •9.2.2 MMX Registers
- •9.2.3 MMX Data Types
- •9.2.4 Memory Data Formats
- •9.2.5 Single Instruction, Multiple Data (SIMD) Execution Model
- •9.3 Saturation and Wraparound Modes
- •9.4 MMX Instructions
- •9.4.1 Data Transfer Instructions
- •9.4.2 Arithmetic Instructions
- •9.4.3 Comparison Instructions
- •9.4.4 Conversion Instructions
- •9.4.5 Unpack Instructions
- •9.4.6 Logical Instructions
- •9.4.7 Shift Instructions
- •9.4.8 EMMS Instruction
- •9.5 Compatibility with x87 FPU Architecture
- •9.5.1 MMX Instructions and the x87 FPU Tag Word
- •9.6 WRITING APPLICATIONS WITH MMX CODE
- •9.6.1 Checking for MMX Technology Support
- •9.6.2 Transitions Between x87 FPU and MMX Code
- •9.6.3 Using the EMMS Instruction
- •9.6.4 Mixing MMX and x87 FPU Instructions
- •9.6.5 Interfacing with MMX Code
- •9.6.6 Using MMX Code in a Multitasking Operating System Environment
- •9.6.7 Exception Handling in MMX Code
- •9.6.8 Register Mapping
- •9.6.9 Effect of Instruction Prefixes on MMX Instructions
- •CHAPTER 10 Programming with Streaming SIMD Extensions (SSE)
- •10.1 Overview of SSE Extensions
- •10.2 SSE Programming Environment
- •10.2.2 XMM Registers
- •10.2.3 MXCSR Control and Status Register
- •10.2.3.3 Flush-To-Zero
- •10.2.3.4 Denormals-Are-Zeros
- •10.2.4 Compatibility of SSE Extensions with SSE2/SSE3/MMX and the x87 FPU
- •10.3 SSE Data Types
- •10.4 SSE Instruction Set
- •10.4.1.1 SSE Data Movement Instructions
- •10.4.1.2 SSE Arithmetic Instructions
- •10.4.2 SSE Logical Instructions
- •10.4.2.1 SSE Comparison Instructions
- •10.4.2.2 SSE Shuffle and Unpack Instructions
- •10.4.3 SSE Conversion Instructions
- •10.4.5 MXCSR State Management Instructions
- •10.4.6 Cacheability Control, Prefetch, and Memory Ordering Instructions
- •10.4.6.1 Cacheability Control Instructions
- •10.4.6.4 SFENCE Instruction
- •10.5 FXSAVE and FXRSTOR Instructions
- •10.6 Handling SSE Instruction Exceptions
- •10.7 Writing Applications with the SSE Extensions
- •CHAPTER 11 Programming with Streaming SIMD Extensions 2 (SSE2)
- •11.1 Overview of SSE2 Extensions
- •11.2 SSE2 Programming Environment
- •11.2.2 Compatibility of SSE2 Extensions with SSE, MMX Technology and x87 FPU Programming Environment
- •11.3 SSE2 Data Types
- •11.4 SSE2 Instructions
- •11.4.1.1 Data Movement Instructions
- •11.4.1.2 SSE2 Arithmetic Instructions
- •11.4.1.3 SSE2 Logical Instructions
- •11.4.1.4 SSE2 Comparison Instructions
- •11.4.1.5 SSE2 Shuffle and Unpack Instructions
- •11.4.1.6 SSE2 Conversion Instructions
- •11.4.4 Cacheability Control and Memory Ordering Instructions
- •11.4.4.1 FLUSH Cache Line
- •11.4.4.2 Cacheability Control Instructions
- •11.4.4.3 Memory Ordering Instructions
- •11.4.4.4 Pause
- •11.4.5 Branch Hints
- •11.5 SSE, SSE2, and SSE3 Exceptions
- •11.5.1 SIMD Floating-Point Exceptions
- •11.5.2.1 Invalid Operation Exception (#I)
- •11.5.2.4 Numeric Overflow Exception (#O)
- •11.5.2.5 Numeric Underflow Exception (#U)
- •11.5.3.1 Handling Masked Exceptions
- •11.5.3.2 Handling Unmasked Exceptions
- •11.5.3.3 Handling Combinations of Masked and Unmasked Exceptions
- •11.6 Writing Applications with SSE/SSE2 Extensions
- •11.6.1 General Guidelines for Using SSE/SSE2 Extensions
- •11.6.2 Checking for SSE/SSE2 Support
- •11.6.3 Checking for the DAZ Flag in the MXCSR Register
- •11.6.4 Initialization of SSE/SE2 Extensions
- •11.6.5 Saving and Restoring the SSE/SSE2 State
- •11.6.6 Guidelines for Writing to the MXCSR Register
- •11.6.7 Interaction of SSE/SSE2 Instructions with x87 FPU and MMX Instructions
- •11.6.10 Interfacing with SSE/SSE2 Procedures and Functions
- •11.6.10.1 Passing Parameters in XMM Registers
- •11.6.10.2 Saving XMM Register State on a Procedure or Function Call
- •11.6.12 Branching on Arithmetic Operations
- •11.6.13 Cacheability Hint Instructions
- •11.6.14 Effect of Instruction Prefixes on the SSE/SSE2 Instructions
- •CHAPTER 12 Programming with Streaming SIMD Extensions 3 (SSE3)
- •12.1 Overview of SSE3 Instructions
- •12.2 SSE3 Programming Environment and Data types
- •12.2.2 Compatibility of SSE3 Extensions with MMX Technology, the x87 FPU Environment, and SSE/SSE2 Extensions
- •12.2.3 Horizontal and Asymmetric Processing
- •12.3 SSE3 Instructions
- •12.3.1 x87 FPU Instruction for Integer Conversion
- •12.3.6 Two Thread Synchronization Instructions
- •12.4 SSE3 Exceptions
- •12.4.1 Device Not Available (DNA) Exceptions
- •12.4.2 Numeric Error flag and IGNNE#
- •12.4.3 Emulation
- •12.5 Writing Applications with SSE3 Extensions
- •12.5.1 General Guidelines for Using SSE3 Extensions
- •12.5.2 Checking for SSE3 Support
- •12.5.4 Programming SSE3 with SSE/SSE2 Extensions
- •13.1 I/O Port Addressing
- •13.2 I/O Port Hardware
- •13.3 I/O Address Space
- •13.4 I/O Instructions
- •13.5.1 I/O Privilege Level
- •13.5.2 I/O Permission Bit Map
- •13.6 Ordering I/O
- •CHAPTER 14 Processor Identification and Feature Determination
- •14.1 Using the CPUID Instruction
- •14.1.1 Notes on Where to Start
- •A.1 EFLAGS and Instructions
- •B.1 Condition Codes
- •APPENDIX C Floating-Point Exceptions Summary
- •C.1 Overview
- •C.2 x87 FPU Instructions
- •C.3 SSE Instructions
- •C.4 SSE2 Instructions
- •C.5 SSE3 Instructions
- •D.1 Origin of the MS-DOS Compatibility Sub-mode for Handling x87 FPU Exceptions
- •D.2 Implementation of the MS-DOS Compatibility Sub-mode In the Intel486, Pentium, AND P6 Processor Family, and Pentium 4 Processors
- •D.2.1.1 Basic Rules: When FERR# Is Generated
- •D.2.1.2 Recommended External Hardware to Support the MS-DOS Compatibility Sub-mode
- •D.2.1.3 No-Wait x87 FPU Instructions Can Get x87 FPU Interrupt in Window
- •D.2.2 MS-DOS Compatibility Sub-mode in the P6 Family and Pentium 4 Processors
- •D.3 Recommended Protocol for MS-DOS* Compatibility Handlers
- •D.3.1 Floating-Point Exceptions and Their Defaults
- •D.3.2 Two Options for Handling Numeric Exceptions
- •D.3.2.1 Automatic Exception Handling: Using Masked Exceptions
- •D.3.2.2 Software Exception Handling
- •D.3.3 Synchronization Required for Use of x87 FPU Exception Handlers
- •D.3.3.1 Exception Synchronization: What, Why and When
- •D.3.3.2 Exception Synchronization Examples
- •D.3.3.3 Proper Exception Synchronization
- •D.3.4 x87 FPU Exception Handling Examples
- •D.3.6 Considerations When x87 FPU Shared Between Tasks
- •D.3.6.1 Speculatively Deferring x87 FPU Saves, General Overview
- •D.3.6.2 Tracking x87 FPU Ownership
- •D.3.6.4 Interrupt Routing From the Kernel
- •D.3.6.5 Special Considerations for Operating Systems that Support Streaming SIMD Extensions
- •D.4 Differences For Handlers Using Native Mode
- •D.4.1 Origin with the Intel 286 and Intel 287, and Intel386 and Intel 387 Processors
- •D.4.2 Changes with Intel486, Pentium and Pentium Pro Processors with CR0.NE[bit 5] = 1
- •D.4.3 Considerations When x87 FPU Shared Between Tasks Using Native Mode
- •APPENDIX E Guidelines for Writing SIMD Floating-Point Exception Handlers
- •E.1 Two Options for Handling Floating-Point Exceptions
- •E.2 Software Exception Handling
- •E.3 Exception Synchronization
- •E.4 SIMD Floating-Point Exceptions and the IEEE Standard 754
- •E.4.1 Floating-Point Emulation
- •E.4.2 SSE/SSE2/SSE3 Response To Floating-Point Exceptions
- •E.4.2.1 Numeric Exceptions
- •E.4.2.2 Results of Operations with NaN Operands or a NaN Result for SSE/SSE2/SSE3 Numeric Instructions
- •E.4.2.3 Condition Codes, Exception Flags, and Response for Masked and Unmasked Numeric Exceptions
- •E.4.3 Example SIMD Floating-Point Emulation Implementation
- •INDEX
- •INTEL SALES OFFICES

DATA TYPES
See Section 4.9.2, “Floating-Point Exception Priority” for a description of the rules for exception precedence when more than one floating-point exception condition is detected for an instruction.
4.9.1Floating-Point Exception Conditions
The following sections describe the various conditions that cause a floating-point exception to be generated and the masked response of the processor when these conditions are detected. Chapter 3, Instruction Set Reference A-M and Chapter 4, Instruction Set Reference N-Z of the IA-32 Intel Architecture Software Developer’s Manual, Volumes 2A & 2B list the floating-point exceptions that can be signaled for each floating-point instruction.
4.9.1.1Invalid Operation Exception (#I)
The processor reports an invalid operation exception in response to one or more invalid arithmetic operands. If the invalid operation exception is masked, the processor sets the IE flag and returns an indefinite value or a QNaN. This value overwrites the destination register specified by the instruction. If the invalid operation exception is not masked, the IE flag is set, a software exception handler is invoked, and the operands remain unaltered.
See Section 4.8.3.6, “Using SNaNs and QNaNs in Applications” information about the result returned when an exception is caused by an SNaN.
The processor can detect a variety of invalid arithmetic operations that can be coded in a program. These operations generally indicate a programming error, such as dividing ∞ by ∞ . See the following sections for information regarding the invalid-operation exception when detected while executing x87 FPU or SSE/SSE2/SSE3 instructions:
•x87 FPU; Section 8.5.1, “Invalid Operation Exception”
•SIMD floating-point exceptions; Section 11.5.2.1, “Invalid Operation Exception (#I)”
4.9.1.2Denormal Operand Exception (#D)
The processor reports the denormal-operand exception if an arithmetic instruction attempts to operate on a denormal operand (see Section 4.8.3.2, “Normalized and Denormalized Finite Numbers”). When the exception is masked, the processor sets the DE flag and proceeds with the instruction. Operating on denormal numbers will produce results at least as good as, and often better than, what can be obtained when denormal numbers are flushed to zero. Programmers can mask this exception so that a computation may proceed, then analyze any loss of accuracy when the final result is delivered.
When a denormal-operand exception is not masked, the DE flag is set, a software exception handler is invoked, and the operands remain unaltered. When denormal operands have reduced significance due to loss of low-order bits, it may be advisable to not operate on them. Precluding denormal operands from computations can be accomplished by an exception handler that responds to unmasked denormal-operand exceptions.
4-24 Vol. 1

DATA TYPES
See the following sections for information regarding the denormal-operand exception when detected while executing x87 FPU or SSE/SSE2/SSE3 instructions:
•x87 FPU; Section 8.5.2, “Denormal Operand Exception (#D)”
•SIMD floating-point exceptions; Section 11.5.2.2, “Denormal-Operand Exception (#D)”
4.9.1.3Divide-By-Zero Exception (#Z)
The processor reports the floating-point divide-by-zero exception whenever an instruction attempts to divide a finite non-zero operand by 0. The masked response for the divide-by-zero exception is to set the ZE flag and return an infinity signed with the exclusive OR of the sign of the operands. If the divide-by-zero exception is not masked, the ZE flag is set, a software exception handler is invoked, and the operands remain unaltered.
See the following sections for information regarding the divide-by-zero exception when detected while executing x87 FPU or SSE/SSE2 instructions:
•x87 FPU; Section 8.5.3, “Divide-By-Zero Exception (#Z)”
•SIMD floating-point exceptions; Section 11.5.2.3, “Divide-By-Zero Exception (#Z)”
4.9.1.4Numeric Overflow Exception (#O)
The processor reports a floating-point numeric overflow exception whenever the rounded result of an instruction exceeds the largest allowable finite value that will fit into the destination operand. Table 4-9 shows the threshold range for numeric overflow for each of the floating-point formats; overflow occurs when a rounded result falls at or outside this threshold range.
Table 4-9. Numeric Overflow Thresholds
Floating-Point Format |
Overflow Thresholds |
|
|
|
|
Single Precision |
| x | ≥ 1.0 |
2128 |
Double Precision |
| x | ≥ 1.0 |
21024 |
Double Extended Precision |
| x | ≥ 1.0 |
216384 |
When a numeric-overflow exception occurs and the exception is masked, the processor sets the OE flag and returns one of the values shown in Table 4-10, according to the current rounding mode. See Section 4.8.4, “Rounding”.
When numeric overflow occurs and the numeric-overflow exception is not masked, the OE flag is set, a software exception handler is invoked, and the source and destination operands either remain unchanged or a biased result is stored in the destination operand (depending whether the overflow exception was generated during an SSE/SSE2/SSE3 floating-point operation or an x87 FPU operation).
Vol. 1 4-25

DATA TYPES
Table 4-10. Masked Responses to Numeric Overflow
Rounding Mode |
Sign of True Result |
Result |
|
|
|
To nearest |
+ |
+∞ |
|
– |
–∞ |
|
|
|
Toward –∞ |
+ |
Largest finite positive number |
|
– |
–∞ |
|
|
|
Toward +∞ |
+ |
+∞ |
|
– |
Largest finite negative number |
|
|
|
Toward zero |
+ |
Largest finite positive number |
|
– |
Largest finite negative number |
|
|
|
See the following sections for information regarding the numeric overflow exception when detected while executing x87 FPU instructions or while executing SSE/SSE2/SSE3 instructions:
•x87 FPU; Section 8.5.4, “Numeric Overflow Exception (#O)”
•SIMD floating-point exceptions; Section 11.5.2.4, “Numeric Overflow Exception (#O)”
4.9.1.5Numeric Underflow Exception (#U)
The processor detects a floating-point numeric underflow condition whenever the result of rounding with unbounded exponent (taking into account precision control for x87) is tiny; that is, less than the smallest possible normalized, finite value that will fit into the destination operand. Table 4-11 shows the threshold range for numeric underflow for each of the floatingpoint formats (assuming normalized results); underflow occurs when a rounded result falls strictly within the threshold range. The ability to detect and handle underflow is provided to prevent a vary small result from propagating through a computation and causing another exception (such as overflow during division) to be generated at a later time.
Table 4-11. Numeric Underflow (Normalized) Thresholds
Floating-Point Format |
Underflow Thresholds* |
|
|
|
|
Single Precision |
| x | < 1.0 |
2−126 |
Double Precision |
| x | < 1.0 |
2−1022 |
Double Extended Precision |
| x | < 1.0 |
2−16382 |
* Where ‘x’ is the result rounded to destination precision with an unbounded exponent range.
How the processor handles an underflow condition, depends on two related conditions:
•creation of a tiny result
•creation of an inexact result; that is, a result that cannot be represented exactly in the destination format
4-26 Vol. 1

DATA TYPES
Which of these events causes an underflow exception to be reported and how the processor responds to the exception condition depends on whether the underflow exception is masked:
•Underflow exception masked — The underflow exception is reported (the UE flag is set) only when the result is both tiny and inexact. The processor returns a denormalized result to the destination operand, regardless of inexactness.
•Underflow exception not masked — The underflow exception is reported when the result is tiny, regardless of inexactness. The processor leaves the source and destination operands unaltered or stores a biased result in the designating operand (depending whether the underflow exception was generated during an SSE/SSE2/SSE3 floating-point operation or an x87 FPU operation) and invokes a software exception handler.
See the following sections for information regarding the numeric underflow exception when detected while executing x87 FPU instructions or while executing SSE/SSE2/SSE3 instructions:
•x87 FPU; Section 8.5.5, “Numeric Underflow Exception (#U)”
•SIMD floating-point exceptions; Section 11.5.2.5, “Numeric Underflow Exception (#U)”
4.9.1.6Inexact-Result (Precision) Exception (#P)
The inexact-result exception (also called the precision exception) occurs if the result of an operation is not exactly representable in the destination format. For example, the fraction 1/3 cannot be precisely represented in binary floating-point form. This exception occurs frequently and indicates that some (normally acceptable) accuracy will be lost due to rounding. The exception is supported for applications that need to perform exact arithmetic only. Because the rounded result is generally satisfactory for most applications, this exception is commonly masked.
If the inexact-result exception is masked when an inexact-result condition occurs and a numeric overflow or underflow condition has not occurred, the processor sets the PE flag and stores the rounded result in the destination operand. The current rounding mode determines the method used to round the result. See Section 4.8.4, “Rounding”.
If the inexact-result exception is not masked when an inexact result occurs and numeric overflow or underflow has not occurred, the PE flag is set, the rounded result is stored in the destination operand, and a software exception handler is invoked.
If an inexact result occurs in conjunction with numeric overflow or underflow, one of the following operations is carried out:
•If an inexact result occurs along with masked overflow or underflow, the OE flag or UE flag and the PE flag are set and the result is stored as described for the overflow or underflow exceptions; see Section 4.9.1.4, “Numeric Overflow Exception (#O)” or Section 4.9.1.5, “Numeric Underflow Exception (#U)”. If the inexact result exception is unmasked, the processor also invokes a software exception handler.
•If an inexact result occurs along with unmasked overflow or underflow and the destination operand is a register, the OE or UE flag and the PE flag are set, the result is stored as described for the overflow or underflow exceptions, and a software exception handler is invoked.
Vol. 1 4-27

DATA TYPES
If an unmasked numeric overflow or underflow exception occurs and the destination operand is a memory location (which can happen only for a floating-point store), the inexact-result condition is not reported and the C1 flag is cleared.
See the following sections for information regarding the inexact-result exception when detected while executing x87 FPU or SSE/SSE2/SSE3 instructions:
•x87 FPU; Section 8.5.6, “Inexact-Result (Precision) Exception (#P)”
•SIMD floating-point exceptions; Section 11.5.2.3, “Divide-By-Zero Exception (#Z)”
4.9.2Floating-Point Exception Priority
The processor handles exceptions according to a predetermined precedence. When an instruction generates two or more exception conditions, the exception precedence sometimes results in the higher-priority exception being handled and the lower-priority exceptions being ignored. For example, dividing an SNaN by zero can potentially signal an invalid-operation exception (due to the SNaN operand) and a divide-by-zero exception. Here, if both exceptions are masked, the processor handles the higher-priority exception only (the invalid-operation exception), returning a QNaN to the destination. Alternately, a denormal-operand or inexact-result exception can accompany a numeric underflow or overflow exception with both exceptions being handled.
The precedence for floating-point exceptions is as follows:
1.Invalid-operation exception, subdivided as follows:
a.stack underflow (occurs with x87 FPU only)
b.stack overflow (occurs with x87 FPU only)
c.operand of unsupported format (occurs with x87 FPU only when using the double extended-precision floating-point format)
d.SNaN operand
2.QNaN operand. Though this is not an exception, the handling of a QNaN operand has precedence over lower-priority exceptions. For example, a QNaN divided by zero results in a QNaN, not a zero-divide exception.
3.Any other invalid-operation exception not mentioned above or a divide-by-zero exception.
4.Denormal-operand exception. If masked, then instruction execution continues and a lowerpriority exception can occur as well.
5.Numeric overflow and underflow exceptions; possibly in conjunction with the inexactresult exception.
6.Inexact-result exception.
Invalid operation, zero divide, and denormal operand exceptions are detected before a floatingpoint operation begins. Overflow, underflow, and precision exceptions are not detected until a true result has been computed. When an unmasked pre-operation exception is detected, the destination operand has not yet been updated, and appears as if the offending instruction has not
4-28 Vol. 1

DATA TYPES
been executed. When an unmasked post-operation exception is detected, the destination operand may be updated with a result, depending on the nature of the exception (except for SSE/SSE2/SSE3 instructions, which do not update their destination operands in such cases).
4.9.3Typical Actions of a Floating-Point Exception Handler
After the floating-point exception handler is invoked, the processor handles the exception in the same manner that it handles non-floating-point exceptions. The floating-point exception handler is normally part of the operating system or executive software, and it usually invokes a userregistered floating-point exception handle.
A typical action of the exception handler is to store state information in memory. Other typical exception handler actions include:
•Examining the stored state information to determine the nature of the error
•Taking actions to correct the condition that caused the error
•Clearing the exception flags
•Returning to the interrupted program and resuming normal execution
In lieu of writing recovery procedures, the exception handler can do the following:
•Increment in software an exception counter for later display or printing
•Print or display diagnostic information (such as the state information)
•Halt further program execution
Vol. 1 4-29

DATA TYPES
4-30 Vol. 1
5
Instruction Set
Summary

CHAPTER 5
INSTRUCTION SET SUMMARY
This chapter provides an abridged overview IA-32 instructions, divided into the following groups:
•General purpose
•x87 FPU
•x87 FPU and SIMD state management
•Intel MMX technology
•SSE extensions
•SSE2 extensions
•SSE3 extensions
•System instructions
•IA-32e mode: 64-bit mode instructions
Table 5-1 lists the groups and IA-32 processors that support each group. Within these groups, most instructions are collected into functional subgroups.
Table 5-1. Instruction Groups and IA-32 Processors |
|
Instruction Set |
|
Architecture |
IA-32 Processor Support |
General Purpose |
All IA-32 processors |
|
|
x87 FPU |
Intel486, Pentium, Pentium with MMX Technology, Celeron, Pentium Pro, |
|
Pentium II, Pentium II Xeon, Pentium III, Pentium III Xeon, Pentium 4, Intel |
|
Xeon processors |
|
|
x87 FPU and SIMD State |
Pentium II, Pentium II Xeon, Pentium III, Pentium III Xeon, Pentium 4, Intel |
Management |
Xeon processors |
|
|
MMX Technology |
Pentium with MMX Technology, Celeron, Pentium II, Pentium II Xeon, |
|
Pentium III, Pentium III Xeon, Pentium 4, Intel Xeon processors |
|
|
SSE Extensions |
Pentium III, Pentium III Xeon, Pentium 4, Intel Xeon processors |
|
|
SSE2 Extensions |
Pentium 4, Intel Xeon processors |
|
|
SSE3 Extensions |
Pentium 4 supporting HT Technology (built on 90nm process technology) |
|
|
IA-32e: 64-Bit Mode |
Pentium 4, Intel Xeon processors |
|
|
System Instructions |
All IA-32 processors |
|
|
Vol. 1 5-1