- •Chapter 1 Intel® Advanced Vector Extensions
- •1.1 About This Document
- •1.2 Overview
- •1.3.2 Instruction Syntax Enhancements
- •1.3.3 VEX Prefix Instruction Encoding Support
- •1.4 Overview AVX2
- •1.5 Functional Overview
- •1.6 General Purpose Instruction Set Enhancements
- •2.1 Detection of PCLMULQDQ and AES Instructions
- •2.2 Detection of AVX and FMA Instructions
- •2.2.1 Detection of FMA
- •2.2.3 Detection of AVX2
- •2.3.1 FMA Instruction Operand Order and Arithmetic Behavior
- •2.4 Accessing YMM Registers
- •2.5 Memory alignment
- •2.7 Instruction Exception Specification
- •2.7.1 Exceptions Type 1 (Aligned memory reference)
- •2.7.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned)
- •2.7.3 Exceptions Type 3 (<16 Byte memory argument)
- •2.7.5 Exceptions Type 5 (<16 Byte mem arg and no FP exceptions)
- •2.7.7 Exceptions Type 7 (No FP exceptions, no memory arg)
- •2.7.8 Exceptions Type 8 (AVX and no memory argument)
- •2.8.1 Clearing Upper YMM State Between AVX and Legacy SSE Instructions
- •2.8.3 Unaligned Memory Access and Buffer Size Management
- •2.9 CPUID Instruction
- •3.1 YMM State, VEX Prefix and Supported Operating Modes
- •3.2 YMM State Management
- •3.2.1 Detection of YMM State Support
- •3.2.2 Enabling of YMM State
- •3.2.4 The Layout of XSAVE Area
- •3.2.5 XSAVE/XRSTOR Interaction with YMM State and MXCSR
- •3.2.6 Processor Extended State Save Optimization and XSAVEOPT
- •3.2.6.1 XSAVEOPT Usage Guidelines
- •3.3 Reset Behavior
- •3.4 Emulation
- •4.1 Instruction Formats
- •4.1.1 VEX and the LOCK prefix
- •4.1.2 VEX and the 66H, F2H, and F3H prefixes
- •4.1.3 VEX and the REX prefix
- •4.1.4 The VEX Prefix
- •4.1.4.1 VEX Byte 0, bits[7:0]
- •4.1.4.2 VEX Byte 1, bit [7] - ‘R’
- •4.1.5 Instruction Operand Encoding and VEX.vvvv, ModR/M
- •4.1.6 The Opcode Byte
- •4.1.7 The MODRM, SIB, and Displacement Bytes
- •4.1.8 The Third Source Operand (Immediate Byte)
- •4.1.9.1 Vector Length Transition and Programming Considerations
- •4.1.10 AVX Instruction Length
- •4.2 Vector SIB (VSIB) Memory Addressing
- •4.3 VEX Encoding Support for GPR Instructions
- •5.1 Interpreting InstructIon Reference Pages
- •5.1.1 Instruction Format
- •5.1.2 Opcode Column in the Instruction Summary Table
- •5.1.3 Instruction Column in the Instruction Summary Table
- •5.1.4 Operand Encoding column in the Instruction Summary Table
- •5.1.5 64/32 bit Mode Support column in the Instruction Summary Table
- •5.1.6 CPUID Support column in the Instruction Summary Table
- •5.2 Summary of Terms
- •5.3 Instruction SET Reference
- •MPSADBW - Multiple Sum of Absolute Differences
- •PALIGNR - Byte Align
- •PBLENDW - Blend Packed Words
- •PHADDW/PHADDD - Packed Horizontal Add
- •PHADDSW - Packed Horizontal Add with Saturation
- •PHSUBW/PHSUBD - Packed Horizontal Subtract
- •PHSUBSW - Packed Horizontal Subtract with Saturation
- •PMOVSX - Packed Move with Sign Extend
- •PMOVZX - Packed Move with Zero Extend
- •PMULDQ - Multiply Packed Doubleword Integers
- •PMULHRSW - Multiply Packed Unsigned Integers with Round and Scale
- •PMULHUW - Multiply Packed Unsigned Integers and Store High Result
- •PMULHW - Multiply Packed Integers and Store High Result
- •PMULLW/PMULLD - Multiply Packed Integers and Store Low Result
- •PMULUDQ - Multiply Packed Unsigned Doubleword Integers
- •POR - Bitwise Logical Or
- •PSADBW - Compute Sum of Absolute Differences
- •PSHUFB - Packed Shuffle Bytes
- •PSHUFD - Shuffle Packed Doublewords
- •PSHUFLW - Shuffle Packed Low Words
- •PSIGNB/PSIGNW/PSIGND - Packed SIGN
- •PSLLDQ - Byte Shift Left
- •PSLLW/PSLLD/PSLLQ - Bit Shift Left
- •PSRAW/PSRAD - Bit Shift Arithmetic Right
- •PSRLDQ - Byte Shift Right
- •PSRLW/PSRLD/PSRLQ - Shift Packed Data Right Logical
- •PSUBB/PSUBW/PSUBD/PSUBQ -Packed Integer Subtract
- •PSUBSB/PSUBSW -Subtract Packed Signed Integers with Signed Saturation
- •PSUBUSB/PSUBUSW -Subtract Packed Unsigned Integers with Unsigned Saturation
- •PXOR - Exclusive Or
- •VPBLENDD - Blend Packed Dwords
- •VPERMD - Full Doublewords Element Permutation
- •VPERMPD - Permute Double-Precision Floating-Point Elements
- •VPERMPS - Permute Single-Precision Floating-Point Elements
- •VPERMQ - Qwords Element Permutation
- •VPSLLVD/VPSLLVQ - Variable Bit Shift Left Logical
- •VPSRAVD - Variable Bit Shift Right Arithmetic
- •VPSRLVD/VPSRLVQ - Variable Bit Shift Right Logical
- •VGATHERDPD/VGATHERQPD - Gather Packed DP FP values Using Signed Dword/Qword Indices
- •VGATHERDPS/VGATHERQPS - Gather Packed SP FP values Using Signed Dword/Qword Indices
- •VPGATHERDD/VPGATHERQD - Gather Packed Dword values Using Signed Dword/Qword Indices
- •VPGATHERDQ/VPGATHERQQ - Gather Packed Qword values Using Signed Dword/Qword Indices
- •6.1 FMA InstructIon SET Reference
- •Chapter 7 Instruction Set Reference - VEX-Encoded GPR Instructions
- •7.1 Instruction Format
- •7.2 INSTRUCTION SET REFERENCE
- •BZHI - Zero High Bits Starting with Specified Bit Position
- •INVPCID - Invalidate Processor Context ID
- •Chapter 8 Post-32nm Processor Instructions
- •8.1 Overview
- •8.2 CPUID Detection of New Instructions
- •8.4 Vector Instruction Exception Specification
- •8.6 Using RDRAND Instruction and Intrinsic
- •8.7 Instruction Reference
- •A.1 AVX Instructions
- •A.2 Promoted Vector Integer Instructions in AVX2
- •B.1 Using Opcode Tables
- •B.2 Key to Abbreviations
- •B.2.1 Codes for Addressing Method
- •B.2.2 Codes for Operand Type
- •B.2.3 Register Codes
- •B.2.4 Opcode Look-up Examples for One, Two, and Three-Byte Opcodes
- •B.2.4.1 One-Byte Opcode Instructions
- •B.2.4.2 Two-Byte Opcode Instructions
- •B.2.4.3 Three-Byte Opcode Instructions
- •B.2.4.4 VEX Prefix Instructions
- •B.2.5 Superscripts Utilized in Opcode Tables
- •B.3 One, Two, and THREE-Byte Opcode Maps
- •B.4.1 Opcode Look-up Examples Using Opcode Extensions
- •B.4.2 Opcode Extension Tables
- •B.5 Escape Opcode Instructions
- •B.5.1 Opcode Look-up Examples for Escape Instruction Opcodes
- •B.5.2 Escape Opcode Instruction Tables
- •B.5.2.1 Escape Opcodes with D8 as First Byte
- •B.5.2.2 Escape Opcodes with D9 as First Byte
- •B.5.2.3 Escape Opcodes with DA as First Byte
- •B.5.2.4 Escape Opcodes with DB as First Byte
- •B.5.2.5 Escape Opcodes with DC as First Byte
- •B.5.2.6 Escape Opcodes with DD as First Byte
- •B.5.2.7 Escape Opcodes with DE as First Byte
- •B.5.2.8 Escape Opcodes with DF As First Byte
POST-32NM PROCESSOR INSTRUCTIONS
NOTES:
1.If a source is denormal relative to input format with DM masked and at least one of PM or UM unmasked, then an exception will be raised with DE, UE and PE set.
8.4VECTOR INSTRUCTION EXCEPTION SPECIFICATION
The exception behavior of instructions operating on YMM states follows the updated classification table of Table 8-11. The instructions VCVTPS2PH and VCVTPS2PH are described by type 11.
Table 8-11. Exception class description
Exception Class |
NI Family |
Mem arg |
Floating-Point |
|
|
|
Exceptions |
|
|
|
(#XM) |
|
|
|
|
Type 1 |
AVX, |
16/32 byte |
none |
|
Legacy SSE |
explicitly aligned |
|
|
|
|
|
Type 2 |
AVX, FMA, |
16/32 byte; not |
yes |
|
Legacy SSE |
explicitly aligned |
|
|
|
with VEX prefix; |
|
|
|
explicitly aligned |
|
|
|
without VEX |
|
|
|
|
|
Type 3 |
AVX, FMA,, |
< 16 byte |
yes |
|
Legacy SSE |
|
|
|
|
|
|
Type 4 |
AVX, |
16/32 byte not |
no |
|
Legacy SSE |
explicitly aligned |
|
|
|
with VEX prefix; |
|
|
|
explicitly aligned |
|
|
|
without VEX |
|
|
|
|
|
Type 5 |
AVX, |
< 16 byte |
no |
|
Legacy SSE |
|
|
|
|
|
|
Type 6 |
AVX (no Legacy |
Varies |
(At present, |
|
SSE) |
|
none do) |
|
|
|
|
Type 7 |
AVX, |
none |
none |
|
Legacy SSE |
|
|
|
|
|
|
Type 8 |
AVX |
none |
none |
|
|
|
|
Type 9 |
AVX |
4 byte |
none |
|
|
|
|
Type 10 |
AVX, Legacy SSE |
16/32 byte; not |
no |
|
|
explicitly aligned |
|
|
|
|
|
Type 11 |
AVX |
Not explicitly |
yes |
|
|
aligned, no AC# |
|
|
|
|
|
Ref. # 319433-011 |
8-9 |
POST-32NM PROCESSOR INSTRUCTIONS
8.4.1Exception Type 11 (VEX-only, mem arg no AC, floating-point exceptions)
Exception |
Real |
Virtual 80x86 |
Protected and Compatibility |
64-bit |
Cause of Exception |
|
|
|
|||||
|
|
|
|
|
|
|
|
X |
X |
|
|
VEX prefix |
|
|
|
|
|
|
|
|
|
|
|
|
|
VEX prefix: |
|
|
|
|
X |
X |
If XFEATURE_ENABLED_MASK[2:1] != ‘11b’. |
|
Invalid Opcode, #UD |
|
|
|
|
If CR4.OSXSAVE[bit 18]=0. |
|
|
|
|
|
|
||
X |
X |
X |
X |
If preceded by a LOCK prefix (F0H) |
||
|
||||||
|
|
|
|
|
|
|
|
|
|
X |
X |
If any REX, F2, F3, or 66 prefixes precede a |
|
|
|
|
VEX prefix |
|||
|
|
|
|
|
||
|
|
|
|
|
|
|
|
X |
X |
X |
X |
If any corresponding CPUID feature flag is ‘0’ |
|
|
|
|
|
|
|
|
Device Not Available, |
X |
X |
X |
X |
If CR0.TS[bit 3]=1 |
|
#NM |
||||||
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
X |
|
For an illegal address in the SS segment |
|
Stack, SS(0) |
|
|
|
|
|
|
|
|
|
X |
If a memory address referencing the SS seg- |
||
|
|
|
|
ment is in a non-canonical form |
||
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
X |
|
For an illegal memory operand effective |
|
|
|
|
|
address in the CS, DS, ES, FS or GS segments. |
||
|
|
|
|
|
||
|
|
|
|
|
|
|
General Protection, |
|
|
|
X |
If the memory address is in a non-canonical |
|
#GP(0) |
|
|
|
form. |
||
|
|
|
|
|||
|
|
|
|
|
|
|
|
X |
X |
|
|
If any part of the operand lies outside the |
|
|
|
|
effective address space from 0 to FFFFH |
|||
|
|
|
|
|
||
|
|
|
|
|
|
|
Page Fault |
|
X |
X |
X |
For a page fault |
|
#PF(fault-code) |
|
|||||
|
|
|
|
|
||
|
|
|
|
|
|
|
SIMD Floating-Point |
X |
X |
X |
X |
If an unmasked SIMD floating-point exception |
|
Exception, #XM |
|
|
|
|
and CR4.OSXMMEXCPT[bit 10] = 1 |
|
|
|
|
|
|
|
8.5FS/GS BASE SUPPORT FOR 64-BIT SOFTWARE
64-bit code can use new instructions to access and modify FS and GS base. These new instructions are available to software in all privilege levels. CR4 register bit 16 allows system software to control the availability of these instructions to software.
CR4.FSGSBASE
8-10 |
Ref. # 319433-011 |
POST-32NM PROCESSOR INSTRUCTIONS
FSGSBASE-Enable Bit (bit 16 of CR4) — Enables RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE instructions in all privilege levels when set. When clear, RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE instructions cause #UD in all privilege level. The default value of this bit is zero after RESET.
RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE instructions are available only 64-bit sub-mode of the IA-32e mode. Access to CR4.FSGSBASE is available in all operating modes if CPUID.(EAX=07H, ECX=0H):EBX.FSGSBASE is 1.
NOTE
It is highly recommended that REX.W prefix is used with these instructions to read/write full 64-bit value. If REX.W prefix is omitted, when reading from segment base, upper 32-bits will be ignored and will be set to zero in destination registers. If REX.W prefix is omitted for write to segment base, the upper 32-bits of source resister will be ignored and the corresponding bits for segment base will be set to zero. Additionally, if the OS enables these instructions it must also context switch GS and FS base to ensure that any changes made by the applications to the segment base are appropriately context switched.
8.6USING RDRAND INSTRUCTION AND INTRINSIC
The RDRAND instruction returns a random number. All Intel processors that support the RDRAND instruction indicate the availability of the RDRAND instruction via reporting CPUID.01H:ECX.RDRAND[bit 30] = 1.
RDRAND returns random numbers that are supplied by a cryptographically secure, deterministic random bit generator (DRBG). The DRBG is designed to meet the NIST SP 800-90 standard. The DRBG is re-seeded frequently from a on-chip non-deter- ministic entropy source to guarantee data returned by RDRAND is statistically uniform, non-periodic and non-deterministic.
In order for the hardware design to meet its security goals, the random number generator continuously tests itself and the random data it is generating. Runtime failures in the random number generator circuitry or statistically anomalous data occurring by chance will be detected by the self test hardware and flag the resulting data as being bad. In such extremely rare cases, the RDRAND instruction will return no data instead of bad data.
Under heavy load, with multiple cores executing RDRAND in parallel, it is possible, though unlikely, for the demand of random numbers by software processes/threads to exceed the rate at which the random number generator hardware can supply them. This will lead to the RDRAND instruction returning no data transitorily. The RDRAND instruction indicates the occurrence of this rare situation by clearing the CF flag.
The RDRAND instruction returns with the carry flag set (CF = 1) to indicate valid data is returned. It is recommended that software using the RDRAND instruction to get random numbers retry for a limited number of iterations while RDRAND returns CF=0
Ref. # 319433-011 |
8-11 |