- •Chapter 1 Intel® Advanced Vector Extensions
- •1.1 About This Document
- •1.2 Overview
- •1.3.2 Instruction Syntax Enhancements
- •1.3.3 VEX Prefix Instruction Encoding Support
- •1.4 Overview AVX2
- •1.5 Functional Overview
- •1.6 General Purpose Instruction Set Enhancements
- •2.1 Detection of PCLMULQDQ and AES Instructions
- •2.2 Detection of AVX and FMA Instructions
- •2.2.1 Detection of FMA
- •2.2.3 Detection of AVX2
- •2.3.1 FMA Instruction Operand Order and Arithmetic Behavior
- •2.4 Accessing YMM Registers
- •2.5 Memory alignment
- •2.7 Instruction Exception Specification
- •2.7.1 Exceptions Type 1 (Aligned memory reference)
- •2.7.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned)
- •2.7.3 Exceptions Type 3 (<16 Byte memory argument)
- •2.7.5 Exceptions Type 5 (<16 Byte mem arg and no FP exceptions)
- •2.7.7 Exceptions Type 7 (No FP exceptions, no memory arg)
- •2.7.8 Exceptions Type 8 (AVX and no memory argument)
- •2.8.1 Clearing Upper YMM State Between AVX and Legacy SSE Instructions
- •2.8.3 Unaligned Memory Access and Buffer Size Management
- •2.9 CPUID Instruction
- •3.1 YMM State, VEX Prefix and Supported Operating Modes
- •3.2 YMM State Management
- •3.2.1 Detection of YMM State Support
- •3.2.2 Enabling of YMM State
- •3.2.4 The Layout of XSAVE Area
- •3.2.5 XSAVE/XRSTOR Interaction with YMM State and MXCSR
- •3.2.6 Processor Extended State Save Optimization and XSAVEOPT
- •3.2.6.1 XSAVEOPT Usage Guidelines
- •3.3 Reset Behavior
- •3.4 Emulation
- •4.1 Instruction Formats
- •4.1.1 VEX and the LOCK prefix
- •4.1.2 VEX and the 66H, F2H, and F3H prefixes
- •4.1.3 VEX and the REX prefix
- •4.1.4 The VEX Prefix
- •4.1.4.1 VEX Byte 0, bits[7:0]
- •4.1.4.2 VEX Byte 1, bit [7] - ‘R’
- •4.1.5 Instruction Operand Encoding and VEX.vvvv, ModR/M
- •4.1.6 The Opcode Byte
- •4.1.7 The MODRM, SIB, and Displacement Bytes
- •4.1.8 The Third Source Operand (Immediate Byte)
- •4.1.9.1 Vector Length Transition and Programming Considerations
- •4.1.10 AVX Instruction Length
- •4.2 Vector SIB (VSIB) Memory Addressing
- •4.3 VEX Encoding Support for GPR Instructions
- •5.1 Interpreting InstructIon Reference Pages
- •5.1.1 Instruction Format
- •5.1.2 Opcode Column in the Instruction Summary Table
- •5.1.3 Instruction Column in the Instruction Summary Table
- •5.1.4 Operand Encoding column in the Instruction Summary Table
- •5.1.5 64/32 bit Mode Support column in the Instruction Summary Table
- •5.1.6 CPUID Support column in the Instruction Summary Table
- •5.2 Summary of Terms
- •5.3 Instruction SET Reference
- •MPSADBW - Multiple Sum of Absolute Differences
- •PALIGNR - Byte Align
- •PBLENDW - Blend Packed Words
- •PHADDW/PHADDD - Packed Horizontal Add
- •PHADDSW - Packed Horizontal Add with Saturation
- •PHSUBW/PHSUBD - Packed Horizontal Subtract
- •PHSUBSW - Packed Horizontal Subtract with Saturation
- •PMOVSX - Packed Move with Sign Extend
- •PMOVZX - Packed Move with Zero Extend
- •PMULDQ - Multiply Packed Doubleword Integers
- •PMULHRSW - Multiply Packed Unsigned Integers with Round and Scale
- •PMULHUW - Multiply Packed Unsigned Integers and Store High Result
- •PMULHW - Multiply Packed Integers and Store High Result
- •PMULLW/PMULLD - Multiply Packed Integers and Store Low Result
- •PMULUDQ - Multiply Packed Unsigned Doubleword Integers
- •POR - Bitwise Logical Or
- •PSADBW - Compute Sum of Absolute Differences
- •PSHUFB - Packed Shuffle Bytes
- •PSHUFD - Shuffle Packed Doublewords
- •PSHUFLW - Shuffle Packed Low Words
- •PSIGNB/PSIGNW/PSIGND - Packed SIGN
- •PSLLDQ - Byte Shift Left
- •PSLLW/PSLLD/PSLLQ - Bit Shift Left
- •PSRAW/PSRAD - Bit Shift Arithmetic Right
- •PSRLDQ - Byte Shift Right
- •PSRLW/PSRLD/PSRLQ - Shift Packed Data Right Logical
- •PSUBB/PSUBW/PSUBD/PSUBQ -Packed Integer Subtract
- •PSUBSB/PSUBSW -Subtract Packed Signed Integers with Signed Saturation
- •PSUBUSB/PSUBUSW -Subtract Packed Unsigned Integers with Unsigned Saturation
- •PXOR - Exclusive Or
- •VPBLENDD - Blend Packed Dwords
- •VPERMD - Full Doublewords Element Permutation
- •VPERMPD - Permute Double-Precision Floating-Point Elements
- •VPERMPS - Permute Single-Precision Floating-Point Elements
- •VPERMQ - Qwords Element Permutation
- •VPSLLVD/VPSLLVQ - Variable Bit Shift Left Logical
- •VPSRAVD - Variable Bit Shift Right Arithmetic
- •VPSRLVD/VPSRLVQ - Variable Bit Shift Right Logical
- •VGATHERDPD/VGATHERQPD - Gather Packed DP FP values Using Signed Dword/Qword Indices
- •VGATHERDPS/VGATHERQPS - Gather Packed SP FP values Using Signed Dword/Qword Indices
- •VPGATHERDD/VPGATHERQD - Gather Packed Dword values Using Signed Dword/Qword Indices
- •VPGATHERDQ/VPGATHERQQ - Gather Packed Qword values Using Signed Dword/Qword Indices
- •6.1 FMA InstructIon SET Reference
- •Chapter 7 Instruction Set Reference - VEX-Encoded GPR Instructions
- •7.1 Instruction Format
- •7.2 INSTRUCTION SET REFERENCE
- •BZHI - Zero High Bits Starting with Specified Bit Position
- •INVPCID - Invalidate Processor Context ID
- •Chapter 8 Post-32nm Processor Instructions
- •8.1 Overview
- •8.2 CPUID Detection of New Instructions
- •8.4 Vector Instruction Exception Specification
- •8.6 Using RDRAND Instruction and Intrinsic
- •8.7 Instruction Reference
- •A.1 AVX Instructions
- •A.2 Promoted Vector Integer Instructions in AVX2
- •B.1 Using Opcode Tables
- •B.2 Key to Abbreviations
- •B.2.1 Codes for Addressing Method
- •B.2.2 Codes for Operand Type
- •B.2.3 Register Codes
- •B.2.4 Opcode Look-up Examples for One, Two, and Three-Byte Opcodes
- •B.2.4.1 One-Byte Opcode Instructions
- •B.2.4.2 Two-Byte Opcode Instructions
- •B.2.4.3 Three-Byte Opcode Instructions
- •B.2.4.4 VEX Prefix Instructions
- •B.2.5 Superscripts Utilized in Opcode Tables
- •B.3 One, Two, and THREE-Byte Opcode Maps
- •B.4.1 Opcode Look-up Examples Using Opcode Extensions
- •B.4.2 Opcode Extension Tables
- •B.5 Escape Opcode Instructions
- •B.5.1 Opcode Look-up Examples for Escape Instruction Opcodes
- •B.5.2 Escape Opcode Instruction Tables
- •B.5.2.1 Escape Opcodes with D8 as First Byte
- •B.5.2.2 Escape Opcodes with D9 as First Byte
- •B.5.2.3 Escape Opcodes with DA as First Byte
- •B.5.2.4 Escape Opcodes with DB as First Byte
- •B.5.2.5 Escape Opcodes with DC as First Byte
- •B.5.2.6 Escape Opcodes with DD as First Byte
- •B.5.2.7 Escape Opcodes with DE as First Byte
- •B.5.2.8 Escape Opcodes with DF As First Byte
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
INVPCID - Invalidate Processor Context ID
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F 38 82 /r |
A |
NE/V |
INVPCID |
Invalidates entries in the TLBs and |
INVPCID r32, m128 |
|
|
|
paging-structure caches based on |
|
|
|
|
invalidation type in r32 and |
|
|
|
|
descriptor in m128. |
66 0F 38 82 /r |
A |
V/NE |
INVPCID |
Invalidates entries in the TLBs and |
INVPCID r64, m128 |
|
|
|
paging-structure caches based on |
|
|
|
|
invalidation type in r64 and |
|
|
|
|
descriptor in m128. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (R) |
ModRM:r/m (R) |
NA |
NA |
|
|
|
|
|
Description
Invalidates mappings in the translation lookaside buffers (TLBs) and paging-struc- ture caches based on the invalidation type specified in the first operand and processor context identifier (PCID) invalidate descriptor specified in the second operand. The INVPCID descriptor is specified as a 16-byte memory operand and has no alignment restriction.
The layout of the INVPCID descriptor is shown in Figure 7-3. In 64-bit mode the linear address field (bits 127:64) in the INVPCID descriptor must satisfy canonical requirement unless the linear address field is ignored.
127 |
64 63 |
12 11 |
0 |
|||
|
Linear Address |
|
Reserved (must be zero) |
|
|
PCID |
Figure 7-3. INVPCID Descriptor
Outside IA-32e mode, the register operand is always 32 bits, regardless of the value of CS.D. In 64-bit mode the register operand has 64 bits; however, if bits 63:32 of the register operand are not zero, INVPCID fails due to an attempt to use an unsupported INVPCID type (see below).
Ref. # 319433-011 |
7-29 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
The INVPCID types supported by a logical processors are:
•Individual-address invalidation: If the INVPCID type is 0, the logical processor invalidates mappings for a single linear address and tagged with the PCID specified in the INVPCID descriptor, except global translations. The instruction may also invalidate global translations, mappings for other linear addresses, or mappings tagged with other PCIDs.
•Single-context invalidation: If the INVPCID type is 1, the logical processor invalidates all mappings tagged with the PCID specified in the INVPCID descriptor except global translations. In some cases, it may invalidate mappings for other PCIDs as well.
•All-context invalidation: If the INVPCID type is 2, the logical processor invalidates all mappings tagged with any PCID.
•All-context invalidation, retaining global translations: If the INVPCID type is 3, the logical processor invalidates all mappings tagged with any PCID except global translations, ignoring the INVPCID descriptor. The instruction may also invalidate global translations as well.
If an unsupported INVPCID type is specified, or if the reserved field in the descriptor is not zero, the instruction fails.
Outside IA-32e mode, the processor treats INVPCID as if all mappings are associated with PCID 000H.
Operation
INVPCID_TYPE ← value of register operand; // must be in the range of 0-3 INVPCID_DESC ← value of memory operand;
CASE INVPCID_TYPE OF
0: // individual-address invalidation retaining global translations OP_PCID ← INVPCID_DESC[11:0];
ADDR ← INVPCID_DESC[127:64];
Invalidate mappings for ADDR tagged with OP_PCID except global translations; BREAK;
1: |
// single PCID invalidation retaining globals |
|
OP_PCID ← INVPCID_DESC[11:0]; |
|
Invalidate all mappings tagged with OP_PCID except global translations; |
|
BREAK; |
2: |
// all PCID invalidation |
|
Invalidate all mappings tagged with any PCID; |
|
BREAK; |
3: |
// all PCID invalidation retaining global translations |
|
Invalidate all mappings tagged with any PCID except global translations; |
BREAK;
ESAC;
7-30 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
Intel C/C++ Compiler Intrinsic Equivalent
INVPCID void _invpcid(unsigned __int32 type, void * descriptor);
SIMD Floating-Point Exceptions
None
Protected Mode Exceptions
#GP(0) |
If the current privilege level is not 0. |
|
If the memory operand effective address is outside the CS, DS, |
|
ES, FS, or GS segment limit. |
|
If the DS, ES, FS, or GS register contains an unusable segment. |
|
If the source operand is located in an execute-only code |
|
segment. |
|
If an invalid type is specified in the register operand, i.e., |
|
INVPCID_TYPE > 3. |
|
If bits 63:12 of INVPCID_DESC are not all zero. |
|
If CR4.PCIDE=0, INVPCID_DESC[11:0] is not zero, and |
|
INVPCID_TYPE is either 0, or 1. |
#PF(fault-code) |
If a page fault occurs in accessing the memory operand. |
#SS(0) |
If the memory operand effective address is outside the SS |
|
segment limit. |
|
If the SS register contains an unusable segment. |
#UD |
If if CPUID.(EAX=07H, ECX=0H):EBX.INVPCID (bit 10) = 0. |
|
If the LOCK prefix is used. |
Real-Address Mode Exceptions
#GP(0) |
If an invalid type is specified in the register operand, i.e |
|
INVPCID_TYPE > 3. |
|
If bits 63:12 of INVPCID_DESC are not all zero. |
|
If CR4.PCIDE=0, INVPCID_DESC[11:0] is not zero, and |
|
INVPCID_TYPE is either 0, or 1. |
#UD |
If CPUID.(EAX=07H, ECX=0H):EBX.INVPCID (bit 10) = 0. |
|
If the LOCK prefix is used. |
Virtual-8086 Mode Exceptions
#UD The INVPCID instruction is not recognized in virtual-8086 mode.
Compatibility Mode Exceptions
Same exceptions as in protected mode.
Ref. # 319433-011 |
7-31 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
64-Bit Mode Exceptions
#GP(0) |
If the current privilege level is not 0. |
|
If the memory operand is in the CS, DS, ES, FS, or GS segments |
|
and the memory address is in a non-canonical form. |
|
If an invalid type is specified in the register operand. |
|
If an invalid type is specified in the register operand, i.e |
|
INVPCID_TYPE > 3. |
|
If bits 63:12 of INVPCID_DESC are not all zero. |
|
If CR4.PCIDE=0, INVPCID_DESC[11:0] is not zero, and |
|
INVPCID_TYPE is either 0, or 1. |
|
If INVPCID_TYPE is 0, INVPCID_DESC[127:64] is not a canon- |
|
ical address. |
#PF(fault-code) |
If a page fault occurs in accessing the memory operand. |
#SS(0) |
If the memory destination operand is in the SS segment and the |
|
memory address is in a non-canonical form. |
#UD |
If the LOCK prefix is used. |
|
If CPUID.(EAX=07H, ECX=0H):EBX.INVPCID (bit 10) = 0. |
7-32 |
Ref. # 319433-011 |