- •Chapter 1 Intel® Advanced Vector Extensions
- •1.1 About This Document
- •1.2 Overview
- •1.3.2 Instruction Syntax Enhancements
- •1.3.3 VEX Prefix Instruction Encoding Support
- •1.4 Overview AVX2
- •1.5 Functional Overview
- •1.6 General Purpose Instruction Set Enhancements
- •2.1 Detection of PCLMULQDQ and AES Instructions
- •2.2 Detection of AVX and FMA Instructions
- •2.2.1 Detection of FMA
- •2.2.3 Detection of AVX2
- •2.3.1 FMA Instruction Operand Order and Arithmetic Behavior
- •2.4 Accessing YMM Registers
- •2.5 Memory alignment
- •2.7 Instruction Exception Specification
- •2.7.1 Exceptions Type 1 (Aligned memory reference)
- •2.7.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned)
- •2.7.3 Exceptions Type 3 (<16 Byte memory argument)
- •2.7.5 Exceptions Type 5 (<16 Byte mem arg and no FP exceptions)
- •2.7.7 Exceptions Type 7 (No FP exceptions, no memory arg)
- •2.7.8 Exceptions Type 8 (AVX and no memory argument)
- •2.8.1 Clearing Upper YMM State Between AVX and Legacy SSE Instructions
- •2.8.3 Unaligned Memory Access and Buffer Size Management
- •2.9 CPUID Instruction
- •3.1 YMM State, VEX Prefix and Supported Operating Modes
- •3.2 YMM State Management
- •3.2.1 Detection of YMM State Support
- •3.2.2 Enabling of YMM State
- •3.2.4 The Layout of XSAVE Area
- •3.2.5 XSAVE/XRSTOR Interaction with YMM State and MXCSR
- •3.2.6 Processor Extended State Save Optimization and XSAVEOPT
- •3.2.6.1 XSAVEOPT Usage Guidelines
- •3.3 Reset Behavior
- •3.4 Emulation
- •4.1 Instruction Formats
- •4.1.1 VEX and the LOCK prefix
- •4.1.2 VEX and the 66H, F2H, and F3H prefixes
- •4.1.3 VEX and the REX prefix
- •4.1.4 The VEX Prefix
- •4.1.4.1 VEX Byte 0, bits[7:0]
- •4.1.4.2 VEX Byte 1, bit [7] - ‘R’
- •4.1.5 Instruction Operand Encoding and VEX.vvvv, ModR/M
- •4.1.6 The Opcode Byte
- •4.1.7 The MODRM, SIB, and Displacement Bytes
- •4.1.8 The Third Source Operand (Immediate Byte)
- •4.1.9.1 Vector Length Transition and Programming Considerations
- •4.1.10 AVX Instruction Length
- •4.2 Vector SIB (VSIB) Memory Addressing
- •4.3 VEX Encoding Support for GPR Instructions
- •5.1 Interpreting InstructIon Reference Pages
- •5.1.1 Instruction Format
- •5.1.2 Opcode Column in the Instruction Summary Table
- •5.1.3 Instruction Column in the Instruction Summary Table
- •5.1.4 Operand Encoding column in the Instruction Summary Table
- •5.1.5 64/32 bit Mode Support column in the Instruction Summary Table
- •5.1.6 CPUID Support column in the Instruction Summary Table
- •5.2 Summary of Terms
- •5.3 Instruction SET Reference
- •MPSADBW - Multiple Sum of Absolute Differences
- •PALIGNR - Byte Align
- •PBLENDW - Blend Packed Words
- •PHADDW/PHADDD - Packed Horizontal Add
- •PHADDSW - Packed Horizontal Add with Saturation
- •PHSUBW/PHSUBD - Packed Horizontal Subtract
- •PHSUBSW - Packed Horizontal Subtract with Saturation
- •PMOVSX - Packed Move with Sign Extend
- •PMOVZX - Packed Move with Zero Extend
- •PMULDQ - Multiply Packed Doubleword Integers
- •PMULHRSW - Multiply Packed Unsigned Integers with Round and Scale
- •PMULHUW - Multiply Packed Unsigned Integers and Store High Result
- •PMULHW - Multiply Packed Integers and Store High Result
- •PMULLW/PMULLD - Multiply Packed Integers and Store Low Result
- •PMULUDQ - Multiply Packed Unsigned Doubleword Integers
- •POR - Bitwise Logical Or
- •PSADBW - Compute Sum of Absolute Differences
- •PSHUFB - Packed Shuffle Bytes
- •PSHUFD - Shuffle Packed Doublewords
- •PSHUFLW - Shuffle Packed Low Words
- •PSIGNB/PSIGNW/PSIGND - Packed SIGN
- •PSLLDQ - Byte Shift Left
- •PSLLW/PSLLD/PSLLQ - Bit Shift Left
- •PSRAW/PSRAD - Bit Shift Arithmetic Right
- •PSRLDQ - Byte Shift Right
- •PSRLW/PSRLD/PSRLQ - Shift Packed Data Right Logical
- •PSUBB/PSUBW/PSUBD/PSUBQ -Packed Integer Subtract
- •PSUBSB/PSUBSW -Subtract Packed Signed Integers with Signed Saturation
- •PSUBUSB/PSUBUSW -Subtract Packed Unsigned Integers with Unsigned Saturation
- •PXOR - Exclusive Or
- •VPBLENDD - Blend Packed Dwords
- •VPERMD - Full Doublewords Element Permutation
- •VPERMPD - Permute Double-Precision Floating-Point Elements
- •VPERMPS - Permute Single-Precision Floating-Point Elements
- •VPERMQ - Qwords Element Permutation
- •VPSLLVD/VPSLLVQ - Variable Bit Shift Left Logical
- •VPSRAVD - Variable Bit Shift Right Arithmetic
- •VPSRLVD/VPSRLVQ - Variable Bit Shift Right Logical
- •VGATHERDPD/VGATHERQPD - Gather Packed DP FP values Using Signed Dword/Qword Indices
- •VGATHERDPS/VGATHERQPS - Gather Packed SP FP values Using Signed Dword/Qword Indices
- •VPGATHERDD/VPGATHERQD - Gather Packed Dword values Using Signed Dword/Qword Indices
- •VPGATHERDQ/VPGATHERQQ - Gather Packed Qword values Using Signed Dword/Qword Indices
- •6.1 FMA InstructIon SET Reference
- •Chapter 7 Instruction Set Reference - VEX-Encoded GPR Instructions
- •7.1 Instruction Format
- •7.2 INSTRUCTION SET REFERENCE
- •BZHI - Zero High Bits Starting with Specified Bit Position
- •INVPCID - Invalidate Processor Context ID
- •Chapter 8 Post-32nm Processor Instructions
- •8.1 Overview
- •8.2 CPUID Detection of New Instructions
- •8.4 Vector Instruction Exception Specification
- •8.6 Using RDRAND Instruction and Intrinsic
- •8.7 Instruction Reference
- •A.1 AVX Instructions
- •A.2 Promoted Vector Integer Instructions in AVX2
- •B.1 Using Opcode Tables
- •B.2 Key to Abbreviations
- •B.2.1 Codes for Addressing Method
- •B.2.2 Codes for Operand Type
- •B.2.3 Register Codes
- •B.2.4 Opcode Look-up Examples for One, Two, and Three-Byte Opcodes
- •B.2.4.1 One-Byte Opcode Instructions
- •B.2.4.2 Two-Byte Opcode Instructions
- •B.2.4.3 Three-Byte Opcode Instructions
- •B.2.4.4 VEX Prefix Instructions
- •B.2.5 Superscripts Utilized in Opcode Tables
- •B.3 One, Two, and THREE-Byte Opcode Maps
- •B.4.1 Opcode Look-up Examples Using Opcode Extensions
- •B.4.2 Opcode Extension Tables
- •B.5 Escape Opcode Instructions
- •B.5.1 Opcode Look-up Examples for Escape Instruction Opcodes
- •B.5.2 Escape Opcode Instruction Tables
- •B.5.2.1 Escape Opcodes with D8 as First Byte
- •B.5.2.2 Escape Opcodes with D9 as First Byte
- •B.5.2.3 Escape Opcodes with DA as First Byte
- •B.5.2.4 Escape Opcodes with DB as First Byte
- •B.5.2.5 Escape Opcodes with DC as First Byte
- •B.5.2.6 Escape Opcodes with DD as First Byte
- •B.5.2.7 Escape Opcodes with DE as First Byte
- •B.5.2.8 Escape Opcodes with DF As First Byte
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
CHAPTER 7 INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
This chapter describes the various general-purpose instructions, the majority of which are encoded using VEX prefix.
7.1INSTRUCTION FORMAT
The format used for describing each instruction as in the example below is described in chapter 5.
ANDN - Logical And Not
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
VEX.NDS.LZ.0F38.W0 F2 /r |
A |
V/V |
BMI1 |
Bitwise AND of inverted r32b |
|
|
|
|
with r/m32, store result in r32a. |
ANDN r32a, r32b, r/m32 |
|
|
|
|
VEX.NDS.LZ. 0F38.W1 F2 /r |
A |
V/NE |
BMI1 |
Bitwise AND of inverted r64b |
ANDN r64a, r64b, r/m64 |
|
|
|
with r/m64, store result in r64b. |
VEX.NDS.LZ.0F38.W0 F2 /r |
A |
V/V |
BMI1 |
Bitwise AND of inverted r32b |
|
|
|
|
with r/m32, store result in r32a |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (r, w) |
VEX.vvvv (r) |
ModRM:r/m (r) |
NA |
|
|
|
|
|
7.2INSTRUCTION SET REFERENCE
Ref. # 319433-011 |
7-1 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
ANDN — Logical AND NOT
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
VEX.NDS.LZ.0F38.W0 F2 /r |
A |
V/V |
BMI1 |
Bitwise AND of inverted r32b with |
ANDN r32a, r32b, r/m32 |
|
|
|
r/m32, store result in r32a. |
VEX.NDS.LZ. 0F38.W1 F2 /r |
A |
V/NE |
BMI1 |
Bitwise AND of inverted r64b with |
ANDN r64a, r64b, r/m64 |
|
|
|
r/m64, store result in r64a. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (W) |
VEX.vvvv |
ModRM:r/m (R) |
NA |
|
|
|
|
|
Description
Performs a bitwise logical AND of inverted second operand (the first source operand) with the third source operand (the second source operand). The result is stored in the first operand (destination operand).
This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
Operation
DEST ← (NOT SRC1) bitwiseAND SRC2;
SF ← DEST[OperandSize -1];
ZF ← (DEST = 0);
Flags Affected
SF and ZF are updated based on result. OF and CF flags are cleared. AF and PF flags are undefined.
Intel C/C++ Compiler Intrinsic Equivalent
Auto-generated from high-level language.
SIMD Floating-Point Exceptions
None
7-2 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
Other Exceptions
See Table 2-22.
Ref. # 319433-011 |
7-3 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
BEXTR — Bit Field Extract
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
VEX.NDS1.LZ.0F38.W0 F7 /r |
A |
V/V |
BMI1 |
Contiguous bitwise extract from |
BEXR r32a, r/m32, r32b |
|
|
|
r/m32 using r32b as control; store |
|
|
|
result in r32a. |
|
|
|
|
|
|
VEX.NDS1.LZ.0F38.W1 F7 /r |
A |
V/N.E. |
BMI1 |
Contiguous bitwise extract from |
BEXR r64a, r/m64, r64b |
|
|
|
r/m64 using r64b as control; store |
|
|
|
result in r64a |
|
|
|
|
|
|
|
|
|
|
|
NOTES:
1.ModRM:r/m is used to encode the first source operand (second operand) and VEX.vvvv encodes the second source operand (third operand).
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (W) |
ModRM:r/m (R) |
VEX.vvvv (R) |
NA |
|
|
|
|
|
Description
Extracts contiguous bits from the first source operand (the second operand) using an index value and length value specified in the second source operand (the third operand). Bit 7:0 of the first source operand specifies the starting bit position of bit extraction. A START value exceeding the operand size will not extract any bits from the second source operand. Bit 15:8 of the second source operand specifies the maximum number of bits (LENGTH) beginning at the START position to extract. Only bit positions up to (OperandSize -1) of the first source operand are extracted. The extracted bits are written to the destination register, starting from the least significant bit. All higher order bits in the destination operand (starting at bit position LENGTH) are zeroed. The destination register is cleared if no bits are extracted.
This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
Operation
START ← SRC2[7:0];
LEN ← SRC2[15:8];
TEMP ← ZERO_EXTEND_TO_512 (SRC1 );
DEST ← ZERO_EXTEND(TEMP[START+LEN -1: START]);
ZF ← (DEST = 0);
7-4 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
Flags Affected
ZF is updated based on the result. AF, SF, and PF are undefined. All other flags are cleared.
Intel C/C++ Compiler Intrinsic Equivalent
BEXTRunsigned __int32 _bextr_u32(unsigned __int32 src, unsigned __int32 start. unsigned __int32 len);
BEXTRunsigned __int64 _bextr_u64(unsigned __int64 src, unsigned __int32 start. unsigned __int32 len);
SIMD Floating-Point Exceptions
None
Other Exceptions
See Table 2-22.
Ref. # 319433-011 |
7-5 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
BLSI — Extract Lowest Set Isolated Bit
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
VEX.NDD.LZ.0F38.W0 F3 /3 |
A |
V/V |
BMI1 |
Extract lowest set bit from r/m32 |
BLSI r32, r/m32 |
|
|
|
and set that bit in r32. |
VEX.NDD.LZ.0F38.W1 F3 /3 |
A |
V/N.E. |
BMI1 |
Extract lowest set bit from r/m64, |
BLSI r64, r/m64 |
|
|
|
and set that bit in r64. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
VEX.vvvv (W) |
ModRM:r/m (R) |
NA |
NA |
|
|
|
|
|
Description
Extracts the lowest set bit from the source operand and set the corresponding bit in the destination register. All other bits in the destination operand are zeroed. If no bits are set in the source operand, BLSI sets all the bits in the destination to 0 and sets ZF and CF.
This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
Operation
temp ← (-SRC) bitwiseAND (SRC); SF ← temp[OperandSize -1];
ZF ← (temp = 0); IF SRC = 0
CF ← 0; ELSE
CF ← 1;
FI
DEST ← temp;
Flags Affected
ZF and SF are updated based on the result. CF is set if the source is not zero. OF flags are cleared. AF and PF flags are undefined.
7-6 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
Intel C/C++ Compiler Intrinsic Equivalent
BLSI unsigned __int32 _blsi_u32(unsigned __int32 src);
BLSI unsigned __int64 _blsi_u64(unsigned __int64 src);
SIMD Floating-Point Exceptions
None
Other Exceptions
See Table 2-22.
Ref. # 319433-011 |
7-7 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
BLSMSK — Get Mask Up to Lowest Set Bit
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
VEX.NDD.LZ.0F38.W0 F3 /2 |
A |
V/V |
BMI1 |
Set all lower bits in r32 to “1” start- |
BLSMSK r32, r/m32 |
|
|
|
ing from bit 0 to lowest set bit in |
|
|
|
|
r/m32. |
VEX.NDD.LZ.0F38.W1 F3 /2 |
A |
V/N.E. |
BMI1 |
Set all lower bits in r64 to “1” start- |
BLSMSK r64, r/m64 |
|
|
|
ing from bit 0 to lowest set bit in |
|
|
|
|
r/m64. |
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
VEX.vvvv (W) |
ModRM:r/m (R) |
NA |
NA |
|
|
|
|
|
Description
Sets all the lower bits of the destination operand to “1” up to and including lowest set bit (=1) in the source operand. If source operand is zero, BLSMSK sets all bits of the destination operand to 1 and also sets CF to 1.
This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
Operation
temp ← (SRC-1) XOR (SRC) ; SF ← temp[OperandSize -1]; ZF ← 0;
IF SRC = 0 CF ← 1;
ELSE
CF ← 0;
FI
DEST ← temp;
Flags Affected
SF is updated based on the result. CF is set if the source if zero. ZF and OF flags are cleared. AF and PF flag are undefined.
7-8 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
Intel C/C++ Compiler Intrinsic Equivalent
BLSMSK unsigned __int32 _blsmsk_u32(unsigned __int32 src);
BLSMSK unsigned __int64 _blsmsk_u64(unsigned __int64 src);
SIMD Floating-Point Exceptions
None
Other Exceptions
See Table 2-22.
Ref. # 319433-011 |
7-9 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
BLSRReset Lowest Set Bit
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
VEX.NDD.LZ.0F38.W0 F3 /1 |
A |
V/V |
BMI1 |
Reset lowest set bit of r/m32, keep |
BLSR r32, r/m32 |
|
|
|
all other bits of r/m32 and write |
|
|
|
|
result to r32. |
VEX.NDD.LZ.0F38.W1 F3 /1 |
A |
V/N.E. |
BMI1 |
Reset lowest set bit of r/m64, keep |
BLSR r64, r/m64 |
|
|
|
all other bits of r/m64 and write |
|
|
|
|
result to r64. |
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
VEX.vvvv (W) |
ModRM:r/m (R) |
NA |
NA |
|
|
|
|
|
Description
Copies all bits from the source operand to the destination operand and resets (=0) the bit position in the destination operand that corresponds to the lowest set bit of the source operand. If the source operand is zero BLSR sets CF.
This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
Operation
temp ← (SRC-1) bitwiseAND ( SRC ); SF ← temp[OperandSize -1];
ZF ← (temp = 0); IF SRC = 0
CF ← 1; ELSE
CF ← 0;
FI
DEST ← temp;
Flags Affected
ZF and SF flags are updated based on the result. CF is set if the source is zero. OF flag is cleared. AF and PF flags are undefined.
7-10 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE - VEX-ENCODED GPR INSTRUCTIONS
Intel C/C++ Compiler Intrinsic Equivalent
BLSR unsigned __int32 _blsr_u32(unsigned __int32 src);
BLSR unsigned __int64 _blsr_u64(unsigned __int64 src);
SIMD Floating-Point Exceptions
None
Other Exceptions
See Table 2-22.
Ref. # 319433-011 |
7-11 |