
- •Chapter 1 Intel® Advanced Vector Extensions
- •1.1 About This Document
- •1.2 Overview
- •1.3.2 Instruction Syntax Enhancements
- •1.3.3 VEX Prefix Instruction Encoding Support
- •1.4 Overview AVX2
- •1.5 Functional Overview
- •1.6 General Purpose Instruction Set Enhancements
- •2.1 Detection of PCLMULQDQ and AES Instructions
- •2.2 Detection of AVX and FMA Instructions
- •2.2.1 Detection of FMA
- •2.2.3 Detection of AVX2
- •2.3.1 FMA Instruction Operand Order and Arithmetic Behavior
- •2.4 Accessing YMM Registers
- •2.5 Memory alignment
- •2.7 Instruction Exception Specification
- •2.7.1 Exceptions Type 1 (Aligned memory reference)
- •2.7.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned)
- •2.7.3 Exceptions Type 3 (<16 Byte memory argument)
- •2.7.5 Exceptions Type 5 (<16 Byte mem arg and no FP exceptions)
- •2.7.7 Exceptions Type 7 (No FP exceptions, no memory arg)
- •2.7.8 Exceptions Type 8 (AVX and no memory argument)
- •2.8.1 Clearing Upper YMM State Between AVX and Legacy SSE Instructions
- •2.8.3 Unaligned Memory Access and Buffer Size Management
- •2.9 CPUID Instruction
- •3.1 YMM State, VEX Prefix and Supported Operating Modes
- •3.2 YMM State Management
- •3.2.1 Detection of YMM State Support
- •3.2.2 Enabling of YMM State
- •3.2.4 The Layout of XSAVE Area
- •3.2.5 XSAVE/XRSTOR Interaction with YMM State and MXCSR
- •3.2.6 Processor Extended State Save Optimization and XSAVEOPT
- •3.2.6.1 XSAVEOPT Usage Guidelines
- •3.3 Reset Behavior
- •3.4 Emulation
- •4.1 Instruction Formats
- •4.1.1 VEX and the LOCK prefix
- •4.1.2 VEX and the 66H, F2H, and F3H prefixes
- •4.1.3 VEX and the REX prefix
- •4.1.4 The VEX Prefix
- •4.1.4.1 VEX Byte 0, bits[7:0]
- •4.1.4.2 VEX Byte 1, bit [7] - ‘R’
- •4.1.5 Instruction Operand Encoding and VEX.vvvv, ModR/M
- •4.1.6 The Opcode Byte
- •4.1.7 The MODRM, SIB, and Displacement Bytes
- •4.1.8 The Third Source Operand (Immediate Byte)
- •4.1.9.1 Vector Length Transition and Programming Considerations
- •4.1.10 AVX Instruction Length
- •4.2 Vector SIB (VSIB) Memory Addressing
- •4.3 VEX Encoding Support for GPR Instructions
- •5.1 Interpreting InstructIon Reference Pages
- •5.1.1 Instruction Format
- •5.1.2 Opcode Column in the Instruction Summary Table
- •5.1.3 Instruction Column in the Instruction Summary Table
- •5.1.4 Operand Encoding column in the Instruction Summary Table
- •5.1.5 64/32 bit Mode Support column in the Instruction Summary Table
- •5.1.6 CPUID Support column in the Instruction Summary Table
- •5.2 Summary of Terms
- •5.3 Instruction SET Reference
- •MPSADBW - Multiple Sum of Absolute Differences
- •PALIGNR - Byte Align
- •PBLENDW - Blend Packed Words
- •PHADDW/PHADDD - Packed Horizontal Add
- •PHADDSW - Packed Horizontal Add with Saturation
- •PHSUBW/PHSUBD - Packed Horizontal Subtract
- •PHSUBSW - Packed Horizontal Subtract with Saturation
- •PMOVSX - Packed Move with Sign Extend
- •PMOVZX - Packed Move with Zero Extend
- •PMULDQ - Multiply Packed Doubleword Integers
- •PMULHRSW - Multiply Packed Unsigned Integers with Round and Scale
- •PMULHUW - Multiply Packed Unsigned Integers and Store High Result
- •PMULHW - Multiply Packed Integers and Store High Result
- •PMULLW/PMULLD - Multiply Packed Integers and Store Low Result
- •PMULUDQ - Multiply Packed Unsigned Doubleword Integers
- •POR - Bitwise Logical Or
- •PSADBW - Compute Sum of Absolute Differences
- •PSHUFB - Packed Shuffle Bytes
- •PSHUFD - Shuffle Packed Doublewords
- •PSHUFLW - Shuffle Packed Low Words
- •PSIGNB/PSIGNW/PSIGND - Packed SIGN
- •PSLLDQ - Byte Shift Left
- •PSLLW/PSLLD/PSLLQ - Bit Shift Left
- •PSRAW/PSRAD - Bit Shift Arithmetic Right
- •PSRLDQ - Byte Shift Right
- •PSRLW/PSRLD/PSRLQ - Shift Packed Data Right Logical
- •PSUBB/PSUBW/PSUBD/PSUBQ -Packed Integer Subtract
- •PSUBSB/PSUBSW -Subtract Packed Signed Integers with Signed Saturation
- •PSUBUSB/PSUBUSW -Subtract Packed Unsigned Integers with Unsigned Saturation
- •PXOR - Exclusive Or
- •VPBLENDD - Blend Packed Dwords
- •VPERMD - Full Doublewords Element Permutation
- •VPERMPD - Permute Double-Precision Floating-Point Elements
- •VPERMPS - Permute Single-Precision Floating-Point Elements
- •VPERMQ - Qwords Element Permutation
- •VPSLLVD/VPSLLVQ - Variable Bit Shift Left Logical
- •VPSRAVD - Variable Bit Shift Right Arithmetic
- •VPSRLVD/VPSRLVQ - Variable Bit Shift Right Logical
- •VGATHERDPD/VGATHERQPD - Gather Packed DP FP values Using Signed Dword/Qword Indices
- •VGATHERDPS/VGATHERQPS - Gather Packed SP FP values Using Signed Dword/Qword Indices
- •VPGATHERDD/VPGATHERQD - Gather Packed Dword values Using Signed Dword/Qword Indices
- •VPGATHERDQ/VPGATHERQQ - Gather Packed Qword values Using Signed Dword/Qword Indices
- •6.1 FMA InstructIon SET Reference
- •Chapter 7 Instruction Set Reference - VEX-Encoded GPR Instructions
- •7.1 Instruction Format
- •7.2 INSTRUCTION SET REFERENCE
- •BZHI - Zero High Bits Starting with Specified Bit Position
- •INVPCID - Invalidate Processor Context ID
- •Chapter 8 Post-32nm Processor Instructions
- •8.1 Overview
- •8.2 CPUID Detection of New Instructions
- •8.4 Vector Instruction Exception Specification
- •8.6 Using RDRAND Instruction and Intrinsic
- •8.7 Instruction Reference
- •A.1 AVX Instructions
- •A.2 Promoted Vector Integer Instructions in AVX2
- •B.1 Using Opcode Tables
- •B.2 Key to Abbreviations
- •B.2.1 Codes for Addressing Method
- •B.2.2 Codes for Operand Type
- •B.2.3 Register Codes
- •B.2.4 Opcode Look-up Examples for One, Two, and Three-Byte Opcodes
- •B.2.4.1 One-Byte Opcode Instructions
- •B.2.4.2 Two-Byte Opcode Instructions
- •B.2.4.3 Three-Byte Opcode Instructions
- •B.2.4.4 VEX Prefix Instructions
- •B.2.5 Superscripts Utilized in Opcode Tables
- •B.3 One, Two, and THREE-Byte Opcode Maps
- •B.4.1 Opcode Look-up Examples Using Opcode Extensions
- •B.4.2 Opcode Extension Tables
- •B.5 Escape Opcode Instructions
- •B.5.1 Opcode Look-up Examples for Escape Instruction Opcodes
- •B.5.2 Escape Opcode Instruction Tables
- •B.5.2.1 Escape Opcodes with D8 as First Byte
- •B.5.2.2 Escape Opcodes with D9 as First Byte
- •B.5.2.3 Escape Opcodes with DA as First Byte
- •B.5.2.4 Escape Opcodes with DB as First Byte
- •B.5.2.5 Escape Opcodes with DC as First Byte
- •B.5.2.6 Escape Opcodes with DD as First Byte
- •B.5.2.7 Escape Opcodes with DE as First Byte
- •B.5.2.8 Escape Opcodes with DF As First Byte
INSTRUCTION SET REFERENCE
for SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2/AVX/FP16/RDRAND/AVX2/BMI1/BMI2/LZ CNT support) that indicate processor support for the instruction. If the corresponding flag is ‘0’, the instruction will #UD.
5.2SUMMARY OF TERMS
•“Legacy SSE”: Refers to SSE, SSE2, SSE3, SSSE3, SSE4, and any future instruction sets referencing XMM registers and encoded without a VEX prefix.
•XGETBV, XSETBV, XSAVE, XRSTOR are defined in IA-32 Intel Architecture Software Developer’s Manual, Volumes 3A and Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B.
•VEX: refers to a two-byte or three-byte prefix. AVX and FMA instructions are encoded using a VEX prefix.
•VEX.vvvv. The VEX bitfield specifying a source or destination register (in 1’s complement form).
•rm_field: shorthand for the ModR/M r/m field and any REX.B
•reg_field: shorthand for the ModR/M reg field and any REX.R
•VLMAX: the maximum vector register width pertaining to the instruction. This is not the vector-length encoding in the instruction's prefix but is instead determined by the current value of XCR0. For existing processors, VLMAX is 256 whenever XFEATURE_ENABLED_MASK.YMM[bit 2] is 1. Future processors may defined new bits in XFEATURE_ENABLED_MASK whose setting may imply other values for VLMAX.
|
VLMAX Definition |
|
XCR0 Component |
|
VLMAX |
|
|
|
XCR0.YMM |
|
256 |
|
|
|
5.3INSTRUCTION SET REFERENCE
<AVX2 instructions are listed below>
Ref. # 319433-011 |
5-7 |
INSTRUCTION SET REFERENCE
MPSADBW - Multiple Sum of Absolute Differences
Opcode/ |
Op/ |
64/32- |
CPUID |
Description |
Instruction |
En |
bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F3A 42 /r ib |
A |
V/V |
SSE4_1 |
Sums absolute 8-bit integer |
MPSADBW xmm1, xmm2/m128, |
|
|
|
difference of adjacent groups |
imm8 |
|
|
|
of 4 byte integers in xmm1 |
|
|
|
|
and xmm2/m128 and writes |
|
|
|
|
the results in xmm1. Starting |
|
|
|
|
offsets within xmm1 and |
|
|
|
|
xmm2/m128 are determined |
|
|
|
|
by imm8. |
VEX.NDS.128.66.0F3A.WIG 42 /r |
B |
V/V |
AVX |
Sums absolute 8-bit integer |
ib |
|
|
|
difference of adjacent groups |
VMPSADBW xmm1, xmm2, |
|
|
|
of 4 byte integers in xmm2 |
xmm3/m128, imm8 |
|
|
|
and xmm3/m128 and writes |
|
|
|
|
the results in xmm1. Starting |
|
|
|
|
offsets within xmm2 and |
|
|
|
|
xmm3/m128 are determined |
|
|
|
|
by imm8. |
VEX.NDS.256.66.0F3A.WIG 42 /r |
B |
V/V |
AVX2 |
Sums absolute 8-bit integer |
ib |
|
|
|
difference of adjacent groups |
VMPSADBW ymm1, ymm2, |
|
|
|
of 4 byte integers in xmm2 |
ymm3/m256, imm8 |
|
|
|
and ymm3/m128 and writes |
|
|
|
|
the results in ymm1. Starting |
|
|
|
|
offsets within ymm2 and |
|
|
|
|
xmm3/m128 are determined |
|
|
|
|
by imm8. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (r, w) |
ModRM:r/m (r) |
NA |
NA |
B |
ModRM:reg (w) |
VEX.vvvv |
ModRM:r/m (r) |
NA |
|
|
|
|
|
Description
(V)MPSADBW sums the absolute difference of 4 unsigned bytes (block_2) in the second source operand with sequential groups of 4 unsigned bytes (block_1) in the first source operand. The immediate byte provides bit fields that specify the initial offset of block_1 within the first source operand, and the offset of block_2 within the second source operand. The offset granularity in both source operands are 32 bits. The sum-absolute-difference (SAD) operation is repeated 8 times for (V)MPSADW
5-8 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
between the same block_2 (fixed offset within the second source operand) and a variable block_1 (offset is shifted by 8 bits for each SAD operation) in the first source operand. Each 16-bit result of eight SAD operations is written to the respective word in the destination operand.
128-bit Legacy SSE version: Imm8[1:0]*32 specifies the bit offset of block_2 within the second source operand. Imm[2]*32 specifies the initial bit offset of the block_1 within the first source operand. The first source operand and destination operand are the same. The first source and destination operands are XMM registers. The second source operand is either an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM destination register remain unchanged. Bits 7:3 of the immediate byte are ignored.
VEX.128 encoded version: Imm8[1:0]*32 specifies the bit offset of block_2 within the second source operand. Imm[2]*32 specifies the initial bit offset of the block_1 within the first source operand. The first source and destination operands are XMM registers. The second source operand is either an XMM register or a 128-bit memory location. Bits (127:128) of the corresponding YMM register are zeroed. Bits 7:3 of the immediate byte are ignored.
VEX.256 encoded version: The sum-absolute-difference (SAD) operation is repeated 8 times for MPSADW between the same block_2 (fixed offset within the second source operand) and a variable block_1 (offset is shifted by 8 bits for each SAD operation) in the first source operand. Each 16-bit result of eight SAD operations between block_2 and block_1 is written to the respective word in the lower 128 bits of the destination operand.
Additionally, VMPSADBW performs another eight SAD operations on block_4 of the second source operand and block_3 of the first source operand. (Imm8[4:3]*32 + 128) specifies the bit offset of block_4 within the second source operand.
(Imm[5]*32+128) specifies the initial bit offset of the block_3 within the first source operand. Each 16-bit result of eight SAD operations between block_4 and block_3 is written to the respective word in the upper 128 bits of the destination operand.
The first source operand is a YMM register. The second source register can be a YMM register or a 256-bit memory location. The destination operand is a YMM register. Bits 7:6 of the immediate byte are ignored.
Ref. # 319433-011 |
5-9 |

INSTRUCTION SET REFERENCE
255 |
224 |
192 |
Imm[4:3]*32+128 |
128 |
|
||||
Src2 |
|
|
Imm[5]*32+128 |
|
|
|
|
||
Src1 |
|
|
|
|
255 |
|
|
144 |
128 |
Destination |
|
|
|
|
127 |
96 |
64 |
Imm[1:0]*32 |
0 |
|
||||
Src2 |
|
|
Imm[2]*32 |
|
|
|
|
||
Src1 |
|
|
|
|
127 |
|
|
16 |
0 |
Destination |
|
|
|
|
Figure 5-1. VMPSADBW Operation
Operation
VMPSADBW (VEX.256 encoded version)
SRC2_OFFSET imm8[1:0]*32
SRC1_OFFSET imm8[2]*32
SRC1_BYTE0 SRC1[SRC1_OFFSET+7:SRC1_OFFSET]
5-10 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
SRC1_BYTE1 SRC1[SRC1_OFFSET+15:SRC1_OFFSET+8]
SRC1_BYTE2 SRC1[SRC1_OFFSET+23:SRC1_OFFSET+16]
SRC1_BYTE3 SRC1[SRC1_OFFSET+31:SRC1_OFFSET+24]
SRC1_BYTE4 SRC1[SRC1_OFFSET+39:SRC1_OFFSET+32]
SRC1_BYTE5 SRC1[SRC1_OFFSET+47:SRC1_OFFSET+40]
SRC1_BYTE6 SRC1[SRC1_OFFSET+55:SRC1_OFFSET+48]
SRC1_BYTE7 SRC1[SRC1_OFFSET+63:SRC1_OFFSET+56]
SRC1_BYTE8 SRC1[SRC1_OFFSET+71:SRC1_OFFSET+64]
SRC1_BYTE9 SRC1[SRC1_OFFSET+79:SRC1_OFFSET+72]
SRC1_BYTE10 SRC1[SRC1_OFFSET+87:SRC1_OFFSET+80]
SRC2_BYTE0 SRC2[SRC2_OFFSET+7:SRC2_OFFSET]
SRC2_BYTE1 SRC2[SRC2_OFFSET+15:SRC2_OFFSET+8]
SRC2_BYTE2 SRC2[SRC2_OFFSET+23:SRC2_OFFSET+16]
SRC2_BYTE3 SRC2[SRC2_OFFSET+31:SRC2_OFFSET+24]
TEMP0 ABS(SRC1_BYTE0 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE1 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE2 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE3 - SRC2_BYTE3)
DEST[15:0] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE1 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE2 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE3 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE4 - SRC2_BYTE3)
DEST[31:16] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE2 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE3 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE4 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE5 - SRC2_BYTE3)
DEST[47:32] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE3 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE4 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE5 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE6 - SRC2_BYTE3)
DEST[63:48] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE4 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE5 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE6 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE7 - SRC2_BYTE3)
DEST[79:64] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE5 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE6 - SRC2_BYTE1)
Ref. # 319433-011 |
5-11 |
INSTRUCTION SET REFERENCE
TEMP2 ABS(SRC1_BYTE7 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE8 - SRC2_BYTE3) DEST[95:80] TEMP0 + TEMP1 + TEMP2 + TEMP3 TEMP0 ABS(SRC1_BYTE6 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE7 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE8 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE9 - SRC2_BYTE3)
DEST[111:96] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE7 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE8 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE9 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE10 - SRC2_BYTE3) DEST[127:112] TEMP0 + TEMP1 + TEMP2 + TEMP3
SRC2_OFFSET imm8[4:3]*32 + 128 SRC1_OFFSET imm8[5]*32 + 128
SRC1_BYTE0 SRC1[SRC1_OFFSET+7:SRC1_OFFSET] SRC1_BYTE1 SRC1[SRC1_OFFSET+15:SRC1_OFFSET+8] SRC1_BYTE2 SRC1[SRC1_OFFSET+23:SRC1_OFFSET+16] SRC1_BYTE3 SRC1[SRC1_OFFSET+31:SRC1_OFFSET+24] SRC1_BYTE4 SRC1[SRC1_OFFSET+39:SRC1_OFFSET+32] SRC1_BYTE5 SRC1[SRC1_OFFSET+47:SRC1_OFFSET+40] SRC1_BYTE6 SRC1[SRC1_OFFSET+55:SRC1_OFFSET+48] SRC1_BYTE7 SRC1[SRC1_OFFSET+63:SRC1_OFFSET+56] SRC1_BYTE8 SRC1[SRC1_OFFSET+71:SRC1_OFFSET+64] SRC1_BYTE9 SRC1[SRC1_OFFSET+79:SRC1_OFFSET+72] SRC1_BYTE10 SRC1[SRC1_OFFSET+87:SRC1_OFFSET+80]
SRC2_BYTE0 SRC2[SRC2_OFFSET+7:SRC2_OFFSET] SRC2_BYTE1 SRC2[SRC2_OFFSET+15:SRC2_OFFSET+8] SRC2_BYTE2 SRC2[SRC2_OFFSET+23:SRC2_OFFSET+16] SRC2_BYTE3 SRC2[SRC2_OFFSET+31:SRC2_OFFSET+24]
TEMP0 ABS(SRC1_BYTE0 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE1 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE2 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE3 - SRC2_BYTE3)
DEST[143:128] TEMP0 + TEMP1 + TEMP2 + TEMP3 TEMP0 ABS(SRC1_BYTE1 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE2 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE3 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE4 - SRC2_BYTE3)
5-12 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
DEST[159:144] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE2 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE3 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE4 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE5 - SRC2_BYTE3)
DEST[175:160] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE3 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE4 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE5 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE6 - SRC2_BYTE3)
DEST[191:176] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE4 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE5 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE6 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE7 - SRC2_BYTE3)
DEST[207:192] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE5 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE6 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE7 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE8 - SRC2_BYTE3)
DEST[223:208] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE6 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE7 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE8 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE9 - SRC2_BYTE3)
DEST[239:224] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE7 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE8 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE9 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE10 - SRC2_BYTE3)
DEST[255:240] TEMP0 + TEMP1 + TEMP2 + TEMP3
VMPSADBW (VEX.128 encoded version)
SRC2_OFFSET imm8[1:0]*32
SRC1_OFFSET imm8[2]*32
SRC1_BYTE0 SRC1[SRC1_OFFSET+7:SRC1_OFFSET]
SRC1_BYTE1 SRC1[SRC1_OFFSET+15:SRC1_OFFSET+8]
SRC1_BYTE2 SRC1[SRC1_OFFSET+23:SRC1_OFFSET+16]
SRC1_BYTE3 SRC1[SRC1_OFFSET+31:SRC1_OFFSET+24]
SRC1_BYTE4 SRC1[SRC1_OFFSET+39:SRC1_OFFSET+32]
SRC1_BYTE5 SRC1[SRC1_OFFSET+47:SRC1_OFFSET+40]
SRC1_BYTE6 SRC1[SRC1_OFFSET+55:SRC1_OFFSET+48]
Ref. # 319433-011 |
5-13 |
INSTRUCTION SET REFERENCE
SRC1_BYTE7 SRC1[SRC1_OFFSET+63:SRC1_OFFSET+56] SRC1_BYTE8 SRC1[SRC1_OFFSET+71:SRC1_OFFSET+64] SRC1_BYTE9 SRC1[SRC1_OFFSET+79:SRC1_OFFSET+72] SRC1_BYTE10 SRC1[SRC1_OFFSET+87:SRC1_OFFSET+80]
SRC2_BYTE0 SRC2[SRC2_OFFSET+7:SRC2_OFFSET] SRC2_BYTE1 SRC2[SRC2_OFFSET+15:SRC2_OFFSET+8] SRC2_BYTE2 SRC2[SRC2_OFFSET+23:SRC2_OFFSET+16] SRC2_BYTE3 SRC2[SRC2_OFFSET+31:SRC2_OFFSET+24]
TEMP0 ABS(SRC1_BYTE0 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE1 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE2 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE3 - SRC2_BYTE3)
DEST[15:0] TEMP0 + TEMP1 + TEMP2 + TEMP3 TEMP0 ABS(SRC1_BYTE1 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE2 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE3 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE4 - SRC2_BYTE3) DEST[31:16] TEMP0 + TEMP1 + TEMP2 + TEMP3 TEMP0 ABS(SRC1_BYTE2 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE3 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE4 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE5 - SRC2_BYTE3) DEST[47:32] TEMP0 + TEMP1 + TEMP2 + TEMP3 TEMP0 ABS(SRC1_BYTE3 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE4 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE5 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE6 - SRC2_BYTE3) DEST[63:48] TEMP0 + TEMP1 + TEMP2 + TEMP3 TEMP0 ABS(SRC1_BYTE4 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE5 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE6 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE7 - SRC2_BYTE3) DEST[79:64] TEMP0 + TEMP1 + TEMP2 + TEMP3 TEMP0 ABS(SRC1_BYTE5 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE6 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE7 - SRC2_BYTE2) TEMP3 ABS(SRC1_BYTE8 - SRC2_BYTE3) DEST[95:80] TEMP0 + TEMP1 + TEMP2 + TEMP3 TEMP0 ABS(SRC1_BYTE6 - SRC2_BYTE0) TEMP1 ABS(SRC1_BYTE7 - SRC2_BYTE1) TEMP2 ABS(SRC1_BYTE8 - SRC2_BYTE2)
5-14 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
TEMP3 ABS(SRC1_BYTE9 - SRC2_BYTE3)
DEST[111:96] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(SRC1_BYTE7 - SRC2_BYTE0)
TEMP1 ABS(SRC1_BYTE8 - SRC2_BYTE1)
TEMP2 ABS(SRC1_BYTE9 - SRC2_BYTE2)
TEMP3 ABS(SRC1_BYTE10 - SRC2_BYTE3)
DEST[127:112] TEMP0 + TEMP1 + TEMP2 + TEMP3
DEST[VLMAX:128] 0
MPSADBW (128-bit Legacy SSE version)
SRC_OFFSET imm8[1:0]*32
DEST_OFFSET imm8[2]*32
DEST_BYTE0 DEST[DEST_OFFSET+7:DEST_OFFSET]
DEST_BYTE1 DEST[DEST_OFFSET+15:DEST_OFFSET+8]
DEST_BYTE2 DEST[DEST_OFFSET+23:DEST_OFFSET+16]
DEST_BYTE3 DEST[DEST_OFFSET+31:DEST_OFFSET+24]
DEST_BYTE4 DEST[DEST_OFFSET+39:DEST_OFFSET+32]
DEST_BYTE5 DEST[DEST_OFFSET+47:DEST_OFFSET+40]
DEST_BYTE6 DEST[DEST_OFFSET+55:DEST_OFFSET+48]
DEST_BYTE7 DEST[DEST_OFFSET+63:DEST_OFFSET+56]
DEST_BYTE8 DEST[DEST_OFFSET+71:DEST_OFFSET+64]
DEST_BYTE9 DEST[DEST_OFFSET+79:DEST_OFFSET+72]
DEST_BYTE10 DEST[DEST_OFFSET+87:DEST_OFFSET+80]
SRC_BYTE0 SRC[SRC_OFFSET+7:SRC_OFFSET]
SRC_BYTE1 SRC[SRC_OFFSET+15:SRC_OFFSET+8]
SRC_BYTE2 SRC[SRC_OFFSET+23:SRC_OFFSET+16]
SRC_BYTE3 SRC[SRC_OFFSET+31:SRC_OFFSET+24]
TEMP0 ABS(DEST_BYTE0 - SRC_BYTE0)
TEMP1 ABS(DEST_BYTE1 - SRC_BYTE1)
TEMP2 ABS(DEST_BYTE2 - SRC_BYTE2)
TEMP3 ABS(DEST_BYTE3 - SRC_BYTE3)
DEST[15:0] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(DEST_BYTE1 - SRC_BYTE0)
TEMP1 ABS(DEST_BYTE2 - SRC_BYTE1)
TEMP2 ABS(DEST_BYTE3 - SRC_BYTE2)
TEMP3 ABS(DEST_BYTE4 - SRC_BYTE3)
DEST[31:16] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(DEST_BYTE2 - SRC_BYTE0)
TEMP1 ABS(DEST_BYTE3 - SRC_BYTE1)
Ref. # 319433-011 |
5-15 |
INSTRUCTION SET REFERENCE
TEMP2 ABS(DEST_BYTE4 - SRC_BYTE2)
TEMP3 ABS(DEST_BYTE5 - SRC_BYTE3)
DEST[47:32] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(DEST_BYTE3 - SRC_BYTE0)
TEMP1 ABS(DEST_BYTE4 - SRC_BYTE1)
TEMP2 ABS(DEST_BYTE5 - SRC_BYTE2)
TEMP3 ABS(DEST_BYTE6 - SRC_BYTE3)
DEST[63:48] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(DEST_BYTE4 - SRC_BYTE0)
TEMP1 ABS(DEST_BYTE5 - SRC_BYTE1)
TEMP2 ABS(DEST_BYTE6 - SRC_BYTE2)
TEMP3 ABS(DEST_BYTE7 - SRC_BYTE3)
DEST[79:64] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(DEST_BYTE5 - SRC_BYTE0)
TEMP1 ABS(DEST_BYTE6 - SRC_BYTE1)
TEMP2 ABS(DEST_BYTE7 - SRC_BYTE2)
TEMP3 ABS(DEST_BYTE8 - SRC_BYTE3)
DEST[95:80] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(DEST_BYTE6 - SRC_BYTE0)
TEMP1 ABS(DEST_BYTE7 - SRC_BYTE1)
TEMP2 ABS(DEST_BYTE8 - SRC_BYTE2)
TEMP3 ABS(DEST_BYTE9 - SRC_BYTE3)
DEST[111:96] TEMP0 + TEMP1 + TEMP2 + TEMP3
TEMP0 ABS(DEST_BYTE7 - SRC_BYTE0)
TEMP1 ABS(DEST_BYTE8 - SRC_BYTE1)
TEMP2 ABS(DEST_BYTE9 - SRC_BYTE2)
TEMP3 ABS(DEST_BYTE10 - SRC_BYTE3)
DEST[127:112] TEMP0 + TEMP1 + TEMP2 + TE
DEST[VLMAX:128] (Unmodified)
Intel C/C++ Compiler Intrinsic Equivalent
(V)MPSADBW __m128i _mm_mpsadbw_epu8 (__m128i s1, __m128i s2, const int mask);
VMPSADBW __m256i _mm256_mpsadbw_epu8 (__m256i s1, __m256i s2, const int mask);
SIMD Floating-Point Exceptions
None
Other Exceptions
See Exceptions Type 4
5-16 |
Ref. # 319433-011 |
|
|
|
|
INSTRUCTION SET REFERENCE |
PABSB/PABSW/PABSD — Packed Absolute Value |
||||
|
|
|
|
|
Opcode/ |
Op/ |
64/32- |
CPUID |
Description |
Instruction |
En |
bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F 38 1C /r |
A |
V/V |
SSSE3 |
Compute the absolute value of |
PABSB xmm1, xmm2/m128 |
|
|
|
bytes in xmm2/m128 and |
|
|
|
|
store UNSIGNED result in |
|
|
|
|
xmm1. |
66 0F 38 1D /r |
A |
V/V |
SSSE3 |
Compute the absolute value of |
PABSW xmm1, xmm2/m128 |
|
|
|
16-bit integers in |
|
|
|
|
xmm2/m128 and store |
|
|
|
|
UNSIGNED result in xmm1. |
66 0F 38 1E /r |
A |
V/V |
SSSE3 |
Compute the absolute value of |
PABSD xmm1, xmm2/m128 |
|
|
|
32-bit integers in |
|
|
|
|
xmm2/m128 and store |
|
|
|
|
UNSIGNED result in xmm1. |
VEX.128.66.0F38.WIG 1C /r |
A |
V/V |
AVX |
Compute the absolute value of |
VPABSB xmm1, xmm2/m128 |
|
|
|
bytes in xmm2/m128 and |
|
|
|
|
store UNSIGNED result in |
|
|
|
|
xmm1. |
VEX.128.66.0F38.WIG 1D /r |
A |
V/V |
AVX |
Compute the absolute value of |
VPABSW xmm1, xmm2/m128 |
|
|
|
16-bit integers in |
|
|
|
|
xmm2/m128 and store |
|
|
|
|
UNSIGNED result in xmm1. |
VEX.128.66.0F38.WIG 1E /r |
A |
V/V |
AVX |
Compute the absolute value of |
VPABSD xmm1, xmm2/m128 |
|
|
|
32-bit integers in |
|
|
|
|
xmm2/m128 and store |
|
|
|
|
UNSIGNED result in xmm1. |
VEX.256.66.0F38.WIG 1C /r |
A |
V/V |
AVX2 |
Compute the absolute value of |
VPABSB ymm1, ymm2/m256 |
|
|
|
bytes in ymm2/m256 and |
|
|
|
|
store UNSIGNED result in |
|
|
|
|
ymm1. |
|
|
|
|
|
Ref. # 319433-011 |
5-17 |
INSTRUCTION SET REFERENCE |
|
|
|
|
|
|
|
|
|
Opcode/ |
Op/ |
64/32- |
CPUID |
Description |
Instruction |
En |
bit |
Feature |
|
|
|
Mode |
Flag |
|
VEX.256.66.0F38.WIG 1D /r |
A |
V/V |
AVX2 |
Compute the absolute value of |
VPABSW ymm1, ymm2/m256 |
|
|
|
16-bit integers in ymm2/m256 |
|
|
|
|
and store UNSIGNED result in |
|
|
|
|
ymm1. |
VEX.256.66.0F38.WIG 1E /r |
A |
V/V |
AVX2 |
Compute the absolute value of |
VPABSD ymm1, ymm2/m256 |
|
|
|
32-bit integers in ymm2/m256 |
|
|
|
|
and store UNSIGNED result in |
|
|
|
|
ymm1. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (w) |
ModRM:r/m (r) |
NA |
NA |
|
|
|
|
|
Description
(V)PABSB/W/D computes the absolute value of each data element of the source operand (the second operand) and stores the UNSIGNED results in the destination operand (the first operand). (V)PABSB operates on signed bytes, (V)PABSW operates on 16-bit words, and (V)PABSD operates on signed 32-bit integers. The source operand can be an XMM register or a YMM register or a 128-bit memory location or 256-bit memory location. The destination operand can be an XMM or a YMM register. Both operands can be MMX register or XMM registers.
VEX.256 encoded version: The first source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.
VEX.128 encoded version: The source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.
128-bit Legacy SSE version: The source operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (255:128) of the corresponding YMM register destination are unmodified.
Operation
PABSB with 128 bit operands:
Unsigned DEST[7:0] ABS(SRC[7:.0])
Repeat operation for 2nd through 15th bytes
Unsigned DEST[127:120] ABS(SRC[127:120])
5-18 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
VPABSB with 128 bit operands:
Unsigned DEST[7:0] ABS(SRC[7:.0])
Repeat operation for 2nd through 15th bytes
Unsigned DEST[127:120] ABS(SRC[127:120])
VPABSB with 256 bit operands:
Unsigned DEST[7:0] ABS(SRC[7:.0])
Repeat operation for 2nd through 31st bytes
Unsigned DEST[255:248] ABS(SRC[255:248])
PABSW with 128 bit operands:
Unsigned DEST[15:0] ABS(SRC[15:0])
Repeat operation for 2nd through 7th 16-bit words
Unsigned DEST[127:112] ABS(SRC[127:112])
VPABSW with 128 bit operands:
Unsigned DEST[15:0] ABS(SRC[15:0])
Repeat operation for 2nd through 7th 16-bit words
Unsigned DEST[127:112] ABS(SRC[127:112])
VPABSW with 256 bit operands:
Unsigned DEST[15:0] ABS(SRC[15:0])
Repeat operation for 2nd through 15th 16-bit words
Unsigned DEST[255:240] ABS(SRC[255:240])
PABSD with 128 bit operands:
Unsigned DEST[31:0] ABS(SRC[31:0])
Repeat operation for 2nd through 3rd 32-bit double words
Unsigned DEST[127:96] ABS(SRC[127:96])
VPABSD with 128 bit operands:
Unsigned DEST[31:0] ABS(SRC[31:0])
Repeat operation for 2nd through 3rd 32-bit double words
Unsigned DEST[127:96] ABS(SRC[127:96])
VPABSD with 256 bit operands:
Unsigned DEST[31:0] ABS(SRC[31:0])
Repeat operation for 2nd through 7th 32-bit double words
Unsigned DEST[255:224] ABS(SRC[255:224])
Intel C/C++ Compiler Intrinsic Equivalent
PABSB__m128i _mm_abs_epi8 (__m128i a)
Ref. # 319433-011 |
5-19 |
INSTRUCTION SET REFERENCE
VPABSB__m128i _mm_abs_epi8 (__m128i a) VPABSB__m256i _mm256_abs_epi8 (__m256i a) PABSW__m128i _mm_abs_epi16 (__m128i a) VPABSW__m128i _mm_abs_epi16 (__m128i a) VPABSW__m256i _mm256_abs_epi16 (__m256i a) PABSD__m128i _mm_abs_epi32 (__m128i a) VPABSD__m128i _mm_abs_epi32 (__m128i a) VPABSD__m256i _mm256_abs_epi32 (__m256i a)
SIMD Floating-Point Exceptions
None
Other Exceptions
See Exceptions Type 4
5-20 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
PACKSSWB/PACKSSDW—Pack with Signed Saturation
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F 63 /r |
A |
V/V |
SSE2 |
Converts 8 packed signed word |
PACKSSWB xmm1, xmm2/m128 |
|
|
|
integers from xmm1 and from |
|
|
|
|
xxm2/m128 into 16 packed |
|
|
|
|
signed byte integers in xmm1 |
|
|
|
|
using signed saturation. |
66 0F 6B /r |
A |
V/V |
SSE2 |
Converts 4 packed signed dou- |
PACKSSDW xmm1, xmm2/m128 |
|
|
|
bleword integers from xmm1 |
|
|
|
|
and from xmm2/m128 into 8 |
|
|
|
|
packed signed word integers in |
|
|
|
|
xmm1using signed saturation. |
VEX.NDS.128.66.0F.WIG 63 /r |
B |
V/V |
AVX |
Converts 8 packed signed word |
VPACKSSWB xmm1, xmm2, |
|
|
|
integers from xmm2 and from |
xmm3/m128 |
|
|
|
xmm3/m128 into 16 packed |
|
|
|
|
signed byte integers in xmm1 |
|
|
|
|
using signed saturation. |
VEX.NDS.128.66.0F.WIG 6B /r |
B |
V/V |
AVX |
Converts 4 packed signed dou- |
VPACKSSDW xmm1, xmm2, |
|
|
|
bleword integers from xmm2 |
xmm3/m128 |
|
|
|
and from xmm3/m128 into 8 |
|
|
|
|
packed signed word integers in |
|
|
|
|
xmm1using signed saturation. |
VEX.NDS.256.66.0F.WIG 63 /r |
B |
V/V |
AVX2 |
Converts 16 packed signed word |
VPACKSSWB ymm1, ymm2, |
|
|
|
integers from ymm2 and from |
ymm3/m256 |
|
|
|
ymm3/m256 into 32 packed |
|
|
|
|
signed byte integers in ymm1 |
|
|
|
|
using signed saturation. |
VEX.NDS.256.66.0F.WIG 6B /r |
B |
V/V |
AVX2 |
Converts 8 packed signed dou- |
VPACKSSDW ymm1, ymm2, |
|
|
|
bleword integers from ymm2 |
ymm3/m256 |
|
|
|
and from ymm3/m256 into 16 |
|
|
|
|
packed signed word integers in |
|
|
|
|
ymm1using signed saturation. |
|
|
|
|
|
Ref. # 319433-011 |
5-21 |
INSTRUCTION SET REFERENCE
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (r, w) |
ModRM:r/m (r) |
NA |
NA |
B |
ModRM:reg (w) |
VEX.vvvv |
ModRM:r/m (r) |
NA |
|
|
|
|
|
Description
The PACKSSWB or VPACKSSWB instruction converts 8 or 16 signed word integers from the first source operand and 8 or 16 signed word integers from the second source operand into 16 or 32 signed byte integers and stores the result in the destination operand. If a signed word integer value is beyond the range of a signed byte integer (that is, greater than 7FH for a positive integer or greater than 80H for a negative integer), the saturated signed byte integer value of 7FH or 80H, respectively, is stored in the destination.
The PACKSSDW instruction packs 4 or 8 signed doublewords from the first source operand and 4 or 8 signed doublewords from the second source operand into 8 or 16 signed words in the destination operand. If a signed doubleword integer value is beyond the range of a signed word (that is, greater than 7FFFH for a positive integer or greater than 8000H for a negative integer), the saturated signed word integer value of 7FFFH or 8000H, respectively, is stored into the destination.
VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.
VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.
128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (255:128) of the corresponding YMM register destination are unmodified.
Operation
PACKSSWB instruction (128-bit Legacy SSE version)
DEST[7:0] SaturateSignedWordToSignedByte (DEST[15:0]);
DEST[15:8] SaturateSignedWordToSignedByte (DEST[31:16]);
DEST[23:16] SaturateSignedWordToSignedByte (DEST[47:32]);
DEST[31:24] SaturateSignedWordToSignedByte (DEST[63:48]);
DEST[39:32] SaturateSignedWordToSignedByte (DEST[79:64]);
DEST[47:40] SaturateSignedWordToSignedByte (DEST[95:80]);
DEST[55:48] SaturateSignedWordToSignedByte (DEST[111:96]);
DEST[63:56] SaturateSignedWordToSignedByte (DEST[127:112]);
DEST[71:64] SaturateSignedWordToSignedByte (SRC[15:0]);
5-22 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
DEST[79:72] SaturateSignedWordToSignedByte (SRC[31:16]);
DEST[87:80] SaturateSignedWordToSignedByte (SRC[47:32]);
DEST[95:88] SaturateSignedWordToSignedByte (SRC[63:48]);
DEST[103:96] SaturateSignedWordToSignedByte (SRC[79:64]);
DEST[111:104] SaturateSignedWordToSignedByte (SRC[95:80]);
DEST[119:112] SaturateSignedWordToSignedByte (SRC[111:96]);
DEST[127:120] SaturateSignedWordToSignedByte (SRC[127:112]);
PACKSSDW instruction (128-bit Legacy SSE version)
DEST[15:0] SaturateSignedDwordToSignedWord (DEST[31:0]);
DEST[31:16] SaturateSignedDwordToSignedWord (DEST[63:32]);
DEST[47:32] SaturateSignedDwordToSignedWord (DEST[95:64]);
DEST[63:48] SaturateSignedDwordToSignedWord (DEST[127:96]);
DEST[79:64] SaturateSignedDwordToSignedWord (SRC[31:0]);
DEST[95:80] SaturateSignedDwordToSignedWord (SRC[63:32]);
DEST[111:96] SaturateSignedDwordToSignedWord (SRC[95:64]);
DEST[127:112] SaturateSignedDwordToSignedWord (SRC[127:96]);
VPACKSSWB instruction (VEX.128 encoded version)
DEST[7:0] SaturateSignedWordToSignedByte (SRC1[15:0]); DEST[15:8] SaturateSignedWordToSignedByte (SRC1[31:16]); DEST[23:16] SaturateSignedWordToSignedByte (SRC1[47:32]); DEST[31:24] SaturateSignedWordToSignedByte (SRC1[63:48]); DEST[39:32] SaturateSignedWordToSignedByte (SRC1[79:64]); DEST[47:40] SaturateSignedWordToSignedByte (SRC1[95:80]); DEST[55:48] SaturateSignedWordToSignedByte (SRC1[111:96]); DEST[63:56] SaturateSignedWordToSignedByte (SRC1[127:112]); DEST[71:64] SaturateSignedWordToSignedByte (SRC2[15:0]); DEST[79:72] SaturateSignedWordToSignedByte (SRC2[31:16]); DEST[87:80] SaturateSignedWordToSignedByte (SRC2[47:32]); DEST[95:88] SaturateSignedWordToSignedByte (SRC2[63:48]); DEST[103:96] SaturateSignedWordToSignedByte (SRC2[79:64]); DEST[111:104] SaturateSignedWordToSignedByte (SRC2[95:80]); DEST[119:112] SaturateSignedWordToSignedByte (SRC2[111:96]); DEST[127:120] SaturateSignedWordToSignedByte (SRC2[127:112]); DEST[VLMAX:128] 0;
VPACKSSDW instruction (VEX.128 encoded version)
DEST[15:0] SaturateSignedDwordToSignedWord (SRC1[31:0]);
DEST[31:16] SaturateSignedDwordToSignedWord (SRC1[63:32]);
DEST[47:32] SaturateSignedDwordToSignedWord (SRC1[95:64]);
DEST[63:48] SaturateSignedDwordToSignedWord (SRC1[127:96]);
DEST[79:64] SaturateSignedDwordToSignedWord (SRC2[31:0]);
Ref. # 319433-011 |
5-23 |
INSTRUCTION SET REFERENCE
DEST[95:80] SaturateSignedDwordToSignedWord (SRC2[63:32]);
DEST[111:96] SaturateSignedDwordToSignedWord (SRC2[95:64]);
DEST[127:112] SaturateSignedDwordToSignedWord (SRC2[127:96]);
DEST[VLMAX:128] 0;
VPACKSSWB instruction (VEX.256 encoded version)
DEST[7:0] SaturateSignedWordToSignedByte (SRC1[15:0]); DEST[15:8] SaturateSignedWordToSignedByte (SRC1[31:16]); DEST[23:16] SaturateSignedWordToSignedByte (SRC1[47:32]); DEST[31:24] SaturateSignedWordToSignedByte (SRC1[63:48]); DEST[39:32] SaturateSignedWordToSignedByte (SRC1[79:64]); DEST[47:40] SaturateSignedWordToSignedByte (SRC1[95:80]); DEST[55:48] SaturateSignedWordToSignedByte (SRC1[111:96]); DEST[63:56] SaturateSignedWordToSignedByte (SRC1[127:112]); DEST[71:64] SaturateSignedWordToSignedByte (SRC2[15:0]); DEST[79:72] SaturateSignedWordToSignedByte (SRC2[31:16]); DEST[87:80] SaturateSignedWordToSignedByte (SRC2[47:32]); DEST[95:88] SaturateSignedWordToSignedByte (SRC2[63:48]); DEST[103:96] SaturateSignedWordToSignedByte (SRC2[79:64]); DEST[111:104] SaturateSignedWordToSignedByte (SRC2[95:80]); DEST[119:112] SaturateSignedWordToSignedByte (SRC2[111:96]); DEST[127:120] SaturateSignedWordToSignedByte (SRC2[127:112]); DEST[135:128] SaturateSignedWordToSignedByte (SRC1[143:128]); DEST[143:136] SaturateSignedWordToSignedByte (SRC1[159:144]); DEST[151:144] SaturateSignedWordToSignedByte (SRC1[175:160]); DEST[159:152] SaturateSignedWordToSignedByte (SRC1[191:176]); DEST[167:160] SaturateSignedWordToSignedByte (SRC1[207:192]); DEST[175:168] SaturateSignedWordToSignedByte (SRC1[223:208]); DEST[183:176] SaturateSignedWordToSignedByte (SRC1[239:224]); DEST[191:184] SaturateSignedWordToSignedByte (SRC1[255:240]); DEST[199:192] SaturateSignedWordToSignedByte (SRC2[143:128]); DEST[207:200] SaturateSignedWordToSignedByte (SRC2[159:144]); DEST[215:208] SaturateSignedWordToSignedByte (SRC2[175:160]); DEST[223:216] SaturateSignedWordToSignedByte (SRC2[191:176]); DEST[231:224] SaturateSignedWordToSignedByte (SRC2[207:192]); DEST[239:232] SaturateSignedWordToSignedByte (SRC2[223:208]); DEST[247:240] SaturateSignedWordToSignedByte (SRC2[239:224]); DEST[255:248] SaturateSignedWordToSignedByte (SRC2[255:240]);
VPACKSSDW instruction (VEX.256 encoded version)
DEST[15:0] SaturateSignedDwordToSignedWord (SRC1[31:0]); DEST[31:16] SaturateSignedDwordToSignedWord (SRC1[63:32]); DEST[47:32] SaturateSignedDwordToSignedWord (SRC1[95:64]);
5-24 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
DEST[63:48] SaturateSignedDwordToSignedWord (SRC1[127:96]);
DEST[79:64] SaturateSignedDwordToSignedWord (SRC2[31:0]);
DEST[95:80] SaturateSignedDwordToSignedWord (SRC2[63:32]);
DEST[111:96] SaturateSignedDwordToSignedWord (SRC2[95:64]);
DEST[127:112] SaturateSignedDwordToSignedWord (SRC2[127:96]);
DEST[143:128] SaturateSignedDwordToSignedWord (SRC1[159:128]);
DEST[159:144] SaturateSignedDwordToSignedWord (SRC1[191:160]);
DEST[175:160] SaturateSignedDwordToSignedWord (SRC1[223:192]);
DEST[191:176] SaturateSignedDwordToSignedWord (SRC1[255:224]);
DEST[207:192] SaturateSignedDwordToSignedWord (SRC2[159:128]);
DEST[223:208] SaturateSignedDwordToSignedWord (SRC2[191:160]);
DEST[239:224] SaturateSignedDwordToSignedWord (SRC2[223:192]);
DEST[255:240] SaturateSignedDwordToSignedWord (SRC2[255:224]);
Intel C/C++ Compiler Intrinsic Equivalent
(V)PACKSSWB __m128i _mm_packs_epi16(__m128i m1, __m128i m2)
(V)PACKSSDW __m128i _mm_packs_epi32(__m128i m1, __m128i m2)
VPACKSSWB __m256i _mm256_packs_epi16(__m256i m1, __m256i m2)
VPACKSSDW __m256i _mm256_packs_epi32(__m256i m1, __m256i m2)
SIMD Floating-Point Exceptions
None
Other Exceptions
See Exceptions Type 4
Ref. # 319433-011 |
5-25 |
INSTRUCTION SET REFERENCE
PACKUSDW — Pack with Unsigned Saturation
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F 38 2B /r |
A |
V/V |
SSE4_1 |
Convert 4 packed signed dou- |
PACKUSDW xmm1, xmm2/m128 |
|
|
|
bleword integers from xmm1 |
|
|
|
|
and 4 packed signed double- |
|
|
|
|
word integers from |
|
|
|
|
xmm2/m128 into 8 packed |
|
|
|
|
unsigned word integers in |
|
|
|
|
xmm1 using unsigned satura- |
|
|
|
|
tion. |
VEX.NDS.128.66.0F38.WIG 2B /r |
B |
V/V |
AVX |
Convert 4 packed signed dou- |
VPACKUSDW xmm1,xmm2, |
|
|
|
bleword integers from xmm2 |
xmm3/m128 |
|
|
|
and 4 packed signed double- |
|
|
|
|
word integers from |
|
|
|
|
xmm3/m128 into 8 packed |
|
|
|
|
unsigned word integers in |
|
|
|
|
xmm1 using unsigned satura- |
|
|
|
|
tion. |
VEX.NDS.256.66.0F38.WIG 2B /r |
B |
V/V |
AVX2 |
Convert 8 packed signed dou- |
VPACKUSDW ymm1, ymm2, |
|
|
|
bleword integers from ymm2 |
ymm3/m256 |
|
|
|
and 8 packed signed double- |
|
|
|
|
word integers from |
|
|
|
|
ymm3/m128 into 16 packed |
|
|
|
|
unsigned word integers in |
|
|
|
|
ymm1 using unsigned satura- |
|
|
|
|
tion. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (r, w) |
ModRM:r/m (r) |
NA |
NA |
B |
ModRM:reg (w) |
VEX.vvvv |
ModRM:r/m (r) |
NA |
|
|
|
|
|
Description
Converts packed signed doubleword integers into packed unsigned word integers using unsigned saturation to handle overflow conditions. If the signed doubleword value is beyond the range of an unsigned word (that is, greater than FFFFH or less than 0000H), the saturated unsigned word integer value of FFFFH or 0000H, respectively, is stored in the destination.
5-26 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.
VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.
128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (255:128) of the corresponding YMM register destination are unmodified.
Operation
PACKUSDW (Legacy SSE instruction)
TMP[15:0] (DEST[31:0] < 0) ? 0 : DEST[15:0];
DEST[15:0] (DEST[31:0] > FFFFH) ? FFFFH : TMP[15:0] ;
TMP[31:16] (DEST[63:32] < 0) ? 0 : DEST[47:32];
DEST[31:16] (DEST[63:32] > FFFFH) ? FFFFH : TMP[31:16] ;
TMP[47:32] (DEST[95:64] < 0) ? 0 : DEST[79:64];
DEST[47:32] (DEST[95:64] > FFFFH) ? FFFFH : TMP[47:32] ;
TMP[63:48] (DEST[127:96] < 0) ? 0 : DEST[111:96];
DEST[63:48] (DEST[127:96] > FFFFH) ? FFFFH : TMP[63:48] ;
TMP[79:64] (SRC[31:0] < 0) ? 0 : SRC[15:0];
DEST[63:48] (SRC[31:0] > FFFFH) ? FFFFH : TMP[79:64] ;
TMP[95:80] (SRC[63:32] < 0) ? 0 : SRC[47:32];
DEST[95:80] (SRC[63:32] > FFFFH) ? FFFFH : TMP[95:80] ;
TMP[111:96] (SRC[95:64] < 0) ? 0 : SRC[79:64];
DEST[111:96] (SRC[95:64] > FFFFH) ? FFFFH : TMP[111:96] ;
TMP[127:112] (SRC[127:96] < 0) ? 0 : SRC[111:96];
DEST[127:112] (SRC[127:96] > FFFFH) ? FFFFH : TMP[127:112] ;
PACKUSDW (VEX.128 encoded version)
TMP[15:0] (SRC1[31:0] < 0) ? 0 : SRC1[15:0];
DEST[15:0] (SRC1[31:0] > FFFFH) ? FFFFH : TMP[15:0] ;
TMP[31:16] (SRC1[63:32] < 0) ? 0 : SRC1[47:32];
DEST[31:16] (SRC1[63:32] > FFFFH) ? FFFFH : TMP[31:16] ;
TMP[47:32] (SRC1[95:64] < 0) ? 0 : SRC1[79:64];
DEST[47:32] (SRC1[95:64] > FFFFH) ? FFFFH : TMP[47:32] ;
TMP[63:48] (SRC1[127:96] < 0) ? 0 : SRC1[111:96];
DEST[63:48] (SRC1[127:96] > FFFFH) ? FFFFH : TMP[63:48] ;
TMP[79:64] (SRC2[31:0] < 0) ? 0 : SRC2[15:0];
DEST[63:48] (SRC2[31:0] > FFFFH) ? FFFFH : TMP[79:64] ;
TMP[95:80] (SRC2[63:32] < 0) ? 0 : SRC2[47:32];
Ref. # 319433-011 |
5-27 |
INSTRUCTION SET REFERENCE
DEST[95:80] (SRC2[63:32] > FFFFH) ? FFFFH : TMP[95:80] ; TMP[111:96] (SRC2[95:64] < 0) ? 0 : SRC2[79:64]; DEST[111:96] (SRC2[95:64] > FFFFH) ? FFFFH : TMP[111:96] ; TMP[127:112] (SRC2[127:96] < 0) ? 0 : SRC2[111:96];
DEST[127:112] (SRC2[127:96] > FFFFH) ? FFFFH : TMP[127:112]; DEST[VLMAX:128] 0;
VPACKUSDW (VEX.256 encoded version)
TMP[15:0] (SRC1[31:0] < 0) ? 0 : SRC1[15:0]; DEST[15:0] (SRC1[31:0] > FFFFH) ? FFFFH : TMP[15:0] ; TMP[31:16] (SRC1[63:32] < 0) ? 0 : SRC1[47:32];
DEST[31:16] (SRC1[63:32] > FFFFH) ? FFFFH : TMP[31:16] ; TMP[47:32] (SRC1[95:64] < 0) ? 0 : SRC1[79:64]; DEST[47:32] (SRC1[95:64] > FFFFH) ? FFFFH : TMP[47:32] ; TMP[63:48] (SRC1[127:96] < 0) ? 0 : SRC1[111:96]; DEST[63:48] (SRC1[127:96] > FFFFH) ? FFFFH : TMP[63:48] ; TMP[79:64] (SRC2[31:0] < 0) ? 0 : SRC2[15:0];
DEST[63:48] (SRC2[31:0] > FFFFH) ? FFFFH : TMP[79:64] ; TMP[95:80] (SRC2[63:32] < 0) ? 0 : SRC2[47:32]; DEST[95:80] (SRC2[63:32] > FFFFH) ? FFFFH : TMP[95:80] ; TMP[111:96] (SRC2[95:64] < 0) ? 0 : SRC2[79:64]; DEST[111:96] (SRC2[95:64] > FFFFH) ? FFFFH : TMP[111:96] ; TMP[127:112] (SRC2[127:96] < 0) ? 0 : SRC2[111:96];
DEST[128:112] (SRC2[127:96] > FFFFH) ? FFFFH : TMP[127:112] ; TMP[143:128] (SRC1[159:128] < 0) ? 0 : SRC1[143:128]; DEST[143:128] (SRC1[159:128] > FFFFH) ? FFFFH : TMP[143:128] ; TMP[159:144] (SRC1[191:160] < 0) ? 0 : SRC1[175:160]; DEST[159:144] (SRC1[191:160] > FFFFH) ? FFFFH : TMP[159:144] ; TMP[175:160] (SRC1[223:192] < 0) ? 0 : SRC1[207:192]; DEST[175:160] (SRC1[223:192] > FFFFH) ? FFFFH : TMP[175:160] ; TMP[191:176] (SRC1[255:224] < 0) ? 0 : SRC1[239:224]; DEST[191:176] (SRC1[255:224] > FFFFH) ? FFFFH : TMP[191:176] ; TMP[207:192] (SRC2[159:128] < 0) ? 0 : SRC2[143:128]; DEST[207:192] (SRC2[159:128] > FFFFH) ? FFFFH : TMP[207:192] ; TMP[223:208] (SRC2[191:160] < 0) ? 0 : SRC2[175:160]; DEST[223:208] (SRC2[191:160] > FFFFH) ? FFFFH : TMP[223:208] ; TMP[239:224] (SRC2[223:192] < 0) ? 0 : SRC2[207:192]; DEST[239:224] (SRC2[223:192] > FFFFH) ? FFFFH : TMP[239:224] ; TMP[255:240] (SRC2[255:224] < 0) ? 0 : SRC2[239:224]; DEST[255:240] (SRC2[255:224] > FFFFH) ? FFFFH : TMP[255:240] ;
Intel C/C++ Compiler Intrinsic Equivalent
(V)PACKUSDW__m128i _mm_packus_epi32(__m128i m1, __m128i m2);
5-28 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
VPACKUSDW__m256i _mm256_packus_epi32(__m256i m1, __m256i m2);
SIMD Floating-Point Exceptions
None
Other Exceptions
See Exceptions Type 4
Ref. # 319433-011 |
5-29 |
INSTRUCTION SET REFERENCE
PACKUSWB — Pack with Unsigned Saturation
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F 67 /r |
A |
V/V |
SSE2 |
Converts 8 signed word integers |
PACKUSWB xmm1, xmm2/m128 |
|
|
|
from xmm1 and 8 signed word |
|
|
|
|
integers from xmm2/m128 into |
|
|
|
|
16 unsigned byte integers in |
|
|
|
|
xmm1 using unsigned satura- |
|
|
|
|
tion. |
VEX.NDS.128.66.0F.WIG 67 /r |
B |
V/V |
AVX |
Converts 8 signed word integers |
VPACKUSWB xmm1,xmm2, |
|
|
|
from xmm2 and 8 signed word |
xmm3/m128 |
|
|
|
integers from xmm3/m128 into |
|
|
|
|
16 unsigned byte integers in |
|
|
|
|
xmm1 using unsigned satura- |
|
|
|
|
tion. |
VEX.NDS.256.66.0F.WIG 67 /r |
B |
V/V |
AVX2 |
Converts 16 signed word inte- |
VPACKUSWB ymm1, ymm2, |
|
|
|
gers from ymm2 and 16signed |
ymm3/m256 |
|
|
|
word integers from |
|
|
|
|
ymm3/m256 into 32 unsigned |
|
|
|
|
byte integers in ymm1 using |
|
|
|
|
unsigned saturation. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (r, w) |
ModRM:r/m (r) |
NA |
NA |
B |
ModRM:reg (w) |
VEX.vvvv |
ModRM:r/m (r) |
NA |
|
|
|
|
|
Description
Converts 8 or 16 signed word integers from the first source operand and 8 or 16 signed word integers from the second source operand into 16 or 32 unsigned byte integers and stores the result in the destination operand. If a signed word integer value is beyond the range of an unsigned byte integer (that is, greater than FFH or less than 00H), the saturated unsigned byte integer value of FFH or 00H, respectively, is stored in the destination.
VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.
VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination
5-30 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
operand is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.
128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (255:128) of the corresponding YMM register destination are unmodified.
Operation
PACKUSWB (Legacy SSE instruction)
DEST[7:0]SaturateSignedWordToUnsignedByte (DEST[15:0]); DEST[15:8] SaturateSignedWordToUnsignedByte (DEST[31:16]); DEST[23:16] SaturateSignedWordToUnsignedByte (DEST[47:32]); DEST[31:24] SaturateSignedWordToUnsignedByte (DEST[63:48]); DEST[39:32] SaturateSignedWordToUnsignedByte (DEST[79:64]); DEST[47:40] SaturateSignedWordToUnsignedByte (DEST[95:80]); DEST[55:48] SaturateSignedWordToUnsignedByte (DEST[111:96]); DEST[63:56] SaturateSignedWordToUnsignedByte (DEST[127:112]); DEST[71:64] SaturateSignedWordToUnsignedByte (SRC[15:0]); DEST[79:72] SaturateSignedWordToUnsignedByte (SRC[31:16]); DEST[87:80] SaturateSignedWordToUnsignedByte (SRC[47:32]); DEST[95:88] SaturateSignedWordToUnsignedByte (SRC[63:48]); DEST[103:96] SaturateSignedWordToUnsignedByte (SRC[79:64]); DEST[111:104] SaturateSignedWordToUnsignedByte (SRC[95:80]); DEST[119:112] SaturateSignedWordToUnsignedByte (SRC[111:96]); DEST[127:120] SaturateSignedWordToUnsignedByte (SRC[127:112]);
PACKUSWB (VEX.128 encoded version)
DEST[7:0] SaturateSignedWordToUnsignedByte (SRC1[15:0]); DEST[15:8] SaturateSignedWordToUnsignedByte (SRC1[31:16]); DEST[23:16] SaturateSignedWordToUnsignedByte (SRC1[47:32]); DEST[31:24] SaturateSignedWordToUnsignedByte (SRC1[63:48]); DEST[39:32] SaturateSignedWordToUnsignedByte (SRC1[79:64]); DEST[47:40] SaturateSignedWordToUnsignedByte (SRC1[95:80]); DEST[55:48] SaturateSignedWordToUnsignedByte (SRC1[111:96]); DEST[63:56] SaturateSignedWordToUnsignedByte (SRC1[127:112]); DEST[71:64] SaturateSignedWordToUnsignedByte (SRC2[15:0]); DEST[79:72] SaturateSignedWordToUnsignedByte (SRC2[31:16]); DEST[87:80] SaturateSignedWordToUnsignedByte (SRC2[47:32]); DEST[95:88] SaturateSignedWordToUnsignedByte (SRC2[63:48]); DEST[103:96] SaturateSignedWordToUnsignedByte (SRC2[79:64]); DEST[111:104] SaturateSignedWordToUnsignedByte (SRC2[95:80]); DEST[119:112] SaturateSignedWordToUnsignedByte (SRC2[111:96]); DEST[127:120] SaturateSignedWordToUnsignedByte (SRC2[127:112]);
Ref. # 319433-011 |
5-31 |
INSTRUCTION SET REFERENCE
DEST[VLMAX:128] 0;
VPACKUSWB (VEX.256 encoded version)
DEST[7:0] SaturateSignedWordToUnsignedByte (SRC1[15:0]); DEST[15:8] SaturateSignedWordToUnsignedByte (SRC1[31:16]); DEST[23:16] SaturateSignedWordToUnsignedByte (SRC1[47:32]); DEST[31:24] SaturateSignedWordToUnsignedByte (SRC1[63:48]); DEST[39:32] SaturateSignedWordToUnsignedByte (SRC1[79:64]); DEST[47:40] SaturateSignedWordToUnsignedByte (SRC1[95:80]); DEST[55:48] SaturateSignedWordToUnsignedByte (SRC1[111:96]); DEST[63:56] SaturateSignedWordToUnsignedByte (SRC1[127:112]); DEST[71:64] SaturateSignedWordToUnsignedByte (SRC2[15:0]); DEST[79:72] SaturateSignedWordToUnsignedByte (SRC2[31:16]); DEST[87:80] SaturateSignedWordToUnsignedByte (SRC2[47:32]); DEST[95:88] SaturateSignedWordToUnsignedByte (SRC2[63:48]); DEST[103:96] SaturateSignedWordToUnsignedByte (SRC2[79:64]); DEST[111:104] SaturateSignedWordToUnsignedByte (SRC2[95:80]); DEST[119:112] SaturateSignedWordToUnsignedByte (SRC2[111:96]); DEST[127:120] SaturateSignedWordToUnsignedByte (SRC2[127:112]); DEST[135:128] SaturateSignedWordToUnsignedByte (SRC1[143:128]); DEST[143:136] SaturateSignedWordToUnsignedByte (SRC1[159:144]); DEST[151:144] SaturateSignedWordToUnsignedByte (SRC1[175:160]); DEST[159:152] SaturateSignedWordToUnsignedByte (SRC1[191:176]); DEST[167:160] SaturateSignedWordToUnsignedByte (SRC1[207:192]); DEST[175:168] SaturateSignedWordToUnsignedByte (SRC1[223:208]); DEST[183:176] SaturateSignedWordToUnsignedByte (SRC1[239:224]); DEST[191:184] SaturateSignedWordToUnsignedByte (SRC1[255:240]); DEST[199:192] SaturateSignedWordToUnsignedByte (SRC2[143:128]); DEST[207:200] SaturateSignedWordToUnsignedByte (SRC2[159:144]); DEST[215:208] SaturateSignedWordToUnsignedByte (SRC2[175:160]); DEST[223:216] SaturateSignedWordToUnsignedByte (SRC2[191:176]); DEST[231:224] SaturateSignedWordToUnsignedByte (SRC2[207:192]); DEST[239:232] SaturateSignedWordToUnsignedByte (SRC2[223:208]); DEST[247:240] SaturateSignedWordToUnsignedByte (SRC2[239:224]); DEST[255:248] SaturateSignedWordToUnsignedByte (SRC2[255:240]);
Intel C/C++ Compiler Intrinsic Equivalent
(V)PACKUSWB__m128i _mm_packus_epi16(__m128i m1, __m128i m2);
VPACKUSWB__m256i _mm256_packus_epi16(__m256i m1, __m256i m2);
SIMD Floating-Point Exceptions
None
5-32 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
Other Exceptions
See Exceptions Type 4
Ref. # 319433-011 |
5-33 |
INSTRUCTION SET REFERENCE
PADDB/PADDW/PADDD/PADDQ — Add Packed Integers
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F FC /r |
A |
V/V |
SSE2 |
Add packed byte integers from |
PADDB xmm1, xmm2/m128 |
|
|
|
xmm2/m128 and xmm1. |
66 0F FD /r |
A |
V/V |
SSE2 |
Add packed word integers from |
PADDW xmm1, xmm2/m128 |
|
|
|
xmm2/m128 and xmm1. |
66 0F FE /r |
A |
V/V |
SSE2 |
Add packed doubleword integers |
PADDD xmm1, xmm2/m128 |
|
|
|
from xmm2/m128 and xmm1. |
66 0F D4/r |
A |
V/V |
SSE2 |
Add packed quadword integers |
PADDQ xmm1, xmm2/m128 |
|
|
|
from xmm2/m128 and xmm1. |
VEX.NDS.128.66.0F.WIG FC /r |
B |
V/V |
AVX |
Add packed byte integers from |
VPADDB xmm1, xmm2, |
|
|
|
xmm2, and xmm3/m128 and |
xmm3/m128 |
|
|
|
store in xmm1. |
VEX.NDS.128.66.0F.WIG FD /r |
B |
V/V |
AVX |
Add packed word integers from |
VPADDW xmm1, xmm2, |
|
|
|
xmm2, xmm3/m128 and store in |
xmm3/m128 |
|
|
|
xmm1. |
VEX.NDS.128.66.0F.WIG FE /r |
B |
V/V |
AVX |
Add packed doubleword integers |
VPADDD xmm1, xmm2, |
|
|
|
from xmm2, xmm3/m128 and |
xmm3/m128 |
|
|
|
store in xmm1. |
VEX.NDS.128.66.0F.WIG D4 /r |
B |
V/V |
AVX |
Add packed quadword integers |
VPADDQ xmm1, xmm2, |
|
|
|
from xmm2, xmm3/m128 and |
xmm3/m128 |
|
|
|
store in xmm1. |
VEX.NDS.256.66.0F.WIG FC /r |
B |
V/V |
AVX2 |
Add packed byte integers from |
VPADDB ymm1, ymm2, |
|
|
|
ymm2, and ymm3/m256 and |
ymm3/m256 |
|
|
|
store in xmm1. |
VEX.NDS.256.66.0F.WIG FD /r |
B |
V/V |
AVX2 |
Add packed word integers from |
VPADDW ymm1, ymm2, |
|
|
|
ymm2, ymm3/m256 and store in |
ymm3/m256 |
|
|
|
ymm1. |
|
|
|
|
|
5-34 |
Ref. # 319433-011 |
|
|
|
|
INSTRUCTION SET REFERENCE |
|
|
|
|
|
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
VEX.NDS.256.66.0F.WIG FE /r |
B |
V/V |
AVX2 |
Add packed doubleword integers |
VPADDD ymm1, ymm2, |
|
|
|
from ymm2, ymm3/m256 and |
ymm3/m256 |
|
|
|
store in ymm1. |
VEX.NDS.256.66.0F.WIG D4 /r |
B |
V/V |
AVX2 |
Add packed quadword integers |
VPADDQ ymm1, ymm2, |
|
|
|
from ymm2, ymm3/m256 and |
ymm3/m256 |
|
|
|
store in ymm1. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (r, w) |
ModRM:r/m (r) |
NA |
NA |
B |
ModRM:reg (w) |
VEX.vvvv |
ModRM:r/m (r) |
NA |
|
|
|
|
|
Description
The PADDB and VPADDB instructions add packed byte integers from the first source operand and second source operand and store the packed integer result in destination operand. When an individual result is too large to be represented in 8 bits (overflow), the result is wrapped around and the low 8 bits are written to the destination operand (that is, the carry is ignored).
The PADDW and VPADDW instructions add packed word integers from the first source operand and second source operand and store the packed integer result in destination operand. When an individual result is too large to be represented in 16 bits (overflow), the result is wrapped around and the low 16 bits are written to the destination operand.
The PADDD and VPADDD instructions add packed doubleword integers from the first source operand and second source operand and store the packed integer result in destination operand. When an individual result is too large to be represented in 32 bits (overflow), the result is wrapped around and the low 32 bits are written to the destination operand.
The PADDQ and VPADDQ instructions add packed quadword integers from the first source operand and second source operand and store the packed integer result in destination operand. When a quadword result is too large to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination element (that is, the carry is ignored).
Note that the (V)PADDB, (V)PADDW, (V)PADDD and (V)PADDQ instructions can operate on either unsigned or signed (two's complement notation) packed integers; however, it does not set bits in the EFLAGS register to indicate overflow and/or a
Ref. # 319433-011 |
5-35 |
INSTRUCTION SET REFERENCE
carry. To prevent undetected overflow conditions, software must control the ranges of values operated on.
VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.
VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.
128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (255:128) of the corresponding YMM register destination are unmodified.
Operation
PADDB (Legacy SSE instruction)
DEST[7:0] DEST[7:0] + SRC[7:0];
(* Repeat add operation for 2nd through 14th byte *) DEST[127:120] DEST[127:120] + SRC[127:120];
PADDW (Legacy SSE instruction)
DEST[15:0] DEST[15:0] + SRC[15:0];
(* Repeat add operation for 2nd through 7th word *) DEST[127:112] DEST[127:112] + SRC[127:112];
PADDD (Legacy SSE instruction)
DEST[31:0] DEST[31:0] + SRC[31:0];
(* Repeat add operation for 2nd and 3th doubleword *) DEST[127:96] DEST[127:96] + SRC[127:96];
PADDQ (Legacy SSE instruction)
DEST[63:0] DEST[63:0] + SRC[63:0];
DEST[127:64] DEST[127:64] + SRC[127:64];
VPADDB (VEX.128 encoded instruction)
DEST[7:0] SRC1[7:0] + SRC2[7:0];
(* Repeat add operation for 2nd through 14th byte *) DEST[127:120] SRC1[127:120] + SRC2[127:120]; DEST[VLMAX:128] 0;
VPADDW (VEX.128 encoded instruction)
DEST[15:0] SRC1[15:0] + SRC2[15:0];
(* Repeat add operation for 2nd through 7th word *)
5-36 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
DEST[127:112] SRC1[127:112] + SRC2[127:112];
DEST[VLMAX:128] 0;
VPADDD (VEX.128 encoded instruction)
DEST[31:0] SRC1[31:0] + SRC2[31:0];
(* Repeat add operation for 2nd and 3th doubleword *) DEST[127:96] SRC1[127:96] + SRC2[127:96]; DEST[VLMAX:128] 0;
VPADDQ (VEX.128 encoded instruction)
DEST[63:0] SRC1[63:0] + SRC2[63:0];
DEST[127:64] SRC1[127:64] + SRC2[127:64];
DEST[VLMAX:128] 0;
VPADDB (VEX.256 encoded instruction)
DEST[7:0] SRC1[7:0] + SRC2[7:0];
(* Repeat add operation for 2nd through 31th byte *) DEST[255:248] SRC1[255:248] + SRC2[255:248];
VPADDW (VEX.256 encoded instruction)
DEST[15:0] SRC1[15:0] + SRC2[15:0];
(* Repeat add operation for 2nd through 15th word *) DEST[255:240] SRC1[255:240] + SRC2[255:240];
VPADDD (VEX.256 encoded instruction)
DEST[31:0] SRC1[31:0] + SRC2[31:0];
(* Repeat add operation for 2nd and 7th doubleword *) DEST[255:224] SRC1[255:224] + SRC2[255:224];
VPADDQ (VEX.256 encoded instruction)
DEST[63:0] SRC1[63:0] + SRC2[63:0];
DEST[127:64] SRC1[127:64] + SRC2[127:64];
DEST[191:128] SRC1[191:128] + SRC2[191:128];
DEST[255:192] SRC1[255:192] + SRC2[255:192];
Intel C/C++ Compiler Intrinsic Equivalent
(V)PADDB__m128i _mm_add_epi8 (__m128ia,__m128i b )
(V)PADDW__m128i _mm_add_epi16 ( __m128i a, __m128i b)
(V)PADDD__m128i _mm_add_epi32 ( __m128i a, __m128i b)
(V)PADDQ__m128i _mm_add_epi64 ( __m128i a, __m128i b)
Ref. # 319433-011 |
5-37 |
INSTRUCTION SET REFERENCE
VPADDB__m256i _mm256_add_epi8 (__m256ia,__m256i b ) VPADDW__m256i _mm256_add_epi16 ( __m256i a, __m256i b) VPADDD__m256i _mm256_add_epi32 ( __m256i a, __m256i b) VPADDQ__m256i _mm256_add_epi64 ( __m256i a, __m256i b)
SIMD Floating-Point Exceptions
None
Other Exceptions
See Exceptions Type 4
5-38 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
PADDSB/PADDSW — Add Packed Signed Integers with Signed Saturation
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F EC /r |
A |
V/V |
SSE2 |
Add packed signed byte integers |
PADDSB xmm1, xmm2/m128 |
|
|
|
from xmm2/m128 and xmm1 |
|
|
|
|
and saturate the results. |
66 0F ED /r |
A |
V/V |
SSE2 |
Add packed signed word integers |
PADDSW xmm1, xmm2/m128 |
|
|
|
from xmm2/m128 and xmm1 |
|
|
|
|
and saturate the results. |
VEX.NDS.128.66.0F.WIG EC /r |
B |
V/V |
AVX |
Add packed signed byte integers |
VPADDSB xmm1, xmm2, |
|
|
|
from xmm2, and xmm3/m128 |
xmm3/m128 |
|
|
|
and store the saturated results |
|
|
|
|
in xmm1. |
VEX.NDS.128.66.0F.WIG ED /r |
B |
V/V |
AVX |
Add packed signed word integers |
VPADDSW xmm1, xmm2, |
|
|
|
from xmm2, and xmm3/m128 |
xmm3/m128 |
|
|
|
and store the saturated results |
|
|
|
|
in xmm1. |
VEX.NDS.256.66.0F.WIG EC /r |
B |
V/V |
AVX2 |
Add packed signed byte integers |
VPADDSB ymm1, ymm2, |
|
|
|
from ymm2, and ymm3/m256 |
ymm3/m256 |
|
|
|
and store the saturated results |
|
|
|
|
in ymm1. |
VEX.NDS.256.66.0F.WIG ED /r |
B |
V/V |
AVX2 |
Add packed signed word integers |
VPADDSW ymm1, ymm2, |
|
|
|
from ymm2, and ymm3/m256 |
ymm3/m256 |
|
|
|
and store the saturated results |
|
|
|
|
in ymm1. |
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (r, w) |
ModRM:r/m (r) |
NA |
NA |
B |
ModRM:reg (w) |
VEX.vvvv |
ModRM:r/m (r) |
NA |
|
|
|
|
|
Description
(V)PADDSB performs a SIMD add of the packed signed integers with saturation from the first source operand and second source operand and stores the packed integer results in the destination operand. When an individual byte result is beyond the
Ref. # 319433-011 |
5-39 |
INSTRUCTION SET REFERENCE
range of a signed byte integer (that is, greater than 7FH or less than 80H), the saturated value of 7FH or 80H, respectively, is written to the destination operand.
(V)PADDSW performs a SIMD add of the packed signed word integers with saturation from the first source operand and second source operand and stores the packed integer results in the destination operand. When an individual word result is beyond the range of a signed word integer (that is, greater than 7FFFH or less than 8000H), the saturated value of 7FFFH or 8000H, respectively, is written to the destination operand.
VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.
VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.
128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (255:128) of the corresponding YMM register destination are unmodified.
Operation
PADDSB (Legacy SSE instruction)
DEST[7:0] SaturateToSignedByte (DEST[7:0] + SRC[7:0]); (* Repeat add operation for 2nd through 14th bytes *)
DEST[127:120] SaturateToSignedByte (DEST[127:120] + SRC[127:120]);
PADDSW (Legacy SSE instruction)
DEST[15:0] SaturateToSignedWord (DEST[15:0] + SRC[15:0]); (* Repeat add operation for 2nd through 7th words *)
DEST[127:112] SaturateToSignedWord (DEST[127:112] + SRC[127:112])
VPADDSB (VEX.128 encoded version)
DEST[7:0] SaturateToSignedByte (SRC1[7:0] + SRC2[7:0]); (* Repeat add operation for 2nd through 14th bytes *)
DEST[127:120] SaturateToSignedByte (SRC1[127:120] + SRC2[127:120]); DEST[VLMAX:128] 0
VPADDSW (VEX.128 encoded version)
DEST[15:0] SaturateToSignedWord (SRC1[15:0] + SRC2[15:0]); (* Repeat add operation for 2nd through 7th words *)
DEST[127:112] SaturateToSignedWord (SRC1[127:112] + SRC2[127:112]) DEST[VLMAX:128] 0
5-40 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
VPADDSB (VEX.256 encoded version)
DEST[7:0] SaturateToSignedByte (SRC1[7:0] + SRC2[7:0]); (* Repeat add operation for 2nd through 31st bytes *)
DEST[255:248] SaturateToSignedByte (SRC1[255:248] + SRC2[255:248]);
VPADDSW (VEX.256 encoded version)
DEST[15:0] SaturateToSignedWord (SRC1[15:0] + SRC2[15:0]); (* Repeat add operation for 2nd through 15th words *)
DEST[255:240] SaturateToSignedWord (SRC1[255:240] + SRC2[255:240])
Intel C/C++ Compiler Intrinsic Equivalent
PADDSB |
__m128i _mm_adds_epi8 ( __m128i a, __m128i b) |
PADDSW |
__m128i _mm_adds_epi16 ( __m128i a, __m128i b) |
VPADDSB |
__m128i _mm_adds_epi8 ( __m128i a, __m128i b) |
VPADDSW |
__m128i _mm_adds_epi16 ( __m128i a, __m128i b) |
VPADDSB |
__m256i _mm256_adds_epi8 ( __m256i a, __m256i b) |
VPADDSW __m256i _mm256_adds_epi16 ( __m256i a, __m256i b)
SIMD Floating-Point Exceptions
None
Ref. # 319433-011 |
5-41 |
INSTRUCTION SET REFERENCE
PADDUSB/PADDUSW — Add Packed Unsigned Integers with Unsigned Saturation
Opcode/ |
Op/ |
64/32 |
CPUID |
Description |
Instruction |
En |
-bit |
Feature |
|
|
|
Mode |
Flag |
|
66 0F DC /r |
A |
V/V |
SSE2 |
Add packed unsigned byte inte- |
PADDUSB xmm1, xmm2/m128 |
|
|
|
gers from xmm2/m128 and |
|
|
|
|
xmm1 and saturate the results. |
66 0F DD /r |
A |
V/V |
SSE2 |
Add packed signed word integers |
PADDUSW xmm1, xmm2/m128 |
|
|
|
from xmm2/m128 and xmm1 |
|
|
|
|
and saturate the results. |
VEX.NDS.128.66.0F.WIG DC /r |
B |
V/V |
AVX |
Add packed unsigned byte inte- |
VPADDUSB xmm1, xmm2, |
|
|
|
gers from xmm2, and |
xmm3/m128 |
|
|
|
xmm3/m128 and store the satu- |
|
|
|
|
rated results in xmm1. |
VEX.NDS.128.66.0F.WIG DD /r |
B |
V/V |
AVX |
Add packed unsigned word inte- |
VPADDUSW xmm1, xmm2, |
|
|
|
gers from xmm2, and |
xmm3/m128 |
|
|
|
xmm3/m128 and store the satu- |
|
|
|
|
rated results in xmm1. |
VEX.NDS.256.66.0F.WIG DC /r |
B |
V/V |
AVX2 |
Add packed unsigned byte inte- |
VPADDUSB ymm1, ymm2, |
|
|
|
gers from ymm2, and |
ymm3/m256 |
|
|
|
ymm3/m256 and store the satu- |
|
|
|
|
rated results in ymm1. |
VEX.NDS.256.66.0F.WIG DD /r |
B |
V/V |
AVX2 |
Add packed unsigned word inte- |
VPADDUSW ymm1, ymm2, |
|
|
|
gers from ymm2, and |
ymm3/m256 |
|
|
|
ymm3/m256 and store the satu- |
|
|
|
|
rated results in ymm1. |
|
|
|
|
|
Instruction Operand Encoding
Op/En |
Operand 1 |
Operand 2 |
Operand 3 |
Operand 4 |
A |
ModRM:reg (r, w) |
ModRM:r/m (r) |
NA |
NA |
B |
ModRM:reg (w) |
VEX.vvvv |
ModRM:r/m (r) |
NA |
|
|
|
|
|
Description
(V)PADDUSB performs a SIMD add of the packed unsigned integers with saturation from the first source operand and second source operand and stores the packed integer results in the destination operand. When an individual byte result is beyond
5-42 |
Ref. # 319433-011 |
INSTRUCTION SET REFERENCE
the range of an unsigned byte integer (that is, greater than FFH), the saturated value of FFH is written to the destination operand.
(V)PADDUSW performs a SIMD add of the packed unsigned word integers with saturation from the first source operand and second source operand and stores the packed integer results in the destination operand. When an individual word result is beyond the range of an unsigned word integer (that is, greater than FFFFH), the saturated value of FFFFH is written to the destination operand.
VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register or a 256-bit memory location. The destination operand is a YMM register.
VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM register or 128-bit memory location. The destination operand is an XMM register. The upper bits (255:128) of the corresponding YMM register destination are zeroed.
128-bit Legacy SSE version: The first source operand is an XMM register. The second operand can be an XMM register or a 128-bit memory location. The destination is not distinct from the first source XMM register and the upper bits (255:128) of the corresponding YMM register destination are unmodified.
Operation
PADDUSB (Legacy SSE instruction)
DEST[7:0] SaturateToUnsignedByte (DEST[7:0] + SRC[7:0]); (* Repeat add operation for 2nd through 14th bytes *)
DEST[127:120] SaturateToUnsignedByte (DEST[127:120] + SRC[127:120]);
PADDUSW (Legacy SSE instruction)
DEST[15:0] SaturateToUnsignedWord (DEST[15:0] + SRC[15:0]); (* Repeat add operation for 2nd through 7th words *)
DEST[127:112] SaturateToUnsignedWord (DEST[127:112] + SRC[127:112])
VPADDUSB (VEX.128 encoded version)
DEST[7:0] SaturateToUnsignedByte (SRC1[7:0] + SRC2[7:0]); (* Repeat add operation for 2nd through 14th bytes *)
DEST[127:120] SaturateToUnsignedByte (SRC1[127:120] + SRC2[127:120]); DEST[VLMAX:128] 0
VPADDUSW (VEX.128 encoded version)
DEST[15:0] SaturateToUnsignedWord (SRC1[15:0] + SRC2[15:0]); (* Repeat add operation for 2nd through 7th words *)
DEST[127:112] SaturateToUnsignedWord (SRC1[127:112] + SRC2[127:112]) DEST[VLMAX:128] 0
VPADDUSB (VEX.256 encoded version)
Ref. # 319433-011 |
5-43 |
INSTRUCTION SET REFERENCE
DEST[7:0] SaturateToUnsignedByte (SRC1[7:0] + SRC2[7:0]); (* Repeat add operation for 2nd through 31st bytes *)
DEST[255:248] SaturateToUnsignedByte (SRC1[255:248] + SRC2[255:248]);
VPADDUSW (VEX.256 encoded version)
DEST[15:0] SaturateToUnsignedWord (SRC1[15:0] + SRC2[15:0]); (* Repeat add operation for 2nd through 15th words *)
DEST[255:240] SaturateToUnsignedWord (SRC1[255:240] + SRC2[255:240])
Intel C/C++ Compiler Intrinsic Equivalent
(V)PADDUSB__m128i _mm_adds_epu8 ( __m128i a, __m128i b) (V)PADDUSW__m128i _mm_adds_epu16 ( __m128i a, __m128i b) VPADDUSB__m256i _mm256_adds_epu8 ( __m256i a, __m256i b) VPADDUSW__m256i _mm256_adds_epu16 ( __m256i a, __m256i b)
SIMD Floating-Point Exceptions
None
Other Exceptions
See Exceptions Type 4
5-44 |
Ref. # 319433-011 |