Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Лаб2012 / 319433-011.pdf
Скачиваний:
27
Добавлен:
02.02.2015
Размер:
2.31 Mб
Скачать

 

 

 

 

 

INSTRUCTION SET REFERENCE

PHADDW/PHADDD - Packed Horizontal Add

 

 

 

 

 

 

 

 

 

 

 

Opcode/

 

Op/

64/32

CPUID

Description

 

 

Instruction

En

-bit

Feature

 

 

 

 

 

Mode

Flag

 

 

 

66 0F 38 01 /r

A

V/V

SSSE3

Add 16-bit signed integers hori-

PHADDW xmm1, xmm2/m128

 

 

 

zontally, pack to xmm1.

66 0F 38 02 /r

A

V/V

SSSE3

Add 32-bit signed integers hori-

PHADDD xmm1, xmm2/m128

 

 

 

zontally, pack to xmm1.

VEX.NDS.128.66.0F38.WIG 01 /r

B

V/V

AVX

Add 16-bit signed integers hori-

VPHADDW xmm1, xmm2,

 

 

 

zontally, pack to xmm1.

xmm3/m128

 

 

 

 

 

 

VEX.NDS.128.66.0F38.WIG 02 /r

B

V/V

AVX

Add 32-bit signed integers hori-

VPHADDD xmm1, xmm2,

 

 

 

zontally, pack to xmm1.

xmm3/m128

 

 

 

 

 

 

VEX.NDS.256.66.0F38.WIG 01 /r

B

V/V

AVX2

Add 16-bit signed integers hori-

VPHADDW ymm1, ymm2,

 

 

 

zontally, pack to ymm1.

ymm3/m256

 

 

 

 

 

 

VEX.NDS.256.66.0F38.WIG 02 /r

B

V/V

AVX2

Add 32-bit signed integers hori-

VPHADDD ymm1, ymm2,

 

 

 

zontally, pack to ymm1.

ymm3/m256

 

 

 

 

 

 

 

 

 

 

 

Instruction Operand Encoding

 

 

Op/En

Operand 1

Operand 2

 

Operand 3

Operand 4

 

A

ModRM:reg (r, w)

ModRM:r/m (r)

 

NA

NA

 

B

ModRM:reg (w)

VEX.vvvv

 

ModRM:r/m (r)

NA

 

 

 

 

 

 

 

 

 

Description

(V)PHADDW adds two adjacent 16-bit signed integers horizontally from the second source operand and the first source operand and packs the 16-bit signed results to the destination operand. (V)PHADDD adds two adjacent 32-bit signed integers horizontally from the second source operand and the first source operand and packs the 32-bit signed results to the destination operand. The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128bit memory location.

Ref. # 319433-011

5-73

INSTRUCTION SET REFERENCE

Legacy SSE instructions: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. In 64-bit mode use the REX prefix to access additional registers.

128-bit Legacy SSE version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: Horizontal addition of two adjacent data elements of the low 16-bytes of the first and second source operands are packed into the low 16bytes of the destination operand. Horizontal addition of two adjacent data elements of the high 16-bytes of the first and second source operands are packed into the high 16-bytes of the destination operand. The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.

SRC2

Y7

 

Y6

Y5

Y4

Y3

 

Y2

Y1

Y0

 

X7

X6

X5

X4

X3

X2

X1

 

X0

SRC1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

S7

S3

 

S3

 

S4

S3

S2

S1

 

S0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Dest

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5-3. 256-bit VPHADDD Instruction Operation

Operation

VPHADDW (VEX.256 encoded version)

DEST[15:0] SRC1[31:16] + SRC1[15:0]

DEST[31:16] SRC1[63:48] + SRC1[47:32]

DEST[47:32] SRC1[95:80] + SRC1[79:64]

DEST[63:48] SRC1[127:112] + SRC1[111:96]

5-74

Ref. # 319433-011

INSTRUCTION SET REFERENCE

DEST[79:64] SRC2[31:16] + SRC2[15:0]

DEST[95:80] SRC2[63:48] + SRC2[47:32]

DEST[111:96] SRC2[95:80] + SRC2[79:64]

DEST[127:112] SRC2[127:112] + SRC2[111:96]

DEST[143:128] SRC1[159:144] + SRC1[143:128]

DEST[159:144] SRC1[191:176] + SRC1[175:160]

DEST[175:160] SRC1[223:208] + SRC1[207:192]

DEST[191:176] SRC1[255:240] + SRC1[239:224]

DEST[207:192] SRC2[127:112] + SRC2[143:128]

DEST[223:208] SRC2[159:144] + SRC2[175:160]

DEST[239:224] SRC2[191:176] + SRC2[207:192]

DEST[255:240] SRC2[223:208] + SRC2[239:224]

VPHADDD (VEX.256 encoded version)

DEST[31-0] SRC1[63-32] + SRC1[31-0]

DEST[63-32] SRC1[127-96] + SRC1[95-64]

DEST[95-64] SRC2[63-32] + SRC2[31-0]

DEST[127-96] SRC2[127-96] + SRC2[95-64]

DEST[159-128] SRC1[191-160] + SRC1[159-128]

DEST[191-160] SRC1[255-224] + SRC1[223-192]

DEST[223-192] SRC2[191-160] + SRC2[159-128]

DEST[255-224] SRC2[255-224] + SRC2[223-192]

VPHADDW (VEX.128 encoded version)

DEST[15:0] SRC1[31:16] + SRC1[15:0]

DEST[31:16] SRC1[63:48] + SRC1[47:32]

DEST[47:32] SRC1[95:80] + SRC1[79:64]

DEST[63:48] SRC1[127:112] + SRC1[111:96]

DEST[79:64] SRC2[31:16] + SRC2[15:0]

DEST[95:80] SRC2[63:48] + SRC2[47:32]

DEST[111:96] SRC2[95:80] + SRC2[79:64]

DEST[127:112] SRC2[127:112] + SRC2[111:96]

DEST[VLMAX:128] 0

VPHADDD (VEX.128 encoded version)

DEST[31-0] SRC1[63-32] + SRC1[31-0]

DEST[63-32] SRC1[127-96] + SRC1[95-64]

DEST[95-64] SRC2[63-32] + SRC2[31-0]

DEST[127-96] SRC2[127-96] + SRC2[95-64]

DEST[VLMAX:128] 0

PHADDW (128-bit Legacy SSE version)

DEST[15:0] DEST[31:16] + DEST[15:0]

Ref. # 319433-011

5-75

INSTRUCTION SET REFERENCE

DEST[31:16] DEST[63:48] + DEST[47:32]

DEST[47:32] DEST[95:80] + DEST[79:64]

DEST[63:48] DEST[127:112] + DEST[111:96]

DEST[79:64] SRC[31:16] + SRC[15:0]

DEST[95:80] SRC[63:48] + SRC[47:32]

DEST[111:96] SRC[95:80] + SRC[79:64]

DEST[127:112] SRC[127:112] + SRC[111:96]

DEST[VLMAX:128] (Unmodified)

PHADDD (128-bit Legacy SSE version)

DEST[31-0] DEST[63-32] + DEST[31-0]

DEST[63-32] DEST[127-96] + DEST[95-64]

DEST[95-64] SRC[63-32] + SRC[31-0]

DEST[127-96] SRC[127-96] + SRC[95-64]

DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PHADDW __m128i _mm_hadd_epi16 (__m128i a, __m128i b) (V)PHADDD __m128i _mm_hadd_epi32 (__m128i a, __m128i b) VPHADDW __m256i _mm256_hadd_epi16 (__m256i a, __m256i b) VPHADDD __m256i _mm256_hadd_epi32 (__m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4

5-76

Ref. # 319433-011

Соседние файлы в папке Лаб2012