Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Лаб2012 / 319433-011.pdf
Скачиваний:
27
Добавлен:
02.02.2015
Размер:
2.31 Mб
Скачать

 

 

 

 

INSTRUCTION SET REFERENCE

PHADDSW - Packed Horizontal Add with Saturation

 

 

 

 

 

Opcode/

Op/

64/32

CPUID

Description

Instruction

En

-bit

Feature

 

 

 

Mode

Flag

 

66 0F 38 03 /r

A

V/V

SSSE3

Add 16-bit signed integers hori-

PHADDSW xmm1, xmm2/m128

 

 

 

zontally, pack saturated inte-

 

 

 

 

gers to xmm1.

VEX.NDS.128.66.0F38.WIG 03 /r

B

V/V

AVX

Add 16-bit signed integers hori-

VPHADDSW xmm1, xmm2,

 

 

 

zontally, pack saturated inte-

xmm3/m128

 

 

 

gers to xmm1.

VEX.NDS.256.66.0F38.WIG 03 /r

B

V/V

AVX2

Add 16-bit signed integers hori-

VPHADDSW ymm1, ymm2,

 

 

 

zontally, pack saturated inte-

ymm3/m256

 

 

 

gers to ymm1.

 

 

 

 

 

Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

A

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

 

 

 

 

 

Description

(V)PHADDSW adds two adjacent signed 16-bit integers horizontally from the second source and first source operands and saturates the signed results; packs the signed, saturated 16-bit results to the destination operand.

128-bit Legacy SSE version: he first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: he first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.

Operation

VPHADDSW (VEX.256 encoded version)

DEST[15:0]= SaturateToSignedWord(SRC1[31:16] + SRC1[15:0])

DEST[31:16] = SaturateToSignedWord(SRC1[63:48] + SRC1[47:32])

DEST[47:32] = SaturateToSignedWord(SRC1[95:80] + SRC1[79:64])

Ref. # 319433-011

5-77

INSTRUCTION SET REFERENCE

DEST[63:48] = SaturateToSignedWord(SRC1[127:112] + SRC1[111:96])

DEST[79:64] = SaturateToSignedWord(SRC2[31:16] + SRC2[15:0])

DEST[95:80] = SaturateToSignedWord(SRC2[63:48] + SRC2[47:32])

DEST[111:96] = SaturateToSignedWord(SRC2[95:80] + SRC2[79:64])

DEST[127:112] = SaturateToSignedWord(SRC2[127:112] + SRC2[111:96])

DEST[143:128]= SaturateToSignedWord(SRC1[159:144] + SRC1[143:128])

DEST[159:144] = SaturateToSignedWord(SRC1[191:176] + SRC1[175:160])

DEST[175:160] = SaturateToSignedWord( SRC1[223:208] + SRC1[207:192])

DEST[191:176] = SaturateToSignedWord(SRC1[255:240] + SRC1[239:224])

DEST[207:192] = SaturateToSignedWord(SRC2[127:112] + SRC2[143:128])

DEST[223:208] = SaturateToSignedWord(SRC2[159:144] + SRC2[175:160])

DEST[239:224] = SaturateToSignedWord(SRC2[191-160] + SRC2[159-128])

DEST[255:240] = SaturateToSignedWord(SRC2[255:240] + SRC2[239:224])

VPHADDSW (VEX.128 encoded version)

DEST[15:0]= SaturateToSignedWord(SRC1[31:16] + SRC1[15:0]) DEST[31:16] = SaturateToSignedWord(SRC1[63:48] + SRC1[47:32]) DEST[47:32] = SaturateToSignedWord(SRC1[95:80] + SRC1[79:64]) DEST[63:48] = SaturateToSignedWord(SRC1[127:112] + SRC1[111:96]) DEST[79:64] = SaturateToSignedWord(SRC2[31:16] + SRC2[15:0]) DEST[95:80] = SaturateToSignedWord(SRC2[63:48] + SRC2[47:32]) DEST[111:96] = SaturateToSignedWord(SRC2[95:80] + SRC2[79:64]) DEST[127:112] = SaturateToSignedWord(SRC2[127:112] + SRC2[111:96]) DEST[VLMAX:128] 0

PHADDSW (128-bit Legacy SSE version)

DEST[15:0]= SaturateToSignedWord(DEST[31:16] + DEST[15:0]) DEST[31:16] = SaturateToSignedWord(DEST[63:48] + DEST[47:32]) DEST[47:32] = SaturateToSignedWord(DEST[95:80] + DEST[79:64]) DEST[63:48] = SaturateToSignedWord(DEST[127:112] + DEST[111:96]) DEST[79:64] = SaturateToSignedWord(SRC[31:16] + SRC[15:0]) DEST[95:80] = SaturateToSignedWord(SRC[63:48] + SRC[47:32]) DEST[111:96] = SaturateToSignedWord(SRC[95:80] + SRC[79:64]) DEST[127:112] = SaturateToSignedWord(SRC[127:112] + SRC[111:96]) DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PHADDSW __m128i _mm_hadds_epi16 (__m128i a, __m128i b)

VPHADDSW __m256i _mm256_hadds_epi16 (__m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

5-78

Ref. # 319433-011

INSTRUCTION SET REFERENCE

Other Exceptions

See Exceptions Type 4

Ref. # 319433-011

5-79

INSTRUCTION SET REFERENCE

PHSUBW/PHSUBD - Packed Horizontal Subtract

Opcode/

Op/

64/32

CPUID

Description

Instruction

En

-bit

Feature

 

 

 

Mode

Flag

 

66 0F 38 05 /r

A

V/V

SSSE3

Subtract 16-bit signed integers

PHSUBW xmm1, xmm2/m128

 

 

 

horizontally, pack to xmm1.

66 0F 38 06 /r

A

V/V

SSSE3

Subtract 32-bit signed integers

PHSUBD xmm1, xmm2/m128

 

 

 

horizontally, pack to xmm1.

VEX.NDS.128.66.0F38.WIG 05 /r

B

V/V

AVX

Subtract 16-bit signed integers

VPHSUBW xmm1, xmm2,

 

 

 

horizontally, pack to xmm1.

xmm3/m128

 

 

 

 

VEX.NDS.128.66.0F38.WIG 06 /r

B

V/V

AVX

Subtract 32-bit signed integers

VPHSUBD xmm1, xmm2,

 

 

 

horizontally, pack to xmm1.

xmm3/m128

 

 

 

 

VEX.NDS.256.66.0F38.WIG 05 /r

B

V/V

AVX2

Subtract 16-bit signed integers

VPHSUBW ymm1, ymm2,

 

 

 

horizontally, pack to ymm1.

ymm3/m256

 

 

 

 

VEX.NDS.256.66.0F38.WIG 06 /r

B

V/V

AVX2

Subtract 32-bit signed integers

VPHSUBD ymm1, ymm2,

 

 

 

horizontally, pack to ymm1.

ymm3/m256

 

 

 

 

 

 

 

 

 

Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

A

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

 

 

 

 

 

Description

(V)PHSUBW performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the second source operand and destination operands, and packs the signed 16-bit results to the destination operand. (V)PHSUBD performs horizontal subtraction on each adjacent pair of 32-bit signed integers by subtracting the most significant doubleword from the least significant doubleword of each pair, and packs the signed 32-bit result to the destination operand.

128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory loca-

5-80

Ref. # 319433-011

INSTRUCTION SET REFERENCE

tion. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.

Operation

VPHSUBW (VEX.256 encoded version)

DEST[15:0] SRC1[15:0] - SRC1[31:16]

DEST[31:16] SRC1[47:32] - SRC1[63:48]

DEST[47:32] SRC1[79:64] - SRC1[95:80]

DEST[63:48] SRC1[111:96] - SRC1[127:112]

DEST[79:64] SRC2[15:0] - SRC2[31:16]

DEST[95:80] SRC2[47:32] - SRC2[63:48]

DEST[111:96] SRC2[79:64] - SRC2[95:80]

DEST[127:112] SRC2[111:96] - SRC2[127:112]

DEST[143:128] SRC1[143:128] - SRC1[159:144]

DEST[159:144] SRC1[175:160] - SRC1[191:176]

DEST[175:160] SRC1[207:192] - SRC1[223:208]

DEST[191:176] SRC1[239:224] - SRC1[255:240]

DEST[207:192] SRC2[143:128] - SRC2[159:144]

DEST[223:208] SRC2[175:160] - SRC2[191:176]

DEST[239:224] SRC2[207:192] - SRC2[223:208]

DEST[255:240] SRC2[239:224] - SRC2[255:240]

VPHSUBD (VEX.256 encoded version)

DEST[31:0] SRC1[31:0] - SRC1[63:32]

DEST[63:32] SRC1[95:64] - SRC1[127:96]

DEST[95:64] SRC2[31:0] - SRC2[63:32]

DEST[127:96] SRC2[95:64] - SRC2[127:96]

DEST[159:128] SRC1[159:128] - SRC1[191:160]

DEST[191:160] SRC1[223:192] - SRC1[255:224]

DEST[223:192] SRC2[159:128] - SRC2[191:160]

DEST[255:224] SRC2[223:192] - SRC2[255:224]

VPHSUBW (VEX.128 encoded version)

DEST[15:0] SRC1[15:0] - SRC1[31:16]

DEST[31:16] SRC1[47:32] - SRC1[63:48]

DEST[47:32] SRC1[79:64] - SRC1[95:80]

DEST[63:48] SRC1[111:96] - SRC1[127:112]

Ref. # 319433-011

5-81

INSTRUCTION SET REFERENCE

DEST[79:64] SRC2[15:0] - SRC2[31:16] DEST[95:80] SRC2[47:32] - SRC2[63:48] DEST[111:96] SRC2[79:64] - SRC2[95:80] DEST[127:112] SRC2[111:96] - SRC2[127:112] DEST[VLMAX:128] 0

VPHSUBD (VEX.128 encoded version)

DEST[31:0] SRC1[31:0] - SRC1[63:32]

DEST[63:32] SRC1[95:64] - SRC1[127:96]

DEST[95:64] SRC2[31:0] - SRC2[63:32]

DEST[127:96] SRC2[95:64] - SRC2[127:96]

DEST[VLMAX:128] 0

PHSUBW (128-bit Legacy SSE version)

DEST[15:0] DEST[15:0] - DEST[31:16]

DEST[31:16] DEST[47:32] - DEST[63:48]

DEST[47:32] DEST[79:64] - DEST[95:80]

DEST[63:48] DEST[111:96] - DEST[127:112]

DEST[79:64] SRC[15:0] - SRC[31:16]

DEST[95:80] SRC[47:32] - SRC[63:48]

DEST[111:96] SRC[79:64] - SRC[95:80]

DEST[127:112] SRC[111:96] - SRC[127:112]

DEST[VLMAX:128] (Unmodified)

PHSUBD (128-bit Legacy SSE version)

DEST[31:0] DEST[31:0] - DEST[63:32]

DEST[63:32] DEST[95:64] - DEST[127:96]

DEST[95:64] SRC[31:0] - SRC[63:32]

DEST[127:96] SRC[95:64] - SRC[127:96]

DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PHSUBW __m128i _mm_hsub_epi16 (__m128i a, __m128i b) (V)PHSUBD __m128i _mm_hsub_epi32 (__m128i a, __m128i b) VPHSUBW __m256i _mm256_hsub_epi16 (__m256i a, __m256i b) VPHSUBD __m256i _mm256_hsub_epi32 (__m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

5-82

Ref. # 319433-011

INSTRUCTION SET REFERENCE

Other Exceptions

See Exceptions Type 4

Ref. # 319433-011

5-83

Соседние файлы в папке Лаб2012