PHADDSW - Packed Horizontal Add with Saturation

Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный Технический Университет Харьковский Политехнический Институт

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Лаб2012 / 319433-011.pdf

Скачиваний:

Добавлен:

02.02.2015

Размер:

2.31 Mб

Скачать

☆

<<< < Предыдущая 19 20 21 22 23 24 25 26 27 28 29 3031 / 7331 32 33 34 35 36 37 38 39 40 41 42 43 > Следующая >>>

				INSTRUCTION SET REFERENCE
PHADDSW - Packed Horizontal Add with Saturation

Opcode/	Op/	64/32	CPUID	Description
Instruction	En	-bit	Feature
		Mode	Flag
66 0F 38 03 /r	A	V/V	SSSE3	Add 16-bit signed integers hori-
PHADDSW xmm1, xmm2/m128				zontally, pack saturated inte-
				gers to xmm1.
VEX.NDS.128.66.0F38.WIG 03 /r	B	V/V	AVX	Add 16-bit signed integers hori-
VPHADDSW xmm1, xmm2,				zontally, pack saturated inte-
xmm3/m128				gers to xmm1.
VEX.NDS.256.66.0F38.WIG 03 /r	B	V/V	AVX2	Add 16-bit signed integers hori-
VPHADDSW ymm1, ymm2,				zontally, pack saturated inte-
ymm3/m256				gers to ymm1.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
A	ModRM:reg (r, w)	ModRM:r/m (r)	NA	NA
B	ModRM:reg (w)	VEX.vvvv	ModRM:r/m (r)	NA

Description

(V)PHADDSW adds two adjacent signed 16-bit integers horizontally from the second source and first source operands and saturates the signed results; packs the signed, saturated 16-bit results to the destination operand.

128-bit Legacy SSE version: he first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: he first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.

Operation

VPHADDSW (VEX.256 encoded version)

DEST[15:0]= SaturateToSignedWord(SRC1[31:16] + SRC1[15:0])

DEST[31:16] = SaturateToSignedWord(SRC1[63:48] + SRC1[47:32])

DEST[47:32] = SaturateToSignedWord(SRC1[95:80] + SRC1[79:64])

Ref. # 319433-011

5-77

INSTRUCTION SET REFERENCE

DEST[63:48] = SaturateToSignedWord(SRC1[127:112] + SRC1[111:96])

DEST[79:64] = SaturateToSignedWord(SRC2[31:16] + SRC2[15:0])

DEST[95:80] = SaturateToSignedWord(SRC2[63:48] + SRC2[47:32])

DEST[111:96] = SaturateToSignedWord(SRC2[95:80] + SRC2[79:64])

DEST[127:112] = SaturateToSignedWord(SRC2[127:112] + SRC2[111:96])

DEST[143:128]= SaturateToSignedWord(SRC1[159:144] + SRC1[143:128])

DEST[159:144] = SaturateToSignedWord(SRC1[191:176] + SRC1[175:160])

DEST[175:160] = SaturateToSignedWord( SRC1[223:208] + SRC1[207:192])

DEST[191:176] = SaturateToSignedWord(SRC1[255:240] + SRC1[239:224])

DEST[207:192] = SaturateToSignedWord(SRC2[127:112] + SRC2[143:128])

DEST[223:208] = SaturateToSignedWord(SRC2[159:144] + SRC2[175:160])

DEST[239:224] = SaturateToSignedWord(SRC2[191-160] + SRC2[159-128])

DEST[255:240] = SaturateToSignedWord(SRC2[255:240] + SRC2[239:224])

VPHADDSW (VEX.128 encoded version)

DEST[15:0]= SaturateToSignedWord(SRC1[31:16] + SRC1[15:0]) DEST[31:16] = SaturateToSignedWord(SRC1[63:48] + SRC1[47:32]) DEST[47:32] = SaturateToSignedWord(SRC1[95:80] + SRC1[79:64]) DEST[63:48] = SaturateToSignedWord(SRC1[127:112] + SRC1[111:96]) DEST[79:64] = SaturateToSignedWord(SRC2[31:16] + SRC2[15:0]) DEST[95:80] = SaturateToSignedWord(SRC2[63:48] + SRC2[47:32]) DEST[111:96] = SaturateToSignedWord(SRC2[95:80] + SRC2[79:64]) DEST[127:112] = SaturateToSignedWord(SRC2[127:112] + SRC2[111:96]) DEST[VLMAX:128]  0

PHADDSW (128-bit Legacy SSE version)

DEST[15:0]= SaturateToSignedWord(DEST[31:16] + DEST[15:0]) DEST[31:16] = SaturateToSignedWord(DEST[63:48] + DEST[47:32]) DEST[47:32] = SaturateToSignedWord(DEST[95:80] + DEST[79:64]) DEST[63:48] = SaturateToSignedWord(DEST[127:112] + DEST[111:96]) DEST[79:64] = SaturateToSignedWord(SRC[31:16] + SRC[15:0]) DEST[95:80] = SaturateToSignedWord(SRC[63:48] + SRC[47:32]) DEST[111:96] = SaturateToSignedWord(SRC[95:80] + SRC[79:64]) DEST[127:112] = SaturateToSignedWord(SRC[127:112] + SRC[111:96]) DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PHADDSW __m128i _mm_hadds_epi16 (__m128i a, __m128i b)

VPHADDSW __m256i _mm256_hadds_epi16 (__m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

5-78	Ref. # 319433-011

INSTRUCTION SET REFERENCE

Other Exceptions

See Exceptions Type 4

Ref. # 319433-011

5-79

INSTRUCTION SET REFERENCE

PHSUBW/PHSUBD - Packed Horizontal Subtract

Opcode/	Op/	64/32	CPUID	Description
Instruction	En	-bit	Feature
		Mode	Flag
66 0F 38 05 /r	A	V/V	SSSE3	Subtract 16-bit signed integers
PHSUBW xmm1, xmm2/m128				horizontally, pack to xmm1.
66 0F 38 06 /r	A	V/V	SSSE3	Subtract 32-bit signed integers
PHSUBD xmm1, xmm2/m128				horizontally, pack to xmm1.
VEX.NDS.128.66.0F38.WIG 05 /r	B	V/V	AVX	Subtract 16-bit signed integers
VPHSUBW xmm1, xmm2,				horizontally, pack to xmm1.
xmm3/m128
VEX.NDS.128.66.0F38.WIG 06 /r	B	V/V	AVX	Subtract 32-bit signed integers
VPHSUBD xmm1, xmm2,				horizontally, pack to xmm1.
xmm3/m128
VEX.NDS.256.66.0F38.WIG 05 /r	B	V/V	AVX2	Subtract 16-bit signed integers
VPHSUBW ymm1, ymm2,				horizontally, pack to ymm1.
ymm3/m256
VEX.NDS.256.66.0F38.WIG 06 /r	B	V/V	AVX2	Subtract 32-bit signed integers
VPHSUBD ymm1, ymm2,				horizontally, pack to ymm1.
ymm3/m256

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
A	ModRM:reg (r, w)	ModRM:r/m (r)	NA	NA
B	ModRM:reg (w)	VEX.vvvv	ModRM:r/m (r)	NA

Description

(V)PHSUBW performs horizontal subtraction on each adjacent pair of 16-bit signed integers by subtracting the most significant word from the least significant word of each pair in the second source operand and destination operands, and packs the signed 16-bit results to the destination operand. (V)PHSUBD performs horizontal subtraction on each adjacent pair of 32-bit signed integers by subtracting the most significant doubleword from the least significant doubleword of each pair, and packs the signed 32-bit result to the destination operand.

128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory loca-

5-80	Ref. # 319433-011

INSTRUCTION SET REFERENCE

tion. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.

Operation

VPHSUBW (VEX.256 encoded version)

DEST[15:0]  SRC1[15:0] - SRC1[31:16]

DEST[31:16]  SRC1[47:32] - SRC1[63:48]

DEST[47:32]  SRC1[79:64] - SRC1[95:80]

DEST[63:48]  SRC1[111:96] - SRC1[127:112]

DEST[79:64]  SRC2[15:0] - SRC2[31:16]

DEST[95:80]  SRC2[47:32] - SRC2[63:48]

DEST[111:96]  SRC2[79:64] - SRC2[95:80]

DEST[127:112]  SRC2[111:96] - SRC2[127:112]

DEST[143:128]  SRC1[143:128] - SRC1[159:144]

DEST[159:144]  SRC1[175:160] - SRC1[191:176]

DEST[175:160]  SRC1[207:192] - SRC1[223:208]

DEST[191:176]  SRC1[239:224] - SRC1[255:240]

DEST[207:192]  SRC2[143:128] - SRC2[159:144]

DEST[223:208]  SRC2[175:160] - SRC2[191:176]

DEST[239:224]  SRC2[207:192] - SRC2[223:208]

DEST[255:240]  SRC2[239:224] - SRC2[255:240]

VPHSUBD (VEX.256 encoded version)

DEST[31:0]  SRC1[31:0] - SRC1[63:32]

DEST[63:32]  SRC1[95:64] - SRC1[127:96]

DEST[95:64]  SRC2[31:0] - SRC2[63:32]

DEST[127:96]  SRC2[95:64] - SRC2[127:96]

DEST[159:128]  SRC1[159:128] - SRC1[191:160]

DEST[191:160]  SRC1[223:192] - SRC1[255:224]

DEST[223:192]  SRC2[159:128] - SRC2[191:160]

DEST[255:224]  SRC2[223:192] - SRC2[255:224]

VPHSUBW (VEX.128 encoded version)

DEST[15:0]  SRC1[15:0] - SRC1[31:16]

DEST[31:16]  SRC1[47:32] - SRC1[63:48]

DEST[47:32]  SRC1[79:64] - SRC1[95:80]

DEST[63:48]  SRC1[111:96] - SRC1[127:112]

Ref. # 319433-011

5-81

INSTRUCTION SET REFERENCE

DEST[79:64]  SRC2[15:0] - SRC2[31:16] DEST[95:80]  SRC2[47:32] - SRC2[63:48] DEST[111:96]  SRC2[79:64] - SRC2[95:80] DEST[127:112]  SRC2[111:96] - SRC2[127:112] DEST[VLMAX:128]  0

VPHSUBD (VEX.128 encoded version)

DEST[31:0]  SRC1[31:0] - SRC1[63:32]

DEST[63:32]  SRC1[95:64] - SRC1[127:96]

DEST[95:64]  SRC2[31:0] - SRC2[63:32]

DEST[127:96]  SRC2[95:64] - SRC2[127:96]

DEST[VLMAX:128]  0

PHSUBW (128-bit Legacy SSE version)

DEST[15:0]  DEST[15:0] - DEST[31:16]

DEST[31:16]  DEST[47:32] - DEST[63:48]

DEST[47:32]  DEST[79:64] - DEST[95:80]

DEST[63:48]  DEST[111:96] - DEST[127:112]

DEST[79:64]  SRC[15:0] - SRC[31:16]

DEST[95:80]  SRC[47:32] - SRC[63:48]

DEST[111:96]  SRC[79:64] - SRC[95:80]

DEST[127:112]  SRC[111:96] - SRC[127:112]

DEST[VLMAX:128] (Unmodified)

PHSUBD (128-bit Legacy SSE version)

DEST[31:0]  DEST[31:0] - DEST[63:32]

DEST[63:32]  DEST[95:64] - DEST[127:96]

DEST[95:64]  SRC[31:0] - SRC[63:32]

DEST[127:96]  SRC[95:64] - SRC[127:96]

DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PHSUBW __m128i _mm_hsub_epi16 (__m128i a, __m128i b) (V)PHSUBD __m128i _mm_hsub_epi32 (__m128i a, __m128i b) VPHSUBW __m256i _mm256_hsub_epi16 (__m256i a, __m256i b) VPHSUBD __m256i _mm256_hsub_epi32 (__m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

5-82	Ref. # 319433-011

INSTRUCTION SET REFERENCE

Other Exceptions

See Exceptions Type 4

Ref. # 319433-011

5-83

<<< < Предыдущая 19 20 21 22 23 24 25 26 27 28 29 3031 / 7331 32 33 34 35 36 37 38 39 40 41 42 43 > Следующая >>>

Соседние файлы в папке Лаб2012

#
02.02.20153.33 Mб6525366517.pdf
#
02.02.20152.52 Mб25253666.pdf
#
02.02.20152.7 Mб2725366617.pdf
#
02.02.20152.09 Mб31253667.pdf
#
02.02.20152.19 Mб2625366717.pdf
#
02.02.20152.31 Mб27319433-011.pdf
#
02.02.2015162.82 Кб20ЛР1 Виконання арифм операц.doc
#
02.02.201564 Кб21ЛР10 ММХ-розширення.doc
#
02.02.201567.07 Кб22ЛР11 SSE-розширення.doc
#
02.02.201590.62 Кб18ЛР12 Windows-застосування.doc
#
02.02.2015181.25 Кб19ЛР13 Досл_дження коду програм.doc