PHADDW/PHADDD - Packed Horizontal Add

Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный Технический Университет Харьковский Политехнический Институт

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Лаб2012 / 319433-011.pdf

Скачиваний:

Добавлен:

02.02.2015

Размер:

2.31 Mб

Скачать

☆

<<< < Предыдущая 18 19 20 21 22 23 24 25 26 27 28 2930 / 7330 31 32 33 34 35 36 37 38 39 40 41 42 > Следующая >>>

					INSTRUCTION SET REFERENCE
PHADDW/PHADDD - Packed Horizontal Add

Opcode/		Op/	64/32	CPUID	Description
Instruction		En	-bit	Feature
			Mode	Flag
66 0F 38 01 /r		A	V/V	SSSE3	Add 16-bit signed integers hori-
PHADDW xmm1, xmm2/m128					zontally, pack to xmm1.
66 0F 38 02 /r		A	V/V	SSSE3	Add 32-bit signed integers hori-
PHADDD xmm1, xmm2/m128					zontally, pack to xmm1.
VEX.NDS.128.66.0F38.WIG 01 /r		B	V/V	AVX	Add 16-bit signed integers hori-
VPHADDW xmm1, xmm2,					zontally, pack to xmm1.
xmm3/m128
VEX.NDS.128.66.0F38.WIG 02 /r		B	V/V	AVX	Add 32-bit signed integers hori-
VPHADDD xmm1, xmm2,					zontally, pack to xmm1.
xmm3/m128
VEX.NDS.256.66.0F38.WIG 01 /r		B	V/V	AVX2	Add 16-bit signed integers hori-
VPHADDW ymm1, ymm2,					zontally, pack to ymm1.
ymm3/m256
VEX.NDS.256.66.0F38.WIG 02 /r		B	V/V	AVX2	Add 32-bit signed integers hori-
VPHADDD ymm1, ymm2,					zontally, pack to ymm1.
ymm3/m256

	Instruction Operand Encoding
Op/En	Operand 1	Operand 2			Operand 3	Operand 4
A	ModRM:reg (r, w)	ModRM:r/m (r)			NA	NA
B	ModRM:reg (w)	VEX.vvvv			ModRM:r/m (r)	NA

Description

(V)PHADDW adds two adjacent 16-bit signed integers horizontally from the second source operand and the first source operand and packs the 16-bit signed results to the destination operand. (V)PHADDD adds two adjacent 32-bit signed integers horizontally from the second source operand and the first source operand and packs the 32-bit signed results to the destination operand. The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128bit memory location.

Ref. # 319433-011

5-73

INSTRUCTION SET REFERENCE

Legacy SSE instructions: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. In 64-bit mode use the REX prefix to access additional registers.

128-bit Legacy SSE version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The second source operand can be an XMM register or a 128-bit memory location. The first source and destination operands are XMM registers. Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: Horizontal addition of two adjacent data elements of the low 16-bytes of the first and second source operands are packed into the low 16bytes of the destination operand. Horizontal addition of two adjacent data elements of the high 16-bytes of the first and second source operands are packed into the high 16-bytes of the destination operand. The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.

SRC2

SRC1

Dest

Figure 5-3. 256-bit VPHADDD Instruction Operation

Operation

VPHADDW (VEX.256 encoded version)

DEST[15:0]  SRC1[31:16] + SRC1[15:0]

DEST[31:16]  SRC1[63:48] + SRC1[47:32]

DEST[47:32]  SRC1[95:80] + SRC1[79:64]

DEST[63:48]  SRC1[127:112] + SRC1[111:96]

5-74	Ref. # 319433-011

INSTRUCTION SET REFERENCE

DEST[79:64]  SRC2[31:16] + SRC2[15:0]

DEST[95:80]  SRC2[63:48] + SRC2[47:32]

DEST[111:96]  SRC2[95:80] + SRC2[79:64]

DEST[127:112]  SRC2[127:112] + SRC2[111:96]

DEST[143:128]  SRC1[159:144] + SRC1[143:128]

DEST[159:144]  SRC1[191:176] + SRC1[175:160]

DEST[175:160]  SRC1[223:208] + SRC1[207:192]

DEST[191:176]  SRC1[255:240] + SRC1[239:224]

DEST[207:192]  SRC2[127:112] + SRC2[143:128]

DEST[223:208]  SRC2[159:144] + SRC2[175:160]

DEST[239:224]  SRC2[191:176] + SRC2[207:192]

DEST[255:240]  SRC2[223:208] + SRC2[239:224]

VPHADDD (VEX.256 encoded version)

DEST[31-0]  SRC1[63-32] + SRC1[31-0]

DEST[63-32]  SRC1[127-96] + SRC1[95-64]

DEST[95-64]  SRC2[63-32] + SRC2[31-0]

DEST[127-96]  SRC2[127-96] + SRC2[95-64]

DEST[159-128]  SRC1[191-160] + SRC1[159-128]

DEST[191-160]  SRC1[255-224] + SRC1[223-192]

DEST[223-192]  SRC2[191-160] + SRC2[159-128]

DEST[255-224]  SRC2[255-224] + SRC2[223-192]

VPHADDW (VEX.128 encoded version)

DEST[15:0]  SRC1[31:16] + SRC1[15:0]

DEST[31:16]  SRC1[63:48] + SRC1[47:32]

DEST[47:32]  SRC1[95:80] + SRC1[79:64]

DEST[63:48]  SRC1[127:112] + SRC1[111:96]

DEST[79:64]  SRC2[31:16] + SRC2[15:0]

DEST[95:80]  SRC2[63:48] + SRC2[47:32]

DEST[111:96]  SRC2[95:80] + SRC2[79:64]

DEST[127:112]  SRC2[127:112] + SRC2[111:96]

DEST[VLMAX:128]  0

VPHADDD (VEX.128 encoded version)

DEST[31-0]  SRC1[63-32] + SRC1[31-0]

DEST[63-32]  SRC1[127-96] + SRC1[95-64]

DEST[95-64]  SRC2[63-32] + SRC2[31-0]

DEST[127-96]  SRC2[127-96] + SRC2[95-64]

DEST[VLMAX:128]  0

PHADDW (128-bit Legacy SSE version)

DEST[15:0]  DEST[31:16] + DEST[15:0]

Ref. # 319433-011

5-75

INSTRUCTION SET REFERENCE

DEST[31:16]  DEST[63:48] + DEST[47:32]

DEST[47:32]  DEST[95:80] + DEST[79:64]

DEST[63:48]  DEST[127:112] + DEST[111:96]

DEST[79:64]  SRC[31:16] + SRC[15:0]

DEST[95:80]  SRC[63:48] + SRC[47:32]

DEST[111:96]  SRC[95:80] + SRC[79:64]

DEST[127:112]  SRC[127:112] + SRC[111:96]

DEST[VLMAX:128] (Unmodified)

PHADDD (128-bit Legacy SSE version)

DEST[31-0]  DEST[63-32] + DEST[31-0]

DEST[63-32]  DEST[127-96] + DEST[95-64]

DEST[95-64]  SRC[63-32] + SRC[31-0]

DEST[127-96]  SRC[127-96] + SRC[95-64]

DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PHADDW __m128i _mm_hadd_epi16 (__m128i a, __m128i b) (V)PHADDD __m128i _mm_hadd_epi32 (__m128i a, __m128i b) VPHADDW __m256i _mm256_hadd_epi16 (__m256i a, __m256i b) VPHADDD __m256i _mm256_hadd_epi32 (__m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4

5-76	Ref. # 319433-011

<<< < Предыдущая 18 19 20 21 22 23 24 25 26 27 28 2930 / 7330 31 32 33 34 35 36 37 38 39 40 41 42 > Следующая >>>

Соседние файлы в папке Лаб2012

#
02.02.20153.33 Mб7425366517.pdf
#
02.02.20152.52 Mб28253666.pdf
#
02.02.20152.7 Mб3025366617.pdf
#
02.02.20152.09 Mб35253667.pdf
#
02.02.20152.19 Mб2825366717.pdf
#
02.02.20152.31 Mб39319433-011.pdf
#
02.02.2015162.82 Кб22ЛР1 Виконання арифм операц.doc
#
02.02.201564 Кб23ЛР10 ММХ-розширення.doc
#
02.02.201567.07 Кб24ЛР11 SSE-розширення.doc
#
02.02.201590.62 Кб20ЛР12 Windows-застосування.doc
#
02.02.2015181.25 Кб21ЛР13 Досл_дження коду програм.doc