Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный Технический Университет Харьковский Политехнический Институт

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Лаб2012 / 319433-011.pdf

Скачиваний:

Добавлен:

02.02.2015

Размер:

2.31 Mб

Скачать

☆

<<< < Предыдущая 23 24 25 26 27 28 29 30 31 32 33 3435 / 7335 36 37 38 39 40 41 42 43 44 45 46 47 > Следующая >>>

INSTRUCTION SET REFERENCE

PMULDQ - Multiply Packed Doubleword Integers

Opcode/	Op/	64/32	CPUID	Description
Instruction	En	-bit	Feature
		Mode	Flag
66 0F 38 28 /r	A	V/V	SSE4_1	Multiply packed signed double-
PMULDQ xmm1, xmm2/m128				word integers in xmm1 by
				packed signed doubleword inte-
				gers in xmm2/m128, and store
				the quadword results in xmm1.
VEX.NDS.128.66.0F38.WIG 28 /r	B	V/V	AVX	Multiply packed signed double-
VPMULDQ xmm1, xmm2,				word integers in xmm2 by
xmm3/m128				packed signed doubleword inte-
				gers in xmm3/m128, and store
				the quadword results in xmm1.
VEX.NDS.256.66.0F38.WIG 28 /r	B	V/V	AVX2	Multiply packed signed double-
VPMULDQ ymm1, ymm2,				word integers in ymm2 by
ymm3/m256				packed signed doubleword inte-
				gers in ymm3/m256, and store
				the quadword results in ymm1.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
A	ModRM:reg (r, w)	ModRM:r/m (r)	NA	NA
B	ModRM:reg (w)	VEX.vvvv	ModRM:r/m (r)	NA

Description

Multiplies the first source operand by the second source operand and stores the result in the destination operand.

For PMULDQ and VPMULDQ (VEX.128 encoded version), the second source operand is two packed signed doubleword integers stored in the first (low) and third doublewords of an XMM register or a 128-bit memory location. The first source operand is two packed signed doubleword integers stored in the first and third doublewords of an XMM register. The destination contains two packed signed quadword integers stored in an XMM register. For 128-bit memory operands, 128 bits are fetched from memory, but only the first and third doublewords are used in the computation.

For VPMULDQ (VEX.256 encoded version), the second source operand is four packed signed doubleword integers stored in the first (low), third, fifth and seventh doublewords of an YMM register or a 256-bit memory location. The first source operand is four packed signed doubleword integers stored in the first, third, fifth and seventh doublewords of an XMM register. The destination contains four packed signed quad-

5-126

Ref. # 319433-011

INSTRUCTION SET REFERENCE

word integers stored in an YMM register. For 256-bit memory operands, 256 bits are fetched from memory, but only the first, third, fifth and seventh doublewords are used in the computation.

When a quadword result is too large to be represented in 64 bits (overflow), the result is wrapped around and the low 64 bits are written to the destination element (that is, the carry is ignored).

128-bit Legacy SSE version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The first source and destination operands are XMM registers. The second source operand is an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.

Operation

VPMULDQ (VEX.256 encoded version)

DEST[63:0]  SRC1[31:0] * SRC2[31:0]

DEST[127:64]  SRC1[95:64] * SRC2[95:64]

DEST[191:128]  SRC1[159:128] * SRC2[159:128]

DEST[255:192]  SRC1[223:192] * SRC2[223:192]

VPMULDQ (VEX.128 encoded version)

DEST[63:0]  SRC1[31:0] * SRC2[31:0]

DEST[127:64]  SRC1[95:64] * SRC2[95:64]

DEST[VLMAX:128]  0

PMULDQ (128-bit Legacy SSE version)

DEST[63:0]  DEST[31:0] * SRC[31:0]

DEST[127:64]  DEST[95:64] * SRC[95:64]

DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PMULDQ __m128i _mm_mul_epi32( __m128i a, __m128i b);

VPMULDQ __m256i _mm256_mul_epi32( __m256i a, __m256i b);

SIMD Floating-Point Exceptions

None

Ref. # 319433-011

5-127

INSTRUCTION SET REFERENCE

Other Exceptions

See Exceptions Type 4

5-128

Ref. # 319433-011

INSTRUCTION SET REFERENCE

PMULHRSW - Multiply Packed Unsigned Integers with Round and Scale

Opcode/	Op/	64/32	CPUID	Description
Instruction	En	-bit	Feature
		Mode	Flag
66 0F 38 0B /r	A	V/V	SSSE3	Multiply 16-bit signed words,
PMULHRSW xmm1, xmm2/m128				scale and round signed double-
				words, pack high 16 bits to
				xmm1.
VEX.NDS.128.66.0F38.WIG 0B /r	B	V/V	AVX	Multiply 16-bit signed words,
VPMULHRSW xmm1, xmm2,				scale and round signed double-
xmm3/m128				words, pack high 16 bits to
				xmm1.
VEX.NDS.256.66.0F38.WIG 0B /r	B	V/V	AVX2	Multiply 16-bit signed words,
VPMULHRSW ymm1, ymm2,				scale and round signed double-
ymm3/m256				words, pack high 16 bits to
				ymm1.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
A	ModRM:reg (r, w)	ModRM:r/m (r)	NA	NA
B	ModRM:reg (w)	VEX.vvvv	ModRM:r/m (r)	NA

Description

PMULHRSW multiplies vertically each signed 16-bit integer from the first source operand with the corresponding signed 16-bit integer of the second source operand, producing intermediate, signed 32-bit integers. Each intermediate 32-bit integer is truncated to the 18 most significant bits. Rounding is always performed by adding 1 to the least significant bit of the 18-bit intermediate result. The final result is obtained by selecting the 16 bits immediately to the right of the most significant bit of each 18-bit intermediate result and packed to the destination operand.

VEX.256 encoded version: The second source operand can be an YMM register or a 256-bit memory location. The first source and destination operands are YMM registers.

Ref. # 319433-011

5-129

INSTRUCTION SET REFERENCE

Operation

VPMULHRSW (VEX.256 encoded version)

temp0[31:0]  INT32 ((SRC1[15:0] * SRC2[15:0]) >>14) + 1 temp1[31:0]  INT32 ((SRC1[31:16] * SRC2[31:16]) >>14) + 1 temp2[31:0]  INT32 ((SRC1[47:32] * SRC2[47:32]) >>14) + 1 temp3[31:0]  INT32 ((SRC1[63:48] * SRC2[63:48]) >>14) + 1 temp4[31:0]  INT32 ((SRC1[79:64] * SRC2[79:64]) >>14) + 1 temp5[31:0]  INT32 ((SRC1[95:80] * SRC2[95:80]) >>14) + 1 temp6[31:0]  INT32 ((SRC1[111:96] * SRC2[111:96]) >>14) + 1 temp7[31:0]  INT32 ((SRC1[127:112] * SRC2[127:112) >>14) + 1 temp8[31:0]  INT32 ((SRC1[143:128] * SRC2[143:128]) >>14) + 1 temp9[31:0]  INT32 ((SRC1[159:144] * SRC2[159:144]) >>14) + 1 temp10[31:0]  INT32 ((SRC1[75:160] * SRC2[175:160]) >>14) + 1 temp11[31:0]  INT32 ((SRC1[191:176] * SRC2[191:176]) >>14) + 1 temp12[31:0]  INT32 ((SRC1[207:192] * SRC2[207:192]) >>14) + 1 temp13[31:0]  INT32 ((SRC1[223:208] * SRC2[223:208]) >>14) + 1 temp14[31:0]  INT32 ((SRC1[239:224] * SRC2[239:224]) >>14) + 1 temp15[31:0]  INT32 ((SRC1[255:240] * SRC2[255:240) >>14) + 1

DEST[15:0]  temp0[16:1]

DEST[31:16]  temp1[16:1]

DEST[47:32]  temp2[16:1]

DEST[63:48]  temp3[16:1]

DEST[79:64]  temp4[16:1]

DEST[95:80]  temp5[16:1]

DEST[111:96]  temp6[16:1]

DEST[127:112]  temp7[16:1]

DEST[143:128]  temp8[16:1]

DEST[159:144]  temp9[16:1]

DEST[175:160]  temp10[16:1]

DEST[191:176]  temp11[16:1]

DEST[207:192]  temp12[16:1]

DEST[223:208]  temp13[16:1]

DEST[239:224]  temp14[16:1]

DEST[255:240]  temp15[16:1]

VPMULHRSW (VEX.128 encoded version)

5-130

Ref. # 319433-011

INSTRUCTION SET REFERENCE

temp6[31:0]  INT32 ((SRC1[111:96] * SRC2[111:96]) >>14) + 1 temp7[31:0]  INT32 ((SRC1[127:112] * SRC2[127:112) >>14) + 1 DEST[15:0]  temp0[16:1]

DEST[31:16]  temp1[16:1] DEST[47:32]  temp2[16:1] DEST[63:48]  temp3[16:1] DEST[79:64]  temp4[16:1] DEST[95:80]  temp5[16:1] DEST[111:96]  temp6[16:1] DEST[127:112]  temp7[16:1] DEST[VLMAX:128]  0

PMULHRSW (128-bit Legacy SSE version)

temp0[31:0]  INT32 ((DEST[15:0] * SRC[15:0]) >>14) + 1 temp1[31:0]  INT32 ((DEST[31:16] * SRC[31:16]) >>14) + 1 temp2[31:0]  INT32 ((DEST[47:32] * SRC[47:32]) >>14) + 1 temp3[31:0]  INT32 ((DEST[63:48] * SRC[63:48]) >>14) + 1 temp4[31:0]  INT32 ((DEST[79:64] * SRC[79:64]) >>14) + 1 temp5[31:0]  INT32 ((DEST[95:80] * SRC[95:80]) >>14) + 1 temp6[31:0]  INT32 ((DEST[111:96] * SRC[111:96]) >>14) + 1 temp7[31:0]  INT32 ((DEST[127:112] * SRC[127:112) >>14) + 1 DEST[15:0]  temp0[16:1]

Intel C/C++ Compiler Intrinsic Equivalent

(V)PMULHRSW __m128i _mm_mulhrs_epi16 (__m128i a, __m128i b)

VPMULHRSW __m256i _mm256_mulhrs_epi16 (__m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4

Ref. # 319433-011

5-131

<<< < Предыдущая 23 24 25 26 27 28 29 30 31 32 33 3435 / 7335 36 37 38 39 40 41 42 43 44 45 46 47 > Следующая >>>

Соседние файлы в папке Лаб2012

#
02.02.20153.33 Mб6525366517.pdf
#
02.02.20152.52 Mб25253666.pdf
#
02.02.20152.7 Mб2725366617.pdf
#
02.02.20152.09 Mб31253667.pdf
#
02.02.20152.19 Mб2625366717.pdf
#
02.02.20152.31 Mб27319433-011.pdf
#
02.02.2015162.82 Кб20ЛР1 Виконання арифм операц.doc
#
02.02.201564 Кб21ЛР10 ММХ-розширення.doc
#
02.02.201567.07 Кб22ЛР11 SSE-розширення.doc
#
02.02.201590.62 Кб18ЛР12 Windows-застосування.doc
#
02.02.2015181.25 Кб19ЛР13 Досл_дження коду програм.doc