Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Лаб2012 / 319433-011.pdf
Скачиваний:
27
Добавлен:
02.02.2015
Размер:
2.31 Mб
Скачать

INSTRUCTION SET REFERENCE

PSHUFD - Shuffle Packed Doublewords

Opcode/

Op/

64/32

CPUID

Description

Instruction

En

-bit

Feature

 

 

 

Mode

Flag

 

66 0F 70 /r ib

A

V/V

SSE2

Shuffle the doublewords in

PSHUFD xmm1, xmm2/m128, imm8

 

 

 

xmm2/m128 based on the

 

 

 

 

encoding in imm8 and store

 

 

 

 

the result in xmm1.

VEX.128.66.0F.WIG 70 /r ib

A

V/V

AVX

Shuffle the doublewords in

VPSHUFD xmm1, xmm2/m128,

 

 

 

xmm2/m128 based on the

imm8

 

 

 

encoding in imm8 and store

 

 

 

 

the result in xmm1.

VEX.256.66.0F.WIG 70 /r ib

A

V/V

AVX2

Shuffle the doublewords in

VPSHUFD ymm1, ymm2/m256,

 

 

 

ymm2/m256 based on the

imm8

 

 

 

encoding in imm8 and store

 

 

 

 

the result in ymm1.

Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

A

ModRM:reg (w)

ModRM:r/m (r)

NA

NA

 

 

 

 

 

Description

Copies doublewords from the source operand and inserts them in the destination operand at the locations selected with the immediate control operand. Figure 5-4 shows the operation of the 256-bit VPSHUFD instruction and the encoding of the order operand. Each 2-bit field in the order operand selects the contents of one doubleword location within a 128-bit lane and copy to the target element in the destination operand. For example, bits 0 and 1 of the order operand targets the first doubleword element in the low and high 128-bit lane of the destination operand for 256-bit VPSHUFD. The encoded value of bits 1:0 of the order operand (see the field encoding in Figure 5-4) determines which doubleword element (from the respective 128-bit lane) of the source operand will be copied to doubleword 0 of the destination operand.

For 128-bit operation, only the low 128-bit lane are operative. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register. The order operand is an 8-bit immediate. Note that this instruction permits a doubleword in the source operand to be copied to more than one doubleword location in the destination operand.

5-154

Ref. # 319433-011

INSTRUCTION SET REFERENCE

SRC

X7

X6

X5

X4

X3

X2

X1

X0

 

 

 

 

 

 

 

 

 

DEST

 

Y7

 

Y6

Y5

 

Y4

 

 

Y3

 

Y2

 

Y1

 

Y0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Encoding

01B - X5

 

 

 

 

 

 

 

 

 

 

 

 

 

 

of Fields in

ORDER

 

 

 

 

 

 

 

 

of Fields in

01B - X1

 

 

ORDER

10B - X6

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

ORDER

10B - X2

 

Operand

11B - X7

 

7 6 5 4 3

2 1

0

 

 

Operand

11B - X3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 5-4. 256-bit VPSHUFD Instruction Operation

Legacy SSE instructions: In 64-bit mode using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).

128-bit Legacy SSE version: Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: Bits (255:128) of the destination stores the shuffled results of the upper 16 bytes of the source operand using the immediate byte as the order operand.

Note: In VEX encoded versions VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Operation

VPSHUFD (VEX.256 encoded version)

DEST[31:0] (SRC[127:0] >> (ORDER[1:0] * 32))[31:0];

DEST[63:32] (SRC[127:0] >> (ORDER[3:2] * 32))[31:0];

DEST[95:64] (SRC[127:0] >> (ORDER[5:4] * 32))[31:0];

DEST[127:96] (SRC[127:0] >> (ORDER[7:6] * 32))[31:0];

DEST[159:128] (SRC[255:128] >> (ORDER[1:0] * 32))[31:0];

DEST[191:160] (SRC[255:128] >> (ORDER[3:2] * 32))[31:0];

DEST[223:192] (SRC[255:128] >> (ORDER[5:4] * 32))[31:0];

DEST[255:224] (SRC[255:128] >> (ORDER[7:6] * 32))[31:0];

VPSHUFD (VEX.128 encoded version)

Ref. # 319433-011

5-155

INSTRUCTION SET REFERENCE

DEST[31:0] (SRC[127:0] >> (ORDER[1:0] * 32))[31:0]; DEST[63:32] (SRC[127:0] >> (ORDER[3:2] * 32))[31:0]; DEST[95:64] (SRC[127:0] >> (ORDER[5:4] * 32))[31:0]; DEST[127:96] (SRC[127:0] >> (ORDER[7:6] * 32))[31:0]; DEST[VLMAX:128] 0

PSHUFD (128-bit Legacy SSE version)

DEST[31:0] (SRC[255:128] >> (ORDER[1:0] * 32))[31:0]; DEST[63:32] (SRC[255:128] >> (ORDER[3:2] * 32))[31:0]; DEST[95:64] (SRC[255:128] >> (ORDER[5:4] * 32))[31:0]; DEST[127:96] (SRC[255:128] >> (ORDER[7:6] * 32))[31:0]; DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PSHUFD __m128i _mm_shuffle_epi32(__m128i a, const int n)

VPSHUFD __m256i _mm256_shuffle_epi32(__m256i a, const int n)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4

5-156

Ref. # 319433-011

 

 

 

 

INSTRUCTION SET REFERENCE

PSHUFHW — Shuffle Packed High Words

 

 

 

 

 

 

Opcode/

Op/

64/32

CPUID

Description

Instruction

En

-bit

Feature

 

 

 

Mode

Flag

 

F3 0F 70 /r ib

A

V/V

SSE2

Shuffle the high words in

PSHUFHW xmm1, xmm2/m128,

 

 

 

xmm2/m128 based on the

imm8

 

 

 

encoding in imm8 and store

 

 

 

 

the result in xmm1.

VEX.128.F3.0F.WIG 70 /r ib

A

V/V

AVX

Shuffle the high words in

VPSHUFHW xmm1, xmm2/m128,

 

 

 

xmm2/m128 based on the

imm8

 

 

 

encoding in imm8 and store

 

 

 

 

the result in xmm1.

VEX.256.F3.0F.WIG 70 /r ib

A

V/V

AVX2

Shuffle the high words in

VPSHUFHW ymm1, ymm2/m256,

 

 

 

ymm2/m256 based on the

imm8

 

 

 

encoding in imm8 and store

 

 

 

 

the result in ymm1.

Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

A

ModRM:reg (w)

ModRM:r/m (r)

NA

NA

 

 

 

 

 

Description

Copies words from the high quadword of a 128-bit lane of the source operand and inserts them in the high quadword of the destination operand at word locations (of the respective lane) selected with the immediate operand . This 256-bit operation is similar to the in-lane operation used by the 256-bit VPSHUFD instruction, which is illustrated in Figure 5-4. For 128-bit operation, only the low 128-bit lane is operative. Each 2-bit field in the immediate operand selects the contents of one word location in the high quadword of the destination operand. The binary encodings of the immediate operand fields select words (0, 1, 2 or 3, 4) from the high quadword of the source operand to be copied to the destination operand. The low quadword of the source operand is copied to the low quadword of the destination operand, for each 128-bit lane.

Note that this instruction permits a word in the high quadword of the source operand to be copied to more than one word location in the high quadword of the destination operand.

Legacy SSE instructions: In 64-bit mode using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).

Ref. # 319433-011

5-157

INSTRUCTION SET REFERENCE

128-bit Legacy SSE version: The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory location. Bits (255:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: The destination operand is an YMM register. The source operand can be an YMM register or a 256-bit memory location.

Note: In VEX encoded versions VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Operation

VPSHUFHW (VEX.256 encoded version)

DEST[63:0] SRC1[63:0]

DEST[79:64] (SRC1 >> (imm[1:0] *16))[79:64]

DEST[95:80] (SRC1 >> (imm[3:2] * 16))[79:64]

DEST[111:96] (SRC1 >> (imm[5:4] * 16))[79:64]

DEST[127:112] (SRC1 >> (imm[7:6] * 16))[79:64]

DEST[191:128] SRC1[191:128]

DEST[207192] (SRC1 >> (imm[1:0] *16))[207:192]

DEST[223:208] (SRC1 >> (imm[3:2] * 16))[207:192]

DEST[239:224] (SRC1 >> (imm[5:4] * 16))[207:192]

DEST[255:240] (SRC1 >> (imm[7:6] * 16))[207:192]

VPSHUFHW (VEX.128 encoded version)

DEST[63:0] SRC1[63:0]

DEST[79:64] (SRC1 >> (imm[1:0] *16))[79:64]

DEST[95:80] (SRC1 >> (imm[3:2] * 16))[79:64]

DEST[111:96] (SRC1 >> (imm[5:4] * 16))[79:64]

DEST[127:112] (SRC1 >> (imm[7:6] * 16))[79:64]

DEST[VLMAX:128] 0

PSHUFHW (128-bit Legacy SSE version)

DEST[63:0] SRC[63:0]

DEST[79:64] (SRC >> (imm[1:0] *16))[79:64]

DEST[95:80] (SRC >> (imm[3:2] * 16))[79:64]

DEST[111:96] (SRC >> (imm[5:4] * 16))[79:64]

DEST[127:112] (SRC >> (imm[7:6] * 16))[79:64]

DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PSHUFHW __m128i _mm_shufflehi_epi16(__m128i a, const int n)

5-158

Ref. # 319433-011

INSTRUCTION SET REFERENCE

VPSHUFHW __m256i _mm256_shufflehi_epi16(__m256i a, const int n)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4

Ref. # 319433-011

5-159

Соседние файлы в папке Лаб2012