Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Лаб2012 / 319433-011.pdf
Скачиваний:
27
Добавлен:
02.02.2015
Размер:
2.31 Mб
Скачать

INSTRUCTION SET REFERENCE

PXOR - Exclusive Or

Opcode/

Op/

64/3

CPUID

Description

Instruction

En

2-bit

Feature

 

 

 

Mode

Flag

 

66 0F EF /r

A

V/V

SSE2

Bitwise XOR of xmm2/m128

PXOR xmm1, xmm2/m128

 

 

 

and xmm1.

VEX.NDS.128.66.0F.WIG EF /r

B

V/V

AVX

Bitwise XOR of xmm3/m128

VPXOR xmm1, xmm2, xmm3/m128

 

 

 

and xmm2.

VEX.NDS.256.66.0F.WIG EF /r

B

V/V

AVX2

Bitwise XOR of ymm3/m256

VPXOR ymm1, ymm2, ymm3/m256

 

 

 

and ymm2.

 

 

 

 

 

Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

A

ModRM:reg (r, w)

ModRM:r/m (r)

NA

NA

B

ModRM:reg (w)

VEX.vvvv

ModRM:r/m (r)

NA

 

 

 

 

 

Description

Performs a bitwise logical XOR operation on the second source operand and the first source operand and stores the result in the destination operand. Each bit of the result is set to 1 if the corresponding bits of the first and second operands differ, otherwise it is set to 0.

Legacy SSE instructions: In 64-bit mode using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).

128-bit Legacy SSE version: The second source operand is an XMM register or a 128bit memory location. The first source operand and destination operands are XMM registers. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The second source operand is an XMM register or a 128bit memory location. The first source operand and destination operands are XMM registers. Bits (127:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: The second source operand is an YMM register or a 256bit memory location. The first source operand and destination operands are YMM registers.

Operation

VPXOR (VEX.256 encoded version)

DEST SRC1 XOR SRC2

5-218

Ref. # 319433-011

INSTRUCTION SET REFERENCE

VPXOR (VEX.128 encoded version)

DEST SRC1 XOR SRC2

DEST[VLMAX:128] 0

PXOR (128-bit Legacy SSE version)

DEST DEST XOR SRC

DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PXOR __m128i _mm_xor_si128 ( __m128i a, __m128i b) VPXOR __m256i _mm256_xor_si256 ( __m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4

Ref. # 319433-011

5-219

INSTRUCTION SET REFERENCE

MOVNTDQALoad Double Quadword Non-Temporal Aligned Hint

Opcode/

Op/

64/32

CPUID

Description

Instruction

En

-bit

Feature

 

 

 

Mode

Flag

 

66 0F 38 2A /r

A

V/V

SSE4_1

Move double quadword from m128

MOVNTDQA xmm1, m128

 

 

 

to xmm1 using non-temporal hint

 

 

 

 

if WC memory type.

VEX.128.66.0F38.WIG 2A /r

A

V/V

AVX

Move double quadword from m128

VMOVNTDQA xmm1, m128

 

 

 

to xmm using non-temporal hint if

 

 

 

 

WC memory type.

VEX.256.66.0F38.WIG 2A /r

A

V/V

AVX2

Move 256-bit data from m256 to

VMOVNTDQA ymm1, m256

 

 

 

ymm using non-temporal hint if WC

 

 

 

 

memory type.

Instruction Operand Encoding1

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

A

ModRM:reg (w)

ModRM:r/m (r)

NA

NA

 

 

 

 

 

Description

MOVNTDQA loads a double quadword from the source operand (second operand) to the destination operand (first operand) using a non-temporal hint if the memory source is WC (write combining) memory type. For WC memory type, the nontemporal hint may be implemented by loading a temporary internal buffer with the equivalent of an aligned cache line without filling this data to the cache. Any memory-type aliased lines in the cache will be snooped and flushed. Subsequent MOVNTDQA reads to unread portions of the WC cache line will receive data from the temporary internal buffer if data is available. The temporary internal buffer may be flushed by the processor at any time for any reason, for example:

A load operation other than a MOVNTDQA which references memory already resident in a temporary internal buffer.

A non-WC reference to memory already resident in a temporary internal buffer.

Interleaving of reads and writes to a single temporary internal buffer.

Repeated (V)MOVNTDQA loads of a particular 16-byte item in a streaming line.

Certain micro-architectural conditions including resource shortages, detection of a mis-speculation condition, and various fault conditions.

The non-temporal hint is implemented by using a write combining (WC) memory type protocol when reading the data from memory. Using this protocol, the processor

1. ModRM.MOD = 011B required

5-220

Ref. # 319433-011

INSTRUCTION SET REFERENCE

does not read the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy. The memory type of the region being read can override the non-temporal hint, if the memory address specified for the non-temporal read is not a WC memory region. Information on non-temporal reads and writes can be found in “Caching of Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architecture Software Developer’s Manual, Volume 3A.

Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with a MFENCE instruction should be used in conjunction with MOVNTDQA instructions if multiple processors might use different memory types for the referenced memory locations or to synchronize reads of a processor with writes by other agents in the system. A processor’s implementation of the streaming load hint does not override the effective memory type, but the implementation of the hint is processor dependent. For example, a processor implementation may choose to ignore the hint and process the instruction as a normal MOVDQA for any memory type. Alternatively, another implementation may optimize cache reads generated by MOVNTDQA on WB memory type to reduce cache evictions.

The 128-bit (V)MOVNTDQA addresses must be 16-byte aligned or the instruction will cause a #GP.

The 256-bit VMOVNTDQA addresses must be 32-byte aligned or the instruction will cause a #GP.

Operation

MOVNTDQA (128bitLegacy SSE form)

DEST SRC

DEST[VLMAX:128] (Unmodified)

VMOVNTDQA (VEX.128 encoded form)

DEST SRC

DEST[VLMAX:128] 0

VMOVNTDQA (VEX.256 encoded form)

DEST[255:0] SRC[255:0]

Intel C/C++ Compiler Intrinsic Equivalent

(V)MOVNTDQA __m128i _mm_stream_load_si128 (__m128i *p);

VMOVNTDQA __m256i _mm256_stream_load_si256 (const __m256i *p);

SIMD Floating-Point Exceptions

None

Ref. # 319433-011

5-221

INSTRUCTION SET REFERENCE

Other Exceptions

See Exceptions Type1; additionally

#UD

If VEX.vvvv != 1111B.

5-222

Ref. # 319433-011

 

 

 

 

INSTRUCTION SET REFERENCE

VBROADCASTBroadcast Floating-Point Data

 

 

 

 

 

 

Opcode/

Op/

64/32

CPUID

Description

Instruction

En

-bit

Feature

 

 

 

Mode

Flag

 

VEX.128.66.0F38.W0 18 /r

A

V/V

AVX

Broadcast single-precision floating-

VBROADCASTSS xmm1, m32

 

 

 

point element in mem to four loca-

 

 

 

 

tions in xmm1.

VEX.256.66.0F38.W0 18 /r

A

V/V

AVX

Broadcast single-precision floating-

VBROADCASTSS ymm1, m32

 

 

 

point element in mem to eight loca-

 

 

 

 

tions in ymm1.

VEX.256.66.0F38.W0 19 /r

A

V/V

AVX

Broadcast double-precision float-

VBROADCASTSD ymm1, m64

 

 

 

ing-point element in mem to four

 

 

 

 

locations in ymm1.

VEX.128.66.0F38.W0 18/r

A

V/V

AVX2

Broadcast the low single-precision

VBROADCASTSS xmm1,

 

 

 

floating-point element in the

xmm2

 

 

 

source operand to four locations in

 

 

 

 

xmm1.

VEX.256.66.0F38.W0 18 /r

A

V/V

AVX2

Broadcast low single-precision

VBROADCASTSS ymm1,

 

 

 

floating-point element in the

xmm2

 

 

 

source operand to eight locations in

 

 

 

 

ymm1.

VEX.256.66.0F38.W0 19 /r

A

V/V

AVX2

Broadcast low double-precision

VBROADCASTSD ymm1,

 

 

 

floating-point element in the

xmm2

 

 

 

source operand to four locations in

 

 

 

 

ymm1.

Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

A

ModRM:reg (w)

ModRM:r/m (r)

NA

NA

 

 

 

 

 

Description

Take the low floating-point data element from the source operand (second operand) and broadcast to all elements of the destination operand (first operand).

The destination operand is a YMM register. The source operand is an XMM register, only the low 32-bit or 64-bit data element is used.

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Ref. # 319433-011

5-223

INSTRUCTION SET REFERENCE

An attempt to execute VBROADCASTSD encoded with VEX.L= 0 will cause an #UD exception.

Operation

VBROADCASTSS (128 bit version) temp SRC[31:0]

FOR j 0 TO 3 DEST[31+j*32: j*32] temp ENDFOR

DEST[VLMAX:128] 0

VBROADCASTSS (VEX.256 encoded version) temp SRC[31:0]

FOR j 0 TO 7 DEST[31+j*32: j*32] temp ENDFOR

VBROADCASTSD (VEX.256 encoded version) temp SRC[63:0]

DEST[63:0] temp DEST[127:64] temp DEST[191:128] temp DEST[255:192] temp

Intel C/C++ Compiler Intrinsic Equivalent

VBROADCASTSS __m128 _mm_broadcastss_ps(__m128 );

VBROADCASTSS __m256 _mm256_broadcastss_ps(__m128);

VBROADCASTSD __m256d _mm256_broadcastsd_pd(__m128d);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 6; additionally

#UD

If VEX.L = 0 for VBROADCASTSD,

 

If VEX.W = 1.

5-224

Ref. # 319433-011

 

 

 

 

INSTRUCTION SET REFERENCE

VBROADCASTF128/I128Broadcast 128-Bit Data

 

 

 

 

 

Opcode/

Op/

64/32

CPUID

Description

Instruction

En

-bit

Feature

 

 

 

Mode

Flag

 

VEX.256.66.0F38.W0 1A /r

A

V/V

AVX

Broadcast 128 bits of floating-point

VBROADCASTF128 ymm1,

 

 

 

data in mem to low and high 128-

m128

 

 

 

bits in ymm1.

VEX.256.66.0F38.W0 5A /r

A

V/V

AVX2

Broadcast 128 bits of integer data

VBROADCASTI128 ymm1,

 

 

 

in mem to low and high 128-bits in

m128

 

 

 

ymm1.

 

 

 

 

 

Instruction Operand Encoding

Op/En

Operand 1

Operand 2

Operand 3

Operand 4

A

ModRM:reg (w)

ModRM:r/m (r)

NA

NA

 

 

 

 

 

Description

VBROADCASTF128 and VBROADCASTI128 load 128-bit data from the source operand (second operand) and broadcast to the destination operand (first operand).

The destination operand is a YMM register. The source operand is 128-bit memory location. Register source encodings for VBROADCASTF128 and VBROADCASTI128 are reserved and will #UD.

VBROADCASTF128 and VBROADCASTI128 are only supported as 256-bit wide versions.

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. Attempts to execute any VPBROADCAST* instruction with VEX.W = 1 will cause #UD.

An attempt to execute VBROADCASTF128 or VBROADCASTI128 encoded with VEX.L= 0 will cause an #UD exception.

Ref. # 319433-011

5-225

INSTRUCTION SET REFERENCE

m128i

X0

 

 

 

 

DEST

X0

X0

Figure 5-6. VBROADCASTI128 Operation

Operation

VBROADCASTF128/VBROADCASTI128 temp SRC[127:0]

DEST[127:0] temp DEST[255:128] temp

Intel C/C++ Compiler Intrinsic Equivalent

VBROADCASTF128 __m256 _mm_broadcast_ps(__m128 ); VBROADCASTF128 __m256d _mm_broadcast_pd(__m128d ); VBROADCASTI128 __m256i _mm_broadcastsi128_si256(__m128i );

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 6; additionally

#UD

If VEX.L = 0 ,

 

If VEX.W = 1.

5-226

Ref. # 319433-011

Соседние файлы в папке Лаб2012