Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный Технический Университет Харьковский Политехнический Институт

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Лаб2012 / 319433-011.pdf

Скачиваний:

Добавлен:

02.02.2015

Размер:

2.31 Mб

Скачать

☆

<<< < Предыдущая 35 36 37 38 39 40 41 42 43 44 45 4647 / 7347 48 49 50 51 52 53 54 55 56 57 58 59 > Следующая >>>

INSTRUCTION SET REFERENCE

PXOR - Exclusive Or

Opcode/	Op/	64/3	CPUID	Description
Instruction	En	2-bit	Feature
		Mode	Flag
66 0F EF /r	A	V/V	SSE2	Bitwise XOR of xmm2/m128
PXOR xmm1, xmm2/m128				and xmm1.
VEX.NDS.128.66.0F.WIG EF /r	B	V/V	AVX	Bitwise XOR of xmm3/m128
VPXOR xmm1, xmm2, xmm3/m128				and xmm2.
VEX.NDS.256.66.0F.WIG EF /r	B	V/V	AVX2	Bitwise XOR of ymm3/m256
VPXOR ymm1, ymm2, ymm3/m256				and ymm2.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
A	ModRM:reg (r, w)	ModRM:r/m (r)	NA	NA
B	ModRM:reg (w)	VEX.vvvv	ModRM:r/m (r)	NA

Description

Performs a bitwise logical XOR operation on the second source operand and the first source operand and stores the result in the destination operand. Each bit of the result is set to 1 if the corresponding bits of the first and second operands differ, otherwise it is set to 0.

Legacy SSE instructions: In 64-bit mode using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).

128-bit Legacy SSE version: The second source operand is an XMM register or a 128bit memory location. The first source operand and destination operands are XMM registers. Bits (255:128) of the corresponding YMM destination register remain unchanged.

VEX.128 encoded version: The second source operand is an XMM register or a 128bit memory location. The first source operand and destination operands are XMM registers. Bits (127:128) of the corresponding YMM register are zeroed.

VEX.256 encoded version: The second source operand is an YMM register or a 256bit memory location. The first source operand and destination operands are YMM registers.

Operation

VPXOR (VEX.256 encoded version)

DEST  SRC1 XOR SRC2

5-218

Ref. # 319433-011

INSTRUCTION SET REFERENCE

VPXOR (VEX.128 encoded version)

DEST  SRC1 XOR SRC2

DEST[VLMAX:128]  0

PXOR (128-bit Legacy SSE version)

DEST  DEST XOR SRC

DEST[VLMAX:128] (Unmodified)

Intel C/C++ Compiler Intrinsic Equivalent

(V)PXOR __m128i _mm_xor_si128 ( __m128i a, __m128i b) VPXOR __m256i _mm256_xor_si256 ( __m256i a, __m256i b)

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 4

Ref. # 319433-011

5-219

INSTRUCTION SET REFERENCE

MOVNTDQALoad Double Quadword Non-Temporal Aligned Hint

Opcode/	Op/	64/32	CPUID	Description
Instruction	En	-bit	Feature
		Mode	Flag
66 0F 38 2A /r	A	V/V	SSE4_1	Move double quadword from m128
MOVNTDQA xmm1, m128				to xmm1 using non-temporal hint
				if WC memory type.
VEX.128.66.0F38.WIG 2A /r	A	V/V	AVX	Move double quadword from m128
VMOVNTDQA xmm1, m128				to xmm using non-temporal hint if
				WC memory type.
VEX.256.66.0F38.WIG 2A /r	A	V/V	AVX2	Move 256-bit data from m256 to
VMOVNTDQA ymm1, m256				ymm using non-temporal hint if WC
				memory type.

Instruction Operand Encoding1

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
A	ModRM:reg (w)	ModRM:r/m (r)	NA	NA

Description

MOVNTDQA loads a double quadword from the source operand (second operand) to the destination operand (first operand) using a non-temporal hint if the memory source is WC (write combining) memory type. For WC memory type, the nontemporal hint may be implemented by loading a temporary internal buffer with the equivalent of an aligned cache line without filling this data to the cache. Any memory-type aliased lines in the cache will be snooped and flushed. Subsequent MOVNTDQA reads to unread portions of the WC cache line will receive data from the temporary internal buffer if data is available. The temporary internal buffer may be flushed by the processor at any time for any reason, for example:

•A load operation other than a MOVNTDQA which references memory already resident in a temporary internal buffer.

•A non-WC reference to memory already resident in a temporary internal buffer.

•Interleaving of reads and writes to a single temporary internal buffer.

•Repeated (V)MOVNTDQA loads of a particular 16-byte item in a streaming line.

•Certain micro-architectural conditions including resource shortages, detection of a mis-speculation condition, and various fault conditions.

The non-temporal hint is implemented by using a write combining (WC) memory type protocol when reading the data from memory. Using this protocol, the processor

1. ModRM.MOD = 011B required

5-220

Ref. # 319433-011

INSTRUCTION SET REFERENCE

does not read the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy. The memory type of the region being read can override the non-temporal hint, if the memory address specified for the non-temporal read is not a WC memory region. Information on non-temporal reads and writes can be found in “Caching of Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architecture Software Developer’s Manual, Volume 3A.

Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with a MFENCE instruction should be used in conjunction with MOVNTDQA instructions if multiple processors might use different memory types for the referenced memory locations or to synchronize reads of a processor with writes by other agents in the system. A processor’s implementation of the streaming load hint does not override the effective memory type, but the implementation of the hint is processor dependent. For example, a processor implementation may choose to ignore the hint and process the instruction as a normal MOVDQA for any memory type. Alternatively, another implementation may optimize cache reads generated by MOVNTDQA on WB memory type to reduce cache evictions.

The 128-bit (V)MOVNTDQA addresses must be 16-byte aligned or the instruction will cause a #GP.

The 256-bit VMOVNTDQA addresses must be 32-byte aligned or the instruction will cause a #GP.

Operation

MOVNTDQA (128bitLegacy SSE form)

DEST  SRC

DEST[VLMAX:128] (Unmodified)

VMOVNTDQA (VEX.128 encoded form)

DEST  SRC

DEST[VLMAX:128]  0

VMOVNTDQA (VEX.256 encoded form)

DEST[255:0]  SRC[255:0]

Intel C/C++ Compiler Intrinsic Equivalent

(V)MOVNTDQA __m128i _mm_stream_load_si128 (__m128i *p);

VMOVNTDQA __m256i _mm256_stream_load_si256 (const __m256i *p);

SIMD Floating-Point Exceptions

None

Ref. # 319433-011

5-221

INSTRUCTION SET REFERENCE

Other Exceptions

See Exceptions Type1; additionally

#UD	If VEX.vvvv != 1111B.

5-222

Ref. # 319433-011

				INSTRUCTION SET REFERENCE
VBROADCASTBroadcast Floating-Point Data

Opcode/	Op/	64/32	CPUID	Description
Instruction	En	-bit	Feature
		Mode	Flag
VEX.128.66.0F38.W0 18 /r	A	V/V	AVX	Broadcast single-precision floating-
VBROADCASTSS xmm1, m32				point element in mem to four loca-
				tions in xmm1.
VEX.256.66.0F38.W0 18 /r	A	V/V	AVX	Broadcast single-precision floating-
VBROADCASTSS ymm1, m32				point element in mem to eight loca-
				tions in ymm1.
VEX.256.66.0F38.W0 19 /r	A	V/V	AVX	Broadcast double-precision float-
VBROADCASTSD ymm1, m64				ing-point element in mem to four
				locations in ymm1.
VEX.128.66.0F38.W0 18/r	A	V/V	AVX2	Broadcast the low single-precision
VBROADCASTSS xmm1,				floating-point element in the
xmm2				source operand to four locations in
				xmm1.
VEX.256.66.0F38.W0 18 /r	A	V/V	AVX2	Broadcast low single-precision
VBROADCASTSS ymm1,				floating-point element in the
xmm2				source operand to eight locations in
				ymm1.
VEX.256.66.0F38.W0 19 /r	A	V/V	AVX2	Broadcast low double-precision
VBROADCASTSD ymm1,				floating-point element in the
xmm2				source operand to four locations in
				ymm1.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
A	ModRM:reg (w)	ModRM:r/m (r)	NA	NA

Description

Take the low floating-point data element from the source operand (second operand) and broadcast to all elements of the destination operand (first operand).

The destination operand is a YMM register. The source operand is an XMM register, only the low 32-bit or 64-bit data element is used.

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.

Ref. # 319433-011

5-223

INSTRUCTION SET REFERENCE

An attempt to execute VBROADCASTSD encoded with VEX.L= 0 will cause an #UD exception.

Operation

VBROADCASTSS (128 bit version) temp  SRC[31:0]

FOR j  0 TO 3 DEST[31+j*32: j*32]  temp ENDFOR

DEST[VLMAX:128]  0

VBROADCASTSS (VEX.256 encoded version) temp  SRC[31:0]

FOR j  0 TO 7 DEST[31+j*32: j*32]  temp ENDFOR

VBROADCASTSD (VEX.256 encoded version) temp  SRC[63:0]

DEST[63:0]  temp DEST[127:64]  temp DEST[191:128]  temp DEST[255:192]  temp

Intel C/C++ Compiler Intrinsic Equivalent

VBROADCASTSS __m128 _mm_broadcastss_ps(__m128 );

VBROADCASTSS __m256 _mm256_broadcastss_ps(__m128);

VBROADCASTSD __m256d _mm256_broadcastsd_pd(__m128d);

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 6; additionally

#UD	If VEX.L = 0 for VBROADCASTSD,
	If VEX.W = 1.

5-224

Ref. # 319433-011

				INSTRUCTION SET REFERENCE
VBROADCASTF128/I128Broadcast 128-Bit Data

Opcode/	Op/	64/32	CPUID	Description
Instruction	En	-bit	Feature
		Mode	Flag
VEX.256.66.0F38.W0 1A /r	A	V/V	AVX	Broadcast 128 bits of floating-point
VBROADCASTF128 ymm1,				data in mem to low and high 128-
m128				bits in ymm1.
VEX.256.66.0F38.W0 5A /r	A	V/V	AVX2	Broadcast 128 bits of integer data
VBROADCASTI128 ymm1,				in mem to low and high 128-bits in
m128				ymm1.

Instruction Operand Encoding

Op/En	Operand 1	Operand 2	Operand 3	Operand 4
A	ModRM:reg (w)	ModRM:r/m (r)	NA	NA

Description

VBROADCASTF128 and VBROADCASTI128 load 128-bit data from the source operand (second operand) and broadcast to the destination operand (first operand).

The destination operand is a YMM register. The source operand is 128-bit memory location. Register source encodings for VBROADCASTF128 and VBROADCASTI128 are reserved and will #UD.

VBROADCASTF128 and VBROADCASTI128 are only supported as 256-bit wide versions.

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. Attempts to execute any VPBROADCAST* instruction with VEX.W = 1 will cause #UD.

An attempt to execute VBROADCASTF128 or VBROADCASTI128 encoded with VEX.L= 0 will cause an #UD exception.

Ref. # 319433-011

5-225

INSTRUCTION SET REFERENCE

m128i	X0

DEST

Figure 5-6. VBROADCASTI128 Operation

Operation

VBROADCASTF128/VBROADCASTI128 temp  SRC[127:0]

DEST[127:0]  temp DEST[255:128]  temp

Intel C/C++ Compiler Intrinsic Equivalent

VBROADCASTF128 __m256 _mm_broadcast_ps(__m128 ); VBROADCASTF128 __m256d _mm_broadcast_pd(__m128d ); VBROADCASTI128 __m256i _mm_broadcastsi128_si256(__m128i );

SIMD Floating-Point Exceptions

None

Other Exceptions

See Exceptions Type 6; additionally

#UD	If VEX.L = 0 ,
	If VEX.W = 1.

5-226

Ref. # 319433-011

<<< < Предыдущая 35 36 37 38 39 40 41 42 43 44 45 4647 / 7347 48 49 50 51 52 53 54 55 56 57 58 59 > Следующая >>>

Соседние файлы в папке Лаб2012

#
02.02.20153.33 Mб6525366517.pdf
#
02.02.20152.52 Mб25253666.pdf
#
02.02.20152.7 Mб2725366617.pdf
#
02.02.20152.09 Mб31253667.pdf
#
02.02.20152.19 Mб2625366717.pdf
#
02.02.20152.31 Mб27319433-011.pdf
#
02.02.2015162.82 Кб20ЛР1 Виконання арифм операц.doc
#
02.02.201564 Кб21ЛР10 ММХ-розширення.doc
#
02.02.201567.07 Кб22ЛР11 SSE-розширення.doc
#
02.02.201590.62 Кб18ЛР12 Windows-застосування.doc
#
02.02.2015181.25 Кб19ЛР13 Досл_дження коду програм.doc