Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный Технический Университет Харьковский Политехнический Институт

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Лаб2012 / 319433-011.pdf

Скачиваний:

Добавлен:

02.02.2015

Размер:

2.31 Mб

Скачать

☆

<<< < Предыдущая 48 49 50 51 52 53 54 55 56 57 58 5960 / 7360 61 62 63 64 65 66 67 68 69 70 71 72 > Следующая >>>

POST-32NM PROCESSOR INSTRUCTIONS

CHAPTER 8 POST-32NM PROCESSOR INSTRUCTIONS

8.1OVERVIEW

This chapter describes additional instructions targeted for Intel 64 architecture processors in process technology smaller than 32 nm. These instructions include:

•Two instructions to support 16-bit floating-point data type conversion to and from single-precision floating-point type. Conversion to packed 16-bit floating-point values from packed single-precision floating-point values also provides rounding control using an immediate byte. These float-16 instructions convert packed data types of different sizes following the same manner as the 256-bit vector SIMD extension, AVX.

•One instruction that generates random numbers of 16/32/64 bit wide random integers. The random number generator instruction operates on general-purpose registers.

•Four instructions that allow software working in 64-bit environment to read and write FS base and GS base registers in all privileged levels.

8.2CPUID DETECTION OF NEW INSTRUCTIONS

Application using float 16 instruction must follow a detection sequence similar to AVX to ensure:

•The OS has enabled YMM state management support,

•The processor support AVX as indicated by the CPUID feature flag, i.e. CPUID.01H:ECX.AVX[bit 28] = 1.

•The processor support 16-bit floating-point conversion instructions via a CPUID feature flag (CPUID.01H:ECX.F16C[bit 29] = 1).

Application detection of Float-16 conversion instructions follow the general procedural flow in Figure 8-1.

Ref. # 319433-011

8-1

POST-32NM PROCESSOR INSTRUCTIONS

Check feature flag

CPUID.1H:ECX.OXSAVE = 1?

	OS provides processor
	extended state management
Yes	Implied HW support for
	XSAVE, XRSTOR, XGETBV, XFEATURE_ENABLED_MASK

Check enabled YMM state in		Check feature flags
Check enabled YMM state in		Check feature flags
XFEM via XGETBV	State	for AVX and F16C	ok to use
	enabled		Instructions

Figure 8-1. General Procedural Flow of Application Detection of Float-16

----------------------------------------------------------------------------------------

INT supports_f16c() { ; result in eax mov eax, 1

cpuid

and ecx, 038000000H

cmp ecx, 038000000H; check OSXSAVE, AVX, F16C feature flags jne not_supported

; processor supports AVX,F16C instructions and XGETBV is enabled by OS mov ecx, 0; specify 0 for XFEATURE_ENABLED_MASK register

XGETBV; result in EDX:EAX and eax, 06H

cmp eax, 06H; check OS has enabled both XMM and YMM state support jne not_supported

mov eax, 1 jmp done

NOT_SUPPORTED: mov eax, 0

done:

8-2	Ref. # 319433-011

POST-32NM PROCESSOR INSTRUCTIONS

}

-------------------------------------------------------------------------------

RDRAND, RDFSBASE/RDGSBASE, WRFSBASE/WRGSBASE operates on general purpose registers only.

Before software attempts to use the RDRAND instruction, it must check that the CPUID feature flag indicating processor supports RDRAND, i.e., if CPUID.01H:ECX.RDRAND[bit 30] = 1.

CPUID.(EAX=07H, ECX=0H):EBX.FSGSBASE[bit 0] = 1 indicates availability of instructions that are primarily targeting use by system software manipulating the base of FS and GS segments. These instructions require enabling by OS (see Section 8.5). An OS may provide programming interfaces indicating its support of application use of RDFSBASE/RDGSBASE, WRFSBASE/WRGSBASE instructions.

8.316-BIT FLOATING-POINT DATA TYPE SUPPORT

Two new instructions support half-precision floating-point data type with conversion to and from single-precision floating-point data types. Table 8-1 gives the length, precision, and approximate normalized range that can be represented by half and single precision data types. Denormal values are also supported in these types.

Table 8-1. Length, Precision, and Range of Floating-Point Data Types

Data Type	Length	Precision	Approximate Normalized Range
		(Bits)
		(Bits)	Binary	Decimal
			Binary	Decimal

Half Precision	16	11	2–14 to 215	3.1 × 10–5 to 6.50 × 104
Single Precision	32	24	2–126 to 2127	1.18 × 10–38 to 3.40 × 1038
Double Precision	64	53	2–1022 to 21023	2.23 × 10–308 to 1.79 × 10308

Table 8-2 shows the floating-point encodings for zeros, denormalized finite numbers, normalized finite numbers, infinities, and NaNs for each of the three floating-point data types.

Ref. # 319433-011

8-3

POST-32NM PROCESSOR INSTRUCTIONS

Table 8-2. Half-Precision Floating-Point Number and NaN Encodings

	Class		Sign	Biased Exponent		Significand

						Integer1	Fraction
Positive		+∞	0	11..11		1	00..00

		+Normals	0	11..10		1	11..11
			.	.		.	.
			.	.		.	.
			0	00..01		1	00..00

		+Denormals	0	00..00		0	11.11
			.	.		.	.
			.	.		.	.
			0	00..00		0	00..01

		+Zero	0	00..00		0	00..00

Negative		−Zero	1	00..00		0	00..00

		−Denormals	1	00..00		0	00..01
			.	.		.	.
			.	.		.	.
			1	00..00		0	11..11

		−Normals	1	00..01		1	00..00
			.	.		.	.
			.	.		.	.
			1	11..10		1	11..11

		-•	1	11..11		1	00..00

NaNs		SNaN	X	11..11		1	0X..XX2
		QNaN	X	11..11		1	1X..XX

		QNaN	1	11..11		1	10..00
		Floating-Point
		Indefinite

		Half-Precision:		← 5Bits →			← 10 Bits →
		Single-Precision;		← 8 Bits	→		← 23Bits	→
		Double-Precision;		← 11 Bits →			← 52Bits	→

NOTES:

1.Integer bit is implied and not stored in memory format.

2.The fraction for SNaN encodings must be non-zero with the most-significant bit 0.

3.The most-significant bit of the fraction for QNaN encoding must be 1.

8-4	Ref. # 319433-011

POST-32NM PROCESSOR INSTRUCTIONS

Half-precision floating-point data type consists of a sign bit, a 5-bit exponent field, and a 11-bit significand field. The 11-bit significand consists of an implied integer bit that is not stored and a 10-bit fraction field that is stored as the 10 least-significant bits along with the sign bit (bit 15) and the exponent field (bits 14:10), see Figure 8-2.

				Sign			Half Precision
							Half Precision
							Floating-Point
			15 14		9	0	Floating-Point
			15 14		9	0
		Sign					Single Precision
							Single Precision
							Floating-Point
	31 30		23 22			0	Floating-Point
	31 30		23 22			0
Sign							Double Precision


							Floating-Point
63 62	52 51					0

Figure 8-2. Floating-Point Data Types

The exponent of each floating-point data type is encoded in biased format, the bias constant is 15 for half-precision floating-point data type. When storing floating-point values in memory, half-precision values are stored in 2 consecutive bytes in memory. Table 8-3 shows how the real number 178.125 (in ordinary decimal format) is stored in IEEE Standard 754 floating-point format. The table lists a progression of real number notations that leads to the half-precision, 16-bit floating-point format.

Table 8-3. Real and Floating-Point Number Notation

Notation		Value

Ordinary Decimal	178.125

Scientific Decimal	1.78125E10 2
Scientific Binary	1.0110010001E2111
Scientific Binary	1.0110010001E210110
(Biased Exponent)

Half-Precision Format	Sign	Biased Exponent	Normalized Significand

	0	10110	0110010001
			1. (Implied)

Ref. # 319433-011

8-5

POST-32NM PROCESSOR INSTRUCTIONS

8.3.1Half-Precision Floating-Point Conversion

Half-precision floating-point values are not used by the processor directly for arithmetic operations. Two instructions, VCVTPH2PS, VCVTPS2PH, provide conversion between half-precision and single-precision floating-point values.

The conversion operations of VCVTPS2PH allow programmer to specify rounding control using control fields in an immediate byte. The effects of the immediate byte are listed in Table 8-4.

Rounding control can use Imm[2] to select an override RC field specified in Imm[1:0] or use MXCSR setting.

Table 8-4. Immediate Byte Encoding for 16-bit Floating-Point Conversion Instructions

Bits	Field Name/value	Description	Comment

Imm[1:0]	RC=00B	Round to nearest even	If Imm[2] = 0

	RC=01B	Round down

	RC=10B	Round up

	RC=11B	Truncate

Imm[2]	MS1=0	Use imm[1:0] for round-	Ignore MXCSR.RC
		ing

	MS1=1	Use MXCSR.RC for round-
		ing

Imm[7:3]	Ignored	Ignored by processor

VCVTPH2PS and VCVTPS2PH are subject to SIMD floating-point exceptions. Specifically, they can cause invalid operation exception (see Table 8-6), the result of which is shown in Table 8-5.

Table 8-5. Non-Numerical Behavior for VCVTPH2PS, VCVTPS2PH

Source Operands	Masked Result	Unmasked Result

QNaN	QNaN11	QNaN11 (not an exception)
SNaN	QNaN12	None

NOTES:

1.The half precision output QNaN1 is created from the single precision input QNaN as follows: the sign bit is preserved, the 8-bit exponent FFH is replaced by the 5-bit exponent 1FH, and the 24-bit significand is truncated to an 11-bit significand by removing its 14 least significant bits.

8-6	Ref. # 319433-011

POST-32NM PROCESSOR INSTRUCTIONS

2.The half precision output QNaN1 is created from the single precision input SNaN as follows: the sign bit is preserved, the 8-bit exponent FFH is replaced by the 5-bit exponent 1FH, and the 24-bit significand is truncated to an 11-bit significand by removing its 14 least significant bits. The second most significant bit of the significand is changed from 0 to 1 to convert the signaling NaN into a quiet NaN.

Table 8-6. Invalid Operation for VCVTPH2PS, VCVTPS2PH

Instruction	Condition	Masked Result	Unmasked Result

VCVTPH2PS	SRC = NaN	See Table 8-5	#I=1

VCVTPS2PH	SRC = NaN	SeeTable 8-5	#I=1

VCVTPS2PH can cause denormal exceptions if the value of the source operand is denormal relative to the numerical range represented by the source format (see Table 8-7).

Table 8-7. Denormal Condition for VCVTPS2PH

Instruction	Condition	Masked Result1	Unmasked Result
VCVTPH2PS	SRC is denormal relative	res = Result rounded to the	Same as masked
	to input format1	destination precision and using the	result.
		bounded exponent, but only if no
		unmasked post-computation
		exception occurs.
		#DE unchanged

VCVTPS2PH	SRC is denormal relative	res = Result rounded to the	#DE=1
	to input format1	destination precision and using the
		bounded exponent, but only if no
		unmasked post-computation
		exception occurs.
		#DE=1

NOTES:

1. Masked and unmasked result is shown in Table 7-7.

VCVTPS2PH can cause an underflow exception if the result of the conversion is less than the underflow threshold for half-precision floating-point data type , i.e. | x | < 1.0 2−14.

Ref. # 319433-011

8-7

POST-32NM PROCESSOR INSTRUCTIONS

Table 8-8. Underflow Condition for VCVTPS2PH

Instruction	Condition	Masked Result1	Unmasked Result
VCVTPS2PH	Result < smallest	Result = +0 or -0, denormal,	#UE=1,
	destination precision	normal.	#PE = 1 if the
	finial normal value2	#UE =1.	result is inexact.
		#PE = 1 if the result is inexact.

NOTES:

1.Masked and unmasked result is shown in Table 7-7.

2.If FTZ is not set ( MXCSR.FTZ = 1 ), masked and unmasked result is shown in Table 7-8. If FTZ is set (MXCSR.FTZ = 1), inexact result = +0 or - 0, #PE and #UE are reported.

VCVTPS2PH can cause an overflow exception if the result of the conversion is greater than the maximum representable value for half-precision floating-point data type, i.e. | x | ≥ 1.0 216.

Table 8-9. Overflow Condition for VCVTPS2PH

Instruction	Condition	Masked Result	Unmasked Result

VCVTPS2PH	Result ≥ largest	Result = +Inf or -Inf.	#OE=1.
	destination precision	#OE=1.
	finial normal value1

VCVTPS2PH can cause an inexact exception if the result of the conversion is not exactly representable in the destination format.

Table 8-10. Inexact Condition for VCVTPS2PH

Instruction	Condition	Masked Result1	Unmasked Result
VCVTPS2PH	The result is not	res = Result rounded to the	Only if no underflow/overflow
	representable in	destination precision and	condition occurred, or if the
	the destination	using the bounded	corresponding exceptions are
	format	exponent, but only if no	masked:
		unmasked underflow or	• Set #OE if masked overflow
		overflow conditions occur	and set result as described
		(this exception can occur in	above for masked overflow.
		(this exception can occur in
		the presence of a masked	• Set #UE if masked underflow
		underflow or overflow).	• Set #UE if masked underflow
		underflow or overflow).	and set result as described
		#PE=1.	above for masked underflow.
			If neither underflow nor
			overflow, result equals the result
			rounded to the destination
			precision and using the bounded
			exponent set #PE = 1.

8-8	Ref. # 319433-011

<<< < Предыдущая 48 49 50 51 52 53 54 55 56 57 58 5960 / 7360 61 62 63 64 65 66 67 68 69 70 71 72 > Следующая >>>

Соседние файлы в папке Лаб2012

#
02.02.20153.33 Mб6525366517.pdf
#
02.02.20152.52 Mб25253666.pdf
#
02.02.20152.7 Mб2725366617.pdf
#
02.02.20152.09 Mб31253667.pdf
#
02.02.20152.19 Mб2625366717.pdf
#
02.02.20152.31 Mб27319433-011.pdf
#
02.02.2015162.82 Кб20ЛР1 Виконання арифм операц.doc
#
02.02.201564 Кб21ЛР10 ММХ-розширення.doc
#
02.02.201567.07 Кб22ЛР11 SSE-розширення.doc
#
02.02.201590.62 Кб18ЛР12 Windows-застосування.doc
#
02.02.2015181.25 Кб19ЛР13 Досл_дження коду програм.doc