Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Лаб2012 / 319433-011.pdf
Скачиваний:
27
Добавлен:
02.02.2015
Размер:
2.31 Mб
Скачать

APPLICATION PROGRAMMING MODEL

2.2.3Detection of AVX2

Hardware support for AVX2 is indicated by CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]=1.

Application Software must identify that hardware supports AVX as explained in Section 2.2, after that it must also detect support for AVX2 by checking CPUID.(EAX=07H, ECX=0H):EBX.AVX2[bit 5]. The recommended pseudocode sequence for detection of AVX2 is:

----------------------------------------------------------------------------------------

INT supports_avx2() { ; result in eax mov eax, 1

cpuid

and ecx, 018000000H

cmp ecx, 018000000H; check both OSXSAVE and AVX feature flags jne not_supported

; processor supports AVX instructions and XGETBV is enabled by OS mov eax, 7

mov ecx, 0 cpuid

and ebx, 20H

cmp ebx, 20H; check AVX2 feature flags jne not_supported

mov ecx, 0; specify 0 for XFEATURE_ENABLED_MASK register XGETBV; result in EDX:EAX

and eax, 06H

cmp eax, 06H; check OS has enabled both XMM and YMM state support jne not_supported

mov eax, 1 jmp done

NOT_SUPPORTED: mov eax, 0

done:

}

-------------------------------------------------------------------------------

2-6

Ref. # 319433-011

APPLICATION PROGRAMMING MODEL

2.2.4Detection VEX-encoded GPR Instructions

VEX-encoded general-purpose instructions do not operate on YMM registers and are similar to legacy general-purpose instructions. Checking for OSXSAVE or YMM support is not required.

There are separate feature flags for the following subsets of instructions that operate on general purpose registers, and the detection requirements for hardware support are:

CPUID.(EAX=07H, ECX=0H):EBX.BMI1[bit 3]: if 1 indicates the processor supports the first group of advanced bit manipulation extensions (ANDN, BEXTR, BLSI, BLSMK, BLSR, TZCNT);

CPUID.(EAX=07H, ECX=0H):EBX.BMI2[bit 8]: if 1 indicates the processor supports the second group of advanced bit manipulation extensions (BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX);

CPUID.(EAX=07H, ECX=0H):EBX.INVPCID[bit 10]: if 1 indicates the processor supports the INVPCID instruction for system software that manages processor context ID.

CPUID.EAX=80000001H:ECX.LZCNT[bit 5]: if 1 indicates the processor supports the LZCNT instruction.

2.3FUSED-MULTIPLY-ADD (FMA) NUMERIC BEHAVIOR

FMA instructions can perform fused-multiply-add operations (including fused- multiply-subtract, and other varieties) on packed and scalar data elements in the instruction operands. Separate FMA instructions are provided to handle different types of arithmetic operations on the three source operands.

FMA instruction syntax is defined using three source operands and the first source operand is updated based on the result of the arithmetic operations of the data elements of 128-bit or 256-bit operands, i.e. The first source operand is also the destination operand.

The arithmetic FMA operation performed in an FMA instruction takes one of several forms, r=(x*y)+z, r=(x*y)-z, r=-(x*y)+z, or r=-(x*y)-z. Packed FMA instructions can perform eight single-precision FMA operations or four double-precision FMA operations with 256-bit vectors.

Scalar FMA instructions only perform one arithmetic operation on the low order data element. The content of the rest of the data elements in the lower 128-bits of the destination operand is preserved. the upper 128bits of the destination operand are filled with zero.

An arithmetic FMA operation of the form, r=(x*y)+z, takes two IEEE-754-2008 single (double) precision values and multiplies them to form an infinite precision intermediate value. This intermediate value is added to a third single (double) precision value (also at infinite precision) and rounded to produce a single (double) precision result.

Ref. # 319433-011

2-7

APPLICATION PROGRAMMING MODEL

Table 2-2 describes the numerical behavior of the FMA operation, r=(x*y)+z, r=(x*y)-z, r=-(x*y)+z, r=-(x*y)-z for various input values. The input values can be 0, finite non-zero (F in Table 2-2), infinity of either sign (INF in Table 2-2), positive infinity (+INF in Table 2-2), negative infinity (-INF in Table 2-2), or NaN (including QNaN or SNaN). If any one of the input values is a NAN, the result of FMA operation, r, may be a quietized NAN. The result can be either Q(x), Q(y), or Q(z), see Table 2-2. If x is a NaN, then:

Q(x) = x if x is QNaN or

Q(x) = the quietized NaN obtained from x if x is SNaN The notation for the output value in Table 2-2 are:

“+INF”: positive infinity, “-INF”: negative infinity. When the result depends on a conditional expression, both values are listed in the result column and the condition is described in the comment column.

QNaNIndefinite represents the QNaN which has the sign bit equal to 1, the most significand field equal to 1, and the remaining significand field bits equal to 0.

The summation or subtraction of 0s or identical values in FMA operation can lead to the following situations shown in Table 2-1

If the FMA computation represents an invalid operation (e.g. when adding two INF with opposite signs)), the invalid exception is signaled, and the MXCSR.IE flag is set.

Table 2-1. Rounding behavior of Zero Result in FMA Operation

x*y

z

(x*y) + z

(x*y) - z

- (x*y) + z

- (x*y) - z

 

 

 

 

 

 

 

 

+0 in all rounding

- 0 when

- 0 when

- 0 in all rounding

(+0)

(+0)

modes

rounding down,

rounding down,

modes

 

 

 

and +0 otherwise

and +0 otherwise

 

 

 

 

 

 

 

 

 

- 0 when

+0 in all rounding

- 0 in all rounding

- 0 when

(+0)

(-0)

rounding down,

modes

modes

rounding down,

 

 

and +0 otherwise

 

 

and +0 otherwise

 

 

 

 

 

 

 

 

- 0 when

- 0 in all rounding

+ 0 in all rounding

- 0 when

(-0)

(+0)

rounding down,

modes

modes

rounding down,

 

 

and +0 otherwise

 

 

and +0 otherwise

 

 

 

 

 

 

 

 

- 0 in all rounding

- 0 when

- 0 when

+ 0 in all rounding

(-0)

(-0)

modes

rounding down,

rounding down,

modes

 

 

 

and +0 otherwise

and +0 otherwise

 

 

 

 

 

 

 

 

 

- 0 when

2*F

-2*F

- 0 when

F

-F

rounding down,

 

 

rounding down,

 

 

and +0 otherwise

 

 

and +0 otherwise

 

 

 

 

 

 

 

 

2*F

- 0 when

- 0 when

-2*F

F

F

 

rounding down,

rounding down,

 

 

 

 

and +0 otherwise

and +0 otherwise

 

 

 

 

 

 

 

2-8

Ref. # 319433-011

APPLICATION PROGRAMMING MODEL

Table 2-2. FMA Numeric Behavior

x

y

z

r=(x*y)

r=(x*y)

r =

r=

Comment

(multiplicand)

(multiplier)

+z

-z

-(x*y)+z

-(x*y)-z

 

 

 

 

 

 

 

 

 

 

NaN

0, F, INF,

0, F,

Q(x)

Q(x)

Q(x)

Q(x)

Signal invalid

 

NaN

INF,

 

 

 

 

exception if x or

 

 

NaN

 

 

 

 

y or z is SNaN

 

 

 

 

 

 

 

 

0, F, INF

NaN

0, F,

Q(y)

Q(y)

Q(y)

Q(y)

Signal invalid

 

 

INF,

 

 

 

 

exception if y or

 

 

NaN

 

 

 

 

z is SNaN

 

 

 

 

 

 

 

 

0, F, INF

0, F, INF

NaN

Q(z)

Q(z)

Q(z)

Q(z)

Signal invalid

 

 

 

 

 

 

 

exception if z is

 

 

 

 

 

 

 

SNaN

 

 

 

 

 

 

 

 

INF

F, INF

+IN

+INF

QNaNIn-

QNaNIn-

-INF

if x*y and z

 

 

F

 

definite

definite

 

have the same

 

 

 

 

 

 

 

sign

 

 

 

 

 

 

 

 

 

 

 

QNaNIn-

-INF

+INF

QNaNIn-

if x*y and z

 

 

 

definite

 

 

definite

have opposite

 

 

 

 

 

 

 

signs

 

 

 

 

 

 

 

 

INF

F, INF

-INF

-INF

QNaNIn-

QNaNIn-

+INF

if x*y and z

 

 

 

 

definite

definite

 

have the same

 

 

 

 

 

 

 

sign

 

 

 

 

 

 

 

 

 

 

 

QNaNIn-

+INF

-INF

QNaNIn-

if x*y and z

 

 

 

definite

 

 

definite

have opposite

 

 

 

 

 

 

 

signs

 

 

 

 

 

 

 

 

INF

F, INF

0, F

+INF

+INF

-INF

-INF

if x and y have

 

 

 

 

 

 

 

the same sign

 

 

 

 

 

 

 

 

 

 

 

-INF

-INF

+INF

+INF

if x and y have

 

 

 

 

 

 

 

opposite signs

 

 

 

 

 

 

 

 

INF

0

0, F,

QNaNIn-

QNaNIn-

QNaNIn-

QNaNIn-

Signal invalid

 

 

INF

definite

definite

definite

definite

exception

 

 

 

 

 

 

 

 

0

INF

0, F,

QNaNIn-

QNaNIn-

QNaNIn-

QNaNIn-

Signal invalid

 

 

INF

definite

definite

definite

definite

exception

 

 

 

 

 

 

 

 

F

INF

+IN

+INF

QNaNIn-

QNaNIn-

-INF

if x*y and z

 

 

F

 

definite

definite

 

have the same

 

 

 

 

 

 

 

sign

 

 

 

 

 

 

 

 

 

 

 

QNaNIn-

-INF

+INF

QNaNIn-

if x*y and z

 

 

 

definite

 

 

definite

have opposite

 

 

 

 

 

 

 

signs

 

 

 

 

 

 

 

 

Ref. # 319433-011

2-9

APPLICATION PROGRAMMING MODEL

 

 

 

 

 

 

 

 

 

 

 

 

x

y

z

r=(x*y)

r=(x*y)

r =

r=

Comment

(multiplicand)

(multiplier)

+z

-z

-(x*y)+z

-(x*y)-z

 

 

 

 

 

 

 

 

 

 

F

INF

-INF

-INF

QNaNIn-

QNaNIn-

+INF

if x*y and z

 

 

 

 

definite

definite

 

have the same

 

 

 

 

 

 

 

sign

 

 

 

 

 

 

 

 

 

 

 

QNaNIn-

+INF

-INF

QNaNIn-

if x*y and z

 

 

 

definite

 

 

definite

have opposite

 

 

 

 

 

 

 

signs

 

 

 

 

 

 

 

 

F

INF

0,F

+INF

+INF

-INF

-INF

if x * y > 0

 

 

 

 

 

 

 

 

 

 

 

-INF

-INF

+INF

+INF

if x * y < 0

 

 

 

 

 

 

 

 

0,F

0,F

INF

+INF

-INF

+INF

-INF

if z > 0

 

 

 

 

 

 

 

 

 

 

 

-INF

+INF

-INF

+INF

if z < 0

 

 

 

 

 

 

 

 

0

0

0

0

0

0

0

The sign of the

 

 

 

 

 

 

 

result depends

0

F

0

0

0

0

0

on the sign of

 

 

 

 

 

 

 

F

0

0

0

0

0

0

the operands

 

 

 

 

 

 

 

 

 

 

 

 

 

 

and on the

 

 

 

 

 

 

 

rounding mode.

 

 

 

 

 

 

 

The product x*y

 

 

 

 

 

 

 

is +0 or -0,

 

 

 

 

 

 

 

depending on

 

 

 

 

 

 

 

the signs of x

 

 

 

 

 

 

 

and y. The sum-

 

 

 

 

 

 

 

mation/subtrac-

 

 

 

 

 

 

 

tion of the zero

 

 

 

 

 

 

 

representing

 

 

 

 

 

 

 

(x*y) and the

 

 

 

 

 

 

 

zero represent-

 

 

 

 

 

 

 

ing z can lead to

 

 

 

 

 

 

 

one of the four

 

 

 

 

 

 

 

cases shown in

 

 

 

 

 

 

 

Table 2-1.

 

 

 

 

 

 

 

 

0

0

F

z

-z

z

-z

 

 

 

 

 

 

 

 

 

0

F

F

z

-z

z

-z

 

 

 

 

 

 

 

 

 

2-10

Ref. # 319433-011

Соседние файлы в папке Лаб2012