Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Лаб2012 / 319433-011.pdf
Скачиваний:
27
Добавлен:
02.02.2015
Размер:
2.31 Mб
Скачать

INTEL® ADVANCED VECTOR EXTENSIONS

1.5FUNCTIONAL OVERVIEW

Intel AVX and FMA provide comprehensive functional improvements over previous generations of SIMD instruction extensions. The functional improvements include:

256-bit floating-point arithmetic primitives: AVX enhances existing 128-bit floating-point arithmetic instructions with 256-bit capabilities for floating-point processing. FMA provides additional set of 256-bit floating-point processing capabilities with a rich set of fused-multiply-add and fused multiply-subtract primitives.

Enhancements for flexible SIMD data movements: AVX provides a number of new data movement primitives to enable efficient SIMD programming in relation to loading non-unit-strided data into SIMD registers, intra-register SIMD data manipulation, conditional expression and branch handling, etc. Enhancements for SIMD data movement primitives cover 256-bit and 128-bit vector floatingpoint data, and 128-bit integer SIMD data processing using VEX-encoded instructions.

Several key categories of functional improvements in AVX and FMA are summarized in the following subsections.

1.5.1256-bit Floating-Point Arithmetic Processing Enhancements

Intel AVX provides 35 256-bit floating-point arithmetic instructions. The arithmetic operations cover add, subtract, multiply, divide, square-root, compare, max, min, round, etc., on single-precision and double-precision floating-point data.

The enhancement in AVX on floating-point compare operation provides 32 conditional predicates to improve programming flexibility in evaluating conditional expressions.

FMA provides 36 256-bit floating-point instructions to perform computation on 256bit vectors. The arithmetic operations cover fused multiply-add, fused multiplysubtract, fused multiply add/subtract interleave, signed-reversed multiply on fused multiply-add and multiply-subtract.

1.5.2256-bit Non-Arithmetic Instruction Enhancements

Intel AVX provides new primitives for handling data movement within 256-bit floating-point vectors and promotes many 128-bit floating data processing instructions to handle 256-bit floating-point vectors.

AVX includes 39 256-bit data processing instructions that are promoted from previous generations of SIMD instruction extensions, ranging from logical, blend, convert, test, unpacking, shuffling, load and stores.

AVX introduces 18 new data processing instructions that operate on 256-bit vectors. These new primitives cover the following operations:

Ref. # 319433-011

1-5

INTEL® ADVANCED VECTOR EXTENSIONS

Non-unit-stride fetching of SIMD data. AVX provides several flexible SIMD floating-point data fetching primitives:

broadcast of single or multiple data elements into a 256-bit destination,

masked move primitives to load or store SIMD data elements conditionally,

Intra-register manipulation of SIMD data elements. AVX provides several flexible SIMD floating-point data manipulation primitives:

insert/extract multiple SIMD floating-point data elements to/from 256-bit SIMD registers

permute primitives to facilitate efficient manipulation of floating-point data elements in 256-bit SIMD registers

Branch handling. AVX provides several primitives to enable handling of branches in SIMD programming:

new variable blend instructions supports four-operand syntax with nondestructive source syntax. This is more flexible than the equivalent SSE4 instruction syntax which uses the XMM0 register as the implied mask for blend selection.

Packed TEST instructions for floating-point data.

1.5.3Arithmetic Primitives for 128-bit Vector and Scalar processing

Intel AVX provides 131 128-bit numeric processing instructions that employ VEXprefix encoding. These VEX-encoded instructions generally provide the same functionality over instructions operating on XMM register that are encoded using SIMD prefixes. The 128-bit numeric processing instructions in AVX cover floating-point and integer data processing across 128-bit vector and scalar processing.

The enhancement in AVX on 128-bit floating-point compare operation provides 32 conditional predicates to improve programming flexibility in evaluating conditional expressions. This contrasts with floating-point SIMD compare instructions in SSE and SSE2 supporting only 8 conditional predicates.

FMA provides 60 128-bit floating-point instructions to process 128-bit vector and scalar data. The arithmetic operations cover fused multiply-add, fused multiplysubtract, signed-reversed multiply on fused multiply-add and multiply-subtract.

1.5.4Non-Arithmetic Primitives for 128-bit Vector and Scalar Processing

Intel AVX provides 126 data processing instructions that employ VEX-prefix encoding. These VEX-encoded instructions generally provide the same functionality over instructions operating on XMM register that are encoded using SIMD prefixes.

1-6

Ref. # 319433-011

INTEL® ADVANCED VECTOR EXTENSIONS

The 128-bit data processing instructions in AVX cover floating-point and integer data movement primitives.

Additional enhancements in AVX on 128-bit data processing primitives include 16 new instructions with the following capabilities:

Non-unit-strided fetching of SIMD data. AVX provides several flexible SIMD floating-point data fetching primitives:

broadcast of single data element into a 128-bit destination,

masked move primitives to load or store SIMD data elements conditionally,

Intra-register manipulation of SIMD data elements. AVX provides several flexible SIMD floating-point data manipulation primitives:

permute primitives to facilitate efficient manipulation of floating-point data elements in 128-bit SIMD registers

Branch handling. AVX provides several primitives to enable handling of branches in SIMD programming:

new variable blend instructions supports four-operand syntax with nondestructive source syntax. Branching conditions dependent on floating-point data or integer data can benefit from Intel AVX. This is more flexible than non-VEX encoded instruction syntax that uses the XMM0 register as implied mask for blend selection. While variable blend with implied XMM0 syntax is supported in SSE4 using SIMD prefix encoding, VEX-encoded 128-bit variable blend instructions only support the more flexible four-operand syntax.

Packed TEST instructions for floating-point data.

1.5.5AVX2 and 256-bit Vector Integer Processing

AVX2 promotes the vast majority of 128-bit integer SIMD instruction sets to operate with 256-bit wide YMM registers. AVX2 instructions are encoded using the VEX prefix and require the same operating system support as AVX. Generally, most of the promoted 256-bit vector integer instructions follow the 128-bit lane operation, similar to the promoted 256-bit floating-point SIMD instructions in AVX.

Newer functionalities in AVX2 generally fall into the following categories:

Fetching non-contiguous data elements from memory using vector-index memory addressing. These “gather” instructions introduce a new memoryaddressing form, consisting of a base register and multiple indices specified by a vector register (either XMM or YMM). Data elements sizes of 32 and 64-bits are supported, and data types for floating-point and integer elements are also supported.

Cross-lane functionalities are provided with several new instructions for broadcast and permute operations. Some of the 256-bit vector integer instructions promoted from legacy SSE instruction sets also exhibit cross-lane behavior, e.g. VPMOVZ/VPMOVS family.

Ref. # 319433-011

1-7

Соседние файлы в папке Лаб2012