Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Лаб2012 / 253665.pdf
Скачиваний:
33
Добавлен:
02.02.2015
Размер:
3.31 Mб
Скачать

CHAPTER 12

PROGRAMMING WITH

SSE3 AND SUPPLEMENTAL SSE3

The Pentium 4 processor supporting Hyper-Threading Technology introduces Streaming SIMD Extensions 3 (SSE3). The Intel Xeon processor 5100 series, Intel Core 2 processor families introduced Supplemental Streaming SIMD Extensions 3 (SSSE3). This chapter describes SSE3/SSSE3 and provides information to assist in writing application programs that use these extensions.

12.1SSE3/SSSE3 PROGRAMMING ENVIRONMENT AND DATA TYPES

The programming environment for using SSE3/SSSE3 is unchanged from that shown in Figure 3-1 and Figure 11-1. SSE3/SSSE3 do not introduce new data types. XMM registers are used to operate on packed integer data, single-precision floating-point data, or double-precision floating-point data.

One SSE3 instruction uses the x87 FPU for x87-style programming. There are two SSE3 instructions that use the general registers for thread synchronization. The MXCSR register governs SIMD floating-point operations. Note, however, that the x87FPU control word does not affect the SSE3 instruction that is executed by the x87 FPU (FISTTP), other than by unmasking an invalid operand or inexact result exception.

12.1.1SSE3/SSSE3 in 64-Bit Mode and Compatibility Mode

In compatibility mode, SSE3/SSSE3 function like they do in protected mode. In 64-bit mode, eight additional XMM registers are accessible. Registers XMM8-XMM15 are accessed by using REX prefixes.

Memory operands are specified using the ModR/M, SIB encoding described in Section 3.7.5.

Some SSE3 instructions may be used to operate on general-purpose registers. Use the REX.W prefix to access 64-bit general-purpose registers. Note that if a REX prefix is used when it has no meaning, the prefix is ignored.

Vol. 1 12-1

PROGRAMMING WITH SSE3 AND SUPPLEMENTAL SSE3

12.1.2Compatibility of SSE3/SSSE3 with MMX Technology, the x87 FPU Environment, and SSE/SSE2 Extensions

SSE3/SSSE3 do not introduce any new state to the Intel 64 and IA-32 execution environments.

For SIMD and x87 programming, the FXSAVE and FXRSTOR instructions save and restore the architectural states of XMM, MXCSR, x87 FPU, and MMX registers. The MONITOR and MWAIT instructions use general purpose registers on input, they do not modify the content of those registers.

12.1.3Horizontal and Asymmetric Processing

Many SSE/SSE2/SSE3/SSSE3 instructions accelerate SIMD data processing using a model referred to as vertical computation. Using this model, data flow is vertical between the data elements of the inputs and the output.

Figure 12-1 illustrates the asymmetric processing of the SSE3 instruction ADDSUBPD. Figure 12-2 illustrates the horizontal data movement of the SSE3 instruction HADDPD.

X1

X0

Y1

Y0

ADD

SUB

X1 + Y1

X0 -Y0

Figure 12-1. Asymmetric Processing in ADDSUBPD

12-2 Vol. 1

Соседние файлы в папке Лаб2012