IEEE floating point representation
.pdf
Q: IEEE single precision:
(i) How is 2. represented ??
0 10000000 00000000000000000000000
(ii) What is the next biggest IEEE single precision number larger than 2 ??
0 10000000 00000000000000000000001
(iii) What is the gap between 2 and the rst IEEE single precision number larger than 2?
|
2−23 × 2 = 2−22. |
|
. |
|
. |
General Result: Let x = m×2E be a normalized single precision number, with 1 ≤ m < 2. The gap between x and the next largest single precision number is
× 2E .
11
Rounding
We use \IEEE Computer Floating Point Numbers" (IEEE CFPNs, or just Floating Point Numbers) to include ±0, subnormal & normalized CFPNs, & ±∞ in a given format, e.g. single. These form a nite set.
If Nmin & Nmax are the minimum & maximum positive normalized such numbers, we say the
binary representation x = m × 2E
of a general real number is \normalized" if
Nmin ≤ |x| ≤ Nmax, and 1 ≤ m < 2.
Q: For any number x which is not a floating point number, what are two obvious choices for the floating point approximation to x ??
x− closest floating point number less than x,
x+ the closest greater than x.
12
Rounding, ctd.
Signi cand b0.b1b2, exponent E {−1, 0, 1}:
|
- |
. . . . . .0 1 x2 3
Rounding in the Toy System.
e.g. if x = 1.7, then x− = 1.5 and x+ = 1.75.
Using IEEE single precision, if
x = (b0.b1b2 . . . b23b24b25 . . .)2 × 2E ,
is positive and normalized, then
x− = (b0.b1b2 . . . b23)2 × 2E ,
is obtained by discarding b24, b25, etc.
An algorithm for x+ is more complicated since it may involve some bit \carries". If x is negative, the situation is reversed: x+ is obtained by dropping bits b24, b25, etc.
13
Correctly Rounded Arithmetic
The IEEE standard de nes the
correctly rounded value of x, round(x) . If x is a floating point number, round(x) = x. Otherwise round(x) depends on the rounding mode in e ect:
• Round down: |
round(x) = x−. |
• Round up: |
round(x) = x+. |
•Round towards zero:
round(x) is either x− or x+, whichever is between zero and x.
•Round to nearest: round(x) is either x− or x+, whichever is nearer to x. In the case of a tie, the one with its least signi cant bit equal to zero is chosen.
14
Correctly Rounded Arithmetic, ctd.
If x is positive, then x− is between zero and x, so round down and round towards zero have the same e ect. Round towards zero simply requires truncating the binary expansion, i.e. discarding bits.
The most useful mode is round to nearest.
Signi cand b0.b1b2, exponent E {−1, 0, 1}:
|
- |
. . . . . .0 1 x2 3
Rounding in the Toy System
e.g. if x = 1.7, then x− = 1.5 and x+ = 1.75, so with x = 1.7, this gives a \rounded" value of x equal to 1.75
\Round" with no quali cation means \round to nearest".
15
Absolute Rounding Error
|round(x)−x| is called the absolute rounding error associated with x, (value depends on mode). For all modes the absolute rounding error is less than the gap between x− and x+.
For general normalized x > 0,
x = (b0.b1b2 . . . b23b24b25 . . .)2 × 2E , b0 = 1.
IEEE single x− = (b0.b1b2 . . . b23)2 × 2E .
So for any mode |round(x) − x| < 2−23 × 2E .
In general for any rounding mode:
|round(x) − x| < × 2E , |
(1) |
Q: (i) For round towards zero, could the absolute rounding error equal × 2E ??
(ii) Does (1) hold if x is subnormal, i.e. E = −126 and b0 = 0 ??
16
Relative Rounding Error
The relative rounding error is de ned to be
δ |
≡ |
round(x) |
− |
1 = round(x) − x. |
||
|
x |
|
x |
|
||
Since |round(x)−x| < ×2E , & for normalized numbers
x = ±m × 2E , |
where m ≥ 1, |
|||||
we have, for all rounding modes, |
|
|||||
δ |
| |
< |
× 2E |
= . |
(2) |
|
| |
|
|
2E |
|
||
Q: Does (2) hold if x is subnormal, |
|
|||||
i.e. E = −126 and b0 = 0 ?? Why |
?? |
|||||
Note for any normalized x
round(x) = x(1 + δ), |δ| < .
17
An Important Idea
From the de nition of δ we see
round(x) = x(1 + δ),
so the rounded value of an arbitrary normalized number x is equal to x(1 + δ), where, regardless of the rounding mode,
|δ| < .
This is very important, because you can think of the stored value of x as not exact, but as exact within a factor of 1 + . IEEE single precision numbers are good to a factor of about 1 + 10−7, which means that they have about 7 accurate decimal digits.
18
Special Case of Round to Nearest
For round to nearest, the absolute rounding error can be no more than half the gap between x− and x+. This means in IEEE single, for all |x| ≤ Nmax:
|round(x) − x| ≤ 2−24 × 2E ,
and in general
|round(x) − x| ≤ 1 × 2E . 2
The previous analysis for round to nearest then gives for normalized x:
round(x) = x(1 + δ),
|δ| ≤ |
21 |
× 2E |
= |
1 |
. |
|
2E |
2 |
19
