Добавил:

Andrey Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Санкт-Петербургский государственный электротехнический университет "ЛЭТИ"

Предмет:

Электротехника

Файл:

Qiao S.Principles of floating point computation

.pdf

Скачиваний:

Добавлен:

23.08.2013

Размер:

174.96 Кб

Скачать

☆

<<< < Предыдущая 12 / 32 3 > Следующая >>>

CAS727, S. Qiao

Part 1 Page 11

codes abort when exceptions occur. We can install trap handlers that abort program when exceptions occur.

IEEE 754 speciﬁes that when an overﬂow or underﬂow trap handler is called, it is passed the wrapped around result as an argument. The deﬁnition of wrapped around for overﬂow is that the result is computed as if to inﬁnite precision, then divided by 2α, and then rounded to the relevant precision. For underﬂow, the result is multiplied by 2α. The exponent α is 192 for single precision and 1536 for double precision. For details of exceptions, ﬂags, and trap handlers, see [4].

Example 11 The computation of the product			n
Example 11 The computation of the product			i=1 xi can potentially over-
ﬂow or underﬂow. One solution is to	compute exp(					n	log x			). But this
ﬂow or underﬂow. One solution is to		Q				i=1			i
solution is less accurate and less e cient. Another					solution is to use trap
solution is less accurate and less e cient. Another					P	k		overﬂows, the
handler. A global counter is initialized to 0. If pk =						k
handler. A global counter is initialized to 0. If pk =						i=1 xi
				by 2α. If p						underﬂows,
counter is increased by one and the result is divided					Q				k

the counter is decreased by one and the result is multiplied by 2α. Thus the result is wrapped around back to range. When all multiplications are done, if the counter is zero, the ﬁnal result is pn, if the counter is positive, pn overﬂows, if the counter is negative, pn unerﬂows.

Why does IEEE 754 specify a ﬂag for each of these kinds of exception? Without ﬂags, detecting rare creations of ∞ and NaN before they disappear requires programmed tests and branches that, besides duplicating rests already performed by the hardware, slow down the program and impel a programmer to make decisions prematurely in many cases. With ﬂags, fewer tests and branches are necessary because they can be postponed to propitious points in the program. They almost never have to appear in innermost loops.

Using ﬂags is the only way to distinguish 1/0, a genuin inﬁnity, and overﬂow.

We show two examples of using ﬂags [4].

Example 12 To compute xn where n is an integer, we write the following function PositivePower(x, n) for positive n.

PositivePower(x, n) { while (n is even) {

x = x x; n = n/2;

}

CAS727, S. Qiao

Part 1 Page 12

u = x;

while (true) { n = n/2;

if (n ≡ 0) return u; x = x x;

if (n is odd) u = u x

}

When n < 0, PositivePower(1/x, −n) is not accurate. Instead, 1/PositivePower(x, −n) should be used. The problem is that when x−n underﬂows, the underﬂow trap handler is called or the underﬂow ﬂag is set, either case is incorrect. Also, x−n underﬂows, xn may overﬂow or in range. The solution is as follows. We ﬁrst disable underﬂow and overﬂow traps and save underﬂow and overﬂow ﬂag status. Then we compute 1/PositivePower(x, −n). If neither underﬂow nor overﬂow ﬂag is set, restore those ﬂags and enable traps, else, restore those ﬂags and compute PositivePower(1/x, −n), which causes correct exceptions to occur.

Example 13 We compute
s
	1	+ x
arccosx = 2arctan	1	− x	.

When x = −1, if arctan(∞) returns π/2, then we get correct result arccos(−1) = π. The problem, however, is that when x = −1, (1 − x)/(1 + x) causes divide-by-zero and raises the divide-by-zero ﬂag. The solution is simple. We save the divide-by-zero ﬂag before the computation and restore it after the computation.

4Error Measurements

Suppose xˆ is an approximation of x, for example, xˆ is the computed value and x is the exact answer. How do we measure the error in xˆ?

The absolute error is deﬁned as Eabs(ˆx) = |x − xˆ|. Obviously, the size of absolute error depends on the size of x. Thus the relative error deﬁned as

Erel(ˆx) = |x − xˆ|/|x| is independent of the size of x. From this deﬁnition, we can see that if xˆ = x(1 + ρ) then |ρ| = Erel(ˆx) and |ρ| |x| = Eabs. Relative error can be used to determine the number of correct signiﬁcant digits. For example, if xˆ = 1.0049 is an approximation of x = 1.0000, then

CAS727, S. Qiao

Part 1 Page 13

Erel = 4.9 ×10−3 , which indicates that xˆ agrees with x to three but not four digits.

Unit of roundo , usually denoted by u, is the most useful quantity associated with a ﬂoating-point number and is ubiquitous in the world of rounding error analysis. The unit roundo is given by

u = 12 β1−t,

recalling that t is the machine precision in terms of the number of digits. Suppose f l(x + y) is the ﬂoating-point addition of x and y, then the IEEE standard requires that f l(x + y) is the same as x + y rounded to the nearest ﬂoating-point number. In other words,

f l(x + y) = (x + y)(1 + δ) |δ| ≤ u.

Another error measurement useful in measuring the error in a computed result is the unit of the last place (ulp). The name itself explains its meaning. For example, if xˆ = d0.d1 · · · dt−1 × βe is a computed result, then one ulp of xˆ is βe−t+1.

5Sources of Errors

Due to ﬁnite precision arithmetic, a computed result must be rounded to ﬁt storage format. Consequently, rounding errors are unavoidable. The IEEE standard requires that for an arithmetic operation op = +, −, , /, we have

f l(x op y) = (x op y)(1 + δ) |δ| ≤ u.

When an inﬁnite series is approximated by a ﬁnite sum, truncation error is introduced. For example, if we use

1 + x +

+ · · · +

to approximate

ex = 1 + x +

+ · · · +

+ · · · ,

then the truncation error is

xn+1

xn+2

(n + 1)! + (n + 2)! + · · · .

≈ f 0(x).

CAS727, S. Qiao			Part 1 Page 14

	h	yh(1)	error
	−1	2.85884195487388
	10	2.85884195487388	1.40560126415e − 1
	−		1.40560126415e − 1
	10 2	2.73191865578708	1.36368273280e − 2
	−		1.36368273280e − 2
	10 3	2.71964142253278	1.35959407373e − 3
	−		1.35959407373e − 3
	10 4	2.71841774707848	1.35918619434e − 4
	−		1.35918619434e − 4
	10 5	2.71829541991231	1.35914532646e − 5
	−		1.35914532646e − 5
	10 6	2.71828318698653	1.35852748429e − 6
	−		1.35852748429e − 6
	10 7	2.71828196396484	1.35505794585e − 7
	−		1.35505794585e − 7
	10 8	2.71828177744737	−5.10116753283e − 8
	−		−5.10116753283e − 8
	10 9	2.71828159981169	−2.28647355716e − 7
	−		−2.28647355716e − 7
	10 10	2.71827893527643	−2.89318261570e − 6
	−		−2.89318261570e − 6
	10 11	2.71827005349223	−1.17749668154e − 5
	−		−1.17749668154e − 5
	10 12	2.71827005349223	−1.17749668154e − 5
	−		−1.17749668154e − 5
	10 13	2.71338507218388	−4.89675627517e − 3
	−		−4.89675627517e − 3
	10 14	2.66453525910038	−5.37465693587e − 2
	−		−5.37465693587e − 2
	10 15	2.66453525910038	−5.37465693587e − 2

Table 6: Values of yh(1) and errors using various sizes of h

When a continous problem is approximated by a discrete one, discretization error is introduced. For example, from the expansion

0		h2 00
f (x + h) = f (x) + hf	(x) +		f	(ξ), for some ξ [x, x + h],
		2!

we can use the following approximation:

yh(x) = f (x + h) − f (x) h

The discretization error is Edis = |f 00(ξ)|h/2.

Note that both truncation error and discretization error have nothing to do with computation. If the arithmetic is perfect (no rounding errors), the discretization error decreases as h decreases. In practice, however, rounding errors are unavoidable. Consider the above example and let f (x) = ex. We computed yh(1) on a SUN Sparc V in MATLAB 5.20.

Table 6 shows	that as h decreases the error ﬁrst decreases and then
increases. This is	because of the combination of discretization error and
rounding errors. In this example, the discretization error is
		h		00		h 1+h			h
Edis =			\|f		(ξ)\| ≤		e	≈		e for small h.
		2				2			2

CAS727, S. Qiao				Part 1 Page 15
Now, we consider the rounding errors. Let the computed yh(x) be
yˆh(x) =	f l	ex+hh− ex	!			(1)
=	(e(x+h)(1+δ0 )(1 + δ1) − ex(1 + δ2))(1 + δ3)			(1 + δ4)		(2)
			h
≈	ex+h(1 + δ0 + δ1 + δ3 + δ4) − ex(1 + δ2 + δ3 + δ4)				,	(3)
≈			h

for |δi| ≤ u (i = 0, 1, 2, 3, 4). In the above derivation, we assume that δi are small so that we ignore the terms like δiδj or higher order. We also assume that ex is computed accurately, i.e., f l(ex) = ex(1 + δ) where |δ| ≤ u. Thus

we have the rounding error	ξ1ex+hh− ξ2ex			for \|ξ1\| ≤ 4u and \|ξ2\| ≤ 3u.
Eround = \|yh(x) − yˆh(x)\| ≈	ξ1ex+hh− ξ2ex			for \|ξ1\| ≤ 4u and \|ξ2\| ≤ 3u.



When x = 1, we have		7u
	Eround ≈	7u	e.

		h

So the rounding error increases as h decreases. Combining both errors, we get the total error:

Etotal = Edis + Eround ≈	h	+	7u	e.

	2		h

Figure 3 plots Etotal.

To minimize the total error, we di erentiate Etotal with respect to h and set the derivative to zero and get the optimal h:

√√

hopt = 14u ≈ u.

6Forward and Backward Errors

Suppose a program takes an input x and computes y, we can view the output y as a function of the input x, y = f (x). Denote the computed result as yˆ, then the absolute error |y − yˆ| and the relative error |y − yˆ|/|y| are called forward errors. Alternatively, we can ask: “For what set of data have we solved our problem?”. That is, the computed result yˆ is the exact result for the input x + x, i.e., yˆ = f (x + x). In general, there may be many such

x, so we are interested in minimal such x and a bound for | x|. This bound, possibly divided by |x|, is called backward error.

CAS727, S. Qiao

Part 1 Page 16

	x 10−5
	2
	1.8
	1.6
	1.4
ERROR	1.2
ERROR	1
TOTAL	1
TOTAL	0.8
	0.8
	0.6
	0.4
	0.2
	0
	10−10	10−9	10−8	10−7	10−6	10−5
				H

Figure 3: Total Error

For example, the IEEE standard requires that

√√

f l( x) = x(1 + δ) |δ| ≤ u.

Then the relative error or the forward error is |δ|, which is bounded by u.

What is the backward error? Let

√ √ √

f l( x) = x(1 + δ) = x + x,

then x = 2xδ + xδ2. Thus, ignoring δ2, we have the backward error:

| x|/|x| ≈ 2|δ| ≤ 2u.

The process of bounding the backward error is called backward error analysis. The motivation is to interpret rounding errors as perturbation in the data. Consequently, it reduces the question of estimating the forwarding error to perturbation theory. We will see its signiﬁcance in the following sections.

To illustrate the forward and backward errors, let us consider the computation of xˆ −yˆ , where xˆ and yˆ can be previously computed results. Assume that x and y are exact results and xˆ = x(1 + δx) and yˆ = y(1 + δy ), then

f l(ˆx − yˆ) = (ˆx − yˆ)(1 + δ) |δ| ≤ u.

It then follows that f l(ˆx − yˆ) = x(1 + δx)(1 + δ) − y(1 + δy )(1 + δ). Ignoring the second order terms δxδ and δy δ and letting δ1 = δx + δ and δ2 = δy + δ, we get

f l(ˆx − yˆ) = x(1 + δ1) − y(1 + δ2).

CAS727, S. Qiao

Part 1 Page 17

If |δx| and |δy | are small, then |δ1|, |δ2| ≤ u + max(|δx|, |δy |) are also small i.e., the backward errors are small. However, the forward error (relative error)

Erel =	\|f l(ˆx − yˆ) − (x − y)\|	=	\|xδ1 − yδ2\|	.
	\|x − y\|		\|x − y\|

If δ1 6= δ2 i.e., δx 6= δy , it is possible that Erel is large when |x − y| is small, i.e., x and y are close to each other. This is called catastrophic cancellation.

If δx = δy , in particular, if both x and y are original data (δx = δy = 0), then Erel = δ. This is called benign cancellation. The following example illustrates the di erence between the two cancellations.

Example 14 Consider the computation of x2 − y2 in our small ﬂoatingpoint number system. Suppose x = 1.11 and y = 1.10 and assume the nearest

rounding, then f l(x

x) = 1.10

(error 2−4) and f l(y

y) = 1.00

even

−

×−

(error 2

2). Thus f l(x x − y y) = 1.00. The exact result is 1.101 × 2

and the

error is 0.0011 and E = 0.00111. However, f l((x

−

(x + y)) =

−2

rel

−1

. The error in f l(x

f l(1.00

2 ) = 1.10

−

y) is 0 and the

× 2 1.10 ×−

error in f l(x + y) is 2

3. Now the total error is 0.0001 and Erel = 0.000101.

7Instability of Certain Algorithms

A method for computing y = f (x) is called backward stable if, for any x, it produces a computed yˆ with a small backward error, that is, yˆ = f (x + x) for some small x. Usually there exist many such x. We are interested in the smallest. If x turns out to be large, then the algorithm is unstable.

Example 15 Suppose β = 10 and t = 3. Consider the following system:

Ax = b, where A =	.001	1.00	! and b =		1.00	! .
Ax = b, where A =	1.00	.200	! and b =	−	3.00	! .
				−

Applying Gaussian elimination (without pivoting), we get the computed decomposition

b b	1.00	0	!	.001	1.00		! .
		1.00		0	−	1000
LU = 1000
The computed solution		xˆ =	1.00	!

			0

CAS727, S. Qiao						Part 1 Page 18
is the exact solution of the perturbed system
	(A +	A)ˆx = b.
Solving for	A, we get					! .
	A =	×			3.2	! .
		×	−		0
		×	−
The smallest	A is	0		3.2		! ,
	A =	0		3.2		! ,
		0	−	0
			−

which is of the same size as A.	This	means that Gaussian elimination
(without pivoting) is unstable. Note that		the exact solution is
	1.0032		!
x =	−3.2 · · ·		.
		· · ·

8Sensitivity of Certain Problems

Let us start with the problem of solving a system of linear equations:

Ax = b

where A and b are known variables and x is the result. The question is: How sensitive is x to the change in A and/or b?. We can assume that the change is only in b since the change in A can be transformed into the change in b. Let x˘ be the solution of the perturbed system:

Ax˘ = b + b.

The change in x (relative error) is kx˘ − xk/kxk and the change in b is k bk/kbk. We use the ratio of the two errors as the measurement of the sensitivity, called condition number:

cond =	kx˘ − xk/kxk	=	kA−1 bk	kAxk		A−1	A .
	k bk/kbk		kxk	· k bk	≤ k
							k k k

So kA−1k kAk is the condition number of the problem of solving a linear system. In general, we can view a problem with data x and result y as a

CAS727, S. Qiao								Part 1 Page 19
function y = f (x). The result of the perturbed problem is y˘ = f (x +										x).
The sensitivity is measured by
cond =	\|y˘ − y\|/\|y\|	=	\|f (x +	x) − f (x)\|	\|x\|	≈ \|	f 0(x)	\|x\|	.	(4)
	\| x\|/\|x\|			\| x\|	\|f (x)\|	≈ \|	\|	\|f (x)\|

Note that the conditioning of a problem is independent of rounding errors and algorithms for solving the problem. The following example is due to Wilkinson(see [8].

Example 16 Let p(x) = (x−1)(x−2)...(x−19)(x−20) = x20 −210x19 +....

The zeros of p(x) are 1, 2, ..., 19, 20 and well separated. With the ﬂoatingpoint number system of β = 2, t = 30 we enter a typical coe cient into the computer, it is necessary to round it to 30 siginiﬁcant base-2 digits. If we make a change in the 30th signiﬁcant base-2, only one of the twenty coe cients, the coe cient of x19, is changed from −210 to −210 + 2−23 . Let us see how much e ect this small change has on the zeros of the polynomial. Here we list (using β = 2, t = 90) the roots of the equation p(x) + 2−23x19 = 0, correctly rounded to the number of digits:

1.00000 0000 10.09526 6145 ± 0.64350 0904i 2.00000 0000 11.79363 3881 ± 1.65232 9728i 3.00000 0000 13.99235 8137 ± 2.51883 0070i 4.00000 0000 16.73073 7466 ± 2.81262 4894i 4.99999 9928 19.50243 9400 ± 1.94033 0347i 6.00000 6944 6.99969 7234 8.00726 7603 8.91725 0249

20.84690 8101

Note the small change in the coe cient −210 has caused ten of the zeros to become complex and that two have moved more than 2.81 units o the real axis. That means the zeros of p(x) are very sensitive to the change in coe cients. The results were computed under a very accurate computation. They did not get any side e ects from rounding errors, and nor is it a problem that the algorithm used solve this problem make some ill-e ects on the results. Actually the problem is the matter of sensitivity itself.

As discussed before, backward error analysis transforms rounding errors into perturbations of data. Thus we can establish a relation between forward

CAS727, S. Qiao

Part 1 Page 20

and backward errors and the conditioning of the problem. Clearly, (4) shows that

Eforward ≤ cond · Ebackward.

This inequality tells us that large forward errors can be caused by either ill-conditioning of the problem or unstable algorithm, or both. The significance of backward error analysis is that it allows us to determine whether an algorithm is stable (small backward errors). If we can prove the algorithm is stable, then we know that large forward errors are due to the ill-conditioning of the problem. On the other hand, if we know the problem is well-conditioned, then large forward errors must be caused by unstable algorithm.

9Machine Parameters

As shown in the previous sections, the behavior of a numerical software is dependent on a set of machine parameters such as base β, precision t, minimum exponent emin, and maximum exponent emax. A program paranoia originally written by Kahan investigates a computer’s ﬂoating-point arithmetic. There are Basic, C, Modula, Pascal, and Fortran versions available

[2].The following is a list of parameters tested by paranoia.

•Radix: The base of the computer number system, such as 2, 10, 16.

•Precision: The number of signiﬁcant digits of radix.

•Closest relative separation: U 1 = Radix−Precision = One Ulp of numbers a little less than 1.0.

•Adequacy of guard digits for multiplication, division, subtraction and addition: In IEEE 754 there is an extra hardware bit called guard digit on the right of fraction during intermdiate caculation to help rounding acurately, see [3] for detail.

•Is rounding on Mulitiply, divide, and add/subtract correct?

•Is sticky bit used correctly for rounding? The sticky bit allows the computer to see the di erence between 0.50...00ten and 0.50...01ten when rounding, see [3] for detail.

•Seeking underﬂow threshold Ufhold. It is related to 2emin . Below this value calculation may su er large relative error than merely round o . Also, seeking the smallest strictly positive number E0.

<<< < Предыдущая 12 / 32 3 > Следующая >>>

Соседние файлы в предмете Электротехника

#
23.08.201331.07 Mб15Printed circuits handbook.Ed.Coombs C.F.2001.pdf
#
23.08.20132.04 Mб60Profibus specification 1.0.1998.PDF
#
23.08.201317.3 Mб14Pruitt J.Teach yourself Gimp in 24 hours.1999.pdf
#
23.08.2013195.66 Кб8Pustejovsky J.The specification language TimeML.pdf
#
23.08.2013152.62 Кб18Pyle W.Making electricity with hydrogen.1993.pdf
#
23.08.2013174.96 Кб22Qiao S.Principles of floating point computation.pdf
#
23.08.2013175.05 Кб18Quan D.User interfaces for supporting multiple categorization.pdf
#
23.08.2013298.24 Кб21Rackham S.AsciiDoc user guide Rev7.1.2.2006.pdf
#
23.08.20134.45 Mб122Raisanen A.V.Radio engineering for wireless communication and sensor applications.2003.pdf
#
23.08.2013945.08 Кб17Raju V.'Hello world' with an AVR.pdf
#
23.08.2013357.78 Кб16Ramsey N.The C-- language specification.V2.0.2005.pdf