Approximate Numbers

Steven Pemberton (19870211 updated 20020218).

There are two kinds of numbers in ABC: exact and approximate. Exact numbers are held internally as a fraction of two (unbounded-length) integers, reduced to their lowest common multiple. You can retuen the two numerator and denominator of these fractions with the operators */ and /*. You can convert exact numbers to approximate ones with the operator ~ and vice versa with round, floor and ceiling.

Exact numbers are the result of computations where the result can be computed exactly: addition, subtraction, multiplication, division, and some exponentiations. However, some operations, such as taking the square root, can't deliver an exact result in general, and deliver an approximate number instead.

Of course, some exact numbers (like 4), have exact square roots, but if you want exact results in these cases you have to write your own function to calculate them, something like:

HOW TO RETURN xroot x:
   IF exact x:
      PUT root */x, root /*x IN a, b
      PUT (round a)/(round b) IN r
      IF r*r = x: RETURN r
   RETURN root x

For approximate numbers, the ABC implementations use the hardware representation of floating-point, and this is the only place where ABC programs can deliver different results on different machines (apart from the workings of the random generator): some machines have greater floating-point accuracy than others.

Machines hold floating-point numbers as an encoding of a pair of numbers (f, e), where f, the fraction, has a fixed number of digits to some base b. Such a pair then represents the number f × b e.

To find out how much accuracy your machine has, you can use the following command (adapted from a routine by Malcolm (reference 1)) to find the base and number of significant digits used on your machine:

HOW TO PRINT ACCURACY:
    PRINT BASE
    PRINT FRACTION
PRINT BASE:
    PUT ~2, ~2, ~0 IN a, b, base
    WHILE a+1-a-1 = ~0: PUT a+a IN a
    WHILE base = ~0: PUT a+b-a, b+b IN base, b
    WRITE "Base =", base /
PRINT FRACTION:
    PUT 1, base IN sig, b
    WHILE b+1-b-1 = ~0: PUT sig+1, b*base IN sig, b
    WRITE "Significant digits =", sig /
    WRITE "This is about `decimal` decimal digits" /
decimal:
    RETURN 1 round (sig/(base log 10))

It works by first looking for a whole number that doesn't produce a different number when you add 1 to it. For instance, suppose the representation we are examining has 4 digits to base 10. Then you can add 1 to all 4 digit numbers, but not to 5 digit numbers, so it will try in turn 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, and 8192. Finally it will try 16384, which is only representable as 1638 × 10 1, which is 16380.

The routine then searches for a number that can be added to it which does produce a different number. Exactly which number it finds depends on whether the machine rounds the last digit of a calculation, or just chops excess digits off. So it tries to add 2, 4, and 8. If the machine rounds then 16380 + 8 = 16388, which rounded to 4 places is 16390. However, if the machine chops, then 16388 chopped is 16380, so then it tries to add 16, which gives 16396, and chopped this gives 16390, so in either case we get the same result from the two types of machine.

Now subtracting these two values, 16390 - 16380, gives the base used, in this case 10.

Having found the base, the routine then goes on to find the number of significant digits by repeatedly multiplying by the base until again 1 cannot be added. In our case, it will try 1, 10, 100, 1000, and finally 10000 to which 1 cannot be added. From the number of times that the multiplication is done, the number of significant digits can be deduced.

Here is an example of its output, for a VAX:

>>> PRINT ACCURACY
Base = 2
Significant digits = 56
This is about 16.9 decimal digits

The Consequences of Approximate Numbers

The advantage of using approximate numbers in ABC is that for large numbers they are generally faster to calculate than the equivalent exact numbers. The big disadvantage is of course that your results are only approximately correct. But how approximate is that?

To give an example, consider the value 1/((sqrt x) - (sqrt (x-1))). This is the same as (sqrt x) + (sqrt (x-1)) (you just multiply all through by ((sqrt x) + (sqrt (x-1))) / ((sqrt x) + (sqrt (x-1)))).

However, if you calculate these using approximate numbers, you will get different results. For instance:

HOW TO COMPARE VALUES FOR x:
    PUT root x, root (x-1) IN r, r1
    PUT 1/(r-r1), r+r1 IN u, v
    WRITE "1/((root x)-(root (x-1)))=", u /
    WRITE "(root x)+(root (x-1))=    ", v /
    WRITE "Difference =", u-v /
>>> COMPARE VALUES FOR 10**12
1/((root x)-(root (x-1)))= 2000042.97959778
(root x)+(root (x-1))=     1999999.9999995
Difference = 42.9795982764044
>>> COMPARE VALUES FOR 10**15
1/((root x)-(root (x-1)))= 63161283.7647059
(root x)+(root (x-1))=     63245553.2033676
Difference = -84269.4386616889

This is on a machine with nearly 17 decimal digits of accuracy, and yet the two results differ already in the third digit in the second example! The explanation for this is that if you subtract two very close numbers, you lose a lot of accuracy. Try doing it for x = 256 with 4 digits of accuracy: root 256 = 16.00, and root 255 = 15.97. Subtracting these two gives 0.03, leaving only one digit of accuracy where before you had four. The inverse of this is 33.33, while the sum of the two is 31.97, a very different value (and as close as you can get to the real value with 4 digits).

So the question arises, in a given program, how can you tell how close a result is to the real answer? The rest of this article presents a package to help answer this question.

Range Arithmetic

This package simulates approximate arithmetic, and keeps for each arithmetic value an upper and lower bound on its real value. For instance, using 4 digits, pi lies between 3.141 and 3.142.

Then for each operation, it calculates the upper and lower bounds of the result. For instance, when adding two values (l , u) and (l' , u'), the result is (rounded.down(l + l'), rounded.up(u + u')). For pi + pi, this would give (6.282, 6.284). In ABC:

HOW TO RETURN a plus b:
    PUT a, b IN (la, ua), (lb, ub)
    RETURN down (la + lb), up (ua + ub)

or shorter yet:

HOW TO RETURN (la, ua) plus (lb, ub)
    RETURN down (la + lb), up (ua + ub)

Since we want to simulate approximate arithmetic using a fixed number of digits of accuracy, the functions up and down take a number and remove excess digits, rounding either up or down. The contents of the shared location accuracy specify how many digits of accuracy we are interested in:

HOW TO RETURN up x:
    SHARE accuracy
    IF x=0: RETURN 0
    PUT 0, 10**accuracy IN shift, limit
    WHILE abs x < limit: PUT x*10, shift - 1 IN x, shift
    WHILE abs x > limit: PUT x/10, shift + 1 IN x, shift
    RETURN (ceiling x) * 10**shift

This takes a number like 3.14159265, and multiplies or divides it so that its integral part has accuracy digits, also calculating the number of digits it has been shifted. So for accuracy = 4, x ends up as 3141.59265, with shift = -3. It then applies ceiling, which gives 3142, and multiplies this by 10 -3 to shift it back to give 3.142.

The function to round down is the same, except it uses floor instead of ceiling.

>>> PUT 4 IN accuracy
>>> WRITE down pi, up pi
3.141 3.142
>>> WRITE down 123, up 123
123 123
>>> WRITE down 12345, up 12345
12340 12350

Because exact arithmetic is being used, any value for accuracy can be used. Calculating to 100 places will only take more time.

Subtraction is similar to addition:

HOW TO RETURN (la, ua) minus (lb, ub):
    RETURN down (la-ub), up (ua-lb)

Multiplication is slightly harder, mainly because of problems caused when multiplying values whose upper and lower bounds have different signs. Here we take the easy way out, and calculate all possible values and identify the maximum and minimum:

HOW TO RETURN (la, ua) times (lb, ub):
    RETURN minmax {la*lb; la*ub; ua*lb; ua*ub}

where minmax returns the minimum and maximum values from a list:

HOW TO RETURN minmax list: RETURN down min list, up max list

We can do the same for division, but additionally we have to check for division by zero:

HOW TO RETURN (la, ua) over (lb, ub):
    CHECK NOT (lb <= 0 <= ub) \Division by zero
    RETURN minmax {la/lb; la/ub; ua/lb; ua/ub}

To use the package, a method is needed for converting an ABC number like pi into a range:

HOW TO RETURN range v: RETURN down v, up v

Just to show it works:

>>> WRITE range pi
3.141 3.142
>>> WRITE range 123
123 123
>>> WRITE range 12345
12340 12350

Also useful is a way of printing ranges more visibly as ranges:

HOW TO RETURN bounds (l, u): RETURN "[`l`:`u`] "
>>> WRITE bounds range pi
[3.141:3.142]
>>> WRITE bounds range 123
[123:123]

Now finally to define a function for square root. This uses the Newton-Raphson method, which repeatedly calculates for argument a, r = half ( { a ÷ r } + r ), until two consecutive r's are sufficiently close:

HOW TO RETURN sq.root (l, u):
    RETURN down sq.root' l, up sq.root' u
HOW TO RETURN sq.root' f:
    IF f=0: RETURN 0
    PUT f, (f+1)/2 IN r, r'
    WHILE converging: PUT r', (f/r' + r') / 2 IN r, r'
    RETURN r
converging:
    REPORT down r <> down r'

The definition of closeness is whether the two approximations differ only in digits outside the accuracy we are interested in.

>>> WRITE bounds sq.root range 256
[16:16.01]
>>> WRITE bounds sq.root range 2
[1.414:1.415]

(Actually sq.root' can be speeded up by adding at the beginning

SHARE accuracy
PUT accuracy+1 IN accuracy

and replacing (f/r' + r') / 2 by up((f/r' + r') / 2). In that way, intermediate calculations don't get calculated with too great an accuracy.)

Now, finally back to our original problem:

>>> PUT range 1 IN one
>>> PUT sq.root range 256 IN r256
>>> PUT sq.root range 255 IN r255
>>> WRITE bounds (r256 plus r255)
[31.96:31.98]
>>> WRITE bounds (one over (r256 minus r255))
[20:33.34]

This makes it quite clear which of the two results is more accurate.

This last result is both shocking and sobering. But if you work it out, root 256 is [16:16.01] and root 255 is [15.96:15.97]. The difference of these two is [0.03:0.05], so obviously the inverse is [20:33.34].

Even if we use range 16 instead of r256, we still get a wide range:

>>> WRITE bounds (one over((range 16) minus r255))
[25:33.34]

For more information about range arithmetic, see reference 2.

References

1. M. A. Malcolm, Algorithms to reveal properties of floating-point arithmetic, Communications of the ACM, Volume 15, Number 11, November 1972.

2. D. E. Knuth, The Art of Computer Programming, Volume 2, Seminumerical Algorithms, Addison-Wesley, Reading, Mass., 1973.