FIRST Robotic Competition - Integer Overflow

Variable Size

One of the features of computers is that numbers are generally stored in a memory space of fixed size. For instance an int may be in 16 bits on some architectures. An attempt to compute a value that can only be represented in 17 or more bits will overflow the variable, resulting in rather odd behavior.

This is complicated by the fact that most variables are signed, which is to say that the variable can be either positive or negative. In twos compliment arithmetic, by far the most common way to store signed int, the highest order bit is the sign bit and is not available to indicate magnitude. However, arithmetic which overflows into the sign bit may cause the result to be negative. Let's look at this in more detail, but using 8 bit variables for convenience. Also for convenience, we will only consider unsigned numbers.

Overflow Example

Hex 0f
Binary 00001111
Decimal 15

Now add one to that number.

Hex 10
Binary 00010000
Decimal 16

So far so good. But what if we deal with larger numbers.

Hex f0
Binary 11110000
Decimal 240

Now add 16 to that number, truncating after the 8th bit.

Hex not 100, but 00
Binary Not 10000000, but 00000000
Decimal Not 256, but 0

Signed Numbers

Recall that the sign bit is stored in the highest order bit. That means that in the example above, the decimal value of f0 would not be 240, but a negative number.

When you are debugging your software you will occasionally encounter cases where simple math results in numbers which are wildly wrong. If you were to graph the results of s linear sequence of such calculations you would see major discontinuities in the results. When you see results like this you should consider that you may be dealing with integer overflow.

The Antidote

When you discover integer overflow in your code there are several ways to resolve it. Any of these may be sufficient. But each involves tradeoffs.

Order of Calculation

Consider the calculation of

a=b*c/d;
This calculation can overflow in the calculation of b*c, even tho d would have brought the result back into range. We can make use of this by forcing the calculation to be
a=(b/d)*c;
Now the intermediate result, as well as the final result, will not overflow. There are two problems with this solution:

Having seen the downsides to this approach, you may be tempted to use the other solutions. But bear in mind that they also have downsides.

Casting or Coercion

There are several integer data types in C. In order of (sometimes) increasing size these are:

Each of these data types are guaranteed to be at least as big as the following. They are also guaranteed to be at least as big as the previous type in the list above. (So long is at least as big as int, but is typically twice as big.)
Type Type Type Minimum Note
char unsigned char signed char 8 bits The char type, when unqualified by signed or unsigned may or may not include a sign bit. The C compiler is free to choose.
short int unsigned short int signed short int 16 bits
int unsigned int signed int 16 bits int is the natural integer type of the CPU. If the CPU naturally works in 32 bits, then int will be 32 bits.
long int unsigned long int signed long int 32 bits
Each C compiler is free to make these types any size, so long as they meet the constraints listed above. So for instance on a Pentium you may find that your int and your long are both 32 bits wide. The file "limits.h" specifies the size of each integer type, including their maximum and minimum values.

OK, so how do we make use of this information? Assuming your variables are all int, the following modification may fix the problem.

a=(long)b*(long)c/(long)d;
What this does is to promote each of the variables to a long before performing the calculation. When the result is then stored in a, it is truncated back to int. But by then hopefully the result will fit into an int.

Disadvantages to this approach.

Change Your Declarations

This approach is similar to the previous one, but instead of promoting the variables to the larger size in the middle of the calculation, it is done in the declaration. So we would change from

      int a, b, c, d;
    
to
      long a, b, c, d;
    
This also has disadvantages.

Summary

Integer overflow is the ugly underbelly of the C language. If you are not experienced with it the results can be maddeningly illogical. Resolving the issue involves tradeoffs. There is no one-size-fits-all solution.

Now that you are aware of this problem you may be asking "why did the designers of C allow such ugliness into the language"? In fact they had good reasons. They could have specified that the compiler would detect all such problems and abort the program. But doing that would add overhead to all programs, making them larger and slower. Instead the C language is designed as a language that allows you to get excellent performance out of the computer. The downside is that you, as the programmer, must ensure that your code doesn't run into problems of this sort.


Last modified 11 Dec 2006
http://brown.armoredpenguin.com/~abrown/contact.html
http://brown.armoredpenguin.com/~abrown/first/training/ComputerBasics/index.html