Posted On: 2022-08-08
Recently, I've been building a simulated system that (loosely) mimics some real-world phenomena - and I've found myself defying convention by using integer arithmetic rather than floating point. This choice was motivated in large part by the need to accumulate many tiny changes over a long time period. With integers, it is possible (if a bit tricky) - but with floating point, it's death by a million paper cuts.
Realistic simulations often use floating point arithmetic - it gives approximately the right value at blazing fast speeds, and is designed to fit most any model or system, regardless of scale*. Although these approximations sometimes cause small drifts in the system - (ie. occasional loss of a millionth of a point), the difference is generally not observably large. Unfortunately, when a system requires precision (ie. conserving a finite resource spread across the whole system), these small drifts can cause the simulation to break in unexpected ways.
If speed (and storage space) is not an issue, there are some (third-party) data types that can be a drop-in replacement for floating point numbers - such as Rational Numbers. For my use case, however, speed and storage are both factors (the simulation runs on a client machine), so I opted to use integers. Fast and deterministic, integers are a great fit for many use cases. Integers are stable for addition, subtraction, and multiplication, giving a result which is both perfectly accurate* and itself an integer. When dividing integers, however, things become a bit more messy: if the divisor is a factor of the numerator, then you'll get an integer result - otherwise, you're out of luck.
When the result of division is not an integer, C# (and most C-like languages) will pretend it's an integer by truncating the fractional part. Thus,
C# will happily tell you that 5/2
is 2
, and (5/2) * 2
is not 5
.
This quirk is, in fact, quite handy for system design. The previous point, articulated more broadly as (x/n) * n
will give you the largest number that is less than or equal to x
and has n
as a factor*. Knowing that number can be quite handy: after all, the sum of numbers that share a common factor will also share that same factor -
making this a great tool for aggregating numbers (ie. for averages, diffusion, etc.)
A downside to integers is that the numbers that are most convenient to reason about are often too small to use in code. If your numbers are too small, integer division quickly collapses down to 0 (ie. if you're working with 2, dividing it into 8 equal parts will give you a whole lot of 0s). To work around this, it's common to multiply by some fixed amount (ie. 10000) - and use that as the starting baseline for the system. As a quick example - consider the measurements of a room. Although it might make the most sense to measure a room in meters, you won't be able to represent (or calculate using) many objects in the room - certainly nothing smaller than a meter. If, instead, you measure the room using millimeters, then most objects in the room will be represented by a reasonably large number - making it much better suited to using integer arithmetic.
There is, of course, the opposite problem as well: if integers are too large, they risk overflowing. While that's fairly easy to mitigate at the baseline level (ie. don't multiply sizes by a billion), anticipating overflow can become more tricky when calculations are involved. Consider again measuring using meters - but this time measure the entire world. Individual objects' dimensions can be measured just fine: even the width of the Pacific Ocean (less than 20 million meters) would fit comfortably under the integer limit (roughly 2 billion.) If, however, you have to measure area (square meters), suddenly things become problematic: 50000 squared is greater than 2.5 billion, meaning the area of anything close to 50k wide is going to be a problem*.
One final note about using integers: in many cases, the precision afforded by the integer won't be high enough to capture the exact value of the phenomena being simulated. This is, of course, an issue for floating point as well - albeit under different circumstances and situations. The advantage for integers, however, is their behavior is deterministic - allowing the developer to intentionally design how those inaccuracies are captured, and where they ultimately go. For me, this is essential: whether it's in one place or another, inaccuracies must be accounted for within the simulation itself - rather than leaking numbers into the void, one hair's width at a time.