Nobody Cares About a Few Million Nanoseconds

A Clever Programming Trick…

If you need to swap the values of two variables, this usually requires a third temporary variable (that is, if you’re not using a language like Python that supports the a, b = b, a syntax.) It looks something like this:

temp = a;
a = b;
b = temp;

But if these are integer variables, there’s a nifty trick to save yourself a little bit of memory. You can use arithmetic instead of a temporary variable:

a = a + b;
b = a - b;
a = a - b;

If the integers on your platform are 32-bits, your new swap will save four bytes of memory.

NOBODY CARES ABOUT FOUR BYTES OF MEMORY.

This is a mistake a lot of new programmers make. The coder comes up with some clever trick or that can save a few bytes of memory or shave a few nanoseconds off of a function. You must learn that these “clever tricks” aren’t really worth it. These clever tricks are called “micro-optimizations”, and back when computers only had 64KB of memory, they made sense. More often than not these days they just make the code less readable and harder to debug. Memory is cheap, and humans won’t notice a few more milliseconds of waiting time (unless the delay is for a frequent and visible event.)

Think about it. In the time that it takes for you to read this sentence, several billion nanoseconds have already passed. Billions. A clever trick that saves a few nanoseconds at the expense of code readability is not going to be noticed. NOBODY CARES ABOUT A FEW MILLION NANOSECONDS.

“Premature optimization is the root of all evil.”

This notion is encapsulated in Don Knuth’s venerated saying, “Premature optimization is the root of all evil.” In other words, you should concentrate on making your software, then concentrate on making it work correctly, and then later (if you have to) concentrate on making it fast. Trying to optimize the code before then is a fool’s errand. Most likely, the software you write today will have been replaced or tossed out or forgotten three years from now. (Unless you are writing Oregon Trail, in which case old people will keep writing emulators to play it out of some silly sense of nostalgia.)

Some examples of clever tricks you should never do:

  • The integer variable swap trick shown above.
  • Reusing variables for different purposes. Separate variable names for separate values makes debugging easier.
  • Using a float instead of a double. (You save a little memory but the smaller precision can result in notoriously tricky rounding errors.)
  • Doing math in the code for the computer, such as multiplying a value by 525600 (the number of minutes in a year) instead of multiplying a value by 60 * 24 * 365 (which is more obvious). The compiler/interpreter will most likely optimize these automatically anyway.
  • Combining functions together into fewer functions to reduce the overhead of function calls.
  • “Loop unrolling/unwinding”, that is, copying/pasting code that would normally be in a loop so that you do not have the overhead of looping for each iteration. (Don’t even think about using Duff’s device.)
  • Short variable and function names. This doesn’t even make the compiled program run better, it just makes the source code slightly smaller. That the variable “mttl” means “monthly total” will be completely impossible to know, even for the programmer who wrote it, once a few weeks have elapsed. Single-letter variable names, unless it’s something like x and y for coordinates or i and j for a loop’s variable, should never be used. Don’t use the variable name “n” to store a number: have the variable name describe what the number is used for.

Don’t Guess, Use a Profiler

When you do begin the process of optimizing your program, don’t just look through your code and guess where the slow and bloated parts are. Run your code under a profiler, which will scientifically tell you how much memory and how long is spent executing each function in your program. (For Python, the cProfile module does a good job. See the Instant User’s Manual to learn how to profile your Python code.)

Unless the software is being run on a computer that is going into space, a nuclear reactor, or someone’s chest cavity, these micro-optimizations don’t matter 97% of the time. Even when programming for smart phones (which have limited system resources) you need to focus on improvements that result in orders of magnitude improvements, not micro-optimizations. These usually involve caching data or using entirely different algorithms, not tiny clever tricks that make the code inscrutable.

Code that is straightforward to read is easy to understand. Code that is easy to understand is less likely to have bugs. Code that is easy to understand is easy to extend with new features. And it’s bug-free programs with cool features that people want. Nobody cares about a few million nanoseconds.

More info about this topic here: http://en.wikipedia.org/wiki/Program_optimization

Also, to head off criticisms of this article, there are times when micro-optimizations are needed. That’s why Knuth says they’re unneeded only 97% of time.

13 thoughts on “Nobody Cares About a Few Million Nanoseconds

  1. One thing I often have to point out to new programmers is that there does not have to be any direct 1-to-1 mapping from variables in your code to registers or even memory locations in the compiled machine code. Reusing a variable for a different purpose doesn’t save you any space because the compiler does its register allocation independent of your variable naming.

  2. Great article! You only fall flat on “Unless the software is being run on a computer that is going into space, a nuclear reactor, or someone’s chest cavity”.
    Those are known to be the last places where anyone would trade readability, which brings reliability, with speed.
    Usually speed optimizations are mandatoriy in OS kernels, system libraries, CPU-heavy interactive applications like VGs, embedded systems (the real ones, not smartphones)

  3. Actually, heavy manual optimization is frequently still necessary for spaceflight software due to the heavily constrained nature of the processors on board.

  4. “””
    a = a + b;
    b = a – b;
    a = a – b;
    “””

    that hacks also have a non obvious problem for beginners, a and b are 32 bits integers, right? now imagine a + b is either > 2^31 or < – 2^31… yes, you have an overflow, and the swap will produce wrong values a and b…

    and yeah, i think using python is fast enought for most things, and when obvious code is not, cprofile will help you, also, runsnakerun is a great gui to makes sense of the results…

    Link on my game is my python game project, which shows you can do complex games in pure python/pygame… it's a clone of smash bros nintendo games… some work left, but really enought to show to clone those 'python is slow' arguments… it's all about an efficient algorithm, (seeing real complexity, not saving "nothings" everywhere).

  5. I’ve seen the “temp swap” done as:
    a ^= b;
    b ^= a;
    a ^= b;
    See : http://en.wikipedia.org/wiki/XOR_swap_algorithm

    I think it’s fair to say that you should never say never. In the situation where you find yourself doing integer (or float) swaps enough to warrant it, write an inline function to do it. Then you can elect to hide whatever implementation you desire as the core meaning behind the swap is clear from the function name.

    And I disagree about using doubles instead of floats! The accuracy of a float is enough to give an error of ~4m on scales approaching those needed to get a satellite into geostationary orbit. That’s better than you’ll get from a standard GPS device. Or to put it another way, on a metre scale, you can get an accuracy of about 1 nanometre, which is roughly in line with the diameter of atoms. I’d say that was enough for most people. Using doubles doesn’t eliminate rounding errors completely, it just makes them less common. If you want to use floats for representing currency, you probably need your head examining, horses for courses!

    Other than that, it’s a fine article :)

  6. I completely agree, except for your statement about floats verse doubles. On a lot of hardware doubles are much slower than floats, and it’s extremely difficult to convert code from using doubles to one that uses floats (for the reasons you mentioned). The other problem with doubles is that the profiler doesn’t really show issues with doubles since it’s not like the profiler is going to tell you that your multiples are slow — Instead you’ll just run into more cache misses and all of your math will be slightly slower.

    Other problems include talking with your GPU (since it is all floats or less) since now your GPU and CPU will have a different representation of the number (slower, and also can be the source of bugs).

    And, honestly, if you’re writing a game that requires the precision of a double, there’s probably something else wrong.

  7. For a moment, I thought your example was meant for a language other than python, I mean the declarations are not on the same line, so what’s with your use of semicolon?

  8. Excellent article! I wish all the programmers could think this way. I preach all day long to keep developments simple but it is not an easy task to do. Some developers tend to complexify their developments as it shows that they are “smart” (am i wrong?). But they often forget that “smart” also means developing code that is simple to maintain, human readable and that comply with the customer budget!

  9. Instead of “don’t do micro-opt” instead “do focus on algorithms and abstractions”. The fact is, depending on application, micro-opt can be appropriate in parts of your code that are heavily used. You see plenty of this appropriately used in the python world. Way more then in just nuclear reactors, etc. The problem with speeding up many programs that havn’t been designed well isn’t that you couldn’t micro-opt your way there but that the places you would need to micro-opt are spread all over the place. If they are behind reasonable abstractions then you can apply micro-opt (after measuring) in a judicious way. And of course if you blow the algorithms up front and/or the use of those algorithms is appropriately abstracted then micro-opt is often not a good enough answer.

  10. You write:
    •Doing math in the code for the computer, such as multiplying a value by 525600 (the number of minutes in a year) instead of multiplying a value by 60 * 24 * 365 (which is more obvious).

    And makes it obvious that this is wrong for leap years. :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>