Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The second version (test against zero) benefits from the fact that the subtraction instruction sets the Z flag automatically. So the end of the second loop is something like "sub i,1" followed by "jnz top".

The most straightforward implementation of first loop would be "add i,1" followed by "cmp i,NUMBER" followed by "jb top" (or "jl top" if i is signed). It's an extra instruction which may be even slower, depending on specific CPU and surrounding code, if NUMBER is a literal or memory access (as opposed to register value).

My guess is that your compiler will produce code that takes advantage of the savings if you turn on optimizations, but you might have to actually do something in the body of the loop to keep the compiler from optimizing the loop completely out (or tweak flags to enable/disable specific optimization techniques). Adding a loop body will make the timing difference less noticeable, as the single-instruction savings of the zero-test version becomes a smaller percentage of the total time per iteration.

Disclaimer: This is how I'd write assembly by hand. A compiler could conceivably do something quite different, depending on the specific application code, compiler, version and flags. I also haven't yet taught myself 64-bit x86 assembly code, although I understand it's rather similar to 32-bit x86 with more registers available.



The compiler optimizing out the loop completely is one of the reasons why I didn't turn on optimizations here, although there was a loop body. gcc is smart enough to optimize loops with a fixed outcome away in some cases. I probably should have written a more complex loop body.

That said, it's very likely that the compiler will make sense of such an optimization and produce the quicker assembly code like you just wrote it.

What I wanted to say with my post was really that things like these are heavily depended on architecture and that comparisons could be optimized in hardware.


> The compiler optimizing out the loop completely is one of the reasons why I didn't turn on optimizations here

Use a non-constant value for the loop and put a dummy load inside to stop compiler from doing loop unrolling and/or dead code elimination.

I often use time(), rand() or even argc to get dummy values when looking at assembly from compiler optimizations.


One can just write the loop as

  for (..; ..; ..) asm("");
GCC doesn't introspect the asm statements, and so it leaves the loop there. (At least, it did for me, even with -O3.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: