Looking at the generated code for their watchdog example and their workaround of...

jrockway · on Feb 17, 2014

At least gcc's generated code has no possibility of working at all, and the system will reboot in a loop the first time it's powered up. So you're only 30 seconds away from finding that bug. (Of course, once you notice that the watchdog is being triggered it's going to be several hours of debugging before you realize that your compiler just optimized out the check.)

kenrose · on Feb 17, 2014

Hours of debugging? Days even!

Really, how often when you encounter a bug do you think it's a compiler bug? Never. It can't be. Compiler writers are infalliable.

You'll first think it's your program, or maybe you misunderstood how volatile works, so you'll read the spec again. You'll write a poodle to isolate the problem. That will reboot each time too. Then you'll think it's some odd race condition related to volatile. But you're just doing a load and a store. The reboot happens every time, OK, that's promising. Then maybe, MAYBE, if you're awesome, you'll think to look at the generated assembly. And when you realize you have a no-op, you'll start to think if you maybe inadvertently specified something wrong in your -O parameters. Because how could the compiler be wrong? It's never wrong.

Code generation bugs are the worst.

mikeash · on Feb 17, 2014

I think it depends on how many compiler bugs you've previously encountered. I'm starting to suspect them more easily, these days. Practice at tracking them down has also made it easier to act on that suspicion. The thought of a compiler bug often gets people to throw up their hands in despair, but it's not so bad: just carefully verify the assembly output and see if it's actually correct. If it's not, then figure out some reasonably reliable way to tweak your code to avoid the bug (after you file a bug with the compiler people).

Blaming the compiler (or the hardware or the OS or...) is only a problem if you do so without investigating to see if your blame is well placed. Once you can investigate properly, then it's just another possibility in your bag of tricks.

Someone · on Feb 17, 2014

Also, this is embedded. As the paper indicates, they managed to crash a couple of compilers on such inputs. In my limited experience, that is not uncommon on embedded compilers. Having the compiler suddenly crash after an innocuous change a few times does not increase one's trust in its infallibility.

cnvogel · on Feb 17, 2014

I think it's a safe bet to start debugging with the notion of a correctly working compiler...

But then, I've found bugs in compilers and standard-libraries for embedded a few times already, they are much less battle-tested than your regular x86_64-linux-gnu-gcc. So at some point, I normally switch into "trust no one" mode and start reading disassembler outputs in the vincinity of the crash-site ;-)

mansr · on Feb 17, 2014

That's when you start finding bugs in the disassembler.

cnvogel · on Feb 18, 2014

...and in your hex-editor? ;-)

jrockway · on Feb 18, 2014

Code generation bugs are fine. CPU bugs are the worst.

In this case, I'd probably suspect the watchdog and add some print statements around the get/set. Then it would work. Then I'd remove the print statements and it would break again. Nothing says "compiler bug" like print statements magically fixing the problem. (Could be a timing thing too, of course. This must be why drivers log so much useless information.)

yoklov · on Feb 17, 2014

Not that I disagree with the sentiment, but days is a bit much.

If I'm that low level I'll probably take a look at the generated assembly, even if I think it is my fault. I don't think that's too uncommon either. It helps filter through the abstractions.