Looking at the generated code for their watchdog example and their workaround of forcing a function evaluation, it looks like the common cause of this bug amongst all of the compilers is the optimizer not respecting volatile. It's easy to understand how this could be the case.
Imagine you work on a production quality C compiler. The pedestrian pieces (e.g., lexer, parser, AST) have been stable forever. The piece that you're very likely going to be working on is the optimizer (or maybe the code generator) to make use of new techniques or new instructions. While you're going about your business, you're probably not thinking about that arcane corner of the C language spec that discusses volatile. What's more, when you finally complete your feature, all of the compiler's test suites pass because there's insufficient coverage for volatile.
The biggest contribution of this paper, besides the fact that it identified this issue across various compilers, is the notion of access summary testing and advocating for it to be included as part of the test suite for C compilers.
At least gcc's generated code has no possibility of working at all, and the system will reboot in a loop the first time it's powered up. So you're only 30 seconds away from finding that bug. (Of course, once you notice that the watchdog is being triggered it's going to be several hours of debugging before you realize that your compiler just optimized out the check.)
Really, how often when you encounter a bug do you think it's a compiler bug? Never. It can't be. Compiler writers are infalliable.
You'll first think it's your program, or maybe you misunderstood how volatile works, so you'll read the spec again. You'll write a poodle to isolate the problem. That will reboot each time too. Then you'll think it's some odd race condition related to volatile. But you're just doing a load and a store. The reboot happens every time, OK, that's promising. Then maybe, MAYBE, if you're awesome, you'll think to look at the generated assembly. And when you realize you have a no-op, you'll start to think if you maybe inadvertently specified something wrong in your -O parameters. Because how could the compiler be wrong? It's never wrong.
I think it depends on how many compiler bugs you've previously encountered. I'm starting to suspect them more easily, these days. Practice at tracking them down has also made it easier to act on that suspicion. The thought of a compiler bug often gets people to throw up their hands in despair, but it's not so bad: just carefully verify the assembly output and see if it's actually correct. If it's not, then figure out some reasonably reliable way to tweak your code to avoid the bug (after you file a bug with the compiler people).
Blaming the compiler (or the hardware or the OS or...) is only a problem if you do so without investigating to see if your blame is well placed. Once you can investigate properly, then it's just another possibility in your bag of tricks.
Also, this is embedded. As the paper indicates, they managed to crash a couple of compilers on such inputs. In my limited experience, that is not uncommon on embedded compilers. Having the compiler suddenly crash after an innocuous change a few times does not increase one's trust in its infallibility.
I think it's a safe bet to start debugging with the notion of a correctly working compiler...
But then, I've found bugs in compilers and standard-libraries for embedded a few times already, they are much less battle-tested than your regular x86_64-linux-gnu-gcc. So at some point, I normally switch into "trust no one" mode and start reading disassembler outputs in the vincinity of the crash-site ;-)
Code generation bugs are fine. CPU bugs are the worst.
In this case, I'd probably suspect the watchdog and add some print statements around the get/set. Then it would work. Then I'd remove the print statements and it would break again. Nothing says "compiler bug" like print statements magically fixing the problem. (Could be a timing thing too, of course. This must be why drivers log so much useless information.)
Not that I disagree with the sentiment, but days is a bit much.
If I'm that low level I'll probably take a look at the generated assembly, even if I think it is my fault. I don't think that's too uncommon either. It helps filter through the abstractions.
Imagine you work on a production quality C compiler. The pedestrian pieces (e.g., lexer, parser, AST) have been stable forever. The piece that you're very likely going to be working on is the optimizer (or maybe the code generator) to make use of new techniques or new instructions. While you're going about your business, you're probably not thinking about that arcane corner of the C language spec that discusses volatile. What's more, when you finally complete your feature, all of the compiler's test suites pass because there's insufficient coverage for volatile.
The biggest contribution of this paper, besides the fact that it identified this issue across various compilers, is the notion of access summary testing and advocating for it to be included as part of the test suite for C compilers.