Volatiles Are Miscompiled, and What to Do about It [pdf]

kenrose · on Feb 17, 2014

Looking at the generated code for their watchdog example and their workaround of forcing a function evaluation, it looks like the common cause of this bug amongst all of the compilers is the optimizer not respecting volatile. It's easy to understand how this could be the case.

Imagine you work on a production quality C compiler. The pedestrian pieces (e.g., lexer, parser, AST) have been stable forever. The piece that you're very likely going to be working on is the optimizer (or maybe the code generator) to make use of new techniques or new instructions. While you're going about your business, you're probably not thinking about that arcane corner of the C language spec that discusses volatile. What's more, when you finally complete your feature, all of the compiler's test suites pass because there's insufficient coverage for volatile.

The biggest contribution of this paper, besides the fact that it identified this issue across various compilers, is the notion of access summary testing and advocating for it to be included as part of the test suite for C compilers.

jrockway · on Feb 17, 2014

At least gcc's generated code has no possibility of working at all, and the system will reboot in a loop the first time it's powered up. So you're only 30 seconds away from finding that bug. (Of course, once you notice that the watchdog is being triggered it's going to be several hours of debugging before you realize that your compiler just optimized out the check.)

kenrose · on Feb 17, 2014

Hours of debugging? Days even!

Really, how often when you encounter a bug do you think it's a compiler bug? Never. It can't be. Compiler writers are infalliable.

You'll first think it's your program, or maybe you misunderstood how volatile works, so you'll read the spec again. You'll write a poodle to isolate the problem. That will reboot each time too. Then you'll think it's some odd race condition related to volatile. But you're just doing a load and a store. The reboot happens every time, OK, that's promising. Then maybe, MAYBE, if you're awesome, you'll think to look at the generated assembly. And when you realize you have a no-op, you'll start to think if you maybe inadvertently specified something wrong in your -O parameters. Because how could the compiler be wrong? It's never wrong.

Code generation bugs are the worst.

mikeash · on Feb 17, 2014

I think it depends on how many compiler bugs you've previously encountered. I'm starting to suspect them more easily, these days. Practice at tracking them down has also made it easier to act on that suspicion. The thought of a compiler bug often gets people to throw up their hands in despair, but it's not so bad: just carefully verify the assembly output and see if it's actually correct. If it's not, then figure out some reasonably reliable way to tweak your code to avoid the bug (after you file a bug with the compiler people).

Blaming the compiler (or the hardware or the OS or...) is only a problem if you do so without investigating to see if your blame is well placed. Once you can investigate properly, then it's just another possibility in your bag of tricks.

Someone · on Feb 17, 2014

Also, this is embedded. As the paper indicates, they managed to crash a couple of compilers on such inputs. In my limited experience, that is not uncommon on embedded compilers. Having the compiler suddenly crash after an innocuous change a few times does not increase one's trust in its infallibility.

cnvogel · on Feb 17, 2014

I think it's a safe bet to start debugging with the notion of a correctly working compiler...

But then, I've found bugs in compilers and standard-libraries for embedded a few times already, they are much less battle-tested than your regular x86_64-linux-gnu-gcc. So at some point, I normally switch into "trust no one" mode and start reading disassembler outputs in the vincinity of the crash-site ;-)

mansr · on Feb 17, 2014

That's when you start finding bugs in the disassembler.

cnvogel · on Feb 18, 2014

...and in your hex-editor? ;-)

jrockway · on Feb 18, 2014

Code generation bugs are fine. CPU bugs are the worst.

In this case, I'd probably suspect the watchdog and add some print statements around the get/set. Then it would work. Then I'd remove the print statements and it would break again. Nothing says "compiler bug" like print statements magically fixing the problem. (Could be a timing thing too, of course. This must be why drivers log so much useless information.)

yoklov · on Feb 17, 2014

Not that I disagree with the sentiment, but days is a bit much.

If I'm that low level I'll probably take a look at the generated assembly, even if I think it is my fault. I don't think that's too uncommon either. It helps filter through the abstractions.

DannyBee · on Feb 17, 2014

This paper misstates the proper behavior of volatile to start

In particular, it says "For every read from or write to a volatile variable that would be performed by a straightforward interpreter for C, exactly one load from or store to the memory location(s) allocated to the variable must be performed."

This is wrong. It later kind of gets it right for C, explaining about sequence points, but it entirely misses that implementations are free to combine and eliminate multiple volatile accesses within the same sequence point.

Now certainly, most of what was reported were genuine bugs (and John reports a lot of correctness issues). But it does/did nobody favors to start with an incorrect definition.

cnvogel · on Feb 17, 2014

I think if you replace "straightforward interpreter for C" with the "abstract state machine" in the standard (I'm looking at ISO/IEC 9899:1999 right now), at least the first half of the sentence you qoute it's pretty much what the standard says:

❝An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine...❞ (§6.7.3)

For combining multiple volatile accesses within the same statement, I think I cannot find an answer in the standard.

zurn · on Feb 17, 2014

Related: https://www.kernel.org/doc/Documentation/volatile-considered...

Sounds like "volatile" variables don't really provide good semantics for most uses even without considering compiler bugs, so it's better to just use explicit load and store macros or functions.

acqq · on Feb 17, 2014

That is the reason the compilers mostly didn't care: the semantics of the language "volatile" is seldom what is needed "in the real world programs."

avian · on Feb 17, 2014

As the paper notes in the introduction, "volatile" is heavily used in embedded software where synchronization primitives like kernel's spinlocks aren't readily available.

The "buffer_ready" in the paper is a very good example that I have seen many times in the real world. If anyone can share the "better solution" that avoids "volatile" (and works on a bare-bone ARM microcontroller for example), I would love to see it.

cnvogel · on Feb 17, 2014

The solution to this problem is a memory barrier. The short version of such a barrier is: Define a class of transactions (e.g. memory writes). Then all transactions before the barrier must conclude before any transaction after the barrier starts. But this doesn't give you any guarantee about transactions of other types.

http://en.wikipedia.org/wiki/Memory_barrier

https://www.kernel.org/doc/Documentation/memory-barriers.txt

acqq · on Feb 17, 2014

I haven't programmed ARM myself and there are different ARMs, but as far as I understand (inspired by post from cnvogel), on most CPU's even non-ARM ones, you need to use intrinsics like these which exist on ARM:

Memory barrier instructions:

http://infocenter.arm.com/help/topic/com.arm.doc.faqs/ka1404...

GCC example:

http://stackoverflow.com/questions/6751605/data-memory-barri...

As far as I understand, volatile doesn't give you the needed "barrier" semantics (what is not allowed to happen before or after, on the deep hardware level) if the code with "volatile" works, it can be just an accident.

zAy0LfpBZLC8mAC · on Feb 17, 2014

I don't know the details for ARM, but generally, microcontroller type CPUs don't do any reordering, so no need to prevent it with barriers. And if you do use volatile correctly (and the compiler is not broken), you don't need compiler barriers either. You also might not need any CPU barriers when the cache controller is configured to treat certain regions of the address space as uncacheable (which might be used for MMIO regions).

mansr · on Feb 17, 2014

Modern high-performance ARM cores such as the Cortex-A15 do plenty of reordering. Even when using strongly ordered memory mappings, you still need barriers to prevent reordering between normal and strongly ordered accesses. No amount of volatile will help with this.

zAy0LfpBZLC8mAC · on Feb 17, 2014

The parent's parent was talking about microcontroller class ARMs, though. Also, you don't necessarily have to order "normal" with regards to strongly ordered accesses. If it's only about ordering accesses to MMIO, it doesn't matter when some arithmetic result gets stored to RAM relative to those accesses, all that matters is that the hardware sees the register reads and writes in the right order. Ordering only matters for stuff that is shared in some way, for private data, the illusion of naive serial execution is guaranteed anyhow.

mansr · on Feb 17, 2014

A common scenario is filling a data buffer in normal memory (cached or non-cached doesn't matter on ARM) before initiating a DMA operation by writing to a device register. In this situation, a barrier is required to prevent any of the normal writes being reordered around the DMA initiation which would then see stale data in the buffer.

Discussions about barriers (or volatile) are only meaningful in the context of related accesses, so explicitly mentioning this isn't really necessary.

zAy0LfpBZLC8mAC · on Feb 18, 2014

I don't really get what you are trying to say. Sure, there are cases where you need barriers, both the CPU and the compiler kind, nobody denied that. Still, in low-end stuff, PIO and cache-free in-order execution still are well and alive, and volatile can be perfectly sufficient under such circumstances.

colin_mccabe · on Feb 18, 2014

I don't really get what you are trying to say.

volatile is an antipattern. Don't use it. Don't encourage other people to use it. It is not a thing which should be used, by you. To use it would be wrong, because not using it is correct.

Use atomic instructions.

mansr · on Feb 18, 2014

The ARMv7-M architecture (which the popular Cortex-M series microcontrollers implement) allows reordering as I described. Even if a Cortex-M3 probably doesn't ever do it, nobody is making any promises. If a particular behaviour is not documented, it is wrong to rely on it.

acqq · on Feb 17, 2014

Well, with the sufficiently primitive CPU and C compiler even volatile is not needed.

zAy0LfpBZLC8mAC · on Feb 17, 2014

Sure - except that "primitive" CPUs are still widely used, while I'd think that non-optimizing C compilers are the exception.

JoeAltmaier · on Feb 17, 2014

Been writing embedded software for decades. Volatile is certainly NOT the feature of choice. Atomic operations including memory barrier are far more commonly used.

Embedded programmers are paranoid. They check such things in the debugger (examine assembly code). Not the big issue the OP seems to think.

_delirium · on Feb 17, 2014

The "buffer_ready" example was given as an incorrect use, though, which won't necessarily work even in a standards-conforming compiler, because the semantics of volatile don't forbid reordering the loop (which has no volatile accesses) to go after the buffer_ready write.

infogulch · on Feb 17, 2014

Quote from intro:

"Although the symptoms of this compiler bug—spurious periodic reboots due to failure to reset the watchdog timer—may be relatively benign, the situation could be worse, for example, if the hardware register were used to lower control rods, cancel a missile launch, or open the pod bay doors."

So that's what the problem was with those pod bay doors.

rjzzleep · on Feb 17, 2014

so, is this kind of quickcheck for c code generators?

i'm a little surprised at how much worse clang fared.

lstamour · on Feb 17, 2014

Note the article was written in 2008 and clang was first released in 2007. "LLVM 2.2 was released on February 11, 2008, and LLVM r53339 is a snapshot of the source code from July 9, 2008." They later in that paragraph describe the improvements in clang over such a short time.

Also, it seems there are quite a few compiler bugs being found, even today. This looks like a very productive field of study, though the same could likely be said for software correctness in general. http://blog.regehr.org/archives/1061

This particular paper's software (its modern equivalent) is at https://github.com/csmith-project/voltest