False. Like for any language, C and C++ programmers (should) put very high value on correctness. Incorrect code can be written in safe languages too, but clearly C/C++ make it much easier to kill your program spectacularly. Discipline, knowledge, and tools are required to work in C/C++.
No, just have some basic knowledge about how your compiler works and make decisions accordingly. Every major compiler lets the programmer specify exactly the tradeoffs you want. C/C++ simply don't hold your hand by default and let you get the best (reasonably possible) performance if you so desire. That's exactly the way it should be.
No, just use a compiler that doesn't actively attempt to malicously sabotage you. C has perfecly well defined semantics for (eg) null pointer dereference: read or write memory location zero, and consequently probably crash. Not "silently rewrite the pointer to nonzero-but-still-pointing-at-address-zero-somehow and proceed to bypass subsequent sanity checks for no damn reason".
Perfect example of the mismatch between (some) programmers' expectations and what language standards and/compilers require/implement.
> C has perfecly well defined semantics for (eg) null pointer dereference: read or write memory location zero, and consequently probably crash.
C does have "perfectly well-defined semantics" for null pointer dereference, but it's undefined behavior. Sure, the null pointer happens to be 0 on the vast majority of architectures most programmers work with, but apparently non-zero null pointers are still used these days (at least in the admgcn LLVM target, from what I understand after a quick glance at the linked page), so it's not even a "all sane modern machines" thing [0].
In any case, I'd love to see some good benchmarks showing the effect of the more "controversial" optimizations. Compiler writers say they're important, people who don't like these optimizations tend to say that they don't make a significant difference. I don't really feel comfortable taking a side until I at least see what these optimizations enable (or fail to enable). I lean towards agreeing with the compiler writers, but perhaps that's because I haven't been bitten by one of these bugs yet...
The compiler writers are right in the sense that for every one of those optimizations you can have a microbenchmark that will benefit hugely. The opponents are right too, in the sense that for most codebases the more niche optimizations don't matter at all. The right answer, as always, is to take a particular case you actually care about and benchmark yourself. There are no benchmarks that would be generally representative, there is no meaningful "average case".
Personally, I mostly use C/C++ for fairly high performance numerical code and happen to benefit greatly from all the "unsafe" stuff, including optimizations only enabled by compilers taking advantage of undefined behavior. I'm therefore naturally inclined to strongly oppose any attempts to eliminate undefined behavior from the language standards. At the same time, however, I fully recognize that most people would probably benefit from a safer set of compiler defaults (as well as actually reading the compiler manuals once in a while) or even using languages with more restrictive semantics. Ultimately, there is no free lunch and performance doesn't come without its costs.
No, C explicitly does not define semantics; it still has them, because the underlying hardware and ABI provides them, whether the C standard likes it or not. That's the point: if the compiler isn't actively sabotaging you, you end up with the semantics of the macine code you're compiling to, and can benefit from obvious sanity checks like not mapping anything at address zero.
There is a difference between implementation defined and undefined. Dereferencing a null pointer is undefined and cannot be performed in a well-formed program. Implementation specified behavior (e.g. sizeof(void *)) is dependent on the hardware and ABI.
If you are dereferencing a null pointer, you're doing something undefined. Your contract with the compiler was that it would produce valid code as long as you stayed within the confines of C. Thus you cannot blame the compiler for automatically assuming that you're not doing anything wrong, because that's explicitly the instructions it was given.
You can view it as malicious sabotage, or you can view it as a potentially useful optimization (with obvious security caveats when used incorrectly). Either way, my point is that every major compiler lets you make the tradeoffs that are right for your particular usecase. That's a good thing.
I have a small program which uses an undefined behavior. How would you rewrite this so as to not do so, but maintain all existing defined behaviors?
#include <stdio.h>
int main() {
int a, b, c;
printf("Please enter two numbers: ");
scanf("%d %d", &a, &b);
c = a + b;
printf("The sum of the numbers you entered is %d\n", c);
return 0;
}
This is the problem. There is no non-trivial C code that doesn't use some undefined behavior somewhere. And it works just fine, right now. But who says it will on the next version of the compiler?
Nobody has mentioned that scanf has undefined behavior on numeric overflow. If INT_MAX is 2147483647 and the user enters 2147483648, the behavior is undefined. (That doesn't just mean you can get an arbitrary value for a and/or b. It means that the standard says nothing at all about how the program behaves).
Reference: N1570 (C11) 7.21.6.2 paragraph 10: "If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined."
> I have a small program which uses an undefined behavior. How would you rewrite this so as to not do so, but maintain all existing defined behaviors?
in this precise case, (in C++ because I can't be arsed to search the modifier for int64), fairly easily :
#include <fmt.h>
#include <cstdio>
int main() {
int a{}, b{};
int64_t c{};
fmt::print("Please enter two numbers: ");
scanf("%d %d", &a, &b);
c = a + b;
fmt::printf("The sum of the numbers you entered is {}\n", c);
return 0;
}
in the case where a, b, were also of the largest int type available, you could do :
#include <stdio.h>
int main() {
int a, b, c;
printf("Please enter two numbers: ");
scanf("%d %d", &a, &b);
if(__builtin_add_overflow(a, b, &c))
{
printf("The sum of the numbers you entered is %d\n", c);
}
else
{
printf("You are in a twisty maze of compiler features, all different");
exec("/usr/bin/0ad");
}
return 0;
}
With regard to your first version: I had this vague feeling that the usual arithmetic conversions state that if the operands are of the same type, then no conversion is performed, and apparently this is so. If so, then would not the expression a + b potentially overflow, regardless of what it is being assigned to?
FWIW, I tried this example:
#include <stdio.h>
#include <limits.h>
#include <stdint.h>
int main () {
int64_t x;
int a = INT_MAX, b = 1;
x = a + b;
printf( "%lu %lu %d %lld\n", sizeof(x), sizeof(a), a, x);
return 0;
}
After compiling gcc -std=c11 -fsanitize=undefined -Wall -Wpedantic -Wextra (getting no warnings), on running I get
c-undef.c:8:9: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
8 4 2147483647 -2147483648
Also with -O0 and -O3.
At least according to the answer in [1], signed (but not unsigned) integer overflow is undefined behavior (unless C and C++ have diverged on this matter, but I get the same result when the same source code is compiled as C++17.)
This may seem to be pedantic (if it is not just wrong), but the point is that you have to be unremittingly pedantic to avoid undefined behavior.
Your first suggestion is similar to his checked_add_1(). The Emacs macros are similar to his checked_add_4(). Presumably builtins are optimal, performance wise, for a given compiler.
I like the idea of your first try, but it's not guaranteed to work on all implementations, even with the type promotion fixed. The problem is that the C standard doesn't specify a maximum size for int. It could be a 128 bit type.
The problem is that UB is invoked on runtime for particular inputs... And it silently makes program unreasonable after this point.
That is not good. It's hard to reason about such a program, in other words can you trust results of such a program? How do you know whether your input caused Ub at some point or not?
The burden of sanitizing the input is on the user (either programmer or the data provider).
Checking this is usually a performance tradeoff, so it was decided not to be done by default.
That's a documentation/sanitization problem, rather than a one involving the question of whether this program is well-formed. My response was that it is, given the input obeys the rules you've stated. How you enforce that is a different concern (or whether you dislike the preconditions and think this code could be more robust and have a wider domain, because it certainly could).
I'll have to disagree with you there: you cannot fully control the range of inputs that you are provided at all times, or perform verification of input. But you can clearly document what conditions the input must satisfy for the program to be well-defined.
The compiler will not break this example, because it cannot put restrictions on the values of a and b. The only thing it can do is assume that a + b < INT_MAX, in which case the program will work as expected.
No. It can check whether the operands will cause overflow and issue an error instead of silently giving a bogus result.
It's a performance hit though.
I don't think you'd want a banking app written with assumption that you'll never have more than MAXINT amount of dollars, and adding few more will roll you back to 0 or even put you in debt... silently... because well it works this way, and designer didint think you'll hit the limit. If it ever happened you'd like a siren to go off.
> It can check whether the operands will cause overflow and issue an error instead of silently giving a bogus result.
Yes, and it could also decide to wipe your hard drive because of the latitude given to it by the C standard. Many compilers have an option to enable some sort of special behavior when a signed integer overflows, but such extensions are non-standard.
Unless the undefined behaviour is hidden in printf or scanf, this code exhibits none.
Variadic functions are reasonably well defined and no type aliasing takes place.
Sure, but the program is well-defined on inputs that do not cause it to overflow. It does not perform undefined behavior in those cases, so it's correct given those preconditions.
You can also take the same route as sel4. They have a formal, machine readable specification of what their C code is supposed to do, and then verify that the output of the C compiler matches that specification. It's a nice 'have your cake and eat it too' situation where you can have formally verified, high assurance binaries that were compiled with gcc -O2, but with gcc removed from the trusted computing base.
sel4 is also the work of a very small handful of people, with a lot of the verification code hand built. A huge chunk of the work needn't be redone, just like you wouldn't need to rewrite a c compiler for every c project.
So yes, they kept the verified code small, so that they could focus on all of the tooling. I'm unconvinced that formal methods intrinsically can't scale.
Unfortunately I'm also not sure if formal methods can scale. In large projects the area is almost completely untouched. From what I can tell those who are using it in large projects aren't talking. (safety critical and military applications both tend to be places where you don't talk about what you have done). I'm looking into proposing to management that it is a solution to our quality problems, but I don't want to look like a head in the clouds guy with no business sense. (We already have a good automated test culture, but they can only prove everything we thought of works, we have lots of issues with situations we didn't think about)
I'd say that the generic tooling isn't quite there yet for most business needs. The sel4 guys have a lot of inrastructure that'd need to be decoupled a little from their use case to reuse.
Also, I'd add that when you say
> We already have a good automated test culture, but they can only prove everything we thought of works, we have lots of issues with situations we didn't think about
in a lot of ways formal verification only helps you with situations you know about, specifically problem classes that you know about. For instance sel4 was susceptible to Meltdown.
interesting and useful point of view. One of our most common problems right now is importing bad data. I'm hoping a formal method could "somehow" force us to find all the places where we act on data and ensure we validated it first. This is a known class of problem, but it is easy to overlook that some file might be mostly valid but have one field that is incorrect.
LOL. Please, tell me what the alternatives are. I’ve only been doing commercial embedded product design for 15 years. Tell me all about how one time you saw a got for “embedded rust” or something.
Those are reserved for people who value speed over correctness.