I guess, if you want to know what your code will be doing when executed, don't u...

tomnj · on June 27, 2018

False. Like for any language, C and C++ programmers (should) put very high value on correctness. Incorrect code can be written in safe languages too, but clearly C/C++ make it much easier to kill your program spectacularly. Discipline, knowledge, and tools are required to work in C/C++.

charleslmunger · on June 27, 2018

Not just kill your program spectacularly - in memory unsafe languages, simple logic errors turn in to remote code execution.

pjmlp · on June 27, 2018

The usual statement.

First time I heard it was about 1993.

mort96 · on June 27, 2018

That doesn't mean it's not true though? People habe been claiming the earth is round since 500 BC, yet it's still as true as ever.

pjmlp · on June 27, 2018

In this case the CVE database records history proves otherwise.

keldaris · on June 27, 2018

No, just have some basic knowledge about how your compiler works and make decisions accordingly. Every major compiler lets the programmer specify exactly the tradeoffs you want. C/C++ simply don't hold your hand by default and let you get the best (reasonably possible) performance if you so desire. That's exactly the way it should be.

a1369209993 · on June 27, 2018

No, just use a compiler that doesn't actively attempt to malicously sabotage you. C has perfecly well defined semantics for (eg) null pointer dereference: read or write memory location zero, and consequently probably crash. Not "silently rewrite the pointer to nonzero-but-still-pointing-at-address-zero-somehow and proceed to bypass subsequent sanity checks for no damn reason".

aw1621107 · on June 27, 2018

Perfect example of the mismatch between (some) programmers' expectations and what language standards and/compilers require/implement.

> C has perfecly well defined semantics for (eg) null pointer dereference: read or write memory location zero, and consequently probably crash.

C does have "perfectly well-defined semantics" for null pointer dereference, but it's undefined behavior. Sure, the null pointer happens to be 0 on the vast majority of architectures most programmers work with, but apparently non-zero null pointers are still used these days (at least in the admgcn LLVM target, from what I understand after a quick glance at the linked page), so it's not even a "all sane modern machines" thing [0].

In any case, I'd love to see some good benchmarks showing the effect of the more "controversial" optimizations. Compiler writers say they're important, people who don't like these optimizations tend to say that they don't make a significant difference. I don't really feel comfortable taking a side until I at least see what these optimizations enable (or fail to enable). I lean towards agreeing with the compiler writers, but perhaps that's because I haven't been bitten by one of these bugs yet...

    [0]: https://reviews.llvm.org/D26196

keldaris · on June 27, 2018

The compiler writers are right in the sense that for every one of those optimizations you can have a microbenchmark that will benefit hugely. The opponents are right too, in the sense that for most codebases the more niche optimizations don't matter at all. The right answer, as always, is to take a particular case you actually care about and benchmark yourself. There are no benchmarks that would be generally representative, there is no meaningful "average case".

Personally, I mostly use C/C++ for fairly high performance numerical code and happen to benefit greatly from all the "unsafe" stuff, including optimizations only enabled by compilers taking advantage of undefined behavior. I'm therefore naturally inclined to strongly oppose any attempts to eliminate undefined behavior from the language standards. At the same time, however, I fully recognize that most people would probably benefit from a safer set of compiler defaults (as well as actually reading the compiler manuals once in a while) or even using languages with more restrictive semantics. Ultimately, there is no free lunch and performance doesn't come without its costs.

jcelerier · on June 27, 2018

> C has perfecly well defined semantics for (eg) null pointer dereference: read or write memory location zero, and consequently probably crash.

... no, C explicitely has no well defined semantics since it's undefined behaviour. You may believe that C do due to habit but that's not the case.

a1369209993 · on June 27, 2018

No, C explicitly does not define semantics; it still has them, because the underlying hardware and ABI provides them, whether the C standard likes it or not. That's the point: if the compiler isn't actively sabotaging you, you end up with the semantics of the macine code you're compiling to, and can benefit from obvious sanity checks like not mapping anything at address zero.

saagarjha · on June 27, 2018

There is a difference between implementation defined and undefined. Dereferencing a null pointer is undefined and cannot be performed in a well-formed program. Implementation specified behavior (e.g. sizeof(void *)) is dependent on the hardware and ABI.

pjmlp · on June 27, 2018

This is, those semantics you state as defined don't translate into portable C code, not even between compiler versions on the same platform.

jcelerier · on June 28, 2018

> it still has them, because the underlying hardware and ABI provides them,

when you program in C you program against the C abstract machine, not against a particular hardware.

saagarjha · on June 27, 2018

If you are dereferencing a null pointer, you're doing something undefined. Your contract with the compiler was that it would produce valid code as long as you stayed within the confines of C. Thus you cannot blame the compiler for automatically assuming that you're not doing anything wrong, because that's explicitly the instructions it was given.

keldaris · on June 27, 2018

You can view it as malicious sabotage, or you can view it as a potentially useful optimization (with obvious security caveats when used incorrectly). Either way, my point is that every major compiler lets you make the tradeoffs that are right for your particular usecase. That's a good thing.

hardlianotion · on June 27, 2018

I think the answer is probably to try to conform to the standard, instead.

chowells · on June 27, 2018

I have a small program which uses an undefined behavior. How would you rewrite this so as to not do so, but maintain all existing defined behaviors?

    #include <stdio.h>

    int main() {
        int a, b, c;
        printf("Please enter two numbers: ");
        scanf("%d %d", &a, &b);
        c = a + b;
        printf("The sum of the numbers you entered is %d\n", c);
        return 0;
    }

This is the problem. There is no non-trivial C code that doesn't use some undefined behavior somewhere. And it works just fine, right now. But who says it will on the next version of the compiler?

_kst_ · on June 27, 2018

Nobody has mentioned that scanf has undefined behavior on numeric overflow. If INT_MAX is 2147483647 and the user enters 2147483648, the behavior is undefined. (That doesn't just mean you can get an arbitrary value for a and/or b. It means that the standard says nothing at all about how the program behaves).

Reference: N1570 (C11) 7.21.6.2 paragraph 10: "If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined."

strtol() avoids this problem.

jcelerier · on June 27, 2018

> I have a small program which uses an undefined behavior. How would you rewrite this so as to not do so, but maintain all existing defined behaviors?

in this precise case, (in C++ because I can't be arsed to search the modifier for int64), fairly easily :

    #include <fmt.h>
    #include <cstdio>

    int main() {
        int a{}, b{};
        int64_t c{};
        fmt::print("Please enter two numbers: ");
        scanf("%d %d", &a, &b);
        c = a + b;
        fmt::printf("The sum of the numbers you entered is {}\n", c);
        return 0;
    }

in the case where a, b, were also of the largest int type available, you could do :

    #include <stdio.h>

    int main() {
        int a, b, c;
        printf("Please enter two numbers: ");
        scanf("%d %d", &a, &b);
        if(__builtin_add_overflow(a, b, &c)) 
        { 
          printf("The sum of the numbers you entered is %d\n", c); 
        }
        else 
        {
          printf("You are in a twisty maze of compiler features, all different"); 
          exec("/usr/bin/0ad");
        }
        return 0;
    }

but you could also use the more portable macros provided in emacs (100 % independent of any other code) : https://github.com/jwiegley/emacs-release/blob/adfd5933358fd...

mannykannot · on June 27, 2018

With regard to your first version: I had this vague feeling that the usual arithmetic conversions state that if the operands are of the same type, then no conversion is performed, and apparently this is so. If so, then would not the expression a + b potentially overflow, regardless of what it is being assigned to?

FWIW, I tried this example:

  #include <stdio.h>
  #include <limits.h>
  #include <stdint.h>

  int main () {
    int64_t  x;
    int a = INT_MAX, b = 1;
    x = a + b;
    printf( "%lu %lu %d %lld\n", sizeof(x), sizeof(a), a, x);
    return 0;
  }

After compiling gcc -std=c11 -fsanitize=undefined -Wall -Wpedantic -Wextra (getting no warnings), on running I get

  c-undef.c:8:9: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
  8 4 2147483647 -2147483648

Also with -O0 and -O3.

At least according to the answer in [1], signed (but not unsigned) integer overflow is undefined behavior (unless C and C++ have diverged on this matter, but I get the same result when the same source code is compiled as C++17.)

This may seem to be pedantic (if it is not just wrong), but the point is that you have to be unremittingly pedantic to avoid undefined behavior.

[1]https://stackoverflow.com/questions/16188263/is-signed-integ...

jcelerier · on June 27, 2018

> I had this vague feeling that the usual arithmetic conversions state that if the operands are of the same type, then no conversion is performed

that's 100% correct, nice catch.

SloopJon · on June 27, 2018

John Regehr wrote a post about checking for signed overflow a few years ago:

https://blog.regehr.org/archives/1139

HN discussion:

https://news.ycombinator.com/item?id=7665254

Your first suggestion is similar to his checked_add_1(). The Emacs macros are similar to his checked_add_4(). Presumably builtins are optimal, performance wise, for a given compiler.

chowells · on June 28, 2018

I like the idea of your first try, but it's not guaranteed to work on all implementations, even with the type promotion fixed. The problem is that the C standard doesn't specify a maximum size for int. It could be a 128 bit type.

"Portable C" doesn't exist.

jcelerier · on June 28, 2018

> I like the idea of your first try, but it's not guaranteed to work on all implementations,

in C++ you'd just add

    static_assert(sizeof(int) < sizeof(int64_t), "fix your types");

to be safe about this

dpc_pw · on June 27, 2018

What's the UB here?

jcelerier · on June 27, 2018

consider what happens if the input numbers are INT_MAX and 1

saagarjha · on June 27, 2018

That doesn't make the program incorrect, it just means that you can't provide that particular input to the program.

luk32 · on June 28, 2018

The problem is that UB is invoked on runtime for particular inputs... And it silently makes program unreasonable after this point.

That is not good. It's hard to reason about such a program, in other words can you trust results of such a program? How do you know whether your input caused Ub at some point or not?

The burden of sanitizing the input is on the user (either programmer or the data provider).

Checking this is usually a performance tradeoff, so it was decided not to be done by default.

saagarjha · on June 28, 2018

That's a documentation/sanitization problem, rather than a one involving the question of whether this program is well-formed. My response was that it is, given the input obeys the rules you've stated. How you enforce that is a different concern (or whether you dislike the preconditions and think this code could be more robust and have a wider domain, because it certainly could).

jcelerier · on June 27, 2018

a program which has undefined behaviour on some inputs is an incorrect program by definition

saagarjha · on June 28, 2018

I'll have to disagree with you there: you cannot fully control the range of inputs that you are provided at all times, or perform verification of input. But you can clearly document what conditions the input must satisfy for the program to be well-defined.

junke · on June 27, 2018

Yes, but the "can't" really is a "shouldn't".

saagarjha · on June 28, 2018

Point taken; unfortunately I can't edit anymore…

saagarjha · on June 27, 2018

The compiler will not break this example, because it cannot put restrictions on the values of a and b. The only thing it can do is assume that a + b < INT_MAX, in which case the program will work as expected.

luk32 · on June 28, 2018

No. It can check whether the operands will cause overflow and issue an error instead of silently giving a bogus result.

It's a performance hit though.

I don't think you'd want a banking app written with assumption that you'll never have more than MAXINT amount of dollars, and adding few more will roll you back to 0 or even put you in debt... silently... because well it works this way, and designer didint think you'll hit the limit. If it ever happened you'd like a siren to go off.

saagarjha · on June 28, 2018

> It can check whether the operands will cause overflow and issue an error instead of silently giving a bogus result.

Yes, and it could also decide to wipe your hard drive because of the latitude given to it by the C standard. Many compilers have an option to enable some sort of special behavior when a signed integer overflows, but such extensions are non-standard.

AstralStorm · on June 27, 2018

Unless the undefined behaviour is hidden in printf or scanf, this code exhibits none. Variadic functions are reasonably well defined and no type aliasing takes place.

(And you're missing a few error checks.)

saagarjha · on June 27, 2018

Can the people downvoting this comment provide a reason why? It seems perfectly reasonable to me.

jcelerier · on June 28, 2018

Integer overflow is undefined behaviour so an input for instance of 2147483647 and 1 makes the program invalid.

saagarjha · on June 28, 2018

Sure, but the program is well-defined on inputs that do not cause it to overflow. It does not perform undefined behavior in those cases, so it's correct given those preconditions.

monocasa · on June 27, 2018

You can also take the same route as sel4. They have a formal, machine readable specification of what their C code is supposed to do, and then verify that the output of the C compiler matches that specification. It's a nice 'have your cake and eat it too' situation where you can have formally verified, high assurance binaries that were compiled with gcc -O2, but with gcc removed from the trusted computing base.

drb91 · on June 27, 2018

sel4 is also absolutely tiny on purpose. Imagine the man hours needed to verify Firefox.

monocasa · on June 27, 2018

sel4 is also the work of a very small handful of people, with a lot of the verification code hand built. A huge chunk of the work needn't be redone, just like you wouldn't need to rewrite a c compiler for every c project.

So yes, they kept the verified code small, so that they could focus on all of the tooling. I'm unconvinced that formal methods intrinsically can't scale.

bluGill · on June 27, 2018

Unfortunately I'm also not sure if formal methods can scale. In large projects the area is almost completely untouched. From what I can tell those who are using it in large projects aren't talking. (safety critical and military applications both tend to be places where you don't talk about what you have done). I'm looking into proposing to management that it is a solution to our quality problems, but I don't want to look like a head in the clouds guy with no business sense. (We already have a good automated test culture, but they can only prove everything we thought of works, we have lots of issues with situations we didn't think about)

monocasa · on June 27, 2018

I'd say that the generic tooling isn't quite there yet for most business needs. The sel4 guys have a lot of inrastructure that'd need to be decoupled a little from their use case to reuse.

Also, I'd add that when you say

> We already have a good automated test culture, but they can only prove everything we thought of works, we have lots of issues with situations we didn't think about

in a lot of ways formal verification only helps you with situations you know about, specifically problem classes that you know about. For instance sel4 was susceptible to Meltdown.

bluGill · on June 27, 2018

interesting and useful point of view. One of our most common problems right now is importing bad data. I'm hoping a formal method could "somehow" force us to find all the places where we act on data and ensure we validated it first. This is a known class of problem, but it is easy to overlook that some file might be mostly valid but have one field that is incorrect.

AstralStorm · on June 27, 2018

The corollary to "not sure" is to try a few times.

Almost nobody does because formal code writing forces a constraint / logic / contract thinking which is alien to many alleged programmers.

It could be taught. Afterwards, compare performance and quality of the result. Could be a big competitive advantage...

mannykannot · on June 27, 2018

> Could be a big competitive advantage...

The fact that after four decades it has not proven to be so is evidence that either it is very difficult to put into use or not that effective.

SlowRobotAhead · on June 27, 2018

Yes, I too believe C is "only for speed"... I suggest we both get rid of any device that uses C code and move the most remote places on earth.

"C is for speed only"... Do you not realize there is an entire embedded world that runs on C?

pjmlp · on June 28, 2018

There are alternatives to C on the embedded world, it is a matter of actually be open minded.

SlowRobotAhead · on June 28, 2018

LOL. Please, tell me what the alternatives are. I’ve only been doing commercial embedded product design for 15 years. Tell me all about how one time you saw a got for “embedded rust” or something.

pjmlp · on June 28, 2018

Apparently you need to broaden your horizons.

https://www.ptc.com/en/products/developer-tools/objectada

https://www.ghs.com/products/AdaMULTI_IDE.html

https://www.mikroe.com

http://www.astrobe.com/default.htm

http://www.microej.com

https://www.aicas.com/cms/

Just a couple of examples from many more, after all I don't want to overflow you with information.

SlowRobotAhead · on June 28, 2018

Yea, I can tell you don't work in embedded and have never made anything for production.

pjmlp · on June 28, 2018

There are many companies delivering embedded products successfully with those technologies.

I call that production deployment.

As for myself, feel free to believe whatever makes you feel better.