Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I guess, if you want to know what your code will be doing when executed, don't use C (and don't even dream of C++).

Those are reserved for people who value speed over correctness.



False. Like for any language, C and C++ programmers (should) put very high value on correctness. Incorrect code can be written in safe languages too, but clearly C/C++ make it much easier to kill your program spectacularly. Discipline, knowledge, and tools are required to work in C/C++.


Not just kill your program spectacularly - in memory unsafe languages, simple logic errors turn in to remote code execution.


The usual statement.

First time I heard it was about 1993.


That doesn't mean it's not true though? People habe been claiming the earth is round since 500 BC, yet it's still as true as ever.


In this case the CVE database records history proves otherwise.


No, just have some basic knowledge about how your compiler works and make decisions accordingly. Every major compiler lets the programmer specify exactly the tradeoffs you want. C/C++ simply don't hold your hand by default and let you get the best (reasonably possible) performance if you so desire. That's exactly the way it should be.


No, just use a compiler that doesn't actively attempt to malicously sabotage you. C has perfecly well defined semantics for (eg) null pointer dereference: read or write memory location zero, and consequently probably crash. Not "silently rewrite the pointer to nonzero-but-still-pointing-at-address-zero-somehow and proceed to bypass subsequent sanity checks for no damn reason".


Perfect example of the mismatch between (some) programmers' expectations and what language standards and/compilers require/implement.

> C has perfecly well defined semantics for (eg) null pointer dereference: read or write memory location zero, and consequently probably crash.

C does have "perfectly well-defined semantics" for null pointer dereference, but it's undefined behavior. Sure, the null pointer happens to be 0 on the vast majority of architectures most programmers work with, but apparently non-zero null pointers are still used these days (at least in the admgcn LLVM target, from what I understand after a quick glance at the linked page), so it's not even a "all sane modern machines" thing [0].

In any case, I'd love to see some good benchmarks showing the effect of the more "controversial" optimizations. Compiler writers say they're important, people who don't like these optimizations tend to say that they don't make a significant difference. I don't really feel comfortable taking a side until I at least see what these optimizations enable (or fail to enable). I lean towards agreeing with the compiler writers, but perhaps that's because I haven't been bitten by one of these bugs yet...

    [0]: https://reviews.llvm.org/D26196


The compiler writers are right in the sense that for every one of those optimizations you can have a microbenchmark that will benefit hugely. The opponents are right too, in the sense that for most codebases the more niche optimizations don't matter at all. The right answer, as always, is to take a particular case you actually care about and benchmark yourself. There are no benchmarks that would be generally representative, there is no meaningful "average case".

Personally, I mostly use C/C++ for fairly high performance numerical code and happen to benefit greatly from all the "unsafe" stuff, including optimizations only enabled by compilers taking advantage of undefined behavior. I'm therefore naturally inclined to strongly oppose any attempts to eliminate undefined behavior from the language standards. At the same time, however, I fully recognize that most people would probably benefit from a safer set of compiler defaults (as well as actually reading the compiler manuals once in a while) or even using languages with more restrictive semantics. Ultimately, there is no free lunch and performance doesn't come without its costs.


> C has perfecly well defined semantics for (eg) null pointer dereference: read or write memory location zero, and consequently probably crash.

... no, C explicitely has no well defined semantics since it's undefined behaviour. You may believe that C do due to habit but that's not the case.


No, C explicitly does not define semantics; it still has them, because the underlying hardware and ABI provides them, whether the C standard likes it or not. That's the point: if the compiler isn't actively sabotaging you, you end up with the semantics of the macine code you're compiling to, and can benefit from obvious sanity checks like not mapping anything at address zero.


There is a difference between implementation defined and undefined. Dereferencing a null pointer is undefined and cannot be performed in a well-formed program. Implementation specified behavior (e.g. sizeof(void *)) is dependent on the hardware and ABI.


This is, those semantics you state as defined don't translate into portable C code, not even between compiler versions on the same platform.


> it still has them, because the underlying hardware and ABI provides them,

when you program in C you program against the C abstract machine, not against a particular hardware.


If you are dereferencing a null pointer, you're doing something undefined. Your contract with the compiler was that it would produce valid code as long as you stayed within the confines of C. Thus you cannot blame the compiler for automatically assuming that you're not doing anything wrong, because that's explicitly the instructions it was given.


You can view it as malicious sabotage, or you can view it as a potentially useful optimization (with obvious security caveats when used incorrectly). Either way, my point is that every major compiler lets you make the tradeoffs that are right for your particular usecase. That's a good thing.


I think the answer is probably to try to conform to the standard, instead.


I have a small program which uses an undefined behavior. How would you rewrite this so as to not do so, but maintain all existing defined behaviors?

    #include <stdio.h>

    int main() {
        int a, b, c;
        printf("Please enter two numbers: ");
        scanf("%d %d", &a, &b);
        c = a + b;
        printf("The sum of the numbers you entered is %d\n", c);
        return 0;
    }
This is the problem. There is no non-trivial C code that doesn't use some undefined behavior somewhere. And it works just fine, right now. But who says it will on the next version of the compiler?


Nobody has mentioned that scanf has undefined behavior on numeric overflow. If INT_MAX is 2147483647 and the user enters 2147483648, the behavior is undefined. (That doesn't just mean you can get an arbitrary value for a and/or b. It means that the standard says nothing at all about how the program behaves).

Reference: N1570 (C11) 7.21.6.2 paragraph 10: "If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined."

strtol() avoids this problem.


> I have a small program which uses an undefined behavior. How would you rewrite this so as to not do so, but maintain all existing defined behaviors?

in this precise case, (in C++ because I can't be arsed to search the modifier for int64), fairly easily :

    #include <fmt.h>
    #include <cstdio>

    int main() {
        int a{}, b{};
        int64_t c{};
        fmt::print("Please enter two numbers: ");
        scanf("%d %d", &a, &b);
        c = a + b;
        fmt::printf("The sum of the numbers you entered is {}\n", c);
        return 0;
    }
in the case where a, b, were also of the largest int type available, you could do :

    #include <stdio.h>

    int main() {
        int a, b, c;
        printf("Please enter two numbers: ");
        scanf("%d %d", &a, &b);
        if(__builtin_add_overflow(a, b, &c)) 
        { 
          printf("The sum of the numbers you entered is %d\n", c); 
        }
        else 
        {
          printf("You are in a twisty maze of compiler features, all different"); 
          exec("/usr/bin/0ad");
        }
        return 0;
    }
but you could also use the more portable macros provided in emacs (100 % independent of any other code) : https://github.com/jwiegley/emacs-release/blob/adfd5933358fd...


With regard to your first version: I had this vague feeling that the usual arithmetic conversions state that if the operands are of the same type, then no conversion is performed, and apparently this is so. If so, then would not the expression a + b potentially overflow, regardless of what it is being assigned to?

FWIW, I tried this example:

  #include <stdio.h>
  #include <limits.h>
  #include <stdint.h>

  int main () {
    int64_t  x;
    int a = INT_MAX, b = 1;
    x = a + b;
    printf( "%lu %lu %d %lld\n", sizeof(x), sizeof(a), a, x);
    return 0;
  }
After compiling gcc -std=c11 -fsanitize=undefined -Wall -Wpedantic -Wextra (getting no warnings), on running I get

  c-undef.c:8:9: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
  8 4 2147483647 -2147483648
Also with -O0 and -O3.

At least according to the answer in [1], signed (but not unsigned) integer overflow is undefined behavior (unless C and C++ have diverged on this matter, but I get the same result when the same source code is compiled as C++17.)

This may seem to be pedantic (if it is not just wrong), but the point is that you have to be unremittingly pedantic to avoid undefined behavior.

[1]https://stackoverflow.com/questions/16188263/is-signed-integ...


> I had this vague feeling that the usual arithmetic conversions state that if the operands are of the same type, then no conversion is performed

that's 100% correct, nice catch.


John Regehr wrote a post about checking for signed overflow a few years ago:

https://blog.regehr.org/archives/1139

HN discussion:

https://news.ycombinator.com/item?id=7665254

Your first suggestion is similar to his checked_add_1(). The Emacs macros are similar to his checked_add_4(). Presumably builtins are optimal, performance wise, for a given compiler.


I like the idea of your first try, but it's not guaranteed to work on all implementations, even with the type promotion fixed. The problem is that the C standard doesn't specify a maximum size for int. It could be a 128 bit type.

"Portable C" doesn't exist.


> I like the idea of your first try, but it's not guaranteed to work on all implementations,

in C++ you'd just add

    static_assert(sizeof(int) < sizeof(int64_t), "fix your types");
to be safe about this


What's the UB here?


consider what happens if the input numbers are INT_MAX and 1


That doesn't make the program incorrect, it just means that you can't provide that particular input to the program.


The problem is that UB is invoked on runtime for particular inputs... And it silently makes program unreasonable after this point.

That is not good. It's hard to reason about such a program, in other words can you trust results of such a program? How do you know whether your input caused Ub at some point or not?

The burden of sanitizing the input is on the user (either programmer or the data provider).

Checking this is usually a performance tradeoff, so it was decided not to be done by default.


That's a documentation/sanitization problem, rather than a one involving the question of whether this program is well-formed. My response was that it is, given the input obeys the rules you've stated. How you enforce that is a different concern (or whether you dislike the preconditions and think this code could be more robust and have a wider domain, because it certainly could).


a program which has undefined behaviour on some inputs is an incorrect program by definition


I'll have to disagree with you there: you cannot fully control the range of inputs that you are provided at all times, or perform verification of input. But you can clearly document what conditions the input must satisfy for the program to be well-defined.


Yes, but the "can't" really is a "shouldn't".


Point taken; unfortunately I can't edit anymore…


The compiler will not break this example, because it cannot put restrictions on the values of a and b. The only thing it can do is assume that a + b < INT_MAX, in which case the program will work as expected.


No. It can check whether the operands will cause overflow and issue an error instead of silently giving a bogus result.

It's a performance hit though.

I don't think you'd want a banking app written with assumption that you'll never have more than MAXINT amount of dollars, and adding few more will roll you back to 0 or even put you in debt... silently... because well it works this way, and designer didint think you'll hit the limit. If it ever happened you'd like a siren to go off.


> It can check whether the operands will cause overflow and issue an error instead of silently giving a bogus result.

Yes, and it could also decide to wipe your hard drive because of the latitude given to it by the C standard. Many compilers have an option to enable some sort of special behavior when a signed integer overflows, but such extensions are non-standard.


Unless the undefined behaviour is hidden in printf or scanf, this code exhibits none. Variadic functions are reasonably well defined and no type aliasing takes place.

(And you're missing a few error checks.)


Can the people downvoting this comment provide a reason why? It seems perfectly reasonable to me.


Integer overflow is undefined behaviour so an input for instance of 2147483647 and 1 makes the program invalid.


Sure, but the program is well-defined on inputs that do not cause it to overflow. It does not perform undefined behavior in those cases, so it's correct given those preconditions.


You can also take the same route as sel4. They have a formal, machine readable specification of what their C code is supposed to do, and then verify that the output of the C compiler matches that specification. It's a nice 'have your cake and eat it too' situation where you can have formally verified, high assurance binaries that were compiled with gcc -O2, but with gcc removed from the trusted computing base.


sel4 is also absolutely tiny on purpose. Imagine the man hours needed to verify Firefox.


sel4 is also the work of a very small handful of people, with a lot of the verification code hand built. A huge chunk of the work needn't be redone, just like you wouldn't need to rewrite a c compiler for every c project.

So yes, they kept the verified code small, so that they could focus on all of the tooling. I'm unconvinced that formal methods intrinsically can't scale.


Unfortunately I'm also not sure if formal methods can scale. In large projects the area is almost completely untouched. From what I can tell those who are using it in large projects aren't talking. (safety critical and military applications both tend to be places where you don't talk about what you have done). I'm looking into proposing to management that it is a solution to our quality problems, but I don't want to look like a head in the clouds guy with no business sense. (We already have a good automated test culture, but they can only prove everything we thought of works, we have lots of issues with situations we didn't think about)


I'd say that the generic tooling isn't quite there yet for most business needs. The sel4 guys have a lot of inrastructure that'd need to be decoupled a little from their use case to reuse.

Also, I'd add that when you say

> We already have a good automated test culture, but they can only prove everything we thought of works, we have lots of issues with situations we didn't think about

in a lot of ways formal verification only helps you with situations you know about, specifically problem classes that you know about. For instance sel4 was susceptible to Meltdown.


interesting and useful point of view. One of our most common problems right now is importing bad data. I'm hoping a formal method could "somehow" force us to find all the places where we act on data and ensure we validated it first. This is a known class of problem, but it is easy to overlook that some file might be mostly valid but have one field that is incorrect.


The corollary to "not sure" is to try a few times.

Almost nobody does because formal code writing forces a constraint / logic / contract thinking which is alien to many alleged programmers.

It could be taught. Afterwards, compare performance and quality of the result. Could be a big competitive advantage...


> Could be a big competitive advantage...

The fact that after four decades it has not proven to be so is evidence that either it is very difficult to put into use or not that effective.


Yes, I too believe C is "only for speed"... I suggest we both get rid of any device that uses C code and move the most remote places on earth.

"C is for speed only"... Do you not realize there is an entire embedded world that runs on C?


There are alternatives to C on the embedded world, it is a matter of actually be open minded.


LOL. Please, tell me what the alternatives are. I’ve only been doing commercial embedded product design for 15 years. Tell me all about how one time you saw a got for “embedded rust” or something.


Apparently you need to broaden your horizons.

https://www.ptc.com/en/products/developer-tools/objectada

https://www.ghs.com/products/AdaMULTI_IDE.html

https://www.mikroe.com

http://www.astrobe.com/default.htm

http://www.microej.com

https://www.aicas.com/cms/

Just a couple of examples from many more, after all I don't want to overflow you with information.


Yea, I can tell you don't work in embedded and have never made anything for production.


There are many companies delivering embedded products successfully with those technologies.

I call that production deployment.

As for myself, feel free to believe whatever makes you feel better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: