Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While the author has WAY more knowledge/experience than me on this and so I wonder how he would solve the following issues:

Evaluating Constant Expressions

- This seems really complicated...if you're working within a translation unit, thats much simplified, but then you're much more limited in what you can do without repeating a lot of code. I wonder how the author solves this.

Compile Time Unit Tests

- This is already somewhat possible if you can express your test as a macro, which if you add in the first point, then this becomes trivial.

Forward Referencing of Declarations

- I think there may be a lot of backlash to this one. The main argument against this is that it then changes the compiler from a one-pass to two pass compiler which has its own performance implications. Given the number of people who are trying to compile massive codebases and go as far as parallelizing compilation of translation units, this may be a tough pill for them to swallow. (evaluating constant expressions probably comes with a similar/worse performance hit caveat depending on how its done)

Importing Declarations

- This is a breaking change...one of the ways I have kind of implemented templating in C is by defining a variable and importing a c file, changing the variable, and then reimporting the same c file. Another thing I've done is define a bunch of things and then import the SQLite C Amalgamation and then add another function (I do this to expose a SQLite internal which isnt exposed via its headers). All of these use cases would break with this change.

Are there any thoughts about these issues? Any ways to solve them perhaps?



> if you're working within a translation unit, thats much simplified, but then you're much more limited in what you can do without repeating a lot of code. I wonder how the author solves this.

You are correct in that the source code to the function being evaluated must be available to the compiler. This can be done with #include. I do it in D with importing the modules with the needed code.

> This is already somewhat possible if you can express your test as a macro, which if you add in the first point, then this becomes trivial.

Expressing the test as a macro doesn't work when you want to test the function. The example I gave was trivial to make it easy to understand. Actual use can be far more complex.

> Performance

D is faster at compiling than C compilers, mainly because:

1. the C preprocessor is a hopeless pig with its required multiple passes. I know, I implemented it from scratch multiple times. The C preprocessor was an excellent design choice when it was invented. Today it is a fossil. I'm still in awe of why C++ has never gotten around to deprecating it.

2. D uses import rather than #include. This is just way, way faster, as the .h files don't need to be compiled over and over and over and over and over ...

D's strategy is to separate the parse from the semantic analysis. I suppose it is a hair slower, but it also doesn't have to recompile the duplicate declarations and fold them into one.

Compile time function execution can be a bottleneck, sure, but that (of course) depends on how heavily it is used. I tend to use it with a light touch and the performance is fine. If you implement a compiler using it (as people have done!) it can be slow.

> one of the ways I have kind of implemented templating in C is by defining a variable and importing a c file, changing the variable, and then reimporting the same c file. Another thing I've done is define a bunch of things and then import the SQLite C Amalgamation and then add another function (I do this to expose a SQLite internal which isnt exposed via its headers). All of these use cases would break with this change.

I am not suggesting removing #include for C. The import thing would be additive.

> Are there any thoughts about these issues?

If you're using hacks to do templating in C, you've outgrown the language and need a more powerful one. D has top shelf metaprogramming - and as usual, other template languages are following in D's path.


Thanks for taking the time to respond! I have a few followup questions if thats ok:

> You are correct in that the source code to the function being evaluated must be available to the compiler. This can be done with #include. I do it in D with importing the modules with the needed code.

> D's strategy is to separate the parse from the semantic analysis. I suppose it is a hair slower, but it also doesn't have to recompile the duplicate declarations and fold them into one.

I dont quite follow all the implications that these statements have. Does the compiler have a different way of handling a translation unit?

- Is a translation unit the same as in C, but since you're #including the file you would expect multiple compilations of a re-included C file? woudnt this bloat the resulting executable (/ bundle in case of a library)

- Are multiple translation units compiled at a time? Wouldnt this mean that the entire translation dependency graph would need to be simultaneously recompiled? Wouldnt this inhibit parallelization? How would it handle recompilation? What happens if a dependency is already compiled? Would it recompile it?

> Performance

I think a lot of this is tied to my question about compilation/translation units above, but from my past experience we have "header hygene" which forces us to use headers in a specific way, which if we do, we actually get really good preprocessor performance (a simple example being: dont use #include in a header), how would you compare performance in these kinds of situations vs a compiler without (i.e. either recompiled a full source file or looking up definitions from a compiled source)?

> If you're using hacks to do templating in C, you've outgrown the language and need a more powerful one. D has top shelf metaprogramming - and as usual, other template languages are following in D's path.

yes, as also demonstrated in the performance question, we do a lot to work within the confines of what we have when other tools would handle a lot more of the lifting for us and this is a fair criticism, but on the flip side, I dont have the power to make large decisions on an existing codebase like "lets switch languages" (even if for a source file or two...I've tried) as much as I wish I could, so I have to work with what I have.


> I dont have the power to make large decisions on an existing codebase like "lets switch languages"

We struggled with that for a long time with D. And finally found a solution. D can compile Standard C source files and make all the C declarations available to the D code. When I proposed it, there was a lot of skepticism that this could ever work. But when it was implemented and debugged, it's been a huge win for D.

> Performance

With D you can put all your source files on one command line invocation. That means that imports are only read once, no matter how many times it is imported. This works so well D users have generally abandoned the C approach of compiling each file individually and then linking them together. A vast amount of time is lost in C/C++ compilation with simply reading the .h files thousands of times.

Modules/imports are a gigantic productivity booster. They're not hard to implement, either. Except for the way C++ did it.

> re multiple translation units compiled at a time? Wouldnt this mean that the entire translation dependency graph would need to be simultaneously recompiled? Wouldnt this inhibit parallelization? How would it handle recompilation? What happens if a dependency is already compiled? Would it recompile it?

Yes, yes, yes, yes. And yet, it still compiles faster! See what I wrote above about not needing to read the .h files thousands of times. Oh, and building one large object file is faster than building a hundred and having to link them together.


I know that in other languages, one obstacle for "just compile the C files" is that the target language might not have pointers and thus have difficulty representing things such as return-by-pointer.

I suppose in D this was less of an issue because D has pointers?


I'm not sure what you mean.


A foreign function interface that's based on parsing C files must translate C types and interfaces into types and interfaces of the target language. I suppose it helped that D's type system has many similarities with C, including support for pointers.

(The issue with return-by-pointer is that in C it's common to use the return value for an error code and use pointer arguments to pass data back to the caller. These are awkward to map to a target language that doesn't have pointers)


> Is a translation unit the same as in C, but since you're #including the file you would expect multiple compilations of a re-included C file? woudnt this bloat the resulting executable (/ bundle in case of a library)

I think the idea is that compiling a translation unit produces two outputs, the object code (as it currently does), and an intermediate representation of the exported declarations, that could be basically a generated .h file, but it would probably be more efficient to use a different format. Then dependent translation units use those declaration files.

With this, you can still compile in parallel. You are constrained by the order of dependencies, but that is already kind of the case.

One complication is that ideally, if the signature doesn't change, but the implementation does, you don't need to re-compile dependent translation units. This is trivial if your build system detects changes based on content (like, say, bazel), but if it uses timestamps (like make) then the compiler needs to ensure the timestamp isn't updated when the declarations don't change.

But this really isn't a new concept. Basically every modern compiled language works fine without needing separate header files.


> This is trivial if your build system detects changes based on content (like, say, bazel), but if it uses timestamps (like make) then the compiler needs to ensure the timestamp isn't updated when the declarations don't change.

This is where the traditional distinction of "compiler vs Make" makes things harder; you want dependencies tracked at the "declaration" level, rather than the file level. If the timestamp _and_ content of the exported declarations file change, but none of the _used_ declarations changed, then there's no more compilation to be done. At best with file level tracking your build system will invoke the compiler for every downstream dependency, and they can decide if there's any more work to be done.

The build system would need to know which declarations are used (and what a declaration is) to do better.


The D compiler has an option to generate a "header file" from D modules. It's called a .di file. It's useful if you want to hide the implementation from a compiler, as you would with libraries.

As it turned out, though, people just found it too convenient to just import the .d file.

But as a very unexpected dividend, it was discovered that the D compiler would generate .di files from compiling .c files, and realized that D had an inherent ability to translate C code to D code!!!! This has become rather popular.


Nice explanation. Modules are the way forward. Looks to always have been. Not understanding the resistance, when the advantages are clear.


I do understand the resistance. C is a simple, comfortable language, and its adherents want it to stay that way, warts and all.

But in the context of that, what baffles me is the additions to the C Standard, such as useless (but complicated!) things like normalized Unicode identifiers, things with very marginal utility like generic functions, etc. Why those and not forward declarations?


Can't you use precompiled headers?


Interesting you brought that up. I implemented them for Symantec C and C++ back in the 90s.

I never want to do that again!

They are brittle and a maintenance nightmare. They did speed up compilations, though, but did not provide any semantic advantage.

With D I focused on fast compilation so much that precompiled headers didn't offer enough speedup to make them worth the agony.


>They are brittle and a maintenance nightmare

I happened to be reading DMC source this week, those hydrate/dehydrate stuff really is everywhere (which I assume is solely used for precompiled headers?)


Yup. I spent a crazy amount of time debugging that. The tiniest mistake was a big problem to find.


I had an intern try to use precompiled headers for the Linux kernel. The road block they found was that the command line parameters used to compile the header must exactly match for all translation units which it is used. This is no the case for the Linux kernel. We could compile the header multiple times, but the build complexity was not something we could overcome during the course of one internship.


> must exactly match

Yup. My compiler kept a list of which switches would perturb compilation and so would invalidate the precompiled header, and which did not.

Precompiled headers are an awful, desperate feature. Good riddance.


I personally don’t like forward referencing because it makes code harder to read. You can no longer rely on the dependency graph being in topological order.


As the article writes, that forces the private leaf functions to be at the top, with the public interface at the end of the file. The normal way is the public interface at the top, and the implementation "below the fold", so to speak.

> topological order

You are correct. But its the reverse topological order, which is not the most readable ordering. One doesn't read a newspaper article starting at the bottom.


Maybe it’s because I’m primarily a mathematician, but I like building complex stuff up from primitives and having the most important results at the end.


That's not how I do things in math. I always need motivation first. So I start with the theorem, look at a couple of examples to see why this theorem is interesting, and then the various lemmas leading into the proof. So that means I really like declaring but not defining the public interface first, and then define the private helper functions, and finally definitions for the public interface.


Vive la différence - and you'll still be able to do it your way!


Perhaps the difference is having algorithm in your head and just putting it into code, versus only knowing the top level work to be done and implementing the needed operations later.

If I am writing some kind of service, I would write the main public functions first, using undefined functions in their bodies as needed. Then I would implement those functions below.


I think you mean it in the context of proofs, right? Proofs are indeed often best written in a topological order: a series of true statements, where every reference refer backwards.

You don't often see

    Answer = A + B, 
    where 
    
    A = ...
    ...
    B = ...
albeit you sometimes see it, and it is totally valid. For proofreading something, it makes a big difference: if things are in a topological order, you can simulate a constant memory finite machine. If they are not in a topological order, well, probably you better just rewrite it (or at least I do).

For most other things, I usually prefer the bird-view first, when I am doing or reading some elses math.

Funnily the language Haskell which operates on definitions, is very order independent, it even allows circular definitions. I like it for leetcode and such.


People learn the ordering. If that is their biggest hurdle learning C they have a blessed life


Every other language does seems to not require header file/forward declarations. I don't understand the backlash against that.

Are modern C compilers actually still single pass?


> Are modern C compilers actually still single pass?

All except ImportC, which effortlessly handles forward references. (Mainly because ImportC hijacks the D front end to do the semantics.)


A bit of an aside but I was poking around in the SPIR-V spec yesterday and they can do forward references because the call site contains all the information to determine the function parameter types. Just thought it was interesting and not really something I had thought about before.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: