It’s kind of annoying how static compilers insist on figuring out everything from scratch. Even in an LTO build, the amount of code that changes between two consecutive commits is typically minimal - rather than the very coarse ccache-style, it would be nice if compilers had nicer ways to cache state across execution so that unchanged code constructs could just emit the same assembly as last time without needing to do anything else.
The very first objection that immediately have sprung up to my mind it that this approach gives up on (chances of) inlining a lot of stuff (or de-inlining, for that matter). The second thought is that it also heavily interferes with any code motion transformations, including instruction scheduling, loop unrolling and auto-vectorization. Granted, this second objection can probably be dealt with by using some clever fine-guided techniques... but at this point I'd wager they'd be both a) about as slow as doing all the work it from the clean slate; b) have incredibly large surface for obscure bugs — and those steps are already one of the most finicky ones.
And back in the 70s, when the microcomputers and interactive computing were becoming the norm, and people really cared about the compilation times — still nobody bothered to implement that kind caching, AFAIK, even when the compiler performed way less complex kinds optimizations than they do today (heck, even register allocation was not done as graph colouring back then).
> The very first objection that immediately have sprung up to my mind it that this approach gives up on (chances of) inlining a lot of stuff (or de-inlining, for that matter). The second thought is that it also heavily interferes with any code motion transformations, including instruction scheduling, loop unrolling and auto-vectorization.
I feel like a lot of these sorts of optimization are best done at the language level, before it gets to the instruction level, eg. fusion, supercompilation, partial evaluation are all more general than and far more effective optimizations than these low-level rewrites. If you've already done those optimizations, then you just want to spit out code as fast as possible.
The low-level transformation that matters the most is probably register allocation. Good register allocation is responsible for a lot of OCaml's speed. I'm not clear whether that would be significantly impacted with copy and patch.
No matter how you slice it almost time spent in a compiler is wasted by recreating optimizations for things that haven't changed. Back in the 70s disk IO was slow and memory was scarce. Compilers couldn't cache much for that reason. Compilers today are designed for constraints that no longer exist.
> Back in the 70s disk IO was slow and memory was scarce. Compilers couldn't cache much for that reason.
That's... not exactly true? Yes, memory was scarce but that's precisely why the compilers in the 70s used disk I/O for temporary storage way more extensively than the modern ones do. Let's take Unix V6, for example: the C compiler used two temporary disk files to keep data between its three passes, the assembler used three temporary files to keep data between its two passes, and I can't quite wrap my head around how many temporaries its linker used but it's at least 4. Nobody bothered to try to keep and reuse those temporaries, however.
Compilers today, instead, keep almost everything in memory and don't generally bother with temporary files at all because why would they? The growth of the computing speed has outpaced the memory/disk speeds anyhow.
If you’re going for very fast codegen, sure. But in a production build typically you have problems like “this shifted by one instruction so every non-relative branch beyond it needs to be patched”.
Rust's incremental compilation does this. Unfortunately, it doesn't help with linking, but it speeds up compilation by reusing everything that didn't change.
I could be wrong but I think Rust’s incremental compilation is still pretty coarse. LLVM is where this would need to be implemented and that’s a tall ask (i.e. skipping expensive computation when the AST is the same or close enough to a previously computed result). And Rust’s incremental compilation doesn’t apply to LTO. There’s only so much you can do from outside the compiler toolchain.