It’s kind of annoying how static compilers insist on figuring out everything fro...

Joker_vD · on June 3, 2024

The very first objection that immediately have sprung up to my mind it that this approach gives up on (chances of) inlining a lot of stuff (or de-inlining, for that matter). The second thought is that it also heavily interferes with any code motion transformations, including instruction scheduling, loop unrolling and auto-vectorization. Granted, this second objection can probably be dealt with by using some clever fine-guided techniques... but at this point I'd wager they'd be both a) about as slow as doing all the work it from the clean slate; b) have incredibly large surface for obscure bugs — and those steps are already one of the most finicky ones.

And back in the 70s, when the microcomputers and interactive computing were becoming the norm, and people really cared about the compilation times — still nobody bothered to implement that kind caching, AFAIK, even when the compiler performed way less complex kinds optimizations than they do today (heck, even register allocation was not done as graph colouring back then).

Joker_vD · on June 3, 2024

Here's an incredibly minimalistic example:

    int f(double x, int y) {
        return x * y;
    }

    int g(int x, int y) {
        return x * y;
    }

These two functions differ by a single token, yet their assembly translations have only one common instruction, and this instruction is "ret":

    f:
        movapd      xmm1, xmm0
        pxor        xmm0, xmm0
        cvtsi2sd    xmm0, edi
        mulsd       xmm0, xmm1
        cvttsd2si   eax, xmm0
        ret
    g:
        mov         eax, edi
        imul        eax, esi
        ret

naasking · on June 3, 2024

> The very first objection that immediately have sprung up to my mind it that this approach gives up on (chances of) inlining a lot of stuff (or de-inlining, for that matter). The second thought is that it also heavily interferes with any code motion transformations, including instruction scheduling, loop unrolling and auto-vectorization.

I feel like a lot of these sorts of optimization are best done at the language level, before it gets to the instruction level, eg. fusion, supercompilation, partial evaluation are all more general than and far more effective optimizations than these low-level rewrites. If you've already done those optimizations, then you just want to spit out code as fast as possible.

The low-level transformation that matters the most is probably register allocation. Good register allocation is responsible for a lot of OCaml's speed. I'm not clear whether that would be significantly impacted with copy and patch.

gizmo · on June 3, 2024

No matter how you slice it almost time spent in a compiler is wasted by recreating optimizations for things that haven't changed. Back in the 70s disk IO was slow and memory was scarce. Compilers couldn't cache much for that reason. Compilers today are designed for constraints that no longer exist.

Joker_vD · on June 3, 2024

> Back in the 70s disk IO was slow and memory was scarce. Compilers couldn't cache much for that reason.

That's... not exactly true? Yes, memory was scarce but that's precisely why the compilers in the 70s used disk I/O for temporary storage way more extensively than the modern ones do. Let's take Unix V6, for example: the C compiler used two temporary disk files to keep data between its three passes, the assembler used three temporary files to keep data between its two passes, and I can't quite wrap my head around how many temporaries its linker used but it's at least 4. Nobody bothered to try to keep and reuse those temporaries, however.

Compilers today, instead, keep almost everything in memory and don't generally bother with temporary files at all because why would they? The growth of the computing speed has outpaced the memory/disk speeds anyhow.

gpderetta · on June 3, 2024

Overlay[1] is another related technique that is today completely obsolete.

[1] https://en.wikipedia.org/wiki/Overlay_(programming)

saagarjha · on June 2, 2024

If you’re going for very fast codegen, sure. But in a production build typically you have problems like “this shifted by one instruction so every non-relative branch beyond it needs to be patched”.

vlovich123 · on June 3, 2024

Hmm... if only there were some scholarly article about how to patch binary snippets in a fast way?

JoshTriplett · on June 2, 2024

Rust's incremental compilation does this. Unfortunately, it doesn't help with linking, but it speeds up compilation by reusing everything that didn't change.

vlovich123 · on June 2, 2024

I could be wrong but I think Rust’s incremental compilation is still pretty coarse. LLVM is where this would need to be implemented and that’s a tall ask (i.e. skipping expensive computation when the AST is the same or close enough to a previously computed result). And Rust’s incremental compilation doesn’t apply to LTO. There’s only so much you can do from outside the compiler toolchain.