JIT compilers are able to take advantage of them, because you don't get a binary...

wahern · on Aug 24, 2016

The AS/400 is more like an AOT than a JIT compiler. When I hear JIT I think opportunistically compiling portions of a program, but falling back to an interpreter.

The way AS/400 works, IIUC, is that the compiler compiles to an intermediate byte code, which has remained stable for decades. When the program is first loaded, the entire program is compiled to the native architecture, cached, and then executed like any other binary.

The reason why JIT environments aren't competitive generally with AOT compiling is because of all the instrumentation necessary. A JIT environment is usually composed of an interpreter[1] which, using various heuristics, decides to compile segments of code to native code. But the logic for deciding what segments to compile, when to reuse a cached chunk, etc is complex, especially the pieces that keep track of data dependencies. Also, each chunk requires instrumentation code for transitioning from and back into the interpreter.[2] For this and other reasons JIT'd environments aren't competitive with AOT environments except for simple programs or programs with very high code and data locality (i.e. spending most of the time inside a single compiled chunk, such as a small loop).

Large or complex programs don't JIT very well. Even if the vast majority of the execution time is spent within a very small portion of the program, if data dependencies or executions paths are complex (which is usually the case) all the work spent managing the JIT'd segments can quickly add up. Programs that would JIT well also tend to be programs that vectorize well, and if they vectorize well AOT compilers also benefit and so JIT compilers are still playing catch-up. (One benefit (albeit only short term) for JIT compilers is that some performance optimizations are easier to add to JIT compilers because you don't have to worry about ABIs and other baggage; you iterate the compiler implementation faster.)

I increasingly hear the term JIT used in the context of GPU programming, where programs are generated and compiled dynamically for execution on SIMD cores. But that's much more like AOT compilation. The implementation stack and code generation rules are basically identical to a traditional AOT compiler and very little like the JIT environments for Java or JavaScript. The only similarity is that compilation happens at run-time, but you can analogize that with, for example, dynamically generating, compiling, and invoking C code. Which, actually, isn't uncommon. It's how Perl's Inline::C works, and how TinyCC is often used.

[1] Mike Pall has said that the secret to a fast JIT environment is a fast interpreter.

[2] So, for example, calling into a module using the Lua C API is faster in PUC Lua than LuaJIT. LuaJIT has to spill more complex state, whereas the PUC Lua interpreter is just calling a C function pointer--the spilling is much simpler and has already been AOT compiled.

mike_hearn · on Aug 25, 2016

I would disagree with almost everything you said here.

JIT compilers can beat equivalent AOT compilers by about 20%, or at least, that's the kind of loss you get in HotSpot from not doing profile guided compilation and doing it all AOT instead. So that point seems wrong. If you're comparing Java and C++ well, that is affected by many things and you can quite easily construct microbenchmarks where Java beats C++. In real programs though, it's usually either a tie or the other way around, mostly because Java has a quite low density memory layout (currently).

It turns out that the "hot spot" theory is mostly correct, in that most programs do spend most of their time in small parts of the program, and large quantities of code may only ever be executed once or twice. That's why you can write Java programs that are competitive with or beat C++ programs (see the TechEmpower server benchmarks for some examples of this happening) despite the fact that not all the code is compiled.

JIT compilers tend to be better at unguided vectorisation than AOT compilers because they know what CPU features are available to them at compile time. Again, you may be confusing multiple unrelated factors together: in practice, most C++ programs will probably use vectorisation more heavily than Java programs do because C++ has features that let the programmer explicitly invoke them so the compiler doesn't have to figure out where the instructions can be used by itself. But that is unrelated to when the compiler runs: you can add vector intrinsics to Java and there's a project doing exactly that.

gpderetta · on Aug 25, 2016

> I would disagree with almost everything you said here.

reality seems to agree with the parent. Case in point there are no production level JIT compilers for C/C++.

The fact that a java AOT compiler is not competitive with HotSpot might have more to do to the maturity of HotSpot and the amenability of Java to AOT compilation.

> JIT compilers tend to be better at unguided vectorisation than AOT compilers because they know what CPU features are available to them at compile time.

runtime dispatching to specialized functions makes this a moot point point. I'm not aware of the state of the art, but last I heard HotSpot wasn't particularly great at vectorization.

mike_hearn · on Aug 26, 2016

There is a JIT compiler for C/C++, it's called Sulong. However, it's a research project indeed.

The 20% comparison is HotSpot in AOT mode (it's being developed) vs HotSpot in JITC mode. So I think it's one of the best figures you're going to get. Speculative, profile guided optimisations aren't going to double your speed or anything like that, but a 20% win is big enough to matter: it's like getting a free additional core on contemporary machines.

Yes you can do runtime dispatching to different functions, but how many apps actually do so? I've seen a lot of software that doesn't bother, or only has a "plain vanilla" and a "low-rev SSE" version.

HotSpot has got steadily better at auto-vectorisation with time. The unreleased Java 9 version has had a lot of patches from Intel go in that make it better at using the latest vector units more frequently.

gpderetta · on Aug 26, 2016

> yes you can do runtime dispatching to different functions, but how many apps actually do so? I've seen a lot of software that doesn't bother, or only has a "plain vanilla" and a "low-rev SSE" version.

those that do care I guess? Games, video encoders/decoders. Most custom HPC applications are simply compiled for whatever architecture is running on the cluster and don't bother with anything else.

pjmlp · on Aug 25, 2016

I used the term JIT, because many tend to associate AOT to native compilation when it happens before the binary is shipped to the customers.

Regarding AS/400, doesn't the documentation refer to it as kernel JIT?

Thanks for the explanation.