RV32E is a big part of it since it only effectively uses 13 registers which makes it possible to write a recompiler that doesn't have to spill any of the them to memory on AMD64.
But there are a few other aspects of RISC-V which are suboptimal for a VM like mine, which I handle by postprocessing the standard ELF binaries with a custom linker. Some of the major things are:
- The instruction encoding format. While it's great for hardware execution, for a VM it can be both simplified and made more compact. For example, I encode branches as: a byte with an opcode + a byte with two 4-bit registers packed + a varint with the absolute address to jump to. This is more space efficient, naturally supports fully sized immediates and doesn't require the crazy bit shuffling to extract them.
- The limited range of immediates. Due to the use of varints I support full length immediates, and I macro-fuse LUI + ADDI etc. in my linker into a single ADDI.
- The zero register. This is only a problem because I must support AMD64 as a target, and there just isn't a free register there where I could park a zero in. This can be special-cased in the VM, but it is kinda tricky and annoying (and a lot of these cases are never emitted by the compiler, so you have a bunch of complexity and effectively useless dead code). So what I do is that I disallow the zero register in the bytecode, and when postprocessing the code in my linker I either just replace those instructions (e.g. if there's a MUL which uses the zero register as one of the source registers I compute the result and put an immediate load there instead), get rid of them (e.g. if the instruction uses it as a destination then it's a NOP) or convert them into new instructions which take any immediate and not only a zero.
- The alignment requirements of code addresses. I don't load any of the code into guest visible memory (it's effectively a Harvard architecture VM), and I virtualize code addresses to not address instructions individually, but basic blocks (so e.g. an address of "4" points to the 1st basic block, "8" points to the 2nd, etc.). This has a couple of benefits: it reduces the attack surface (you can't ROP your way around anywhere, only to the start of basic blocks), is more compact to encode (since there are less basic blocks than instructions the addresses are smaller and you need less bits to encode them as varints), makes it possible to efficiently emit code for basic block based gas metering in a single-pass without having to analyze the control flow, etc. Unfortunately I still need to handle the case of "what if someone dynamically jumps to address 3", and there's no way to work around this issue in my linker, so at run time I have to create a jump table (for translating guest addresses into host addresses) that's 4 times bigger than necessary. This wastes memory and introduces extra cache misses. (I can later drop this down to 2 if I add support for the C extension, but that will still make it twice as big than necessary.)
---------
So my number one wish for RISC-V that'd make it nicer for VMs (and wouldn't be VM specific) would probably be an extension which supports full length immediates and forces the minimum alignment of code addresses to be 1.
RV32E is a big part of it since it only effectively uses 13 registers which makes it possible to write a recompiler that doesn't have to spill any of the them to memory on AMD64.
But there are a few other aspects of RISC-V which are suboptimal for a VM like mine, which I handle by postprocessing the standard ELF binaries with a custom linker. Some of the major things are:
- The instruction encoding format. While it's great for hardware execution, for a VM it can be both simplified and made more compact. For example, I encode branches as: a byte with an opcode + a byte with two 4-bit registers packed + a varint with the absolute address to jump to. This is more space efficient, naturally supports fully sized immediates and doesn't require the crazy bit shuffling to extract them.
- The limited range of immediates. Due to the use of varints I support full length immediates, and I macro-fuse LUI + ADDI etc. in my linker into a single ADDI.
- The zero register. This is only a problem because I must support AMD64 as a target, and there just isn't a free register there where I could park a zero in. This can be special-cased in the VM, but it is kinda tricky and annoying (and a lot of these cases are never emitted by the compiler, so you have a bunch of complexity and effectively useless dead code). So what I do is that I disallow the zero register in the bytecode, and when postprocessing the code in my linker I either just replace those instructions (e.g. if there's a MUL which uses the zero register as one of the source registers I compute the result and put an immediate load there instead), get rid of them (e.g. if the instruction uses it as a destination then it's a NOP) or convert them into new instructions which take any immediate and not only a zero.
- The alignment requirements of code addresses. I don't load any of the code into guest visible memory (it's effectively a Harvard architecture VM), and I virtualize code addresses to not address instructions individually, but basic blocks (so e.g. an address of "4" points to the 1st basic block, "8" points to the 2nd, etc.). This has a couple of benefits: it reduces the attack surface (you can't ROP your way around anywhere, only to the start of basic blocks), is more compact to encode (since there are less basic blocks than instructions the addresses are smaller and you need less bits to encode them as varints), makes it possible to efficiently emit code for basic block based gas metering in a single-pass without having to analyze the control flow, etc. Unfortunately I still need to handle the case of "what if someone dynamically jumps to address 3", and there's no way to work around this issue in my linker, so at run time I have to create a jump table (for translating guest addresses into host addresses) that's 4 times bigger than necessary. This wastes memory and introduces extra cache misses. (I can later drop this down to 2 if I add support for the C extension, but that will still make it twice as big than necessary.)
---------
So my number one wish for RISC-V that'd make it nicer for VMs (and wouldn't be VM specific) would probably be an extension which supports full length immediates and forces the minimum alignment of code addresses to be 1.