Well, Lustre RPC doesn't use on-disk data structures on the wire, though that is indeed an interesting idea.
In Lustre RPC _control_ messages go over one channel and they're all a C structure(s) with sender encoding hints so the receiver can make it right, and any variable-length payloads go in separate chunks trailing the C structures.
Whereas bulk _data_ is done with RDMA, and there's no C structures in sight for that.
Capnp sounds about right for encoding rules. The way I'd do it:
- target 64-bit architectures
(32-bit senders have to do work
to encode, but 64-bit senders
don't)
- assume C-style struct packing
rules in the host language
(but not #pragma packed)
- use an arena allocator
- transmit {archflags, base pointer, data}
- receiver makes right:
- swab if necessary
- fix interior pointers
- fail if there are pointers
to anything outside the
received data
- convert to 32-bit if the
receiver is 32-bit
(That's roughly what Lustre RPC does.)
As for syntax, I'd build an "ASN.2" that has a syntax that's parseable with LALR(1), dammit, and which is more like what today's devs are used to, but which is otherwise 100% equivalent to ASN.1.
Out of curiosity, why not use offsets instead of pointers? That's what capnp does. I assume offset calculation is going to be efficient on most platforms. This removes the need for fixing up pointers; instead you just need to check bounds.
It's more work for the sender, but the receiver still has to do the same amount of work as before to get back to actual pointers. So it seems like pointless work.
Having actual interior pointers means not having to deal with pointers as offsets when using these objects. Now the programming language could hide those details, but that means knowing or keeping track of the root object whenever traversing those interior pointers, which could be annoying, or else encoding an offset to the root and an offset to the pointed-to-item, which would be ok, and then the programming language can totally hide the fact that interior pointers are offset pairs.
I've a feeling that fixing up pointers is the more interoperable approach, but it's true that it does more memory writes. In any case all interior pointers have to be validated on receiving -- I don't see how to avoid that (bummer).
In Lustre RPC _control_ messages go over one channel and they're all a C structure(s) with sender encoding hints so the receiver can make it right, and any variable-length payloads go in separate chunks trailing the C structures.
Whereas bulk _data_ is done with RDMA, and there's no C structures in sight for that.
Capnp sounds about right for encoding rules. The way I'd do it:
(That's roughly what Lustre RPC does.)As for syntax, I'd build an "ASN.2" that has a syntax that's parseable with LALR(1), dammit, and which is more like what today's devs are used to, but which is otherwise 100% equivalent to ASN.1.