like what, a few microseconds? What are you doing where you're awaiting for things in parallel where that matters? HPC? We're dispatching things that take on the order of minutes. Typically a local network request has 10-20 milliseconds of latency on our office LAN, so whatever. Clean and comprehensible code with very little boilerplate is more important when I'm reviewing my junior's code.
I think if you're striving for that, then a bit of complexity is warranted. Not everything has to be simple, and async is hard to do correctly without the correct abstractions. Honestly, though I was hoping Rust would go with the Actix way of doing things, but that's fine. You don't have to use Rust's async.