The network isn't the only resource in play. The puller is hypothetically more aware of the size of it's buffers, processing capacity, internet connection speed, etc. But again, to me the primary advantage is the mental model. For omnistreams the implementation ended being almost the same as the ACK-based system I started with, but shifting the names around and inverting the model in my head made it much easier to work with.
FWIW, Cap'n Proto's approach provides application-level backpressure as well. The application returns from the RPC only when it's done processing the message (or, more precisely, when it's ready for the next message). The window is computed based on application-level replies, not on socket buffer availability.
My experience was that in practice, most streaming apps I'd seen were doing this already (returning when they wanted the next message), so turning that into the basis for built-in flow control made a lot of sense. E.g. I can actually go back and convert Sandstorm to use streaming without actually introducing any backwards-incompatible protocol changes.
Ah I think I misread the announcement to mean you were using the OS buffer level information. But if I understand correctly you're just using the buffer size as a heuristic for the window size, then doing all the logic at the application level?
If that's the case, then implementation-wise these approaches are probably very similar, and window/ACK is the normal way of doing this, and also the pragmatic approach in your case.