> We don't need Erlang clusters for redundancy any more. You just run multiple c...

jerf · on Jan 27, 2023

"The thing about Erlang is that you never needed clusters at all."

IIRC, if you read the original thesis, the reason for clusters is just that there's always that chance an entire machine will go down, so if you want high reliability, you have no choice but to have a second one.

The OP is correct in that the key to understanding every design decision in Erlang is to look at it through the lens of reliability. It also helps to think about it in terms of phone switches, where the time horizon for reliability is in milliseconds. I am responsible for many "reliable" systems that have a high need for reliability, but not quite on that granularity. A few seconds pause, or the need for a client to potentially re-issue a request, is not as critical as missing milliseconds in a phone call.

brightball · on Jan 27, 2023

That's true. You do always need to plan for machine redundancy but hopefully machines don't completely fail that often. I can't remember the last time I experienced an instance failure that wasn't a data center wide impact.

It impacts how you architect certain solutions though. For example, if you've got users connected with websockets you're suddenly able to maintain their state right there with the connection.

In a situation where you can't rely on state on the server itself, every websocket connection has to relay to some backend system like Redis/DB, etc since the state can't be counted on at the connection layer.