> We don't need Erlang clusters for redundancy any more. You just run multiple copies of a service against the message bus, on multiple systems for redundancy.
The thing about Erlang is that you never needed clusters at all. The redundancy was built into each instances with the runtime. When you build that way, everything naturally scales out horizontally with additional processors and/or physical nodes.
You can't do that in any other language without building the entire system for it from the ground up.
Using multiple systems for redundancy means counting on the entire system going down. The Erlang way isolates this impact to 1 of potentially millions of parallel actions on the system itself. Using other systems for redundancy, the other million actions in progress on this server go down with it. The difference in the level of redundancy is significant.
But I do agree with you that we don't need it for most systems because most systems simply aren't that complex. The benefit of the BEAM comes from simplifying complexity, which tends to evolve over time. Elixir, Phoenix and LiveView will likely lead to earlier adoption of the BEAM in projects before the complexity ramps up which will show a long term benefit.
"The thing about Erlang is that you never needed clusters at all."
IIRC, if you read the original thesis, the reason for clusters is just that there's always that chance an entire machine will go down, so if you want high reliability, you have no choice but to have a second one.
The OP is correct in that the key to understanding every design decision in Erlang is to look at it through the lens of reliability. It also helps to think about it in terms of phone switches, where the time horizon for reliability is in milliseconds. I am responsible for many "reliable" systems that have a high need for reliability, but not quite on that granularity. A few seconds pause, or the need for a client to potentially re-issue a request, is not as critical as missing milliseconds in a phone call.
That's true. You do always need to plan for machine redundancy but hopefully machines don't completely fail that often. I can't remember the last time I experienced an instance failure that wasn't a data center wide impact.
It impacts how you architect certain solutions though. For example, if you've got users connected with websockets you're suddenly able to maintain their state right there with the connection.
In a situation where you can't rely on state on the server itself, every websocket connection has to relay to some backend system like Redis/DB, etc since the state can't be counted on at the connection layer.
The thing about Erlang is that you never needed clusters at all. The redundancy was built into each instances with the runtime. When you build that way, everything naturally scales out horizontally with additional processors and/or physical nodes.
You can't do that in any other language without building the entire system for it from the ground up.
Using multiple systems for redundancy means counting on the entire system going down. The Erlang way isolates this impact to 1 of potentially millions of parallel actions on the system itself. Using other systems for redundancy, the other million actions in progress on this server go down with it. The difference in the level of redundancy is significant.
But I do agree with you that we don't need it for most systems because most systems simply aren't that complex. The benefit of the BEAM comes from simplifying complexity, which tends to evolve over time. Elixir, Phoenix and LiveView will likely lead to earlier adoption of the BEAM in projects before the complexity ramps up which will show a long term benefit.