Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The main insight is in Joe Armstrong's thesis. "Making reliable distributed systems in the presence of software errors" https://erlang.org/download/armstrong_thesis_2003.pdf (I think the original title was "In the presence of hardware errors", emphasis mine)

All the other things flow from that thesis and understanding. You can recreate behaviours described in the repo doc using Erlang primitives very easily, and they are very hard to recreate in pretty much any other language.

Because Erlang is very much literally about lightweight processes and message passing. Only:

- every part of the system knows about lightweight processes and messages

- every part of the system is engineered around them

- the underlying VM is not only re-entrant in almost every single function, it's also extremely resilient, and can almost guarantee that when something dies, only that process dies, and the rest of the system receives a guaranteed notification that this process dies

There are more, but without at least that you can't really recreate Erlang's standard behaviours. For example, you can't recreate this in Go or Rust because `panic` will kill your entire program unconditionally.



18 or so years ago, I was a phone monkey for a government department who was initially learning to code for fun, but quickly realised programming could be a great way to earn a living.

I read about Erlang on /r/programming, which was having a new fad for a new shiny language, as was the custom back then.

And I desperately followed all the /r/programming fads because I was worried that I'd end up irrelevant if I wasn't skilled up in the latest Haskell web framework.

But one of those fads was Erlang, and it intrigued me so much, that I ended up printing Mr Armstrong's thesis, stuck it in a manila folder, and read it, slowly, cover to cover while waiting for, or riding on, my bus to my contact centre job, and I've still got it in my bookcase.

His thinking on resiliency in the face of inevitable failures, and on safe concurrency, has shaped my thinking, and proved invaluable repeatedly and is very relevant today. It's like their phone switches were distributed microservices running in a container orchestrator, before it was cool.


But one of those fads was Erlang, and it intrigued me so much, that I ended up printing Mr Armstrong's thesis, stuck it in a manila folder, and read it

I am now inspired to do the same, thank you!


Most modern async runtimes let you do that.

A (worker) thread dying isn't an issue in Rust, Go, C# and etc. Sure, each goes about error handling in a slightly different way either opt-in or opt-out or enforced (when unwrapping Error<T, E>) but other than that the advantages of Erlang/Elixir have faded over time because the industry has caught up.

p.s.: C# has not one but two re-entrancy syntax options - 'async/await' and 'IEnumerable<T>/yield return'. You can use latter to conveniently implement state machines.


> but other than that the advantages of Erlang/Elixir have faded over time because the industry has caught up.

It really hasn't. People fixate on the idea of just running some processes, and just catching some errors.

And yet, non of the languages that "solved this" can give you Erlang's supervision trees that are built on Erlang and Erlang VM's basic functionality. Well, Akka tried, and re-implemented half of Erlang in the process :)

But other advantages did fade: multi-machine configurations are solved by kubernetes. And it no longer matters that you can orchestrate multiple processes doing something when even CI/CD now looks like "download a hundred docker containers for even the smallest off tasks and execute those".

> p.s.: C# has not one but two re-entrancy syntax options - 'async/await' and 'IEnumerable<T>/yield return'.

What I meant by re-entrancy in the VM is this:

Every process in Erlang gets a certain number of reductions. Where a reduction is a function call, or a message passed. Every time a function is called or a message is passed, the reduction count for that process is reduced by one. Once it reaches zero, the process is paused, and a different process running on the same scheduler is executed. Once that reaches zero, the next one is executed etc.

Once all processes are executed the reduction counter is reset, the process is woken up and resumed.

On top of that, if a process waits for a message, it can be paused indefinitely long, and resumed only when the message it waits for arrives.

So, all functions that this process executes have to be re-entrant. And it doesn't mean just the functions written in Erlang itself. It means all functions, including the ones in the VM: I/O (disk, network), error handling, date handling, regexps... You name it.


Sometimes the advantage is not just the ability to do something -- I mean, by that logic, only bare-metal systems languages like Rust, C, Zig etc could have advantages over other languages.

Erlang (and by extension Elixir) has the advantage that the programmer can think at a higher level about their system. You don't have to write or configure a scheduler. You don't have to invent supervision trees. You can be sure that the concurrently-running parts of your system cannot possibly affect each other's memory footprint (though Rust gives a robust answer to this problem as well).

It doesn't make a perfect fit for every problem, but there is still a decent-sized space of problems -- I'd say "highly concurrent, but not highly parallel" -- where Erlang gives the programmer a headstart.


The thing is, we mostly try to keep applications stateless, unless it's necessary to keep state for performance (think realtime onlinegames, hft, ...).

And in those cases there is simply no need for e.g. supervision trees because there is nothing to restart. You still need stuff like retrying, but this is supported by all major concurrency libraries / effect systems. In fact, Erlang has fallen behind here in terms of what the language offers (not what the BEAM offers, the BEAM is still top notch imho)

State is mostly moved to either the database or message queues or similar, which is pretty good.


We try to keep applications stateless because state handling in most programming languages is pathological. Erlang presents a different solution to the underlying problem of state management, rather than the trying to solve the higher-level of "how best to support this one particular solution to the problem of state management".

Namely: The message queues are part of the language. They're built right in, for ease of use. Your caching layer is built into the language, for ease of use and faster performance.


> We try to keep applications stateless because state handling in most programming languages is pathological.

It's true that handling state is hard in most programming languages. But that's neither the only nor the most important reason why people try to keep applications stateless.


C#'s weakness here is that those two patterns are cooperative multitasking only. Under the hood they retain control of a thread until they yield execution. By default resource management is something that needs to be considered and the default thread pool is not an uncontested resource.

I don't use Erlang but my understanding is that while it is not exactly fully pre-emptive, there are safeguards in place to ensure process fairness without developer foresight.


C# async runtime is mixed mode, threadpool will try to optimize the threadcount so that all tasks can advance fairly-ish. This means spawning more worker threads than physical cores and relying on operating system's thread pre-emption to shuffle them for work.

That's why synchronously blocking a thread is not a complete loss of throughput. It used to be worse but starting from .NET 6, threadpool was rewritten in C# and can actively detect blocked threads and inject more to deal with the issue.

Additionally, another commenter above mistakenly called Rust "bare metal" which it is not because for async it is usually paired with tokio or async-std which (by default, configurable) spawn 1:1 worker threads per CPU physical threads and actively manage those too.

p.s.: the goal of cooperative multi-tasking is precisely to alleviate the issues that come with pre-emptive one. I think Java's project Loom approach is a mistake and made sense 10 years ago but not today, with every modern language adopting async/await semantics.


Hey, I also prefer C# and async. Alternatives have yet to prove they can handle gui patterns where main threads matter.

...but the problems stated are real. I'm excited to hear that this might be fixed in .net 6 but it'll be a while before that rolls out to most deployments.


Apologies but it seems you have gotten wrong impression (or maybe I did a poor job in explaining).

It has never been a big issue in the first place because by now everyone knows not to 'Thread.Sleep(500)' or 'File.ReadAllBytes' in methods that can be executed by threadpool and use 'await Task.Delay(500)' or 'await File.ReadAllBytesAsync' instead. And even then you would run into threadpool starvation only under load and when quickly exhausting newly spawned threads. It is a relatively niche problem, not the main cornerstone of runtime design some make it out to be.

Also, .NET 6 is old news and has been released on Nov 8, 2021. It is the deployment target for many enterprise projects nowadays.


"Everyone knows to do it right" is no protection at all. And honestly, I would push back on this in general because no its not well known at all. A fresh grad will not intuitively know to look for WhateverAsync API in case they exist and veterans will miss this as well.

Knowing that file IO is too heavy and has *Async counterpart methods is somewhat obvious to a veteran, but other long running methods are not so obvious. In this case you would need to profile your use case to understand that certain calculations/methods might be best farmed off to a different threadpool.

Unity still uses Mono and has a very low max thread pool size, for example. The thread pool is easily starved in the latest version of that engine and I'm sure it's more common than you think.

Relatively niche, perhaps, but a critical problem when stumbled upon none the less. Again, I like async/await but there are certainly foot guns left to remove.


Unity is special and has its own API and popular patterns, if you block the main/render thread it will explode, regardless of the language of choice, and Erlang/Elixir performance is not acceptable for Gamedev and will likely stumble upon similar issues.

Again, and I cannot stress this enough, we're discussing somewhat niche feature. You have to take into account that even the standard library still has a lot of semi-blocking code, simply due to the nature of certain system calls or networking code. From runtime standpoint, blocking or computationally heavy logic - there is no difference, it will scale the amount of threads to account for fairness automatically. It's that blocking just has extra cost due to being "better" at holding threads (you don't have to think about it). .NET 6 is just comparatively better at dealing with such scenarios but your app would work fine in PROD 9 times out of 10 with invalid code before or after that. It's a difference between running 'Task.Run(() => /* use up thread for no reason for seconds / minutes */))' in a 100s iterations loop going from terrible to very bad.

It's pointless to "fight against words". Just trust the runtime to do its thing right. That's why its baseline cost is somewhat higher than that of Golang or Rust/Tokio - you pay more upfront to get foolproof solution that has really good multi-threaded scaling.

If you don't want to believe the above, just look at average C# solutions on Github. There are no "special magic to learn", that's just how people write code new to the language or otherwise.

p.s.: This situation reminds me one of my colleagues who would always come up with an excuse for his point regardless of context. It's counter-productive and self-defeating.


> A (worker) thread dying isn't an issue in Rust, Go, C# and etc

If your Rust thread panics while it holds a Mutex, you've got a bit of a mess. Especially if it was halfway through updating shared mutable state. Probably similar in Go or C#, but I haven't used Go and only did cargo cult programming in C#, I didn't read any sources or see warnings about crashing in threads or async/await.


> A (worker) thread dying isn't an issue in Rust, Go, C# and etc.

Go channels are nice but they don’t come close to Erlang message passing. In go you can’t just ignore if the channel is bounded or unbounded, open or closed. Writing to a closed channel with blow you up. It takes some time to learn it. Messages in Erl are easy fifo serial execution.


> For example, you can't recreate this in Go or Rust

If you're willing to make your "lightweight processes" OS threads you could kind of make it work. E.g. Rust gives you both panic hooks (to notify everyone else that you died) and catch_unwind to contain the effect of a panic (which generally stops at a thread boundary anyways). But of course that only scales to a couple hundred or thousand threads, so you probably have to sacrifice a lot of granularity.

And any library that links to C/C++ code has the potential to bring the whole process down (unless you make your "lightweight processes" just "OS processes", but that just makes the scaling problems worse)


Lightweight processes is explicitly not os threads in at least two senses: smaller footprint in memory, and no system call for every context switch.

It’s explained in many documents about lightweight processes, of course for elixir/erlang/beam but also for Go and Crystal and even going back to Solaris Internals and modern Project Loom for upcoming JVM situations


Erlang's beam processes are undoubtedly awesome. But if we start with the premise of "achieving the same goals without Erlang" I think it's entirely valid to start with "what does our process primitive look like". With just 16GB of RAM my laptop is quite memory constrained, but according to task manager I'm still running 5200 threads accross 350 processes right now. Many use cases that required light threads/processes 37 years ago or even 20 years ago would work with OS threads by now. Of course many others don't, which is where the Erlang popularity comes from.


It takes on the order of nanoseconds to start an Erlang process. They are also extremely lightweight memory-wise. And Erlang VM tries to keep context switching between CPU cores to a minumum. And all processes get more-or-less equal share, so it's hard to get a process consuming all of CPU and never yielding back to other processes.

So firing off and monitoring processes becomes second nature easily. Rarely so in other languages


OS threads would defeat the purpose though. You can run millions of BEAM processes. Threads are a lot more expensive and still run the risk of taking over your system with an infinite loop.


OS threads are actually premptive, so an infinite loop is really less of a deal than one in Erlang. Erlang is really not premptive, it's just that function calls are automatically (potential) yield points, and you can't loop without function calls because of the construction of the language; if somehow you did manage to get a real loop into a process, that scheduler would be stuck forever, and you'd end up with a broken BEAM when you triggered something that requires coordination between schedulers --- I forget what sorts of things do that, but code purging can't finish without each scheduler doing housekeeping, etc.

Meanwhile, your OS thread will just keep eating CPU, but everything else is fine.

You're right about the number of threads though. It takes a lot more memory to run an OS thread than a BEAM process, and I'm not sure OS schedulers will manage that many threads even if the memory is there (but I could be wrong... getting things to work nicely at 50k threads may be sufficient for millions)


Unless code is exiting the BEAM itself, an infinite loop that bypasses yield points shouldn't be possible.

Doing this at such a granular level is also one of the reasons that you can run a database that thousands of processes are accessing, within the same runtime, without a performance impact to the overall system.


> For example, you can't recreate this in Go or Rust because `panic` will kill your entire program unconditionally.

You can recover() panic() just fine in Go.


Yeah, OP took a shortcut here :)

The nuance is that most languages let you build reliable programs _if your code is correct_ - if you're using defers, context handlers, finalizer, cleaning up state in shared data structure, etc.

Erlang's goal is to be reliable "in the presence of software errors", that is, even if your code is buggy. If a request handler process dies inside an Erlang web app, whatever file or socket it opened, memory it allocated, shared resource it acquired (e.g. a DB connection) will be reclaimed. This is true without having to write any error handling in your code.

The way it's done is that the VM handles the basic stuff like memory and files, and it provides signals that libraries (like a DB connection pool for instance) can use to know if a process fails and clean up as needed. In other words the process that fails is not responsible for its own clean up.

At some point some code must of course be correct for this to work. Like, if the DB connection pool library doesn't monitor processes that borrow connections, it could leak the connection when such a process dies. But the point is that this part (the "error kernel") can be a small, well-tested part of the overall system; whereas in a classic program, the entire codebase has to handle errors correctly to guarantee overall stability.


These are like sledgehammers when you need a screwdriver.

It's very hard to build Erlang-like versions of supervision trees using those tools.


> These are like sledgehammers when you need a screwdriver.

Also by the very nature of Go your application state is as likely as not to be fucked, so even if you did handroll monitors and links, and thus could build supervision trees, they wouldn’t be of much use.


> You can recover() panic() just fine in Go.

You can catch_unwind() panic!() in Rust too :-).


I agree up to the last point. You can catch panics at the thread level right? I mean, more generally, what is Erlang implemented in?


> I mean, more generally, what is Erlang implemented in?

"Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang." - Virding's Law ;)

So first you have to implement all those things. And then use them.


You can in C/C++. Though, the point is that erlang has a strategy how to clean and recover from such error (some data can be lost, rarely we can crash the whole system for some errors).

I mean, you can write erlang-way in almost any language, just in this case you need to adopt all the libraries to follow the same principles.

Some languages implement similar error handling strategy by just creating a separate process per request (hello, php). We know how to clean after a worker dies (just let the worker die). Just in that case supervisor strategy is very simple.


Of course you could implement a language like Erlang in Rust, but I think the point is that you would have to do exactly that in order to do in Rust what Erlang does at the language level.


Erlang is implemented on top of BEAM which has its own scheduler and schedules its own idea of lightweight processes.


It looks like features that the Go runtime provides.


In C. You don’t operate on thread level directly, threads are just executors of processes managed by VM, usually the same amount as vCPU cores.


>> I mean, more generally, what is Erlang implemented in?

Pure magic, that can't be recreated in any regular language :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: