I want to address a controversy that has gripped the Rust community for the past year or so: the choice by the prominent async “runtimes” to default to multi-threaded executors that perform work-stealing to balance work dynamically among their many tasks. Some Rust users are unhappy with this decision, so unhappy that they use language I would characterize as melodramatic.
[…]What these people advocate instead is an alternative architecture that they call “thread-per-core.” They promise that this architecture will be simultaneously more performant and easier to implement. In my view, the truth is that it may be one or the other, but not both.
A very academic discussion.
This is the same discussion that happens in other languages like Swift or C#, and the one we recently had about kernel fibers in different operating systems:
https://www.osnews.com/story/137300/sunos-multi-thread-architecture/#comment-10433232
And, yes the good async design can be more performant and easier at the same time. It might not be easier to learn, or implement a program. But it would be much more easier to be correct and would prevent multi-threading issues like deadlocks, dependency inversions, race conditions, and so on.
Basically a thread is usually the wrong abstraction for a unit of work. It is a unit of execution instead.
Meaning, I might have a long running task to respond to an HTTP request, and return a data from the disk. This has many disconnected pieces that will wait on I/O, and if we are using modern apis, like “zero-copy”, almost no CPU time compared to the total execution “wall time”.
Hence one thread = one client is precisely the wrong abstraction here (or by extension one process/fork = one client in older apache httpd).
Instead, each tread should always be scheduled some task all the time. Which mean, an async task pool, and a robust way to weave those fibers onto the thread. (Hence the discussion here).
(One reasonable use for direct multi-threading is CPU intensive compute tasks, where we want 100% load on one or more cores for the entire duration. Something like compression/encoding/machine learning/etc).
I remember that in Linux originally POSIX threads were user space so IBM developed a library with a n:m design (some threads in kernel space and some in user space like FreeBSD) but the current design, with all threads in kernel space was more performant.