We describe a model for multiple threads of control within a single UNIX process. The main goals are to provide extremely lightweight threads and to rationalize and extend the UNIX Application Programming Interface for a multi-threaded environment. The threads are intended to be sufficiently lightweight so that there can be thousands present and that synchronization and context switching can be accomplished rapidly without entering the kernel. These goals are achieved by providing lightweight user-level threads that are multiplexed on top of kernel-supported threads of control. This architecture allows the programmer to separate logical (program) concurrency from the required real concurrency, which is relatively costly, and to control both within a single programming model.
The introduction to a 1991 USENIX paper about SunOS’ multithread architecture. Just the kind of light reading material for an Autumn weekend.
I think the addition of “cooperative” threading on top of low level “preemptive” threading is now very common in many environements.
Windows Kernal has Fibers.
Linux might have something similar (not sure).
All* modern languages does the same with “async” constructs.
Basically when you are I/O bound, it does not make sense to have the CPU wait on the response. So
while (!socket.ready()) { spin_loop(); }
data = socker.read()
No longer makes sense. Instead, you’d have something like
data = await socket.read()
which internally becomes
_future = socket.read()
_future.yield_until_ready()
data = _future.value()
And then either an OS or a compiler framework will handle stitching those fibers into multiple threads, possibly handling synchronization and deadlock avoiding along the way.
(and we need edit back, also please some way of formatting source code)
Cooperative multithreading and fibers as you bring up can be good at eliminating much of the synchronization overhead that kills performance in normal multithreaded programs. I’ve been programing everything with async constructs rather than using threads these days. It works very well for IO without normal MT overhead. If I needed to scale up I’d use one thread (or process) per core but still use the async model on each core. I don’t like the traditional approach of giving each client their own thread and stack. Even if the OS minimizes multitasking overhead, all the stacks end up requiring more memory and causing a lot more cache evictions than an asynchronous model. Besides, the primary benefit of many CPU cores is not to handle short IO events but rather to handle intense computation. CPU intensive workloads are the main time I see a real benefit in spawning threads while minimizing the need for costly synchronization.