Protothreads are a type of extremely lightweight threads – each protothread requires only two bytes of memory – that are usually used for embedded firmware programming, where memory is at a premium. Protothreads combine the low overhead with event-driven programming with the algorithmic clarity of threaded programming.
Larry Ruane from LeftHand Networks Inc. has written a protothreads library for Unix systems that, unlike Adam Dunkels’ original library, contains a complete scheduler that allows several protothreads to run inside a regular thread. Ruane’s protothreads are implemented in less than 400 lines of C code using gcc’s labels-as-values feature. The project wiki has a thorough explanation of how protothreads work and how they are intended to be used. The open source code can be downloaded from the project’s SourceForge page.
and my first guess are embedded systems on top of Linux kernel? I mean – come on, ordinary threads are lightweight enough for modern hardware. But threads on my phone definitely coud be lighter .
Meh. Android runs tons of threads, inside a JVM no less, on a pretty constrained hardware environment, and it all seems pretty snappy.
Ordinary threads are pretty heavyweight, if there are thousands of them. E.g. think of running a simulation of 10000 objects, each object in its own thread. That would hog up the kernel pretty badly – while having all threads within one kernel thread neatly compartmentalizes the scheduling to that one thread.
Of course this is not a case where you would typically use threads, but at least it’s possible. Though the applications are more ofter of scientific rather than practical interest.
Threads which are blocked should not consume any computational reasources… only memory, so if you use threads only as a means of specifying your algorithm and you don’t have too many running concurrently, 10,000 threads should only increase your memory footprint and startup time without affecting runtime performance.
But consider the situations where the threads are NOT blocked. If you have 5000 threads that are unblocked, you have 5000 context switches to do (and context switches are not free).
Yeah well I’d use them just so I can say I use “proto threads”. The name is pretty elite.
I don’t have much experience in thread-programming and none in protothreads, and the Wikipedia page wasn’t very helpful.
For less memory overhead you probably have some disadvantage, so can anybody tell me what the tradeoff here is?
Normal threads store function variables in their own stack, where they are preserved by the kernel. With the way protothreads work, context switching is simply marking where you were in the function and then returning all the way to the scheduler. The scheduler then resumes the next thread by a combination of function calls and goto statements. This is all hidden by way of macros. However, in the process, the values of the local function variables are lost. In order to retain the values of the variables, you need to have a pointer to a block of state data that needs to be passed as an argument to all functions that make use of the protothreads API. From the article given:
static pt_t producer_thr(void *const env) {
pc_thread_context_t *const c = env;
/* Do stuff */
*c->mailbox = c->i;
return PT_DONE;
};
given that
typedef struct {
pt_thread_t pt_thread; /* Required by protothreads */
pt_func_t pt_func; /* Required by protothreads */
int i; /* Function variable */
int *mailbox; /* Function variable */
} pc_thread_context_t;
where normally you would have used:
<return-type> producer_thr() {
int i;
int *mailbox;
/* Do stuff */
*mailbox = i;
};
This requires some more effort on part of the programmer. Note that recursion is non-trivial, because data goes to a data structure rather than the stack. Also, it should be noted that the protothreads API is heavily dependant on macros, goto statements, and gcc label variables (non-standard C). Protothreads are useful in systems whose kernel/OS do not support “normal” threads.
If a language can be extended such that the protothread API is part of the language itself, then I believe that the benefit of protothreads can be obtained without requiring the user to take any extra considerations (when compared to normal threads).
Also, the article makes some claims about performance. It should be noted that context switching for protothreads is dependent on the depth of nested function calls, but this is not a factor for the context switching of “normal” threads. Thus, it seems to me that comparing the performance of the two types of threading is a little like comparing apples and oranges, since threads that incur in very deep function calls will at some point be slower than regular threads.
Nevertheless, a cool invention and a nice implementation right there.
Seems like a rather long way around to do something you can already do with SySV ucontext manipulation. This appears to be a bit of a case of a solution in search of a problem, sadly.
These are application level (user-space) threads, as opposed to kernel level threads, so yes there are disadvantages.
Using your own scheduler means ALL of your threads are at the mercy of the kernel scheduler as if they were only 1 thread. This means that if the scheduler worked by evenly dividing time without priorities, and if it didn’t context switch on I/O, and you had 4 threads with 8 kernel threads (1 of which being yours), your application would get 1/8 (12.5%) of the CPU time. With kernel level threads you would have 5/12 (41.6%) of the CPU. Of course the kernel scheduler doesn’t actually work that way though…
That supposed 400x performance increase that is referenced on the WIKI is most likely something that wouldn’t be processor intensive, it was probably just spawning off a bunch of threads. Spawning kernel threads can be expensive (especially if your goal is just to spawn a bunch of them and not really do anything else)
Other disadvantage is they can’t use multiple cores/processors — you are stuck with just the one you started on.
Now, in an embedded system these may not be a big issue. For performance, you probably don’t have many other threads running, and yours should get the priority if setup properly. And unless you are using some sort of embedded cell, you probably only have 1 processor 1 core. If you do have 2 cores though, this could work well if your thread is running on a different core than the rest of your threads.
Why would any scheduler that makes an attempt at fairness across processes do that? In a multiprogramming context, if a process had a higher weight according to the number of threads it had, you’d violate performance isolation among processes by being able to cheat the system into giving you a higher priority by just spawning off more threads. The most obvious solution is to allocate a time slice per process and multiplex that timeslice between its threads.
You can easily utilize multiple cores and processes so long as you have some abstraction for an execution context (i.e. kernel threads). Simply have a kernel thread for each cpu, dispatch user threads to each kernel thread, and you’re good to go (this is known as n:m threading).
User-level threading’s disadvantage lies in the fact that a thread may block on a system call while there are more user threads on the host kernel thread that can be executed. Along similar lines, the kernel may try to infer workload properties based the thread’s behavior. With user-level threads, you’re going to end up with the least common denominator (e.g interactive user-thread being masked by a cpu intensive user-thread). Essentially this is an area where the end-to-end principle isn’t being applied. Some solutions have been proposed but have ultimately been rejected as being too slow, too intrusive, or too difficult to implement and maintain.
The stackless design means that the local context is not preserved between thread switches, and therefore the solution does not equal real threads.
Hmm, reminds me of a similar design in Stackless Python:
http://zope.stackless.com/about/sdocument_view
Protothreads are really useful on small embedded systems using 8/16bit mcu with only a few KB of RAM and no real OS or scheduling to speak of, and i have used them in few AVR based projects.
But I don’t see the performance benefits of using it on big hardware with modern OSes. Prototheads cant be scheduled together, so they provide no performance enhancements on multi cpu systems. They probably provide a little boost on single cpu systems compared to using “real” threads.
On the other hand they can be used to simplify event driven programing.
I published a new version of protothreads you may be interested in, linked from the wikipedia page, and also now on Google Code:
http://code.google.com/p/protothread/
We (I work for HP) use it on a system that’s much bigger that a small embedded system (a storage appliance running Linux with many GB of memory), and the reason is, as you said, the programming is simpler than event-driven. It has the same performance as event-driven code, and much faster than regular threads (at least the Linux POSIX pthread library). I measured context switch performance to be 135 times as fast, and thread create – destroy to be 643 times as fast.
You are right that protothreads (including my version) can’t take advantage of multiple-CPUs.
Another very important advantage of protothreads (over POSIX threads)for us is that we can write deterministic pseudo-random system-level tests that are reproduceable. (The documentation discusses that a little.)
Sorry, I hadn’t seen the rest of the comment thread before I posted the previous comment… I didn’t realize you were already talking about the protothread implementation I recently published. (The Google Code and SourceForge code is identical; I’ll probably make SourceForge refer to Google Code soon.)
Larry Ruane