“When I was but a wee computer science student at New Mexico Tech, a graduate student in OS handed me an inch-thick print-out and told me that if I was really interested in operating systems, I had to read this. It was something about a completely lock-free operating system optimized using run-time code generation, written from scratch in assembly running on a homemade two-CPU SMP with a two-word compare-and-swap instruction – you know, nothing fancy. The print-out I was holding was Alexia (formerly Henry) Massalin’s PhD thesis, Synthesis: An Efficient Implementation of Fundamental Operating Systems Services (html version here). Dutifully, I read the entire 158 pages. At the end, I realized that I understood not a word of it, right up to and including the cartoon of a koala saying ‘QUA!’ at the end. Okay, I exaggerate – lock-free algorithms had been a hobby of mine for the previous few months – but the main point I came away with was that there was a lot of cool stuff in operating systems that I had yet to learn.”
It seems to me like this design results in the duplication of the same code several times. I’m not sure how this is an advantage over a more traditional function-pointer based object system (in which you can still often pull the same runtime specialization tricks through vptr jamming).
I have done some coding myself using inline assembler and the CMPXCHG instruction, since the Microsoft C++ x86 intrinsic does not seem to directly expose the actual compare-and-swap power of CMPXCHG. The Google query (( _InterlockedCompareExchange CMPXCHG )) finds as the 8th hit an “HP OpenVMS systems” article explaining “But there are also new builtins with names and signatures matching those provided by Intel’s IA64 compiler, _InterlockedCompareExchange*. Instead of returning a status value, these return the value fetched. If it matches the comparison value passed in, then the new value was stored; otherwise the store did not take place”
So in other words with JUST the x86 to worry about, the ZF already says if the CMPXCHG worked or not. Evidently InterlockedCompareExchange() was designed to be compatible with other architectures, perhaps the Alpha works that way, I don’t know. But it’s a plausible explanation of why if you call InterlockedCompareExchange() primitive rather than just using the CMPXCHG directly, you then need the additional comparison to know if it worked or not – those other architectures might not set a flag, and Wintel in the days of NT wanted to be compatible with all of them.
Or if there is some other explanation, it would be worth the embarrassment to know the real story !
As of Right Now, the Google query (( _InterlockedCompareExchange CMPXCHG )) returns the parent post as the #1 hit – I was checking to see if I had any typos. Either OSnews is getting special treatment or Google’s web-crawlers operate way faster than I had intuitively thought !
I’d expect the compiler to do the optimization you describe when the intrinsic version is in use in the way you describe. I’m going to find out why this is not happening at some point.
The API is written like it is because many times people are interested in the old value even when the exchange fails (sometimes this old value is useful when recomputing and retrying the cmpxchg).
>> The API is written like it is because many times people are interested in the old value even when the exchange fail <<
Enlightening, thanks !
Impressive results and very smart OS implementation. I had a lot of pleasure reading it. It would be nice to have it working on Amiga computers but sadly CAS/CAS2/TAS instructions are not supported due to fixed cycle bus sharing with hardware chipset. Maybe it could run on Atari hardware ?