Windows 7 Gets User Mode Scheduling

Submitted by PlatformAgnostic 2009-02-07 Windows 15 Comments

M:N threading, in which a single kernel thread is multiplexed to run multiple logical user mode threads, has long been a feature of some Unix systems (Solaris and FreeBSD have had it for years). Even Windows NT has had “Fibers” for several releases, though they suffered from the same problems as other M:N schemes and were incompatible with many Win32 APIs. Join Windows Kernel Architect Dave Probert for a discussion on the new User Mode Scheduling Feature which solves these problems while allowing applications fine grained control over their threads.

To give you somewhat of an idea of what this is all about:

Dave and team, working very closely with the Parallel Computing Platform People, have created a very compelling new user mode thread scheduling/management system in Windows 7. In a nutshell, the User Mode Scheduler provides a new model for high-performance applications to control the execution of threads by allowing applications to schedule, throttle and control the overhead due to blocking system calls. In other words, applications can switch user threads completely in user mode without going through the kernel level scheduler. This frees up the kernel thread scheduler from having to block unnecessarily, which is a very good thing as we move into the age of Many-Core.

THe best way to make use of this new feature is ConcRT. “ConcRT is built on top of UMS and is the best way to most effectively utilize this new user mode thread scheduling model in Windows 7.”

The videos are obviously quite technical, so the faint-hearted might want to do something else.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

15 Comments

2009-02-07 1:27 am

google_ninja
I am far from an expert at windows programming, but I thought the way it worked were processes were big, heavyweight things that contained multiple threads, as opposed to Unix which just had threads and would fork them.

2009-02-07 2:54 am

galvanash
I am far from an expert at windows programming, but I thought the way it worked were processes were big, heavyweight things that contained multiple threads, as opposed to Unix which just had threads and would fork them.

In windows processes are heavy and kernel threads are much lighter. In Linux processes are much, much lighter than in Windows – kernel threads are lighter still but the difference isn’t as extreme. Essentially it boils down to Linux process management structures being very heavily performance optimized (mostly because threads were not attractive to most Linux programmers early on so a whole lot of work went into make fork() cheap)

So processes in Linux tend to be cheap enough that using kernel threads is often not worth the complexity it adds and parallel programming is often done just using simple fork() calls. On windows forking processes to achieve parallel execution is MUCH slower than using threads – so much so that is is seldom done outside of special case scenarios.

M:N threading is yet another layer of abstraction where essentially the OS only knows or cares about scheduling the M side – the N side is scheduled using a userland library. The M side can be a process or a thread depending upon how the library is implemented.

2009-02-07 4:02 am

google_ninja
That makes sense, but I figured that part of the overhead of windows processes had to do with scheduling. guess I was wrong

Thanks for the info
2009-02-07 6:45 am

PlatformAgnostic
Actually you got M and N reversed .

The kernel cares about N and the M is handle in user-mode.

2009-02-07 4:32 am

rexstuff
Doesn’t an M:N model imply that a number of user space threads can be mapped onto not just one, but a number of kernel space threads?

That seems to be what wikipedia has to say…

http://en.wikipedia.org/wiki/Thread_(computer_science)#N:M

Edited 2009-02-07 04:34 UTC
2009-02-07 4:34 am

1c3d0g
…in plain English, what does this mean? Faster Folding@Home performance from my Quad Core CPU in Windows 7?

Edited 2009-02-07 04:35 UTC

2009-02-07 6:56 am

PlatformAgnostic
It depends on how folding@home processes items. If it has a set of threads that rarely interact but keep chugging away at their own immutable portion of the problem, then this won’t help (the program should just create 4 threads and run, run run!). If folding@home has to coordinate who can process which data, or if some threads create data for other threads to consume, then this should make that operate faster.
2009-02-07 11:12 am

gilboa
Unlikely, BOINC (seti@home, folding@home, etc) doesn’t use threads for processing. (Multiple processes are used instead.)

– Gilboa

2009-02-07 7:39 am

pg--az
I generally listen to MSDN videos by downloading the WMV so that I can spend 25 minutes instead of 50 minutes – Dave Probert being one of those slow-speaking types this would be ideal. But tonite the connection is slow, it would have taken two hours to download.

“Silverlight Configuration” did not seem to contain a “Play Speed” option, can this really be missing ?
2009-02-07 10:44 am

Invincible Cow
Since these threads are user-mode threads (of course mapped onto one or more kernel threads) they can be managed from user-mode (i.e. your own program). Kernel mode or special API support is not needed. Anyone who would have wanted this could have written M:N threading for their own application already.

Edited 2009-02-07 10:44 UTC

2009-02-07 5:03 pm

looncraz
Yup – at least I did

My setup is simple, but VERY effective:

psuedo-code:

A Thread Manager doing the job of the kernel:

class ThreadManager<class ThreadPolicy = MNRoundRobin>

:public Manager, protected ThreadingObject

And a Thread class:

class Thread;

Yup, that is all it takes to do it!

Each Thread can can itself be split onto any number of system threads, or it can share a system thread with other threads. The ThreadManager owns ALL threads.

I also created another type of object type to simplify async function calls: AsyncExec. In this way any object wishing to provide asynchronous calls or to issue async callbacks can do so rather easily ( though I am still trying to find ways to reduce required code mass for this – duplicating every function call certainly ain’t a great way – even if it is automated through a build tool… ).

Oh well, once again Microsoft does what everyone else is doing, does it worse, and calls it innovation! Heck, they probably patented it as well, which will be fun considering my code has been done for.. well.. a long time

–The loon

2009-02-07 7:19 pm

PlatformAgnostic
What do you do when one of your kernel threads blocks, either to wait on some synchronous I/O or on a pagefault, over which you have no control? If you could somehow know that you need to start running a new thread (there’s a way to do this on Windows via IO Completion Ports), how do you ensure that the two threads which are now active are not thrashing in competition for CPU time.

You can’t implement this without kernel support.

2009-02-07 2:44 pm

diegocg
FreeBSD and Solaris used to do M:N, but they switched to 1:1 (in Solaris 10 and FreeBSD 7) when they realized that their M:N couldnt match the Linux 1:1 …

2009-02-07 3:56 pm

kaiwai
FreeBSD and Solaris used to do M:N, but they switched to 1:1 (in Solaris 10 and FreeBSD 7) when they realized that their M:N couldnt match the Linux 1:1 …

True, but not for those reasons. IN theory M:N should be superior to 1:1 but to get the performance required the complexity is so high that in the end one has to argue whether it pays off in the end. Linux was almost going to get M:N in the form of NGPT (Next Generation POSIX Library) that was under development by IBM, but was instead superseded by NPTL.

The only operating system that I know of which has really good M:N threading performance was Tru64 – but the complexity of achieving the same performance in Solaris and Linux just doesn’t pay off in the end.

Its one of those ‘in theory’ arguments when it comes to engineering. In Theory M:N should be superior, in theory the Itanium should the performance king – but when the rubber hits the road and all the ugly code, are thrown at it, all the synthetic benchmarks in the world aren’t going to change it from being a ‘nice in theory’ idea.

2009-02-11 3:53 pm

Bill Shooter of Bul Platinum Prime
Do you have to really program specifically for M:N, or does the operating system just take your thread creation requests and treat them as these fibers? I would think that M:N would only make sense if the application were creating a large number of threads in multiple programs. Otherwise there are two levels of scheduling overhead. If the server is single purposed, it should only have one of those. Of course on desktops and servers that have a more varied load, it might make some sense.

Edited 2009-02-11 16:06 UTC