Robert Love, a famous Linux kernel hacker, has written a blog entry with his thoughs on the recently posted Vista’s network slowdown issue and the explanation given by Mark Russinovich: “Unlike DPCs, however, the Linux parallel does not consume nearly half of your CPU. There is no excusable reason why processing IP packets should so damagingly affect the system. Thus, this absolutely abysmal networking performance should be an issue in and of itself, but the Windows developers decided to focus on a secondary effect.”
… [While Windows has]
…
This very specific difference between GNU/Linux and Windows only confirms the gut feelings I have and makes me appreciate all the more the skill and devotion of all those that have made it possible for me to enjoy using my favorite OS.
In the long run, I think you are right. Living through the short-term can be a bit aggravating, though. 😉
This is a network stack redesigned from the ground up? I know complex TCP/IP routing can be processor intensive, but seriously…
Mark goes on to show that copying a file from one machine to another consumes a staggering 41% of the available processor.
I can’t imagine what causes this sort of load. This screams denial of service to me; if it’s this complicated they should drop the DPC to a lower priority as soon as they can, rather than hanging all other threads on the system.
If something virtually blocks all other threads on the computer, you really want to make sure you keep it’s execution time down to a minimum. Routing TCP/IP to the correct application shouldn’t be extremely hard unless you’ve got a very complicated packet filter in place: but this should only really happen on dedicated routers/firewall boxes. 300 MHz P2 boxes seem to handle 1gbps home routing and forwarding on Linux without problem in my experience, and can even function as a streaming media player.
Even weirder, is that rather than looking at virtually every other OS and wondering how they get around this problem without taking 40% CPU, they just create a new sub-scheduler. If all other OSes manage it, why can’t Windows… hell they could even take a quick peek at the latest FreeBSD networking code to see what optimisations and algorithms they use. A programmer who doesn’t take advantage of the tools available to him is a bad programmer.
Incidentally, is this MMCSS thing just MS speak for “When media is playing it lowers network speed,” or is it some scheduler extension. If it is a scheduler extension, why can’t these features be available to other non-media related realtime tasks, rather than being media centric; infact, why didn’t they just make it part of the scheduler along with the other priority settings? (I’m curious – not just being rhetorical)
Wow that 300mhz P2 must have a hell of a system bus. I have a 3.8ghz P4 at work that couldn’t get above 40 megabytes per second in throughput. Still very fast, but not very close to maxing out gigabit ethernet.
“Wow that 300mhz P2 must have a hell of a system bus. ”
Depends almost entirely on the number of packets per second and not on the bits per second. More packets, more interrupts, more CPU load. This is why gigabit nic’s have interrupt mitigation and large buffers. If you use large packets you can get very respectable bps on low end hardware.
Still, 1gbps on a 300mhz p2 is suspiciously high.
“I have a 3.8ghz P4 at work that couldn’t get above 40 megabytes per second in throughput.”
This, on the other hand, is suspiciously low.
That’s definitely not normal. Even my Disk I/O is better than that on a 5 year old P4 Linux system (around 50MB/s to a plain old PATA drive). PCI should be capable of 133MB/s and GigE is 128MB/s.
The point is, regardless of throughput speed, the processor time consumption of Linux network code seems to be far lower than that of Vista.
As a test when building a Mythbox from a 450MHz P3 (with an Atheros wifi card), I turned on VLC and streamed live TV from it to VLC running on a Mac (wired directly to the wifi router).
Gapless and in sync.
The Multimedia Class Scheudler Service is built into the scheduler. Any task in Vista can register itself with it to ensure that task is not interrupted.
I’d imagine that on a Vista-based server (if such an animal exists — I’d imagine most servers are still on Windows 200X), the process that did the actual serving would be using this class to make sure that it never dropped, while non-critical and unimportant tasks (the GUI, background maintenance tests) were executed as secondary tasks.
MMCSS is not integrated with the scheduler. It’s a usermode hack. It’s a service that runs with high privilege that adjusts the priorities of other services. It is not useful for a server and likely won’t exist on WS2008.
MMCSS is not integrated with the scheduler. It’s a usermode hack. It’s a service that runs with high privilege that adjusts the priorities of other services. It is not useful for a server and likely won’t exist on WS2008.
You miss the point. The problem here is with Vista’s networking stack somewhere. The MMCSS hack was implemented as a response to it, but all processes can be affected, not just multimedia ones.
Microsoft programmers are not allowed to look at any external source code at all, not even BSD licensed code. There are exceptions of course, for instance zlib is available internally, but those exceptions are only granted by Legal and Corporate Affairs (LCA) after extensive IP vetting. The problem for Microsoft, as always, is liability and IP taint; they are scared that if they take ideas and/or code from BSD etc someone might claim it’s some kind of patent infringement or similar. They deploy this policy world-wide because even though far from all countries recognize the patentability of software, Microsoft still needs to distribute its software globally which include the US market with its insane legal climate.
I suspect that the Linux vs. Windows difference is more in the drivers.
Looking at my Vista nForce controller properties, I have a ton of options, including flow control, interrupt moderation, 4 types of checksum offload, and 3 types of sender offload.
In Linux, everything that works is enabled. If it doesn’t work, someone tries to find out why.
It appears to me though, that Windows drivers default to a play-it-safe mode, because most of those options were *disabled* on my controller.
It would be interesting to find out Mark’s network controller, driver, and driver option settings. I believe Windows could do much better than 40% if most of the work was done on the network controller, as it should be.
Windows users are mostly the type of it should work out of the box. Nothing wrong with that. Why should they browse sites such as OSNews and the like to get their answers? And don’t forget to count the ugly Genuine Advantage debacle. Ugly situation indeed. And they dare to ask a lot of money.
Edited 2007-08-29 07:26
I suspect that the Linux vs. Windows difference is more in the drivers.
Microsoft tried to do the same thing with NT 4 on a lot of occasions – try to blame third-party drivers.
It would be interesting to find out Mark’s network controller, driver, and driver option settings. I believe Windows could do much better than 40% if most of the work was done on the network controller, as it should be.
That’s exactly what Vista’s network stack is supposed to do – offload to TOE. However, all that it’s done is increase the complexity of the networking stack, drivers and the possibilities of things to go wrong.
The Linux guys quite rightly rejected TOE because there was a lot of pain with zero benefit to anyone.
Based on what evidence? TOE is beneficial only for situations where the bandwidth utilisation is so high there are major performance penalties associated with processing huge amounts of TCP/IP information coming in. Heck, right now, for example, talk to anyone with incredibly large systems and the amount of CPU utilisation due to the amount of traffic.
Using the rationale of the ‘Linux experts’ – encryption acceleration and XML parsing acceleration are all a waste of time. Waste of time or more like, a complex problem that requires a design from the ground up for it to work properly. Something demonstrated in the number of re-writes of Linux subsystems because inadequate of designing before writing the code.
Based on what evidence? TOE is beneficial only for situations where the bandwidth utilisation is so high there are major performance penalties associated with processing huge amounts of TCP/IP information coming in.
Look at the history of networking. People have been talking about TOE for years, even with 10 and 100 networks, and it has never come about because CPUs and hardware in general have kept pace. TOE also dramatically increases the cost and complexity of network cards, drivers and stacks for no benefit. Additionally, in some cases it may not even increase performance at all, because of extra communication between the network card and the rest of the system.
Networking is supposed to just work, not get reinvented. It is no reason to completely redesign and rewrite a network stack and all the associated drivers. TOE has not been proven to offer any benefit whatsoever.
Mind you, from what Mark Russinovitch showed, it isn’t doing a good job of keeping CPU usage down!
Using the rationale of the ‘Linux experts’ – encryption acceleration and XML parsing acceleration are all a waste of time.
Depends on what the cost/benefit is.
Something demonstrated in the number of re-writes of Linux subsystems because inadequate of designing before writing the code.
You misunderstand. That’s called iterative improvement. People ask themselves if there is a pay-off in the long run and it is usually done without affecting anything else or rewriting ten dozen drivers.
It’s how Linux keeps improving, and how, quite frankly, systems like Vista and Solaris don’t.
Edited 2007-08-29 11:12
One could apply that rationale to another of other devices, and yet, they’re still used.
One has to ask, if they’ve been ‘trying’ to do somethinmg for 10 years, one would assume that maybe there is a rationale behind it – one that does make sense.
I mean, I assume that thee decisions are being made by people who are far more intelligent than me, and thus probably know alot more about the greater complexities that exist. For someone (Robert Love) to jump out of the wood works and slam something with an off the cuff remark smacks of ignorance more than anything else.
Oh, and if Rober love wants a tip – could he please fix the networking stack in Linux; everytime I tried to do something like ripping a CD I would find my wireless network performance plumet, heck, some cases the websites wouldn’t even load! I find it rich he is complaining about Windows and yet ignores the elephant in his own room.
It isn’t ‘iterative’ – iterative means that you take an existing idea and build upon it. For example, I grab a sponge cake, put icing and other stuff and turn it into a Gatto.
One grabs the existing networking code and then builds ontop of it a wireless stack which implements the required wireless features by extending the existing networking stack.
That is what iterative is. Linux doesn’t iterate, it throws out and starts again. Fixing old problems and introducing new ones. It wastes time, breaks compatibility and quite frankly, no one learns anything through the wasted exercise of re-inventing the wheel over and over again.
Why you slag Solaris, god only knows, because unless you’ve been living in a cave, Solaris improves without re-inventing the wheel. Features being added all the time, and features that were added 4 years ago *shock* aren’t so badly designed they actually need to be replaced and break compatibility.
Based on what evidence?
The TOE issue has been discussed to death, search the archives. This is a design decision that has not been taken randomly, there’re strong reasons to argue that implementing TOE is stupid and a waste of time.
“The Linux guys quite rightly rejected TOE because there was a lot of pain with zero benefit to anyone.”
There’s no evidence that TOE is the cause of this problem.
There’s no evidence that TOE is the cause of this problem.
I never said it was (it was a specific reply) – but something in the network stack is definitely wrong because this didn’t happen in XP.
Edited 2007-08-29 11:15
Wow that 300mhz P2 must have a hell of a system bus. I have a 3.8ghz P4 at work that couldn’t get above 40 megabytes per second in throughput. Still very fast, but not very close to maxing out gigabit ethernet.
Are you sure you weren’t measuring the throughput of your disk? I would have expected a 100-133MB/s PCI bus to handle better than that.
When Adrian Kingley-Hughes measured transfer (tracing back blog links lead to http://blogs.zdnet.com/hardware/?p=702 ), he was using Windows Task Manager. However, as Mark Russinovich showed at TechEd, Windows Task Manager isn’t really that good of a tool for measuring these things. While the audience watched, Mark ran a demo using a stress-testing program on Vista. Windows Task Manager showed that task as using nearly no memory and processor time, but was revealed to be nearly completely locking the system in his tool (and in the horribly jerky control while the stress test was running).
I’m not saying the problem’s not there — for all we know, it could be much worse — but I’d like to see the results tried out on something a little better.
Edited 2007-08-29 04:35 UTC
It wasn’t task manager. It was his own tool – Process Monitor.
I respect Mark, when it comes to Windows he is def ‘the man’. I enjoy his explanations of strange happenings in Windows and I’ve learned a lot reading his blog and other material.
However I am really beginning to wonder just WTF MS is smoking up there in Redmond.
They seem to pride themselves on over-complicated band aids and a never ending string of ‘helper’ services that try to make up for design flaws imho.
To me this sounds like decisions from which developers were excluded.
A recent update I installed tonight – performance has improved but I think that zlynx has a good point. I wonder how much of the issues are related to poor setup than anything to do with solely a Microsoft issue.
For me, my computer had Windows Vista by default with all the latest drivers included with the restoration cd. I haven’t seen any performance issues, then again, I have a very souped up machine with 2gigs of RAM.
Well, 41% of CPU time for copying a network file, and still getting the dreaded “Calculating time remaining”…
Reading the posts on 2CPU you find that several of the people complaining about poor performance (including dloneranger) use motherboards with integrated Gigabit Ethernet controllers. And while the integrated controllers are convenient and save precious PCI slots, most of them are not in the high performance category.
One thing that I know from experience is that drivers and driver settings greatly affect the performance of a NIC, whether it be integrated or a separate card. Zlynx is right in saying that the driver could be the culprit. We had a problem where database exports between a SunFire 4800 and a Windows 2000 Server and Windows 2003 Server machines (HP/Compaq DL380’s) would take in excess of 30 minutes to complete (and the Gigabit Controllers on a DL380 are a little more powerful than those on your typical PC motherboard). We upgraded the NIC drivers and beefed up the TCP Transmit Descriptors and the same exports completed in 5 minutes. The one thing we noticed is that the updated driver had more tuning options than the older version, particularly the Transmit and Receive Descriptors.
And while there are plenty of people who are trying to paint this as a Microsoft problem, when you are dealing with drivers, my experience has been to update and tweak to see if the performance improves before you start complaining.
It would also be interesting to see whether this performance issue shows up using a quality PCI/PCI-X/PCI Express NIC or is it limited to motherboard integrated NIC’s.
This reminds me of a Microsoft paper called:
“The Problems You’re Having May Not Be the Problems You Think You’re Having”
It was written in 1998 and was a study of the problems with handling low latency audio, video etc tasks on NT kernel based operating systems.
It has an interesting paragraph:
“Another commonly held view is that Ethernet input packet processing is a problem. Yet we tested many of the most popular 10/100 Ethernet cards receiving full rate 100Mbit point-to-point TCP traffic up to user space. The cards we tested were the Intel EtherExpress Pro 100b, the SMC EtherPower II 10/100, the Compaq Netelligent 10/100 Tx, and the DEC dc21x4 Fast 10/100 Ethernet. The longest observed individual DPC execution we observed was only 600 µs, and the longest cumulative delay of user-space threads was approximately 2ms. Ethernet receive processing may have been a problem for dumb ISA cards on 386/20s, but it’s no longer a problem for modern cards and machines.”
So why did Microsoft need to fix something that was not broken?
You’re assuming that those results are still relevant. Windows’ architecture has changed signifigantly in 9 years, as have NICs and NIC drivers (and the care taken in writing them). More processing is done in software on the low-end because it’s more cost effective, not unlike with most modems today vs. their more expensive hardware counterparts. Greater driver dependance leads to greater variances between driver versions. The driver could easily harm performance with long DPC latencies, wheras a different driver improves performance by shortening such latences. Likewise, the driver’s configuration can also impact performance.
Edited 2007-08-29 19:10
Thank you for that interesting 1998 quotation about 100 mb ethernet. Today 1000 mb ethernet transfers in XP on one of today’s machines consumes nearly half of the CPU, but running MP3 audio concurrently does NOT produce glitches in the audio.
The article seems kind of ironic to me. Saying the networking guys and the media guys arent working together nicely in the Windows world…
yet..
Linux has had a >3 year quest to provide glitch-free media playback… and now finally the scheduler guys are finally listening to the media guys and changing the scheduler…
And one other thing…
It would be nice if the linux kernel “wasnt so responsive”. My box could be long-and-gone dead outta memory and swap, yet the box happily accepts new tcp connections.
Makes my load-balancing appliance not as useful.
http://en.wikipedia.org/wiki/Technical_features_new_to_Windows_Vist…