It isn’t how often MS updates Windows; it’s how it develops it

Thom Holwerda 2018-10-20 Windows 13 Comments

Developing a Chrome-like testing infrastructure for something as complicated and sprawling as Windows would be a huge undertaking. While some parts of Windows can likely be extensively tested as isolated, standalone components, many parts can only be usefully tested when treated as integrated parts of a complete system. Some of them, such as the OneDrive file syncing feature, even depend on external network services to operate. It’s not a trivial exercise at all.
Adopting the principle that the Windows code should always be shipping quality – not “after a few months of fixing” but “right now, at any moment” – would be an enormous change. But it’s a necessary one. Microsoft needs to be in a position where each new update is production quality from day one; a world where updating to the latest and greatest release is a no-brainer, a choice that can be confidently taken. Feature updates should be non-events, barely noticed by users. Cutting back to one release a year, or one release every three years, doesn’t do that, and it never did. It’s the process itself that needs to change: not the timescale.

The latest Windows feature update had to be pulled due to a serious data deletion bug, so it makes sense to take a good look at the development process of Windows, and what can be changed to prevent such problems from appearing again.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

13 Comments

2018-10-21 1:06 am
WorknMan
Even if it’s just not feasible from a manpower or cost perspective to QA all the major bugs out of each release, then they should definitely ease up on the six month updates and maybe release them every year or so.

2018-10-21 2:01 am
dpJudas
There is a curse going through the software industry currently that all desktop software should behave like websites. That is, the product will invisibly always be up to date with the code of latest version, which is updated all the time.
The poster child of this behavior is Chrome.
With Windows 10 they’ve been trying to get the OS to update like this, and now that they fucked up they are doubling down by saying it is because they still aren’t Chrome’ish enough in their development.
I say this is exactly the kind of behavior you’ll have to expect more of. The thing is, no amount of fancy in-house testing will ever get rid of all the bugs. The deal used to be pretty simple for Windows: f0r each OS release the product would become increasingly stable as each service pack included only bug fixes and minor features. You as a user had the ability to choose at which stage to install – earlier access vs stability.
In the new model all that power has been taken away from you. Instead, it is solely the product owners that decide. And their interests rarely align with yours.

2018-10-22 9:19 am
A.Dev
You make a good point about the developer process potentially leaking into release space ( rather than the other way around ). However it doesn’t have to.
Developing internally continuously on the ‘head’ so that everybody is working on the same version, isn’t necessarily the same as releasing all features continuously.
Hence the Chrome stable and dev release channels.
How does that work on a single code base you ask?
One way is to develop new features behind ‘feature toggles’ . ie they are not on by default, but can be switched on via a startup option.
This means everyone is still working on the same version of the code, but you don’t have to ship all the features to users.
Also developing behind a toggle, also helps the developers built it in a way that it can be turned on and off easily, which generally results in a more modular code structure.
The old Windows mode of development directly led to the disaster that was Windows Vista ( late and bad ).
That resulted in many companies skipping Vista entirely – an 8 year wait between XP and Windows 7.
While the new process isn’t going to result in bug free code – overall it would be hard pressed to be worse.

2018-10-22 10:29 am
dpJudas
How does that work on a single code base you ask? [/q]
No, I’m not asking that. I’m fully aware of the concept of feature branches, bug fix branches, merging things into one or more branches.
The problem is the simplistic nature of nightly->beta->live release channels (those three steps go by different names depending on product – map as accordingly). It means features are continuously merged into the live branch, no matter how you put it.
This is in strong contrast to the old way where you could jump from Windows 2000 SP2 -> Windows XP SP2 -> Windows Vista SP2 -> Windows 7 SP2.
This choice does simply not exist in the new model as none of the release channels offer this. What they offer is getting totally untested, slightly tested or mass deployed. There’s no “oops, we mass deployed it, it broke and we fixed the bugs, but we did not add new features” version (*).
One way is to develop new features behind ‘feature toggles’ . ie they are not on by default, but can be switched on via a startup option.
It is funny you bring Vista into this, because Vista’s problem was that they made some fundamental changes that simply cannot be hidden behind ‘feature toggles’. Specifically, they changed the entire display driver model, not something you can make optional. Even if you do include both at the same time, eventually one day you have to make the switch from one to the other. Is all hardware ready?
Were all plugins ready when Firefox did this? Of course they weren’t – I lost half my plugins that day and it was a product owner at Mozilla that decided that I one morning should have this “gift”. Now, I’m not saying no changes should ever be done, but there’s no denying that I lost the ability to choose when I think a new product/feature is ready to land on my computer.
[q]That resulted in many companies skipping Vista entirely – an 8 year wait between XP and Windows 7.
A choice they had. Now the don’t. When Windows 10 1908 lands the next Vista or Metro disaster – well well, enjoy getting it the next morning you wake up for work whether you want it or not.
*) actually technically there’s a somewhat hidden indirect way you can get this with Windows 10. It involves changing the update policy to not be targeted and delayed by a week or two. The trick here is that when Microsoft shit someone else computer and pulls the update it won’t have reached your computer yet and dodged the bullet. But even that approach won’t get you as close as the old ways.

2018-10-22 3:55 pm
A.Dev
So your complaint is there is not choice of stable branches that are patched over time – just one, the trunk. I can see that – but’s that’s not a feature of the development practice – it’s a choice by MS to reduced costs not a result of the choice of development style.
The key thing for trunk based development is not to dispense with release branches – but to not ‘develop’ on the branch – you develop on the trunk and back port to the branch, not the other way around.
> It is funny you bring Vista into this, because Vista’s problem was that they made some fundamental changes that simply cannot be hidden behind ‘feature toggles’. Specifically, they changed the entire display driver model, not something you can make optional
Perhaps if they had had the discipline of having to evolve the software, the first change would have been to make it evolvable, with the same behaviour, then add the new behavior. End result a better code base, much easier to test in the context of the rest of the system – easier to spot regressions due to that particular change etc.
Edited 2018-10-22 16:06 UTC

2018-10-22 9:49 am
avgalen
In general you are correct. The industry has completely moved to “everyone on the latest version of everything unless….” with the ‘unless’ part often being unrealistic options for most people (Long-Term Servicing Channel for example).
However, I completely understand that this is the way things have evolved. It doesn’t matter so much that you and I are using different versions of the same program if we don’t exchange data, but in 2018 almost everything is exchanging data and that is incredibly hard if we cannot update both server and clients automatically. In the past we often used a “greatest common denominator” approach were we would downgrade communication to older protocols so we could communicate, but that is just too insecure.
In general IT is moving too fast and things are released too early and broken with a fix “coming soon”. The “release early, release often” mantra has been taken to extremes and software is reaching the global population too early. But going back to the “there is a new release every few years and most people will just a few releases” is not a good solution anymore either. There is a balance every developer has to make and the scale has tipped a bit too far. Time for a correction, but not an overcorrection

2018-10-22 11:43 am
ahferroin7
The thing is, this really isn’t a bad deployment model, if you actually develop things sanely and test properly. That last bit is the entirety of the issue, nobody wants to hire dedicated testers, so all the testing work gets pushed off on the original developers and any users who opt-in to getting ‘beta’ versions.
The person who developed a piece of code should not be the sole tester before it gets pushed to users, and no amount of ‘testing’ by volunteer users can cover what an actual dedicated tester would.
The net result of all of this is that what’s actually getting pushed to those ‘beta’ users is actually alpha (or sometimes even pre-alpha) quality software, and what regular users who didn’t opt-in to the ‘beta’ are getting is quite often at best beta quality.

2018-10-23 9:09 am
avgalen
Required introduction for all developers:
Pre-Alpha: No testing has been done
Alpha: blackbox testing, by another team, is started
Beta: feature complete, but might still contain both known and unknown bugs. Normally the first time someone outside of the developing organization receives a release.
Release Candidate: A beta-release with no known showstopper-class bugs is tested on outside locations by selected users and only bugfixes should be added to the code
https://en.wikipedia.org/wiki/Software_release_life_cycle#Pre-alpha

2018-10-21 7:03 am
jmorgannz
I dont mind if they make feature releases reasonably often; but marking these as critical security updates is a problem.
You can stop most updates by setting your connections(s) to metered.
Critical security updates ignore this though. Fair enough they should be small.
Twice my father has had his entire cap (expensive remote satellite connection) eaten because Microsoft has marked a 3gb+ feature update as ‘critical’

2018-10-21 2:23 pm
BlueofRainbow
At 3GB+, it seems that these feature updates are the full ISO one would download for a fresh install.
How about a household with two, three or more devices running Windows 10? A sensible approach would be to offer a choice between getting this ISO from the nearest Microsoft Store, delivered by snail-mail, or downloaded once for multiple installs.

2018-10-22 11:37 am
ahferroin7
If you configure delivery optimization right on multiple systems, you should only end up downloading the update once. The problem is that MS doesn’t push out feature updates to everybody at once, even if they have the same physical hardware, and as a result it’s not unusual for each system in a given household to get the update at different times, even if they have the same update settings.

2018-10-21 10:40 pm
jmorgannz
Windows 10 shares the updates with PC’s on LAN (unless you turn it off)
So it need not download the 3gb+
per machine in household
Edited 2018-10-21 22:40 UTC

2018-10-22 11:38 am
ahferroin7
Except for the fact that you have to deal with Microsoft’s staggered deployment methodology. Given my own experience, it’s unusual to see two Windows 10 systems receive the feature updates at the same time, even if they have identical update configurations and hardware.