How do you fit a 250kB dictionary in 64kB of RAM and still perform fast lookups? For reference, even with modern compression techniques like gzip -9, you can’t compress this file below 85kB.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T. The constraints of the PDP-11 computer meant the entire dictionary needed to fit in just 64kB of RAM. A seemingly impossible task.
↫ Abhinav Upadhyay
They still managed to do it, but had to employ some incredibly clever tricks to make it work, and make it work fast. Such skillful engineers interested in optimising and eeking the most possible performance out of underpowered hardware still exist today, but they’re not in any position to make lasting changes at any of the companies defining our technology today. Why spend money on skilled engineers, when you can just throw cheap hardware at the problem?
I wonder just how many resources the spellchecking feature in Word or LibreOffice Writer takes up.
Thom Holwerda,
Haha, this is definitely where modern software development is at. I wonder how many clever optimizations are getting lost to time because of the excess of hardware. I vaguely remember writing one of these years ago as a student project. CPU and memory weren’t restrained so naive algorithms did the trick. These days a neural network might be used.
Ironically I find the firefox spellchecker lacking. Partly because the dictionary is incomplete, but also because it frequently fails to identify the correct word. I find that duckduckgo usually gets it right so whatever algorithm they are using works better than firefox’s spell checker.
The original Unix Spell and modern spell checkers are fundamentally different use cases. One was a tiny dictionary utility that did rudimentary spell checking offline, and modern spell checkers run real time and can provide context aware spell predictions/corrections.
Sadly a lot of people tend to confuse “lack of functionality” with “optimization” or “efficiency”
Xanady Asem,
Nobody’s confusing anything, Naturally we expect to do more with modern software and hardware, but it’s simultaneously true that we don’t optimize stuff like we used to because we’re spoiled by hardware to offset inefficiency. The article said it best:
When waxing poetic about how much more “efficient” things were in the past, the definition of “efficiency” and/or “optimization” is almost always invariably a purely subjective and qualitative nebulous rant about modern day bloat and what not.
To wit. The original Unix Spell ran on a fridge-sized PDP minicomputer that consumed lots of power, all to perform rudimentary/partial batch look ups against a tiny dictionary. Whereas we now run full blown real time spelling and grammar correction, prediction, and even translation to multiple languages on tiny devices that fit in our pocket and run off battery.
So if we compare efficiency/optimization, for both systems, under actual quantitative metrics like performance per watt, we see a more complete and reasonable picture. Which help us understand that this field does in fact advance a hell of a lot in terms of efficiency and optimization.
Something from the past can be impressive in terms what they were able to achieve with the resources they had at hand. But that doesn’t automatically make it neither efficient nor optimal. It is just what it was possible at the time, with the knowledge and resources they had. That’s it.
Xanady Asem,
We really do see unoptimized bloat everywhere today though. In the past hardcore software optimization was an unavoidable necessity due to hardware constraints but now a lot of companies tend to skip it and resource utilization blows up. This is why hardware can improve by magnitudes while software doesn’t deliver a proportional benefit.
This line is a very accurate reflection of the industry today:
“Why spend money on skilled engineers, when you can just throw cheap hardware at the problem?”
For better or worse this is just the way it goes.
Thanks for proving the point. The whole “unoptimized bloat” stuff is purely subjective qualitative terminology. Confusing “lack of functionality” with “optimization”
Similarly, using a bunch of engineers to write code by hand, when you can just use HW to get better performance, cheaper, in less time… is ironically a very inefficient far from optimal approach. Which is why things go the way they go.
Would rather read comments on experiences and anectotes but that’s just me