How UNIX spell ran in 64kB RAM

Thom Holwerda 2025-01-20 Unix 16 Comments

How do you fit a 250kB dictionary in 64kB of RAM and still perform fast lookups? For reference, even with modern compression techniques like gzip -9, you can’t compress this file below 85kB.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T. The constraints of the PDP-11 computer meant the entire dictionary needed to fit in just 64kB of RAM. A seemingly impossible task.
↫ Abhinav Upadhyay

They still managed to do it, but had to employ some incredibly clever tricks to make it work, and make it work fast. Such skillful engineers interested in optimising and eeking the most possible performance out of underpowered hardware still exist today, but they’re not in any position to make lasting changes at any of the companies defining our technology today. Why spend money on skilled engineers, when you can just throw cheap hardware at the problem?

I wonder just how many resources the spellchecking feature in Word or LibreOffice Writer takes up.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

16 Comments

2025-01-20 7:36 pm
Alfman verbose=1
Thom Holwerda,
they’re not in any position to make lasting changes at any of the companies defining our technology today. Why spend money on skilled engineers, when you can just throw cheap hardware at the problem?
Haha, this is definitely where modern software development is at. I wonder how many clever optimizations are getting lost to time because of the excess of hardware. I vaguely remember writing one of these years ago as a student project. CPU and memory weren’t restrained so naive algorithms did the trick. These days a neural network might be used.
Ironically I find the firefox spellchecker lacking. Partly because the dictionary is incomplete, but also because it frequently fails to identify the correct word. I find that duckduckgo usually gets it right so whatever algorithm they are using works better than firefox’s spell checker.
2025-01-20 10:23 pm
Xanady Asem
The original Unix Spell and modern spell checkers are fundamentally different use cases. One was a tiny dictionary utility that did rudimentary spell checking offline, and modern spell checkers run real time and can provide context aware spell predictions/corrections.
Sadly a lot of people tend to confuse “lack of functionality” with “optimization” or “efficiency”

2025-01-20 10:43 pm
Alfman verbose=1
Xanady Asem,
Sadly a lot of people tend to confuse “lack of functionality” with “optimization” or “efficiency”
Nobody’s confusing anything, Naturally we expect to do more with modern software and hardware, but it’s simultaneously true that we don’t optimize stuff like we used to because we’re spoiled by hardware to offset inefficiency. The article said it best:
Even though modern spell checkers use different techniques like edit distance and language models, the engineering insights from Unix spell remain valuable. It shows how deep understanding of theoretical concepts combined with practical constraints can lead to efficient and elegant solutions.
Most importantly, it demonstrates that some of the best innovations happen when we are resource constrained, forcing us to think deeper about our problems rather than throwing more hardware at them.

2025-01-21 2:31 am
Xanady Asem
When waxing poetic about how much more “efficient” things were in the past, the definition of “efficiency” and/or “optimization” is almost always invariably a purely subjective and qualitative nebulous rant about modern day bloat and what not.
To wit. The original Unix Spell ran on a fridge-sized PDP minicomputer that consumed lots of power, all to perform rudimentary/partial batch look ups against a tiny dictionary. Whereas we now run full blown real time spelling and grammar correction, prediction, and even translation to multiple languages on tiny devices that fit in our pocket and run off battery.
So if we compare efficiency/optimization, for both systems, under actual quantitative metrics like performance per watt, we see a more complete and reasonable picture. Which help us understand that this field does in fact advance a hell of a lot in terms of efficiency and optimization.
Something from the past can be impressive in terms what they were able to achieve with the resources they had at hand. But that doesn’t automatically make it neither efficient nor optimal. It is just what it was possible at the time, with the knowledge and resources they had. That’s it.

2025-01-21 3:12 am
Alfman verbose=1
Xanady Asem,
When waxing poetic about how much more “efficient” things were in the past, the definition of “efficiency” and/or “optimization” is almost always invariably a purely subjective and qualitative nebulous rant about modern day bloat and what not.
We really do see unoptimized bloat everywhere today though. In the past hardcore software optimization was an unavoidable necessity due to hardware constraints but now a lot of companies tend to skip it and resource utilization blows up. This is why hardware can improve by magnitudes while software doesn’t deliver a proportional benefit.
This line is a very accurate reflection of the industry today:
“Why spend money on skilled engineers, when you can just throw cheap hardware at the problem?”
For better or worse this is just the way it goes.

2025-01-21 4:42 pm
Xanady Asem
Thanks for proving the point. The whole “unoptimized bloat” stuff is purely subjective qualitative terminology. Confusing “lack of functionality” with “optimization”
Similarly, using a bunch of engineers to write code by hand, when you can just use HW to get better performance, cheaper, in less time… is ironically a very inefficient far from optimal approach. Which is why things go the way they go.
2025-01-22 9:23 am
Alfman verbose=1
Xanady Asem,
Thanks for proving the point. The whole “unoptimized bloat” stuff is purely subjective qualitative terminology. Confusing “lack of functionality” with “optimization”
There are optimization opportunities everywhere without taking away functionality, we don’t do it because hardware lets us get away with it and it’s cheaper not to. You can deny it if you want but this is the truth.
Similarly, using a bunch of engineers to write code by hand, when you can just use HW to get better performance, cheaper, in less time… is ironically a very inefficient far from optimal approach.
That’s kind of the point though, the software industry doesn’t care about optimal, it cares about cost savings.
2025-01-22 4:04 pm
Xanady Asem
There have been tremendous amounts of improvements in optimization and efficiency all along. It’s a major driver in our industry. The problem is that since you are not aware of or educated about them, you assume they are not there.
E.g. back in the mid 80s it would take a team of engineers a few weeks/months to design and implement a compute kernel that could achieve 2 Gflops. and then would run for days on a HW platform that cost several million dollars and would require several Kwatts and would take half of a computer room.
Now, I can solve the same problem, by myself, on a couple of hours from start to finish, on a cheap laptop running off battery.
there are massive trends in optimization and efficiency that had led to that,
2025-01-22 6:17 pm
Alfman verbose=1
Xanady Asem,
There have been tremendous amounts of improvements in optimization and efficiency all along. It’s a major driver in our industry. The problem is that since you are not aware of or educated about them, you assume they are not there.
An ad hominem, not surprised.
E.g. back in the mid 80s it would take a team of engineers a few weeks/months to design and implement a compute kernel that could achieve 2 Gflops. and then would run for days on a HW platform that cost several million dollars and would require several Kwatts and would take half of a computer room.
Now, I can solve the same problem, by myself, on a couple of hours from start to finish, on a cheap laptop running off battery.
Your example doesn’t make the case you think it does. It’s precisely because the hardware was so expensive that software efficiency was prioritized. This is exactly what the author was talking about. Today it’s “Why spend money on skilled engineers, when you can just throw cheap hardware at the problem?” The cheaper hardware gets, the less software efficiency is deemed to matter for large swaths of the industry.
2025-01-25 4:16 am
Xanady Asem
your unwavering commitment to miss the point is commendable.
2025-01-25 12:13 pm
Alfman verbose=1
Xanady Asem,
your unwavering commitment to miss the point is commendable.
It couldn’t be any clearer that hardware advancements enable publishers to cut corners on software optimization. And not for nothing but even your own example supports what the author, Thom and I are all saying.
2025-01-28 6:58 pm
Xanady Asem
The only thing happening here is you reinforcing the original point that you mistake “lack of functionality” with “optimization.”
2025-01-28 7:54 pm
Alfman verbose=1
Xanady Asem,
The only thing happening here is you reinforcing the original point that you mistake “lack of functionality” with “optimization.”
Not at all. Functional software can be optimized too, it’s just that most of today’s software skips optimization and the reason is simple: more powerful hardware helps offset software inefficiency. I don’t see a point in denying it.
It’s not that software publishers can’t do any better, they could, but there isn’t much motivation when hardware resources are so plentiful. Being behind on cutting edge hardware may have played a role in China beating our companies at AI training efficiency. Necessity is one of the great motivators.

2025-01-21 8:23 pm
BBAP2005
Would rather read comments on experiences and anectotes but that’s just me
2025-01-22 5:17 am
Shiunbird
Imagine how much electricity we would save worldwide if software today as a bit closer to the efficiency of software in the past…

2025-01-22 4:06 pm
Xanady Asem
It’s the opposite, imagine the electricity we would be wasting otherwise if software and hardware was as inefficient as it was in the past.