What made the 1960s CDC6600 supercomputer fast?

Thom Holwerda 2020-02-17 Hardware 16 Comments

Besides the architectural progress, the CDC6600 was impressive for its clock speed of 10 MHz. This may not sound much, but consider that this was a physically very large machine entirely built from discrete resistors and transistors in the early 60ies. Not a single integrated circuit was involved. For comparison, the PDP-8, released in 1965 and also based on discrete logic, had a clock speed of 1.5 MHz. The first IBM PC, released 20 years later, was clocked at less than half the speed of the CDC6600 despite being based on integrated circuits. The high clockrate is even more impressive when comparing it to more recent (hobbyist) attempts to design CPUs with discrete components such as the MT15, the Megaprocessor or the Monster6502. Although these are comparatively small designs based on modern components, none of them get to even a tenth of the CDC6600 clock speed.

Detailed look at the speed of the CDC6600.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

16 Comments

2020-02-17 4:38 pm
Drunkula
Interesting article. I don’t ever recall hearing of this computer. Oh well!
2020-02-17 6:25 pm
cb88
The difference is obvious… it used 60kW to switch all those transistors that fast. It was also one of the first architectures to use hardware parallelism.
ELC logic by the mid 80s was up toward 100Mhz in Cray Supercomptuers.
The reason PDP8 and other mini and micro computers had lower clocks was due to the designs being optimized for relatively low power and cost. Due to those optimizations almost all the the optimizations that made the CDC fast were left out. The CDC6600 even had a scoreboard for dynamic scheduling… something not seen in PCs until nearly the 90’s.
2020-02-18 7:05 pm
JLF65
I made a 4-bit CPU from TTLs and EPROMs for the microcode back in the mid 80s that could hit 10MHz. It’s not really that hard. It wasn’t even on PCBs – it was a group of prototyping boards from Radio Shack. It had only a single “modern” chip – a PEEL I used to synchronize the reset with the pipe clock (it had a two stage pipe to make reading/writing the internal registers easier).

2020-02-18 8:32 pm
Alfman verbose=1
JLF65,
I made a 4-bit CPU from TTLs and EPROMs for the microcode back in the mid 80s that could hit 10MHz. It’s not really that hard. It wasn’t even on PCBs – it was a group of prototyping boards from Radio Shack. It had only a single “modern” chip – a PEEL I used to synchronize the reset with the pipe clock (it had a two stage pipe to make reading/writing the internal registers easier).
I am impressed. I’ve only designed CPUs on paper and never actually built one out of transisers/gates. Not to detract from your achievement, but what makes the CDC6600 stand out isn’t merely the 10MHZ clock, but that it did so with 60bit floating point registers. Large registers require deeper cascading logic, which potentially requires faster transistors to reliably reach a given clock frequency. The CDC6600 mantissa was 48bits, so with some hand waving and all else being equal, a 4bit computer might need to operate at around 120MHZ to match the performance of CDC6600 at 10MHZ.
https://en.wikipedia.org/wiki/CDC_6600#Peripheral_processors_(characteristics)
Floating-point operations were given pride of place in this architecture: the CDC 6600 (and kin) stand virtually alone in being able to execute a 60-bit floating point multiplication in time comparable to that for a program branch.
Fixed point addition and subtraction of 60-bit numbers were handled in the Long Add Unit, using ones’ complement for negative numbers. Fixed point multiply was done as a special case in the floating-point multiply unit—if the exponent was zero, the FP unit would do a single-precision 48-bit floating-point multiply and clear the high exponent part, resulting in a 48-bit integer result. Integer divide was performed by a macro, converting to and from floating point.[25]
…
The system used a 10 MHz clock, with a four-phase signal. A floating-point multiplication took ten cycles, a division took 29, and the overall performance, taking into account memory delays and other issues, was about 3 MFLOPS. Using the best available compilers, late in the machine’s history, FORTRAN programs could expect to maintain about 0.5 MFLOPS.
I’m glad that you posted this article Thom, this computer is new to me.
2020-02-18 10:58 pm
Xanady Asem
A 4 bit integer datapath is significantly more simple than the 60bit out of order cpu of the cdc 6600.
The thing of note about the cdc 6600 is not just how fast it’s implementation was for the time, bu how sophisticated it’s architecture was.

2020-02-19 3:17 pm
JLF65
I wasn’t trying to compare my 4-bit processor to the CDC, I was pointing out that the article seemed to think that no one could hit 10MHz with hobbyist processors, which is silly. There are plenty of hobby processors that go that fast or even faster, but they were ignored by the article.
I was basically saying, “even my college class project hit 10MHz, it’s not that big a deal for hobbyists.” Like others, I’m impressed by the CDC architecture – that’s where the magic was.

2020-02-19 5:12 pm
Xanady Asem
Oh, I see.
2020-02-19 10:48 pm
Alfman verbose=1
JLF65,
I wasn’t trying to compare my 4-bit processor to the CDC, I was pointing out that the article seemed to think that no one could hit 10MHz with hobbyist processors, which is silly. There are plenty of hobby processors that go that fast or even faster, but they were ignored by the article.
I was basically saying, “even my college class project hit 10MHz, it’s not that big a deal for hobbyists.” Like others, I’m impressed by the CDC architecture – that’s where the magic was.
It could have been clarified in the article, but I don’t think it was intentionally misleading. I expect you would agree that comparing the clock rates for a 4bit circuit and a >>4bit circuit is not an apples to apples comparison. Max clock rate is a function of the sum of all cumulative logic gate latencies, which is in itself is a function of how many cascading bits you are computing. Saying something such as “my 10MHZ transistors are the same as your 10MHZ transisters” would be an oversimplification. If you had a 48bit adder that could run at 10MHZ, those same physical transistors are already equivalent to 120MHZ in a 4 bit adder arrangement, just by chop off the high bits. The transistors were already inherently fast enough without requiring higher voltages or better transistors. You’d just need to dissipate more heat as a result of the higher duty cycle.
So with all this in mind, I doubt many college/hobby processors were achieving similar speeds with homemade logic gates at the time. Reaching 10MHZ may have been feasible, but it was solving for a simpler problem.

2020-02-21 9:01 am
JLF65
Logic doesn’t work that way. Doing an 8-bit add only takes twice as long as a 4-bit add if you use a 4-bit ALU and do two cycles to add the 8-bit values. My 4-bit processor could have easily been 8 or 12 or 16 or 128 bits wide and still ran at the same speed. The only reason it was 4 bits instead of 8 or something else was that the class was assigned a 4 bit processor with set instructions (just what the instructions did, not how they were encoded or how they should be implemented – everyone took different paths on this same specification).
Ripple carry adders can take longer for wider adds, but not twice as long, and who would use a ripple carry architecture in that case? You use it when it doesn’t matter to the speed, and use look-ahead carry when speed matters.
2020-02-21 5:28 pm
Alfman verbose=1
JLF65,
Logic doesn’t work that way. Doing an 8-bit add only takes twice as long as a 4-bit add if you use a 4-bit ALU and do two cycles to add the 8-bit values. My 4-bit processor could have easily been 8 or 12 or 16 or 128 bits wide and still ran at the same speed.
…
Ripple carry adders can take longer for wider adds, but not twice as long, and who would use a ripple carry architecture in that case? You use it when it doesn’t matter to the speed, and use look-ahead carry when speed matters.
Ripple carry propagation delay is proportional to the number of bits. You make a valid point about look ahead carry, but even there you can’t ignore propagation delays of the additional logic gates, that still adds latency as you add bits, I found a latency calculator for it here:
http://www.ecs.umass.edu/ece/koren/arith/simulator/Add/lookahead/
With the given gate characteristics with a group size of 4, 4bits incur 7.2 units of latency whereas 128bits incur 31.2 units of latency. Better than ripple carry, but if you were already pushing your transistors near their max performance to begin with, then adding more bits does mean you’ll have to dial back your clock rate to have all those bits completed inside the clock window.
I’m not able to independently confirm the article’s claims, but the point about their transistors making the difference needed to achieve that clock rate for their number of bits still sounds plausible to me even after the discussion we’ve had so far. Still disagree?
2020-02-22 11:01 am
JLF65
The latencies you mention would be an issue if you were pushing the limits of the tech. At 10 MHz, you’re nowhere near the limits unless you’re trying to use some REALLY old chips and wired protoboards. 🙂 But yes, technically you have a point. It just doesn’t matter for the article. You could also work around those latencies on old/slow stuff using pipelines. It means a more complicated design, but it’s still doable.
2020-02-24 10:53 am
Alfman verbose=1
JLF65,
The latencies you mention would be an issue if you were pushing the limits of the tech. At 10 MHz, you’re nowhere near the limits unless you’re trying to use some REALLY old chips and wired protoboards. But yes, technically you have a point. It just doesn’t matter for the article. You could also work around those latencies on old/slow stuff using pipelines. It means a more complicated design, but it’s still doable.
Well considering it was 1960s tech that not only bested other discrete computers at the time but even the first IBM PC built with ICs 20 years later, it still seems significant. I don’t really know why the IBM PC was not able to be clocked as high (and with fewer bits at that). Do you believe that the IBM PC should/could have been clocked 100+% faster and they just didn’t because they were extremely conservative on clock speeds some reason? I did a search, and although apparently some 8088 clones eventually reached 10MHZ, I couldn’t find any record of the original 8088 itself being successfully clocked that high. So I really don’t have an explanation other than the one offered by the article regarding novel transistors as to why they had a commercial clock advantage for so long.
I concede you have more experience with transistors circuits than I do, but it still doesn’t make sense to me that if increasing the clock speed was so easy using ordinary transisters, that the industry wouldn’t have done so much sooner in order to gain an easy competitive advantage. Something doesn’t seem to add up. Oh well it’s not important, just curious.
2020-02-25 11:51 am
JLF65
Making a faster home cpu from TTLs and transistors and the like was easy as these were all smaller parts that were fairly well established by that point – mature and getting faster all the time. Bigger packages like “modern” cpus and memories would need more time to reach lower powers and higher clock rates without costing a fortune. My 11 MHz 4-bit processor had 256 bytes of sram – it was easy to get higher speed parts for it. Making a full 8 or 16 bit processor in a single chip with more than 1KB would mean compromises in power and/or speed to make high volume production at a reasonable cost viable.
2020-02-25 2:44 pm
Alfman verbose=1
JLF65,
Making a faster home cpu from TTLs and transistors and the like was easy as these were all smaller parts that were fairly well established by that point – mature and getting faster all the time. Bigger packages like “modern” cpus and memories would need more time to reach lower powers and higher clock rates without costing a fortune. My 11 MHz 4-bit processor had 256 bytes of sram – it was easy to get higher speed parts for it. Making a full 8 or 16 bit processor in a single chip with more than 1KB would mean compromises in power and/or speed to make high volume production at a reasonable cost viable.
Ok, you keep saying it’s easy to make faster computers at home, but can you provide a link for a homemade computer built from discrete components that actually reached 10mhz clocks ideally with a decent 8+bit word size? If you want to criticize the article over this, I don’t have a problem with it per say, but it’s not like author used a handwavy argument and failed to provide evidence for how impressive this was in the 1960 and beyond. He even specifically cited two modern homemade computers being built out of discrete components today: the 16bit megaprocessor (20khz) and monster 6502 (50khz) that won’t be able to reach those speeds. Real world examples provide much stronger evidence than handwaving, so IMHO your argument that this “is easy” would benefit greatly from a concrete counterexample.
2020-02-25 6:06 pm
JLF65
Examples were easier to find years ago when this was all the rage. I made mine in ’85. These days, it’s all FPGAs, and custom CPUs are everywhere and run up to 100MHz. I won’t count that since it’s not TTL or similar. Let’s look for what TTL cpus CAN be found today…
Here’s one that’s 8MHz: http://www.mycpu.eu/
Here’s one that’s >6MHz: https://geeks-world.github.io/articles/465805/index.html
There’s a few others easily found that are ~6MHz.
Here’s one that’s ~4MHz: http://www.homebrewcpu.com/
This one is in progress and aiming for over 10MHz, it’s predecessor ran at 6.25MHz: https://hackaday.io/project/167605-kobold-k2-risc-ttl-computer
Here’s one that’s breadboarded and runs at 5MHz: https://github.com/Pconst167/dreamcatcher
Here’s one that runs at 3.58MHz: https://github.com/DoctorWkt/CSCvon8
Hmm – seems hobbyists today that work in TTL aren’t as ambitious as they used to be. Those folks are probably all working with FPGAs now instead since that’s where the real speed and flexibility for designing your own cpu can be found for a reasonable price. Why kill yourself making a 5 to 10 MHz TTL 8-bit cpu that won’t get you any praise when you can make a homebrew 32-bit cpu that runs at 50 to 100 MHz instead using an FPGA. Given that FPGAs have been readily available since 1985 (half my class used Xilinx for their design, but at the time I felt that was “cheating” in making your own cpu), I can understand why most of the work today is all FPGA. Hell, I want to do some FPGA projects. 🙂

2020-02-20 8:58 am
Iapx432
There is an operational CDC6500 at the living Computer Museum in Seattle. This is a dual CPU computer with two CDC 6400s. The CDC 6400 shares architecture with the CDC 6600 but appears to have only one ALU, vs 10.
You can request on-line access, Here are the instructions for when you get that: https://wiki.livingcomputers.org/doku.php?id=cdc6500_survival_guide .
Everybody know that the “HAL” in HAL 9000 is a decrement of IBM. Well it seems the 9000 came from extrapolating the CDC 6000 and 7000 series to get to a 9000 series by the 90s. Supposedly the logo is visible in the move but I can’t find an image.