The Z-80 has a 4-bit ALU – here’s how it works

Thom Holwerda 2013-09-07 Hardware 28 Comments

The 8-bit Z-80 processor is famed for use in many early personal computers such the Osborne 1, TRS-80, and Sinclair ZX Spectrum, and it is still used in embedded systems and TI graphing calculators. I had always assumed that the ALU (arithmetic-logic unit) in the Z-80 was 8 bits wide, like just about every other 8-bit processor. But while reverse-engineering the Z-80, I was shocked to discover the ALU is only 4 bits wide! The founders of Zilog mentioned the 4-bit ALU in a very interesting discussion at the Computer History Museum, so it’s not exactly a secret, but it’s not well-known either.
I have been reverse-engineering the Z-80 processor using images from the Visual 6502 team. The image below shows the overall structure of the Z-80 chip and the location of the ALU. The remainder of this article dives into the details of the ALU: its architecture, how it works, and exactly how it is implemented.

Ken Shirrif’s blog is an absolute must for fans of ultra-low-level hardware stuff. This goes way over my head, but interesting nonetheless.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

28 Comments

2013-09-07 11:54 am
shotsman
Back in the day were pretty common.
The first MicroProcessor I got my hands on was an IMP-16p (National Semiconductor). This was made up of 4 x 4bit ALU’s chained together. I found this very strange at the time as I was coming from a PDP-11/40.
LAter on, I got involved with developing interfaces for DEC. Our CPU of choice for many of these was the 2901. This was another 4 bit slice device. On the TSU05 Tape controller, we had 4 of these connected together giving us basically, a 16-bit word CPU.
It does not surprise me that the Z80 has a 4bit ALU one little bit.

2013-09-07 6:36 pm
Pro-Competition
This isn’t talking about chained 4-bit units – it’s a single 4-bit unit, where the slices are processed serially, not in parallel.
Actually, this probably helps to explain the high number of clock cycles the Z-80 used to perform operations. I’m not sure if this was a good design trade-off or not.
(I always admired the rich instruction set of the Z-80 in comparison to the 6502 I was working with on the VIC-20 / C-64. But then I looked at the timing data and wasn’t quite so jealous. ;^) )
P.S. I remember reading about those bit-slice processors in data books as kid. (Yes, I was a nerdy kid.) I think that was an elegant solution at the time.

2013-09-07 7:33 pm
moondevil
We were all nerdy back then.
I got my ZX Spectrum compatible (Timex 2068) at the age of 10 and started coding around the age of 12.
2013-09-08 9:32 am
shotsman
I did say that 4bit ALU’s were common in those days.
The IC technology available then was by today’s standards pretty crude so making 8,12 or even 16 bit ALU’s was for a while impossible.
I went on to give some examples of some other uses for them
The 2901 could be used on its own. It didn’t have to be used with others.
A lot of engineers quickly realised that 4bits was very limiting especially as many of the other CPU’s(non microprocessor) around in those days had far longer word lengths. could this be why we chained 4bit devices together?
Intel realised this as well. How long did the 4004 last before they came out with the 8008?
2013-09-08 12:50 pm
puenktchen
This isn’t talking about chained 4-bit units – it’s a single 4-bit unit, where the slices are processed serially, not in parallel.
Doesn’t that make the Z80 count as a 4-bit CPU?

2013-09-08 7:00 pm
transputer_guy
No, plenty of cpus have used serial computation and had reasonable performance, TI 9900 did 16 bits in 18 clocks or so, it allowed the clock to run faster to make up for it. The architecture defines the width of a processor, not the internal design. And the Pentium 4 also used a 16 bit ALU double pumped, still a 32b processor.

2013-09-08 7:52 pm
puenktchen
The architecture defines the width of a processor, not the internal design.
I alway thought the internal design was part of the architecture. And that the meaning of 8/16/32/64 bitness changed over the years with a little help by marketing. So the width of the processor is defined by the instruction set, not by the data path or registers or .. ?
2013-09-08 8:41 pm
Drumhellar
It’s defined by the instruction set, and not the actual implementation
Z80 is an 8-bit architecture because you add two 8-bit registers to get a result. The fact that behind the scenes, it’s breaking it down into multiple 4-bit adds is inconsequential. They could change it at a later point to give it a true 8-bit ALU and nobody would know the difference.
Same about data buses. The 8086 had a 16-bit system bus, while the 8088 had an 8-bit bus. This wouldn’t make the 8088 an 8-bit chip, since the you were doing 16-bit math in 16-bit registers.
But, there was time when it was reasonable to assume that 8-bit chips had 8-bit buses and 16-bit chips had 16-bit buses, but as time progressed the ISA became further and further divorced from the actual implementation.
2013-09-09 4:43 pm
bartgrantham
Just playing devil’s advocate, but the Z-80 could also do 16-bit adds and substracts the the HL/BC/DE register pairs. Wouldn’t that make it a 16-bit CPU by your definition (width of operands)?
2013-09-09 5:15 pm
Alfman verbose=1
bartgrantham,
I agree. Consider that one could hypothetically implement/emulate a 64bit ISA CPU on top of 16bit components, but personally I still think it makes sense to call it a 16bit CPU if it can only ever physically handle 16 bits concurrently.
Admittedly though it’s rather ambiguous when different components (and operations) have different bit widths (register bits/cpu&cache bus width/memory&device bus width/alu/fpu/…). Maybe in such circumstances it makes the most sense to call the Z80 a 4/8bit hybrid rather than either 4bit or 8bit.
Edited 2013-09-09 17:34 UTC
2013-09-09 6:40 pm
Drumhellar
Perhaps, but considering how limited using register pairs is compared to the rest of the architecture, I’d still say it’s 8-bit. I mean, I wouldn’t call the Pentium MMX a 64-bit chip simply because it can do 64-bit integer math – the conditions imposed to adding large numbers is quite extensive.
Also, if you look at the bitwise or logical operators, they are only capable of operating on one register at a time, with the exception of the HL pair.
Of course, I don’t have any actual experience programming a Z-80, but everybody calls it 8-bit, and the instructions listed at http://bit.ly/14z9vLR show it’s almost pure 8-bit instructions, with a couple special 16-bit instructions.
I do know the Nintendo Gameboy used the a variation of the Z-80, and that was considered an 8-bit system by Nintendo.
(According to Wikipedia, the chip in the Gameboy was somewhere between the 8080 and the Z-80, with none of the extra registers of the Z-80, but many of the extra instructions. It’s not pure Z-80, but most sources I’ve seen consider it one)
2013-09-09 7:19 pm
Alfman verbose=1
Drumhellar,
“I wouldn’t call the Pentium MMX a 64-bit chip simply because it can do 64-bit integer math – the conditions imposed to adding large numbers is quite extensive.”
This is somewhat tangential, but it may interest you anyways…MMX wasn’t actually capable of 64bit integer math. It’s 64bit registers were actually split up into multiple packed operands of 32bits or less. At 32bit it was limited to addition and subtraction. It could only do 16 bit multiplications (_mm_mulhi_pi16). I remember back in the day learning MMX but being very disappointed in what it offered. Complete 32bit SIMD support wasn’t available until some time later with SSE. IMHO SSE2 was the first truly useful (intel) SIMD extension for x86, not only because it offered 128bit XMM registers (64bit operations), but because it finally stopped clobbering the FPU registers which was a hack.
http://www.plantation-productions.com/Webster/www.artofasm.com/Wind…
“Despite the presence of 64-bit registers, the MMX instruction set does not extend the 32-bit Pentium processor to 64-bits. Instead, after careful study Intel added only those 64-bit instructions that were useful for multimedia operations. For example, you cannot add or subtract two 64-bit integers with the MMX instruction set. In fact, only the logical and shift operations directly manipulate 64 bits.”
https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
2013-09-09 7:36 pm
Drumhellar
You are correct. The Wikipedia article didn’t make that clear. Oddly enough, the second article you mentioned was the second one I found when I looked again.
So, then, SSE2, which can add 64-bit integers, didn’t turn the Pentium 4 into a 64-bit architecture.
I love tangential comments
2013-09-10 7:57 am
DeepThought
IMHO neither the bus width nor the ALU was used to “define” the bitness of a CPU. It was the register width.
The 68k was seen as a 32bit CPU, but then people claim it a 32/16bit CPU because the data-bus with was only 16bit (or even 8 on a 68008 (Sinclair QL)).
But today this definition is also not that easy to use. For example, the e200 PowerPC cores have 64bit registers, but only for SIMD (SPE called by Freescale). So no real 64bit add possible. So it is a 32bit CPU.
So IHMO, today, the bitness is defined by the width of general purpose registers.
2013-09-10 7:16 pm
JLF65
To software engineers, the “bitness” is defined by the ISA, and is nearly always the width of operational data registers (registers that can be added, multiplied, etc).
To hardware engineers, the bitness is defined by the width of the ALU. The 68000 was the classical case – it had a 32-bit architecture, but the ALU was only 16 bits wide. It was commonly noted as a 16/32 bit processor. At the time, hardware engineers had more sway in computers, so the 68000 was in most books as a 16 bit processor, along with the 8086.
This new info on the Z80 would have had most engineers of the time calling it a 4-bit CPU, so it’s no wonder Zilog kept this quiet. They were competing in the 8-bit CPU market, and being called a 4-bit processor, or at best a 4/8 bit processor would have meant death in the marketplace.
2013-09-10 8:02 pm
DeepThought
The best explanation I read/heard so far.
2013-09-12 8:41 pm
ElCabri2
It’s a fuzzy definition really. Bit-ness of technology can refer to the size of the address bus, the size of the word operated on by the arithmetic instructions, or actual architectural details … When all this was fluctuating fast from generation to generation in the 1980s and 1990s, there were some creative labeling for marketing reasons. That’s why we had for example a “16/32 bits” processors like the Motorola 68000, or why some claims of “128 bits” video game systems have emerged in the late 90s. Mainframe and HPC architectures were even more exotic. Let’s mention also the “Saturn” architecture of HP’s high-end calculators, which had 64bit registers, with sub-fields of various length aligned on 4-bit boundaries, 4-bit addressable RAM (“nybles”) and 20-bits (four nybles) addresses…

2013-09-07 2:32 pm
peteo
“This goes way over my head…” and yet you claim the blog is a must.
It’s actually pretty badly written.

2013-09-07 7:19 pm
kens
Wow, tough crowd here. Anything specific you’d like improved in the article?

2013-09-07 8:08 pm
viton
Great article, Ken. But I’m just a programmer, not a literary critic 🙂
I did a lot of Z80 coding and this discovery is pretty exciting for me.
2013-09-08 7:11 am
kokara4a
Well, I would have liked to know why they did it like that. It seems to me that the additional logic is more than what would have been needed for a full 8-but ALU. Maybe I’m wrong, but the drawback is quite significant – you get half the performance. Granted, the Z80 was running on frequencies quite a bit higher than the original 6052 but it seems the 4-bit ALU eats most of that.
Interesting article though – I like reading about such things. Never did any assembly programming on the Z80. Never had access to any. In the early 80s in Bulgaria there were mostly locally produced Apple II compatibles. We did make one – ÐŸÑ€Ð°Ð²ÐµÑ†-8Ðœ – which incorporates a Z80 extension card on the mainboard. I’m not familiar of anyone else doing that. But these were rare.

2013-09-08 12:40 pm
xdev
Well, I would have liked to know why they did it like that. It seems to me that the additional logic is more than what would have been needed for a full 8-but ALU. …Granted, the Z80 was running on frequencies quite a bit higher than the original 6052 but it seems the 4-bit ALU eats most of that.
Wildly guessing, I would assume that carry propagation is the critical timing path in an otherwise simple CPU. That would mean for the possible clock rate of the remaining chip: Either use half clock rate for everything, or use full clock rate, where only a half width ALU runs for 2 clocks while the remaining chip runs at full speed.
Carry lookahead could help here, but there might be reasons not to use it (patents?).
It is interesting that the P4 has a “double pumped” ALU, too, but IIRC that rus at double chip speed.
2013-09-09 8:21 pm
viton
Maybe I’m wrong, but the drawback is quite significant – you get half the performance.
Minimal Z80 instruction execution time is 4 clock cycles = just a “M1 cycle” (basically opcode fetch time)
2013-09-10 8:51 am
Snial
The 4-bit ALU doesn’t halve the performance of a Z80, because it’s somewhat pipelined as Masatoshi Shima explains in the Z80 Oral History.
The high clocks/instruction occurs because of the Z80’s bus logic: a minimal instruction fetch requires 4 cycles: 2 to fetch the op-code itself [Address setup, then Read data] and another 2 for DRAM refresh (while the instruction is executed).
Compare it with the 8080, which took 5 cycles to execute an 8-bit ALU operation and it had a full 8-bit ALU. Or compare it with the RCA 1802 (12 clocks/instruction); the Nat Semi SC/MP (7-20+ clocks/ instruction). The Z80’s designers did pretty well for that era.
Also, the Z80 didn’t have control logic as simple and direct as a 6502, which made the 6502’s instruction execution more efficient. But that didn’t mean the 6502’s arithmetic was always faster*; a 16-bit Zero-page Add on a 6502 would take 20 cycles and 14 bytes (CLC;LDA;ADC;STA;LDA;ADC;STA) vs a Z80’s 11 cycles / 1 byte (add hl,rr).
-cheers.
[*not to trash the 6502, it’s an amazing 8-bit CPU in many respects]

2013-09-10 1:23 pm
Alfman verbose=1
Snial,
“Also, the Z80 didn’t have control logic as simple and direct as a 6502, which made the 6502’s instruction execution more efficient. But that didn’t mean the 6502’s arithmetic was always faster*; a 16-bit Zero-page Add on a 6502 would take 20 cycles and 14 bytes (CLC;LDA;ADC;STA;LDA;ADC;STA) vs a Z80’s 11 cycles / 1 byte (add hl,rr).”
It’s the age old debate between RISC/CISC. From the looks of it here, the 6502 was a bit too RISC for such a basic operation, making simple operations cost more than they should. On the other hand Z80 (and x86 afterwards) were too CISC with regards to multiple instruction size encoding and memory addressing which demanded more complexity in the decode stage. Getting an optimal combination needs compromises between the two.
Edited 2013-09-10 13:32 UTC

2013-09-08 5:32 pm
ferrels
What did you expect? It’s a blog for goodness sakes, not a novel or hardback book that you purchased at the bookstore for $50. Technically speaking it’s a darn good blog. If you want perfection go buy a James Joyce novel.
Edited 2013-09-08 17:34 UTC

2013-09-09 11:51 am
henderson101
Ah yes, there’s a great section on the 6502 instruction set tucked in to the middle of the Dubliners, and totally I forgot about the Z80 primer in the last chapter of Ulysses! Thanks for your extremely helpful comment!

2013-09-08 6:54 pm
transputer_guy
I reverse engineered a dozen of these 1979 NMOS processors to learn about how the ALUs, register files, ROMs, RAMs, PLAs, clocks, back bias circuits worked. I had forgotten how the Z80 worked though, but from the circuit given here there is a good reason why it was done this way.
In the 6800, 8080, 6502 the carry path used was usually a pass gate with a minimal delay per bit. 4 carry cells in series would look like a distributed 4 bit tree like nand gate with extra devices to steer 1s & 0s or to bypass to the output and then invert. This is a very slow gate with all the associated capacitance but it does cover 4 bits in one go with 2 logic delays, slooow + fast. So an 8 bit adder delay would look like a 1 bit adder with 4 extra carry gates in delay. Thats about the limit of the clock cycle.
This Z80 used 2 faster gates per bit, so 4 bit cells will add up to 8 gate delays on top of the basic adder cell. Thats about the limit of the clock cycle.
In another design, some designers paired off odd/even slices to achieve 2 bit carry in 2 gates of logic, allowing 8 bit design with 8 gate delay carry.
Its a swings vs roundabout issue. By the time the Z80 was built, the trend was moving away from the ripple pass dynamic gates to full static logic, and in the Z8000 the entire 16 bit path had a full custom design with full static carry look ahead asymmetric logic, all done in about 8-12 fast gates iirc, no precharging logic needed.
The 68000 and 8086 retained the ripple pass gate but enhanced it by using boot strapping on a floating pass gate with a precharge. In these schemes a clock would be rammed through 8 pass gates in short order and then buffered for the next 8 bit block. Boot strapping allowed 16 bit addition in a reasonable time but required a clock for precharge.
So in the dynamic style, 1 clock precharges and the next conditionally discharges or computes the gate values. In the static style, both clock phases can do useful work.
Even the Pentium 4 a 32b processor uses a double pumped 16 bit ALU too.