Mach-O ABI Costs MacOSX 12% of Speed

Eugenia Loli 2002-10-21 macOS 16 Comments

An Unsanity developer writes in his blog that MacOSX, by using the CISC-optimized Mach-O ABI, which is derived directly from NeXT, can see a decrease of speed of up to 12%. Attempts to rewrite that ABI would result in breakage of applications and binary compatibility.

About The Author

Eugenia Loli

Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.

Follow me on Twitter @EugeniaLoli

16 Comments

2002-10-21 6:40 pm
Anonymous
I know that Eugenia’s been having bandwidth issues, so I thought that I’d help people out with a little redirection. Check out the comments over at MacSlash [ http://www.macslash.org/ ] for some great background from some old-time NeXTers that explain this away. This is seemingly like the whole Carbon/Cocoa issue. (You can read Unsanity’s actually well-done take on that issue, which is in constrast to RealBasic’s. RealBasic’s was done merely to promote their product, as it’s in Carbon, while Unsanity’s take on it is from an architecturally sound view point.)
— Rob
2002-10-21 6:43 pm
Anonymous
We do not have bandwidth issues anymore, thanks to some very kind OSNews readers. And the bandwidth issues were not from… pure text (like this story), but from screenshots.
2002-10-21 7:22 pm
Anonymous
earlier today. yes, it is highly recomened that you do so before you flame apple.
2002-10-21 7:41 pm
Anonymous
It’s a total misunderstanding of the situation, except the last post, which points this out. The actual situation here is this:
Code compiled on a processor has two levels of binary requirements, the instruction set, and the ABI (Application Binary Interface). The processor dictates the instruction set, but the OS and compiler dictate the ABI. For example, most OSs (except for Windows, of course) on x86 use the System V ABI for C code, and the GCC-ABI-of-the-day for C++ code (or the Code Sourcery IA-32/IA-64 C++ if they’re new enough). The ABI defines aspects of the binary interface unrelated to instructions. A simple example helps:
int func(float x, double y)
{
….
}
The System V ABI says several things about how this function should behave. It states that the ‘int’ returned by the function will be returned in the eax processor register. It states that the ‘float’ will be aligned on a 4 byte boundry on the stack, while the double will be aligned on a 16-byte boundry (even though it’s a 10-byte variable) on the stack. It also specifies the order the variables are pushed on the stack, and what registers the function can use as “scratch” registers, and which ones it has to preserve.
Now, the ABI also specifies how global data (such as function addresses, in this case) is accessed. On x86 processors (like other CISC processors) functions are almost always accessed relative to the program counter (pointer to current instruction). If a function at address 0x80004204 in memory is called from an instruction at address 0x80004000, then the instruction will be coded “jump +204” meaning, “jump to the address 204 bytes from the current instruction.” In native PowerPC ABI, it might be coded as “jump 0x80004204” instead (actually it wouldn’t since no RISC chip has space for a full 32-bit address in its instruction format, it would be something similar). Apparently, in OS X, the ABI doesn’t use the PPC processor optimally, and instead tries to fake the program counter-relative format for function calls, which results in a 10-12% slowdown for programs.
Personally, I don’t really like it. Apple is bleeding performance all over the place. You say, “hey, it’s only 10-12%” but you have to note:
1) Object-oriented code makes more function calls than C-like code. Once you have more object-oriented code, the more this will hurt.
2) 10% here, 12% there, 10% somewhere else, eventually, it starts cutting in!
2002-10-21 9:31 pm
Anonymous
A couple of observations:
1) Undoubtedly, Apple knows this and has chosen to push it back on the burner for other priorities. However, I would be pretty amazed if the choice to keep this was not a conscious choice by Apple. They know it’s there, they KNEW it was there, but they kept it anyway. I doubt its there due to sloth or sloppyness.
2) I doubt anyone who isn’t buying a Mac or OS X today would turn around and buy it if it were 10-12% faster. The users would appreciate it, of course, who wouldn’t, but I don’t think it’s a big deal to the architects and product folks at Apple.
10-12% is expensive, but not “noticable”, not seat of the pants “feels faster” noticable. When folks say “it’s too slow”, they want a 2-300% increase, they want “Wow! That’s much better!”. 10-12% isn’t there.
By the time Apple comes up with a fix, and to deal with all of the backward compat issues, the machines will more than overtake the difference in performance. They’d rather spend that time working on the next iApp.
Why they didn’t choose to eliminate it in the beginning is a real question. Perhaps it was a portability thing, maybe they would have broken compatability with older Rhapsody code, maybe they really didn’t notice it until it was Too Late.
So, anyway, it’s a novel observation, it’s shame it’s there, but I think that Mac owners are pretty much stuck with it for the time being.
2002-10-21 9:58 pm
Anonymous
Heh, that’s always funny. Sometimes it is true, but when there’s competition involved things start to make more of a difference.
One of the refreshing things about the open source based operating systems is that they’ve historically been dedicated to *improving* performance with every release in addition to adding more and more features.
It seems like the commercial OS’s (MS in particular) pay more attention to getting features out the door and leave performance/design considerations back in the trash bin.
One positive thing for Apple…if they fix it they can use it for marketing and to sell another major version. (I’m not sure that’s a good thing for the users though).
2002-10-21 10:29 pm
Anonymous
Funny how the Mach-O builds of Mozilla are so much faster than the CFM builds…
2002-10-21 11:27 pm
Anonymous
Is the only thing changing between the Mach-O and CFM builds the ABI? I highly doubt that. More likely there is a lot of different between the two builds, and the CFM build is most likely not very optimized (because OS-9 has officially been depricated by it’s creator). As for the 10-12% performance difference, it’s true it’s not a lot, but like I said, OS X seems to have a lot of stuff like this…
2002-10-22 1:07 am
Anonymous
Hooray for those generous people!
Well, each version of OS X is faster than the last, but it is obvious Apple isn’t pre-occupied with speed.
2002-10-22 1:51 am
Anonymous
I have a ? as I read the article it said that to correct the problem would infact make all apps have to be recompiled? well would that be a bad thing. Like I read 10% here 5% there it all adds up.
2002-10-22 3:09 am
Anonymous
Could it be that Apple knowingly kept this, actually I am pretty sure they knew this, but could it be that they kept this because of a possible move to a CISC architecture, maybe AMD or Intel.
2002-10-22 6:23 am
Anonymous
Jay : “but it is obvious Apple isn’t pre-occupied with speed.”
Was not before the 10.0 then they discovered angry users saying that os X was too slow since that they have worked a lot on speed.
Sergio : “they kept this because of a possible move to a CISC architecture, maybe AMD or Intel.”
No they kept-it because their tool chain does mach-o for a long time, and it takes time to upgrade correctly a tool chain. Apple has almost catchup with gcc now and they are developing their own extension, they may have a switch in mind, I mean it’s not impossible to have 2 binaries format at the same time, remembre linux when 2.0.0 came out, most binaries where switching from a.out to ELF, and this took time.
Sean: “Funny how the Mach-O builds of Mozilla are so much faster than the CFM builds”
Take the time to read http://www.mozilla.org/ports/fizzilla/ . They explain that the mach-o build does use the native necko engine and does call directly the BSD socket interface without the need to use the OpenTransport API that’s the main reason that make the macho Build faster.
—
http://islande.hirlimann.net
2002-10-22 6:24 am
Anonymous
Can someone point me or explain the main differences between ELF,Xcoff, a.out and the other binary formats ?
—
http://islande.hirlimann.net
2002-10-22 9:25 am
Anonymous
Elf = 32bit binary file, now used by all modern UNIX’s
a.out = 26bit binary file, used by BeOS r3 series and older
UNIX’s and other operating systems before they stardardised on elf.
coff = UnixWare/Xenix’s original binary file.
PE = Used by Microsoft Windows and yes, Microsoft do have to be different to what everyone else is using.
2002-10-22 12:31 pm
Anonymous
for 64 bit computing ?
—
http://islande.hirlimann.net
2002-10-22 4:00 pm
Anonymous
Elf would be good for 64-bit .
It’s not only the question of how many bits.
It’s more of the question on how many sections in your code, how they are placed in the header and how they are translated by loader when file is loaded into memory.
Small history on microsoft formats :
In old DOS days there were COM files which were practically binary code without address translation. So they were limited to 64K of code segment size. As the code bloats , EXE files came to rescue (PE format). They have headers, sections, entry points. Almost at the same time came DLLs which are actually using the same PE format – that’s why on windows box you have “rundll” executable which allows you to run a DLL as regular EXE file.
Unix systems went through similar process of different binaries. ELF is result of the evolution, don’t try to re-invent the wheel.
If you compile a simple executable statically linked to every call, it’s not a big deal which format you’re using.
The problem is with shared libraries – you have to use ELF format if you’re compiling for ELF libraries.
ELF is standard ABI on all System V.
If you have a UNIX box around you should be able to check –
man pages and header files should be there.