Start-up’s Goal: Run Software on Any Chip

Adam Scheinberg 2003-07-13 Hardware 27 Comments

“Can Macintosh software run on an Intel machine at speeds that approach the same performance as on Apple hardware, or visa versa? Not yet, but that day may be closer if a Los Gatos start-up gets its product off the ground, and it could have implications for chip makers.” Read the rest at siliconvalley.com.

About The Author

Adam Scheinberg

Technology Executive • Web Developer • Father • Foodie • Music Snob • OS enthusiast

Follow me on Mastodon @sethadam1@shakedown.social

27 Comments

2003-07-14 4:28 am
Anonymous
How does this emulation (which I assume is generic) compare to the specialized platform support at a language level such as Java ?
In other words, why would a generic chip emulator outperform an optimized runtime which can be made specifically for a language ?
Just a question.
2003-07-14 5:26 am
Anonymous
But this isn’t exactly new. The concept has been around, well, since the early Alpha days, heck, I think there was one before that to help in the PDP to VAX transition.
What will be interesting is whether one can run Windows applications on Apple using wine + the morphing/emulation/code converting software. IIRC, there was something similar to that but it emulated the x86 processor rather than converting the code from x86 to PPC.
The move should be towards a situation where write once run everywhere not necessary set the world alight in terms of speed, but provide reasonable speed whilst maintaining its WORA. Java 1.4.2 shows it can be possible, I am sure with in 1 year or so management code and their run time environments will become so efficient that the reason for using native code for creating generic applications will be moot.
What SUN should do is team up with software vendors to get Java performance to something close to comparable with native code, and pay vendors to migrate or atleast provide a Java version of their applications.
For example, lets say right now Java performance is around 90% the speed of native code, could you imagine what the result would be if Photoshop, Dreamweaver and other major software titles were available and the net effect it would have on the operating system market? the reason for using one operating sytem over another would develop into a VERY short list and the reason for a particular person not to mirgate to another solution would be a thing of the past.
2003-07-14 6:38 am
Anonymous
“I am sure with in 1 year or so management code and their run time environments will become so efficient that the reason for using native code for creating generic applications will be moot.”
Except for in areas where speed is critical. 3d games, to name the most prominant example in the home-computing market.
2003-07-14 6:49 am
Anonymous
isn’t this what transmeta’s code-morphing does? their chip architecture combined with code-morphing allows it to run any instruction set (although they’re only using it for x86 right now). i heard HP had been doing similar work too. this is by no means innovative. they even cited FX32! which was a translator + JIT, didn’t mention how their product is really any different.
2003-07-14 8:15 am
Anonymous
wow. dynamic translation. when we all have 10ghz machines we wont know we are emulating anything.
a nice concept. iirc, some company already holds several patents on dynamic translation of code…
2003-07-14 8:18 am
Anonymous
If I recall corectly, Transitive’s technology for PPC on Intel emulates a G3 running at about 100mhz.
I very much doubt it’ll ever be practical to emulate a PPC with any degree of usability, especially now tyhey’re going 64-bit
Several firms have promised PPC Mac emulation in recent years ( Emulators Inc and microcode Solutions ) its become very clear the claims of those two firms are nothing more than outright lies as each has consistently failed to meet their own deadlines for supplying a product
Emulators inc is particularly bad, having claimed to have demonstrated a PPC emulation at a macworld – but research showed that actually – they hadn’t
2003-07-14 8:25 am
Anonymous
DEC did something similar on Windows NT4.
They had some code converter which allowed running x86 windows applications on Alpha.
2003-07-14 8:30 am
Anonymous
DEC did something similar on Windows NT4.
It’s mentioned in the article, according to them it was called FX32!.
2003-07-14 8:35 am
Anonymous
iirc, some company already holds several patents on dynamic translation of code
It could be the company in the article, it says they have been working on it since 1995. However, I don’t see how any such patent could be valid, since the concept of code emulation has been around as long as computers have.
2003-07-14 9:11 am
Anonymous
Not exactly original and has been done before, but making it work well is a whole different ball game, if they can do that, they may have a market.
I prefer Tao’s solution of writing for a common “assembly” which can be translated on a specific architecture at load time. Much simpler, but does require you write for it, so no use with existing code unless you have access to the source.
2003-07-14 10:07 am
Anonymous
Photoshop will never be ported to java, since java does not support unsigned integers and does not support structs, so it is impossible to represent a raw color value efficiently in java.
It would be easy to port Photoshop to C#, but then you would have to decide which GUI toolkit to use to preserve platform independence….
regards,
tuttle
2003-07-14 11:11 am
Anonymous
From what I read, it is not the OS, but the chip that is less important. This will allow a computer company to change to a different chip and keep shipping the same OS. Imagine purchasing a new Windows XP computer and finding that it is running on a ppc G5 processor.
It is not the consumer that has the chioce, byt the computer manufacturer.
2003-07-14 12:02 pm
Anonymous
Apple on my PC? Ha!? We PC bottom dwellers won’t see that code unless we rip it from Steve Job’s cold bankrupt hands.
Freebsd for the free, Windows for the games, OSX for the rich, and Linux as an plea to god to avoid purgatory (“But I used Linux, I’ve already been in Hell!!”).
2003-07-14 12:55 pm
Anonymous
“I am sure with in 1 year or so management code and their run time environments will become so efficient that the reason for using native code for creating generic applications will be moot.”
Except for in areas where speed is critical. 3d games, to name the most prominant example in the home-computing market.
You mean, to name the only example in the home computing market. Granted, there are innumerable examples when you move away from the home desktop, but even those could possibly be integrated into frameworks like .NET, either by integrating time-critical segments into the code directly in an unmanaged fashion, or simply calling the existing unmanaged code from a managed environment. Graphics-intensive games are a very valid point, but I would presume that they are one of the few (if not the only) software categories which will not be satisfied by a highly-optimized managed runtime environment for the home user.
2003-07-14 2:30 pm
Anonymous
There have been many attempts at dynamic binary translation and static translation is of some interest (in terms of techniques for emulation used). A list of DBTs and similar products/projects mentioned in my PhD thesis are: OCT+HP3000, Flashport, XDOS, Bedichek, Accelerator, VEST & TIE, mx & mxr, Mae, Wabi, Atom, Shade, Executor & Syn68k, TIBBIT, SoftWindows & RealPC, Virtual PC, Freeport Express, SimOS, Embra, Morph, FX!32, DAISY, Bochs, Crusoe, Dynamo, VMWare, MOJO, Plex86, UQDBT, Aries, Vulcan. A more recent and impressive open-source DBT is QEMU which enables the emulation of X86 or ARM code on a variety of processors including PPC. QEMU already allows the emulation of Wine on PPC Linux, a MacOS port is very possible. People interested in DBTs may also want to check out the old WBT and FDDO conferences and the new CGO conference. The sophistication of DBTs tends to be at the level of quick compilers, they aren’t as sophisticated as Java JIT compilers which can perform optimisations equivalent to those of static compilers. Runtime information also allows Java adaptive compilers to outperform static compilers.
2003-07-14 2:57 pm
Anonymous
During development of Sims Online, we tried to use WINE to speed the porting of code from Win32 to Linux. Our native port effort quickly overtook our WINE efforts, because WINE put severe restrictions on how you can write code: no static initialization was probably the worst. If it provided any benefit at all, it would be providing header files for Win32 that had already be tweaked to work with gcc, so that we could adapt the code for the compiler differences first, then worry about the platform differences later.
Also, the article implied that Apple had a long and difficult transition to PowerPC, which couldn’t be further from the truth. I had a 6100/60 (the low end of the first round of PowerPC machines), and its performance running 680×0 code was generally on par with the fastest 68040 machines of the time. Of course, Mac applications spent a lot of time in OS calls, and the most common of those had native PowerPC applications, so this wasn’t strictly due to the speed of the 680×0 emulation, but even large CPU intensive operations performed reasonably well. I would have to say it was the smoothest processor transition I’ve ever heard of.
2003-07-14 3:38 pm
Anonymous
1. Sounds like a JIT, smells like a JIT.
2. Their biggest mistake that I can see is not using platform
native tools. FX32! and em86 has done all of this already.
– load code into emulator
– recompile on the fly to new code base (alpha)
– rethrow all library calls to native libraries.
Simple easy, The only thing new here is they may be
using HP’s or IBM’s recompiler. It taks the binary and
recompiles it after linking with dynamic libraries. The
could translate the binary into asm code for the target
chip, let the assembler then recompile the code. While
going this on the same platform (IA32 to IA32) has shown
up to a 10 percent decrease in cpu usage. (gets faster)
the fact that for thier case the high level optimizations
would be damage could explain the 70% rating.
Final Note:
At least 2 companies have tried to “patent this” I think
there was an article in eetimes about it. I think all of the
patenets could be overtutned by prior art. Sun, Next, IBM, Dec, Amiga have aall done this , shrug.
Donaldson
2003-07-14 5:16 pm
Anonymous
Yes, WINE compiles on PPC Linux……… but since the WINE project doesn’t include an emulation of a Intel or compatible processor, you’d have to include the code to emulate a physical PC as well as emulate the windows calls.
I understand a project to do this is underway but it’ll probably be a long time before a symbiosis of a platform emulator and an OS emulator is stable enough to run applications on a “foreign” processor
2003-07-14 5:22 pm
Anonymous
Transitive has been working with TransGaming and Alchemy (AMD’s embeded division to port ARM to MIPS) for awhile.
HP also has Dynamo. More here: http://www.codeonthefly.com/corporate.html
2003-07-14 5:32 pm
Anonymous
Runtime information also allows Java adaptive compilers to outperform static compilers.
Yes, and these tests tend to contain code *horribly* skewed towards Java’s favor, usually calling a trivial virtual method from within a loop. Java’s inlining of virtual methods at runtime will lead to an obvious performance advantage as C++ will continue doing lookups in the vptr table.
But this is stupid anyway… any first year C++ student knows virtual methods are slow. Either declare the method final or use templates to eliminate inheritance. It’s that simple.
Compiled code can also be optimized at run-time using Profile Guided Optimization. These optimizations will be permanent, whereas Java must begin re-optimizing all code from scratch each time a given program is run.
Let’s get back to reality here… Java does not outperform native code unless the programmer writing the native code is a moron.
2003-07-14 5:42 pm
Anonymous
that is my question. It could be anyone, sun, SGI, HP, IBM, even sony for their play station. Its not apple though if that gave a boost in runing windows on a power pcs it might be attractive.
2003-07-14 5:49 pm
Anonymous
I thought Transmeta was doing this on board their chips. Anyone know for sure?
2003-07-14 6:38 pm
Anonymous
Hey, get IBM’s CHRP spec and build a board.
2003-07-14 7:55 pm
Anonymous
If I can assume that most people are using high-level languages (I don’t want to concern myself with the pros and cons of writing low-level native code vs. compilers), there are some points I wanted to add following the feedback:
– Wine on Linux & PPC – see QEMU – http://fabrice.bellard.free.fr/qemu/ – I highly recomend this project to people and feel it is a shame you see less mention of it on news portals
– JIT vs. caching the translation – caching the translation can be viewed as an optimisation to a JIT system. It can be beneficial. It can also cause high initial latencies when the cached translation is loaded off the disk and the data structures necessary to know about the translation are rebuilt. JVMs such as the Jikes RVM initially translate for the purposes of building a disk image and then use the disk image. Although technically different to what you’re talking about, the potential to do this in any JVM is there. The reason it’s not been done is that it must fail a cost-benefit analysis or still be a research issue.
– Runtime vs. static optimisation – I believe that it is possible to do all runtime optimisation statically. However, I don’t think it’s fair to say that all programmers do all the optimisation they can on their code (and code called over library boundaries). One reason for this is to aid in how easy it is to understand the code by other programmers. For example, if you needed to initialise a 2 dimensional array you may write code of the form:
for(int x=0; x<100;x++){
for(int y=0; y<100; y++){
a[x][y] = -1;
}
}
It probably wouldn’t even cross your mind to try to optimise this code further, but an adaptive compiler which sees this code being executed a lot may choose to optimise it. One such optimisation is to parallelise the loop (an optimisation that will be of benefit to CMP chips). The dynamic compiler can also see what the load on the CPUs at that moment and use that to calculate the degree of parallelism desirable. You can do equivalent code statically, but it will be quite horrendous to understand. Other runtime optimisations such as value-specific optimisations (VSO – the superset of compiler optimisations such as specialisation – used to remove virtual method call overhead) become practical in a dynamic environment. Think of those tables where they are intialised once (e.g. Huffman compression, virtual method calls) and then the table remains invariant. It is conceivable that your static code could be coded for all table values, but I think that most programmers wouldn’t expend that effort. VSO is even been performed in hardware, initially as branch prediction logic, but in the future as value prediction in situations such as cache misses – simplifying hardware by putting the burden onto compilers is an idea that RISC processors pioneered. Programmers also don’t expend all the optimisation effort they can as computer architectures now differ from how they will be in the future. DBT is sometimes seen as a system tool for providing legacy compatibility (a glorified emulator), personally I feel there’s a future to runtime optimisation in its own right (as researched by computers such as Sun, IBM and Microsoft) and so I disagree with Bascule. I hope they don’t find me a moron and can see why DBTs are more quick compilers than JVMs are (ie JVMs have a lot more information about a program available to them – such as information on what’s an array, method boundaries..).
If people would like references to work in these areas then contact me directly and I can tell you where to look. Thanks to Zasteva’s comments which (for me) were interesting. It’s a shame to me to hear people’s eristical arguments against DBTs. I hope my comments have shone some light on this area for people.
2003-07-14 9:54 pm
Anonymous
For example, if you needed to initialise a 2 dimensional array you may write code of the form:
for(int x=0; x<100;x++){
for(int y=0; y<100; y++){
a[x][y] = -1;
}
}
It probably wouldn’t even cross your mind to try to optimise this code further, but an adaptive compiler which sees this code being executed a lot may choose to optimise it.
This code will be easily optimized by virtually any compiler in existance. The most obvious and straightforward optimization is to unroll the loop. This can’t be done effectively by the programmer (using something like Duff’s Device) because it requires knowledge of how large the resulting code size will be, and whether or not that code will fit in cache. However, virtually every compiler in existance can calculate these details for you and will automatically unroll the loop…
2003-07-15 3:00 am
Anonymous
except that it supposedly runs Real Fast? 70% is impressive & I want it, but 100% is likely to be physically impossible (allowing for differences in processor capabilities) and I suspect 70% might not happen in every case either.
A GUI program on one chip is not going to run on another without a complete runtime system + libraries, either emulated or ported. If one has to buy an OS anyway….
During the Mac – PowerPC transition there was a British (I think) company that offered a translation from 680×0 machine code to PowerPC, generating a native PPC executable rather than one that had to be emulated at run time. That approach makes more sense from the standpoint of software makers: resurrect the NEXT multiplatform binary scheme (multiple binaries bundled together, the proper one chosen at runtime) & leverage cheap hard drives instead of CPU’s. It’s almost certain have less problems and allow for more optimization than something that has to do its translation at runtime.
2003-07-15 6:19 am
Anonymous
This code will be easily optimized by virtually any compiler in existance. The most obvious and straightforward optimization is to unroll the loop. This can’t be done effectively by the programmer (using something like Duff’s Device) because it requires knowledge of how large the resulting code size will be, and whether or not that code will fit in cache. However, virtually every compiler in existance can calculate these details for you and will automatically unroll the loop…
I just wonder Bascule if you think that an unrolled loop runs faster than a loop being executed in parallel? Not all optimisations are equal and runtime optimisations are a class of optimisations that static compilers struggle to perform without introducing overheads such as code explosion and guard checks.