Yesterday, on pearpc.net, it was anounced that PearPC has experimental altivec support. Among other things Altivec theoretically improves OS and application speed due to the ability to handle floating point math operations. The builds can be downloaded here.
I would think it would be more efficient to use SSE2 underneath to process the Altivec code than it would for the integer registers in the CPU.
Altivec is a 128 bit vector processing unit, next to the 64bit integer and floating point units of a G4 and G5 processor. Vectors enable very fast transformations, making it possible to accelerate certain math algorithms upto 10 times compared to writing code for a floating point unit to get the same results.
That is why certain algorithms can run so much faster on a G4/G5 compared to Intel/AMD chips, and that is also why you need a specialized compiler and knowledge of math to generate code for that Altivec part of the processor, it is not normal floating point stuff.
call me stupid, but how can it support altivec on cpu’s that don’t HAVE altivec. Isn’t this a PPC emulator for x86. So if its got altivec support, its just emulated, so its not that special anyways, cause its not in hardware, where the real “WOW” of altivec even matters.
Even if the apps that use altivec can use PearPC’s altivec support, will it even matter? Apps that use AltiVec are usually cpu intensive already and will crush PearPC. But i guess this is one step in a direction that seems to be fast moving. They seem to have a lot of support for the app.
Welcome to the wonderfull world of emulation, dude.
i guess its along the same lines of Mode7 emulation for SNES emulators. Its just odd cause altivec always gets a WOW from people cause of its extreme efficiency with vectors using hardware and not software. And this is justsoftware.. but i guess it totaly helps with support for other apps. Did it actually SPEED up anything?
anyone using it yet?
Anyone with real benchs of pearpc compared with a Mac G3/G4 ?
I dont know about the Altivec support but just in general the latest CVS builds have been up to 3 times as fast as the 0.2 release for me.
Altivec is a 128-bit SIMD unit. The G4 had 32bit registers for integer, and the G5 has 64bit registers. I believe both had 64bit floating point.
Vector units can make certain operations very fast. Consider pixel math. 8bpp, into a 128bit register, means you can do 16 additions at a time. Floating point speed up will be generally 2 or 4 (32bit single precision or 64bit double precision). Unless vector ops take different clock cycles to complete than the floating point unit you will get (# of chunks in vector unit)x speed up over just integer or floating point.
You’re right in that you need a special compiler to take advantage of it (unless you hand code the assembly, like many people do with Intel MMX. Look at most game emulators or movie players) but Intel has had SIMD for years. Granted, altivec got it right the first time, where Intel needed MMX (64bit integer only SIMD that overlayed the floating point registers, making mixed code a pain), SSE (128bit SIMD in a seperate register set, but lacked double precision fp operations) and SSE2 (finally a full SIMD implementation).
The question I have is why would anyone want an emulator such as PearPC that makes the client OS as well as applications run at a poor speed? This is very counter productive to use such a product.
Quote from PearPC:
“While the CPU emulation may be slow (1/500th or 1/40th, see above), the speed of emulated hardware is hardly impacted by the emulation; the emulated hard-drive and CDROM e.g. are very fast, especially with OS that support bus-mastering (Linux, Darwin, Mac OS X do)”
One of the reasons I switched to Linux is because of the highend software I use and performance gained compared to that of using Windows. If I want to use Maya and Shake on OSX then I’ll purchase a G5 system. Until that time many studios are satisfied running software such as Maya and Shake on Linux. After all studios have used Linux and open source technology for years. Such software doesn’t require emulation or require purchasing a G5 because the software has been ported to Linux. The software also runs on current hardware in studios which saves even more money instead of purchasing turnkey systems from Apple.
In a business where time impacts the dollar (ie: animation, rendering, compositing, editing) using an emulator which slows work being done really saves you nothing. Until the PearPC developers can produce an emulator that doesn’t hinder performance I would advise not to waist your time on this emulator software.
I think this might be a good way of ‘test driving’ OSX before plunking down $1,000+ on a band new Mac.
There exists no emulator that doesn’t hinder performance (rare exceptions aside, that I know none of). It’s common sense, since a CPU with special features will require more cycles in a CPU that is emulating those features.
Think about the SNES emulators. They required a Pentium 100 for games to be playable with sound (and I mean highly optimized ones, like zsnes on ms-dos).
So why make an emulator? Obviously, because you can’t get the original hardware but you need it. Like in the case you can’t or don’t want to buy a G5, but you want to make sure your application will work on that architecture (even if it feels slow on the emu).
I seriously doubt anyone would use PearPC to use Photoshop on Mac OS X. He would be better by buying the PC version.
We “think this might be a good way of ‘test driving’ [MorphOS] before plunking down <$1,000+ on a band new [Pegasos}.” 😉
R&B
http://www.pegasosppc.com
If anyone feels the need to test drive OSX or any other Apple software they can simply do that in the Apple store with the sales rep to answer any questions. If it’s specific software such as Alias Maya or Apple Shake there are resellers that sell the G5 with Maya and Shake installed and can demo it for you. I tried Maya on a G5 system which seemed okay but after seeing performance tests results I wasn’t really impressed (ie: Maya benchmarks at http://zoorender.com/ ). This may be resolved when Apple releases OSX Tiger which will be better able to use the G5 64-bit architecture.
If anyone feels the need to test drive OSX or any other Apple software they can simply do that in the Apple store with the sales rep to answer any questions.
…. which is pretty hard in the countries which don’t have Apple Stores. 🙂
Anyway it’s fun to see PearPC develop. In a few years it could be the next Basilisk II.
You people are forgetting the geek satisfaction factor involved in running an OS which was officially stated that would NEVER run on x86 hardware. Also it is pretty fun to play around with other os’s, not to use them for “productive” work.
The main reason to use an emulator is to test compatability. While PearPC gets press for being able to run MacOS X on a PC, it’s more useful, like most emulators, for testing software on a platform you don’t have physical access to. Want to test NetBSD on a PPC? Try it out on your PC using PearPC. The fact that it can run OS X is only noteworthy because there’s no other software (that I know of, at least) cabable of doing that at all. Not that PearPC can do it well.
I would certainly like an PowerPC-emulating “sandbox” for my mac. I already run Virtual PC, which is great (even under microsoft supervision) but won’t allow me to sandbox Linux/PPC, MorphOS or OS X itself.
Does anyone know of plans or projects in this direction??
-ak
If you are just talking about Linux, a port of CoLinux would probably be better. But yes, such has been mentioned in the forums.
By the way, PearPC.net tends to be a little behind at getting PearPC information. Try the Emaculation.com forums instead. The Altivec info was posted on the 19th there.
For me, it seems like a good way of finally being able to offer OSX builds of my programs.
Is it just me or is their site down?
No one is marketing PearPC as a business tool. Why are you challenging this as some alternative to native execution on Apple hardware? That is not a challenge the developers are attempting to meet.
There is a port of Mac on Linux to OS X that is going to make that possible, and very fast. It is currently only in alpha release though.
Of course, Mac-on-Linux. I forgot, that is the same idea as CoLinux, and already running on Unix-like PPC. Silly me
Just some examples of what’s the point:
– I’m a webdesigner. How does my site run on Safari?
– I’m a programmer. I wanna compile my application for OSX (talking about small projects of course. Big projects can afford buying real hardware )
I wanna buy a Powerbook but is OSX suitable for me? I work mainly in Windows (Office + Project + Visio). Now I can test drive OSX before getting my wallet lightened in about 2900? which is the price for a 15″ Powerbook with superdrive here in Portugal.
Do Altivec builds improve speed? On my AMD XP 1600+ I went from 540MHz G3 to 620MHz G4 in “About this MAC” from a build without Altivec to an Altivec ‘powered’ one. I upgraded from 10.3.3 to 10.3.4 and it was fast. System is more responsive 8)
2900? should read 2900 Euros.
“I wanna buy a Powerbook but is OSX suitable for me? I work mainly in Windows (Office + Project + Visio). Now I can test drive OSX before getting my wallet lightened in about 2900? which is the price for a 15″ Powerbook with superdrive here in Portugal.”
That makes me cringe. I see your point, but it really makes me cringe to hear about people trying things and only getting such a small portion of the real experience.
There is one such project- Mac-On-Linux. Right now, you can use it to boot up various Mac OSes, and Linux inside of Linux, but their Darwin port is making progress. (You may still have to use X11, but that’s a minor detail.) G5 compatibility is unknown.
“Granted, altivec got it right the first time, where Intel needed MMX (64bit integer only SIMD that overlayed the floating point registers, making mixed code a pain), SSE (128bit SIMD in a seperate register set, but lacked double precision fp operations) and SSE2 (finally a full SIMD implementation).”
By your definition, Altivec isn’t a “full SIMD implementation” , since it doesn’t support vector DP operations.
A7V wrote: “I wanna buy a Powerbook but is OSX suitable for me? I work mainly in Windows (Office + Project + Visio). Now I can test drive OSX before getting my wallet lightened in about 2900? which is the price for a 15″ Powerbook with superdrive here in Portugal.”
PantherPPC wrote: “That makes me cringe. I see your point, but it really makes me cringe to hear about people trying things and only getting such a small portion of the real experience.”
I can only agree with you on the real experience case, but my problem is being able to do at least the same things I do @work and that I can test/try. And I can always take my boss’s 12″ Powerbook for a ride
“Such software doesn’t require emulation or require purchasing a G5 because the software has been ported to Linux. The software also runs on current hardware in studios which saves even more money instead of purchasing turnkey systems from Apple.”
Yeah they run Linux on more expensive equipement(you mean right, SGI?)than Apple equipement. I’m not an expert, but it seems Apple has priced thier system really good compared to Boxx and other high end machines.
A couple of corrections to everyone.
AltiVec is hundreds of times better than MMX/SSE/SSE2. What mapping that can be done, will eventually be done. Until then, the speed will suffer, but this is an initial “it works” release, not an optimized release build.
Although, due to the fact that AltiVec instructions generate no exceptions, and sets no condition flags (except saturate) the cost of emulation is significantly lower.
AltiVec is also used extensively by the OSX operating system for drawing, and copying memory. Even scalar emulation of the AltiVec core was showing some limited performance gains.
PearPC is a PowerPC Architecture Emulator. While MacOnLinux is a Macintosh Virtualizer. While MoL will only run on a PowerPC processor, PearPC will eventually run on any processor, and host any OS. PearPC will never use “hacks” that will speed up MacOSX, at the cost of providing support to even the hobbiest of OSes.
PearPC can be a debugging tool. If you compile it with the generic core, you will eventually be able to run it on any architecture, and single step through instructions.
Oh, and the current cvs took out everything that was making it go faster than without altivec, as backbone is being written in.
“AltiVec is hundreds of times better than MMX/SSE/SSE2”
Without even vector DP support, how on Earth can it be “hundreds of times better”? It’s useless for the code I write, for instance.
The point of a PowerPC emulator is obviously to play Marathon on a Basilisk-emulated 68k Mac, running on Windows 2k in a virtual PC, running in OSX on Mac-on-Linux, running on PearPC, running on Windows XP under VMware, running in FreeBSD’s Linux-emulation mode!
A MMX was the first, useful SIMD in a personal computer. The forced nature of MMX was to keep the transitor count down and the clock speed high. Something Apple has forgotten.. hey 3 ghz g5 .. where are u now??
Dont believe the hype.. intel has had the fastest cpus for application benchmarks and THATS WHAT COUNTS. Nobody cares if g5 can do 1 hardware instruction the intel cant when that instructions used so few in any real benchmark the intel still wins.
This has been happening all along.. wanky apple marketing is not convincing.
Follow the benchmarks and the PC has almost always been ahead.. even with “altivec” vs “SSE” code
Intel made the RIGHT decisions with SIMD unlike apple.. which is why u cant clock the g5 any higher.
Some of u ppl should go to cpu school and learn how to design a cpu .. theres more to it than wacking in as many instructions in hardware as u can. After all thats CISC and g5 isnt cisc is it? oh wait it is now hahah
“Without even vector DP support, how on Earth can it be ‘hundreds of times better’? It’s useless for the code I write, for instance.”
Ok, take the set of number { 1, 2, 3, 4Â }, and now the set of numbers { 2 … 401Â }
Despite the second set not including every number of the first set, it _STILL_ has a cardinality that is 100 times larger.
Despite AltiVec not having Double Precision support, the number of supported opcodes, and the capabilities of it, still far outweigh the pain-in-the-butt, which are the MMX/SSE/SSE2 hacks.
Just for completeness… a Double-Precision vector in a 128-bit vector would contain a “measly” two values. But, honestly, since the PowerPC FPU does everything in 64-bit, and is well designed… I imagine there wouldn’t be all that much of a speed boost in doing it with AltiVec, as especially the overhead of transparent auto-alligning, and un-aligned access fix-ups, would make it effectively almost useless.
But, I’m sorry that AltiVec doesn’t support your precious Double-Precision… but that doesn’t change the fact that AltiVec is hundreds of times more thought out, and completed (within the scope it was done), than is the mess Intel and AMD have put out.
The G5 processes two DP FP numbers simultaneously via its dual FP unit architecture, so IBM saw no need to add that to the AltiVec implementation. Being able to process two independent FP numbers simulatenously is more flexible than doing the same operation on two dependent FP numbers in any case. You can simulate DP AltiVec by just doing the same FP operation on two consecutive DP FP numbers in memory. You wouldn’t see any difference from a operational or speed point of view.
This does make a difference for G4 people. So the best you can say is the G4 is not quite a full SIMD implementation. The G5 is.
“Despite AltiVec not having Double Precision support, the number of supported opcodes, and the capabilities of it, still far outweigh the pain-in-the-butt, which are the MMX/SSE/SSE2 hacks.”
*hacks*? Please.
[QUOTE]
Just for completeness… a Double-Precision vector in a 128-bit vector would contain a “measly” two values. But, honestly, since the PowerPC FPU does everything in 64-bit, and is well designed… I imagine there wouldn’t be all that
[/QUOTE]
The x86 FPU is 80 bit – just for “completeness”. A vector op of length 2 is still better than 2 operations.
[QUOTE]
much of a speed boost in doing it with AltiVec, as especially the overhead of transparent auto-alligning, and un-aligned access fix-ups, would make it effectively almost useless.[/QUOTE]
Actually, no. That’s the “Apple didn’t do provide something so it’s better just not to have it” argument. DP vector operations isn’t as useful for applications like *Photoshop*, therefore they didn’t provide it. Just like MMX was integer only – it was intended for “multimedia”.
[QUOTE]
But, I’m sorry that AltiVec doesn’t support your precious Double-Precision… but that doesn’t change the fact that
[/QUOTE]
*My* “precious double precision”?!? For anyone using a computer for technical purposes, it’s pretty important.
[QUOTE]
AltiVec is hundreds of times more thought out, and completed (within the scope it was done), than is the mess Intel and AMD have put out.[/QUOTE]
What it most certainly does point out is that is just your opinion based on your very limited criteria.
“The G5 processes two DP FP numbers simultaneously via its dual FP unit architecture, so IBM saw no need to add that to the AltiVec implementation.”
I think you’ll find that you will rarely – if ever – operate on two DP numbers at once using two different FPUs.
” You can simulate DP AltiVec by just doing the same FP operation on two consecutive DP FP numbers in memory. ”
I’m afraid that doesn’t work. While you *will* be performing two DF operations, they won’t be at the same time.
“You wouldn’t see any difference from a operational or speed point of view.”
I’m afraid you would. Ever wonder why the G5 doesn’t perform anywhere near as well as it’s theoretical values suggest?
“So the best you can say is the G4 is not quite a full SIMD implementation. The G5 is. ”
Having 2 FPU units is NOT s SIMD.
Just wanted to thank Daniel Foesch and all the others who made PearPC a reality.
Oh and for you who missed it, Daniel Foesch is the main reason for this news on AltiVec;)
“I’m afraid that doesn’t work. While you *will* be performing two DF operations, they won’t be at the same time.”
Yes, you will, if they both operate on entirely different registers. For example:
fadd r0, r2, r4
fadd r1, r3, r5
This will execute IN PARALLEL on a G5. If you don’t understand how this works, then learn about superscalar designs.
“Having 2 FPU units is NOT s SIMD.”
No, it’s not. You have two instructions on two sets of data. But, due to superscalar designs, they will run parallel on a G5.
“Yes, you will, if they both operate on entirely different registers. For example:
fadd r0, r2, r4
fadd r1, r3, r5”
Provided they do. If you write software using a higher-level language, you can’t guarantee it. You do NOT get real world performance on the G5 anywhere close to the *theoretical* performance of the chip.
“This will execute IN PARALLEL on a G5. If you don’t understand how this works, then learn about superscalar designs.”
Not relevant, since you’d have to hand code it in assembler.
Are the load/stores in parallel too? Quite a bit more work than just using the dp operation intrinsics in Fortran on a SSE2 capable machine. Oh, yes – *SSE2* is the hack here.
You think double precision math isn’t important (witness the “your precious” bit). Your opinion really doesn’t matter to me.
“Despite AltiVec not having Double Precision support, the number of supported opcodes, and the capabilities of it, still far outweigh the pain-in-the-butt, which are the MMX/SSE/SSE2 hacks.”
“*hacks*? Please.”
Yes, HACKS. Seriously, look at the whole x86 SIMD feature set. By offering a MOVUPS (unaligned vector load) they eliminate the need to permute data to give you an unaligned vector. So all their *SHUF* commands only take integers. Sometimes, you want them to operate on only pieces. Also, there’s no way to “shuffle” a SSE register by bytes. The best you’ve got is by words. (and that’s only half of the SSE register at a time)
I’m sitting here with the Documentation References for both AltiVec and MMX/SSE/SSE2, and I’m sitting here shaking my head. Honestly, the MMX/SSE/SSE2 feature set just doesn’t stack up against AltiVec.
“The x86 FPU is 80 bit – just for ‘completeness’. A vector op of length 2 is still better than 2 operations. ”
For “completeness” it’s either 64-bit or 80-bit, there’s an internal flag that controls which mode it’s operating in. But that doesn’t change the fact that except for being longer, two operations _CAN_ execute simulataneously, unless they collide a result from one, into the operands of another.
“Actually, no. That’s the ‘Apple didn’t do provide something so it’s better just not to have it’ argument. DP vector operations isn’t as useful for applications like *Photoshop*, therefore they didn’t provide it. Just like MMX was integer only – it was intended for ‘multimedia’. ”
Actually, no. That’s the, the over head of how AltiVec requires you to deal with vector data, means that you would lose more than you gain. EXAMPLE:
Unaligned hypothetical dp altivec. Load two vectors from memory, and add them:
addi rX, rA, 16
lvx vr0, r0, rB
lvx vr1, rX, rB
lvsl vrP, r0, rB
vperm vr0, vr0, vr1, vrP
lvx vr1, r0, rC
lvx vr2, rX, rC
lvsl vrP, r0, rC
vperm vr1, vr1, vr2, vrP
vadddp vr2, vr0, vr1
Aligned hypothetical dp altivec. Load two vectors from memory, and add them:
lvx vr0, r0, rB
lvx vr1, r0, rC
vadddp vr2, vr0, vr1
Aligned & Unalligned non-hypothetical dp FPU. Load two “vectors” from memory, and add them:
addi rX, r0, 8
lfdx fr0, r0, rB
lfdx fr1, rX, rB
lfdx fr2, r0, rC
lfdx fr3, rX, rC
fadd fr4, fr0, fr2
fadd fr5, fr1, fr3
The unaligned dp altivec would hit 16+16+16+16 = 64 bytes of memory, and takes 10 instructions. And takes 6 cycles assuming two loads can happen together.
The aligned dp altivec would hit 16+16 = 32 bytes of memory, and takes 3 instructions And takes 2 cycles assuming two loads can happen together.
The unaligned & alligned dp FPU hits 8+8+8+8 = 32 bytes of memory, and takes 7 instructions. And takes 4 cycles assuming two loads can happen together.
So, as you can see for yourself, if you’re doing an unaligned access, you’re better off using the FPU core in every way shape and form.
Now, if you can insure that everything you do is alligned (which is possible, and generally recommended for all vector data anyways), then there would be a total of 2 cycle gain, and 4 instructions.
I’m not arguing that there would be a modest gain. But the gain entirely disappears for unaligned accesses. Where as for all the other data types, there’s a gain on both.
Rather than make a trade off, PowerPC decided to just not implement it. Because mixing FPU and AltiVec code with interdependent data is a pain (it has to write to memory each time).
“What it most certainly does point out is that is just your opinion based on your very limited criteria.”
My “limited” criteria has factual bases. There’s a significant drop in the ability of the AltiVec core handling un-alligned double precision vector point values.
Does it suck that the AltiVec core doesn’t have Double Precision floating point? Yeah, it sucks… It’d be pretty nice to see it be there. But the added complications of adding it in would produce dropping gains in enough situations to render it practically useless.
“Provided they do. If you write software using a higher-level language, you can’t guarantee it. You do NOT get real world performance on the G5 anywhere close to the *theoretical* performance of the chip. ”
double a = (double)c + (double)e;
double b = (double)d + (double)f;
These will get written out as two fadd’s following each other. And any compiler that is producing the code for a G5 that doesn’t, is improperly optimzing your code.
“Not relevant, since you’d have to hand code it in assembler.”
Not. A good optimizing compiler should know about this stuff, and generate code appropriately.
“Are the load/stores in parallel too? Quite a bit more work than just using the dp operation intrinsics in Fortran on a SSE2 capable machine. Oh, yes – *SSE2* is the hack here.”
What if your values are not alligned? You end up having to use the MOVUPS instruction, which is INCREDIBLY slow compared to the MOVAPS.
So much so, that if you’re using exclusively MOVUPS, you’re actually slowing down your process. (I have run actual code to verify that this phenomena exists)
And an optimizing Fortran compiler for the G5 should use it’s knowledge that the G5 can execute two double precision floating point instructions at the same time, and compile accordingly.
“You think double precision math isn’t important (witness the “your precious” bit). Your opinion really doesn’t matter to me.”
I didn’t say it wasn’t IMPORTANT. I was meaning your precious double floating point AltiVec.
Re: “Yeah they run Linux on more expensive equipement(you mean right, SGI?)than Apple equipement. I’m not an expert, but it seems Apple has priced thier system really good compared to Boxx and other high end machines.”
Thanks for making me laugh SGI has in the past made me see them looking a lot like Apple. They are changing their ways slowly by offering Linux systems but still seem to limiting when it comes upgrading hardware.
Anyway, a lot of post-production studios switched from SGI to not just Boxx but also HP, IBM, etc. Some small studios will even custom build their systems. Such distributors have been known to negotiate leasing or financing cost for studios compared to Apple not. Comparing the cost for the new HP Workstations that have the new 64-bit processors with Hyperthreading to Apple G5 systems, HP wins the lower cost not only in purchase but I would presume those systems would also win in price per composite. Think about it, Maya, XSI, Houdini or Shake running on dual 64-bit processors with Hyperthreading is like having 4 64-bit processors. Apple has nothing that competes with that. Add to that Apple has no DCC (Digital Content Creation) graphics hardware (ie: FireGL, Quadro, Wildcat). At least on x86 systems you are assured you have support for such hardware no matter if you use Linux or Windows.
That’s why I say many studios and Freelancers will continue to use Linux and highend software ported to Linux on x86 hardware instead of using emulation. Why emulate software when you can get that software ported to your platform? The logical answer is you would go with the port instead of slow emulation. Sure for someone in a far off country with no access to Apple they may consider emulation software but in reality PearPC as it is now really sucks for performace on real world applications. I wish them all the best in improving it but for now I’m not impressed at all.
“These will get written out as two fadd’s following each other. And any compiler that is producing the code for a G5 that doesn’t, is improperly optimzing your code.”
With out-of-order execution being the norm in a modern processor, you still can’t claim this claim. Your *code* may specify those operations in parallel, but they might not execute that way.
“Not. A good optimizing compiler should know about this stuff, and generate code appropriately. ”
I suppose that means that there are no “good optimizing compilers” for the G5 then – judging from the poorer real world performance of the chip compared to it’s theoretical FLOPS. The only way it gets close is *with* Altivec vector operations.
“What if your values are not alligned?”
What if they are?
“I didn’t say it wasn’t IMPORTANT. I was meaning your precious double floating point AltiVec.”
Altivec is of no use or interest to me, since there *isn’t*
a DP Altivec. One more thing the Macintosh platform lacks.
“I’m sitting here with the Documentation References for both AltiVec and MMX/SSE/SSE2, and I’m sitting here shaking my head. Honestly, the MMX/SSE/SSE2 feature set just doesn’t stack up against AltiVec.”
Unless you need something Altivec doesn’t have.
“Now, if you can insure that everything you do is alligned (which is possible, and generally recommended for all vector data anyways), then there would be a total of 2 cycle gain, and 4 instructions.”
Which is generally what you use a SIMD set for in the first place.
“My “limited” criteria has factual bases. There’s a significant drop in the ability of the AltiVec core handling un-alligned double precision vector point values. ”
Which doesn’t impact my application.
“But the added complications of adding it in would produce dropping gains in enough situations to render it practically useless.”
Or not – depending on what you use it for.
A SSE 2 does everything the FPU can do .. using SIMD! So no longer do u have to program fpu or SIMD code.. u program SSE2 and its IEEE 64 bit certified and runs in SIMD.
SSE2 is better than altivec (Face it) .. and stop winging about MMX which is so old we mose well be talking about 100mhz machines!
DP is vital.. if youve never used it youve never used a computer for maths science.. or such. Your obviously a gamer who things 32 bits is a lot.
U can hack DP into altivec.. but its a hack and ppl dont generally use it. Why not just make it proper 64 bit in hardware like intel and AMD?
SIMD is basically similar to out of order parrallel execution. This king of this is AMD followed by intel followed by IBM. Hence why running an OS with lots of things going AMD kicks butt.
AMD make the best cpus today.. just face it.. STOP winging about apple and intel because both are being better by AMD, in terms of technological development.
Just look at the SMP implementation of the opteron AMAZING
Lets get to the point.. u want a 4 cpu box that kicks your friends.. get a 4 cpu opteron, a 4 cpu g5 or intel box just wont keep up because they dont share memmory properly or have onboard memmory controllers.
Dual precicion is important
Fastest P4 beats the fastest g5 in altivec enabled code (thats because g5 is now cisc and cant be clocked higher)
Opteron is the most advanced technological design (and best amongs all conditions)
“With out-of-order execution being the norm in a modern processor, you still can’t claim this claim. Your *code* may specify those operations in parallel, but they might not execute that way.”
If this were the case (it splitting up a fadd from it’s follow fadd) then it would STILL be executing two floating point operations in the same cycle period.
If anything, out-of-order execution would say it’s MORE LIKELY to be executing parallel floating point instructions.
“I suppose that means that there are no “good optimizing compilers” for the G5 then – judging from the poorer real world performance of the chip compared to it’s theoretical FLOPS. The only way it gets close is *with* Altivec vector operations.”
No, there are. What do you think IBM compiled the SPEC benchmarks with in order to give them the best speed possible? IBM’s compiler still produces the fastest and best code. Just like Intel’s compiler does the same for Pentium series processors.
“‘What if your values are not alligned?’
What if they are?”
Most of the time you can’t just assume that though.
“Altivec is of no use or interest to me, since there *isn’t*
a DP Altivec. One more thing the Macintosh platform lacks.”
Right, and we’ve just spent the past 30 minutes trying to beat it into your head that IT WOULDN’T RUN SIGNIFICANTLY FASTER ANYWAYS.
What? You think IBM, Apple and Motorola got together to design AltiVec, and completely FORGOT that double-precision existed? No, of course not. They looked at the existant situation. And decided that the PowerPC has such a powerful double-precision FPU, that to implement a SIMD double-precision instruction set would do more harm than good.
On the other hand, Intel with SSE2 wants to replace the aged x87. That FPU design is one of the worst possible (but it was the best choice at the time) FPU implementations in existance. And with the Pentium 4, Intel has removed a lot of the things that made the FPU faster in the first place.
The reason why? With SSE2, they’ve obsoleted the x87. So, no code written to execute on the P4 should use the FPU, but rather only scalar SSE2.
“Unless you need something Altivec doesn’t have.”
That the FPU of the PowerPC already provides in SIGNIFICANT abundance. Remember, the x86 world is still grappling with the horribly limiting design of their FPU. No kidding they’re interested in replacing it.
Honestly, I don’t know enough of the details regarding the full internals of how a hypothetical double precision AltiVec core would perform next to the existant FPU core. But I can guarantee you that IBM, Motorola, and Apple weren’t total idiots about implementing AltiVec, and that they surely considered the impact of double-precision floating point AltiVec.
Fact of the matter is that there’s far more to this whole deal than you’re putting out. You’re complaining about the lack of a feature in the AltiVec instruction set.
Guess what? The normal PowerPC instruction set doesn’t have a “not” instruction. OH MY GOD! What a horrible thing! Oh wait, they have a “nor” instruction, that when you use the same register for both operands, you get the not of that register.
Wait a second. You mean, you can have full support for something, without actually having it? Wow, that’s a miracle.
You’re missing the point that AltiVec doesn’t NEED double-precision floating point. I’ve already conceeded the point that it doesn’t HAVE it, but you keep badgering that it “needs” it. But it doesn’t. The PowerPC FPU core is more than sufficient to take over the needs in this situation.
Um… You forgot something. DEC Alpha.
Alphas run faster off a 7 year old core design than x86 or PowerPC or anything but the Itanium.
The Itanium only beats it when it’s REALLY well tuned, which is extremely difficult.
The x86 architecture is a piece of crap. Ask any professor down at your local Electrical Engineering College. The hurdles that have been overcome make it incredibly impressive. And I’m not doubting that.
And yes, Athlons are far better than Pentiums.
But they’re not “the best of all time”. Hell, A DEC Alpha running a a measly 1GHz blows them both out of the water.
The only reason x86 run faster than PowerPC is time and money. Both of which have been spent at great cost, even when it was just reaching the possibilities of the PowerPC.
I regret all the time I spent trying to inform you x86 apologists.
I’m sorry for any time that you lost trying futily to explain to me how the x86 architecture is “better” than the PowerPC architecture.
But the fact remains that the PowerPC architecture and AltiVec instruction set on top of that, is significantly more well thought out than the spaghetti-code core, which you champion.
If you don’t like the PowerPC core, that’s fine. There are a lot of people who think it’s worse, because it’s slower. But for those of us in the know, we know that if even half the time and money, which has been poured into the x86 world, had been instead better placed in the PowerPC world, then we would have a chip that would be unparallleled with anything available today.
And if it had been put into the Alpha core, then it would be even more impressive.
Athlon 1800+ @ 1533Mhz:
Score = 4.71
Mac 2xG4 @ 800Mhz:
Score = 99.87
=> So basically, this 1533Mhz PC emulates a Mac with the speed of a 75Mhz PC…
(Almost) complete XBench report here: http://nogfx.free.fr/pearpc_benchs.png
PS: PearPC test was run without Quartz nor OpenGL tests which would give even worst results !
@Dark_Knight: “on dual 64-bit processors with Hyperthreading is like having 4 64-bit processors”
Hyperthreading is _not_ a dual processor nor a dual core. It is a interresting technique to squeeze more potential out of the Intel cpu. It has his own issues. And i think its more of a marketing trick.
Bochs, PearPC and the like are great tools and examples how to virtualize pieces of hardware. It helps researching next generation operating systems.
This benchmark isn’t really usefull to make some decisions over pearpc. The “real” Apple-PC has 2 processors and 10 times more RAM, so interpreting the results is pretty useless.
Maybe because i went to Uni and studied cpu design so put valid points?
Pfft to this discussion.
As far as i understand it Athlon creams G4 or G5 with DP .. thats why it matters..
I had heaps of valid points and everyone just ignored them pfft. Ive run windows xp on a 200mhz pc.. hows osx run on that .. ?? haha
OSX is slow bloatware.. G4 and G5 are good cpus but no PC beaters (get an average of a lot of SSE and altivec encoded apps and youll see)
You gotta be kidding. The cost of both a Maya and a Shake license is much more than what a dual G5 would cost you. PearPC is a nice toy, but that’s it. To be able to run OS X at decent speeds you’d need a much more expensive equivalent hardware, and you’d still be using a silly PC with a 1980-designed CPU. Just buy the real thing.
You obviously didn’t pay attention in CPU design 101. IA-32 is probably the worst architecture ever made, that’s why all modern processors (p4 and uthlon) are basically a RISC core emulating the old ISA. The PPC is a much better designed CPU. And sorry to burst your bubble, OS X is not bloated, and is still years ahead of what proprietary x86 solutions like Windows XP and GNU/LNUX can offer.
>his benchmark isn’t really usefull to make some decisions over pearpc. The “real” Apple-PC has 2 processors and 10 times more RAM, so interpreting the results is pretty useless.
It’s still more accurate than the About window of MacOSX reporting “700Mhz” that most people take as right…
Just wanted to give an idea at how PearPC performs when compared with a real Mac… that is 40x slower than equivalent Mac.
I think PearPC will mostly be used with an illegal copy of OSX… That’s not a very big problem, as PearPC is nearly useless for running OSX, but it’s still illegal. And it would be a problem for people who say “now i can offer OSX builds” or “now i can make my site look good in safari” (which uses khtml, by the way).
” Why dont any of u read my comments? Maybe because i went to Uni and studied cpu design so put valid points?
Pfft to this discussion.”
First off, get over yourself. You haven’t said anything impressive yet.
“As far as i understand it Athlon creams G4 or G5 with DP .. thats why it matters..”
Athlons are great chips, and AMD has done a lot to overcome a bad architecture. But to say it ‘creams’ a G5 is a tad much. Athlons and G5 top eachother depending on the software you are running.
“I had heaps of valid points and everyone just ignored them pfft.”
I think most people ignored you for two reasons, one, you are being very arrogant, and two, your debate was off topic from this thread. People came to read a discussion about PearPC, not the differences between G5s and Athlons.
“Ive run windows xp on a 200mhz pc.. hows osx run on that .. ?? haha”
Try running the newest Logic on XP. We could go on like this for weeks.
“OSX is slow bloatware.. G4 and G5 are good cpus but no PC beaters(get an average of a lot of SSE and altivec encoded apps and youll see)”
Sounds like you failed chip school. Good software on PPC runs much faster than good software on x86. Plain and simple. Your entire discussion seemed to miss out on that point. Second point…a lot of the software I need isn’t available on Windows or Linux. What then?
“Right, and we’ve just spent the past 30 minutes trying to beat it into your head that IT WOULDN’T RUN SIGNIFICANTLY FASTER ANYWAYS.”
Is this the imperial “we”? Perhaps it wouldn’t using Altivec, but it is faster with SSE2.
“What? You think IBM, Apple and Motorola got together to design AltiVec, and completely FORGOT that double-precision existed? No, of course not. They looked at the existant situation. And decided that the PowerPC has such a powerful double-precision FPU, that to implement a SIMD double-precision instruction set would do more harm than good”
That’s amusing, considering the usual pitiful performance of the G4 on DP code.
“You’re missing the point that AltiVec doesn’t NEED double-precision floating point. I’ve already conceeded the point that it doesn’t HAVE it, but you keep badgering that it “needs” it. But it doesn’t. The PowerPC FPU core is more than sufficient to take over the needs in this situation.”
When they add the instruction in the future, I’ll bet your opinion will change.
What you FAIL to recognize is that your opinions aren’t fact.
“I regret all the time I spent trying to inform you x86 apologists.”
What an arrogant thing to say.
“I’m sorry for any time that you lost trying futily to explain to me how the x86 architecture is “better” than the PowerPC architecture.”
What you failed to do is convince people that your opinion is fact.
“But the fact remains that the PowerPC architecture and AltiVec instruction set on top of that, is significantly more well thought out than the spaghetti-code core, which you champion.”
“Spaghetti-code core”? Please.
“If you don’t like the PowerPC core, that’s fine. There are a lot of people who think it’s worse, because it’s slower.”
Performance of the chip IS what matters.
“But for those of us in the know, we know that if even half the time and money, which has been poured into the x86 world, had been instead better placed in the PowerPC world, then we would have a chip that would be unparallleled with anything available today.”
Your just jealous that the PPC has not lived up to any of its hype. Still pining for the RISC/CISC wars?
“And if it had been put into the Alpha core, then it would be even more impressive”
Quite a bit hotter too.
“I think PearPC will mostly be used with an illegal copy of OSX… ”
Running a *purchased* copy of OS X on “Pear PC” would be a license violation. I see Apple lawyers in their future.
“Sounds like you failed chip school.”
You never went.
“Good software on PPC runs much faster than good software on x86. Plain and simple.”
No, it doesn’t.
“Your entire discussion seemed to miss out on that point. Second point…a lot of the software I need isn’t available on Windows or Linux. What then?”
A lot of software *I* need isn’t available on OS X. What then?
“I’m sorry for any time that you lost trying futily to explain to me how the x86 architecture is “better” than the PowerPC architecture.”
Known officially as the PowerPC 970MP, the chip will feature two interconnected microprocessors on a single 13.225mm x 11.629mm die — a first for the 970 processor family. Each core will have its own 1MB L2 cache, sources said; the 970FX has only 512KB. L3 cache will not be supported.
The 970MP will feature a copper bus with 10 layers of metal; the dual cores will share a single Elastic Interface (EI) bus supporting a wide range of bus ratios and opening the door for higher bus speeds.
Antares(970MP) will be manufactured using IBM’s CMOS SOI10K process with Silicon on Insulator technology, sources said. The new chip will also support the VMX instruction set with Altivec-compatible Vector/SIMD units — one on each core.
A lot of software *I* need isn’t available on OS X. What then?
Care to elaborate on that? Perhaps is one obscure application only available on that niche OS, Windows? If it’s not available on OS X is probably useless. Or maybe it’s a game. Stupid gamers.
anymore all of these processors have more similarities than either side would care to admit.
since only very few users write asm code which would let them see the ISA (instruction set architecture) of the cpu, it comes down to the compiler for each side and the overall performance one can achieve for the applications that one runs.
as far as software being availible on an os on one side and not on the other…. there is VERY little that doesnt have at least some usable analog on another platform.
i do wish alpha stayed around though :. DEC pretty much designed all of the good parts of modern systems (pci, pcie, ht, etc…).
If X86 is so good then JCS, why is Microsoft using PPC as the processor for Xbox2?
…and yes, I too have heard that the P4 is the laughing stock at any university.
I can see you are trying to make a point of some kind, but you come across as being very childish. Please make a few on topic comments once in a while and perhaps try to take in what everyone else is saying. Then you can draw your own conclusions from what is said. If you still think its all BS then you are entitled to your opinion.
AX.
sorry for typo, there is no message edit button on this forum after posting.
I use both an G3 and an x86 processors(amd 2800+). To run the programs on a pc that are mac-only, you really need a mac and it’s true on Windows. I’m not a fan of virtual windows. I would rather have the real thing.
Would I be right in assuming it would be a lot faster on my nice new Athlon64-3200 with 1GB RAM if compiled as 64bit code, than compiled as 32bit?
“Would I be right in assuming it would be a lot faster on my nice new Athlon64-3200 with 1GB RAM if compiled as 64bit code, than compiled as 32bit?”
I believe there are people, who have compiled it, but they don’t have it working. You need to compile it as a 32-bit app in order for it to work.
Im glad to see so many people excited about PearPC
But you know what would be better than running the best OS available in emulation? to satisfy your curiosity and buy OSX.
OSX is a beast of an operating system- don’t cage it in lads.
That makes me cringe. I see your point, but it really makes me cringe to hear about people trying things and only getting such a small portion of the real experience.
well it’s probably the same as macheads commenting about pc performance based on running vpc. 🙂
PdC,
Re: “Hyperthreading is _not_ a dual processor nor a dual core. It is a interresting technique to squeeze more potential out of the Intel cpu. It has his own issues. And i think its more of a marketing trick.”
Well explain why not only my OS but also Maya and Mental Ray detect my workstations that have Hyperthreading as having 2 processors per machine? I noticed a performance drop if Hyperthreading is not enabled when rendering frames. This link may help to explain it better to you ( http://arstechnica.com/paedia/h/hyperthreading/hyperthreading-1.htm… ).
Perez-Gilaberte,
Re: “You gotta be kidding. The cost of both a Maya and a Shake license is much more than what a dual G5 would cost you. PearPC is a nice toy, but that’s it. To be able to run OS X at decent speeds you’d need a much more expensive equivalent hardware, and you’d still be using a silly PC with a 1980-designed CPU. Just buy the real thing.”
Like I said as it is right now many artists and studio owners wouldn’t seriously consider using PearPC because of the performance hit when emulating the client OS. I prefer to run highend software such as Maya, Shake, etc on the OS it’s ported for instead of running it on an emulator.
As for the cost difference between the Shake for Linux and Shake for OSX you seem to forget not only has Apple changed their pricing but also third party Apple resellers are willing to compete with competitors. Such as equal pricing for Shake 3.5 on OSX and Linux from distributors such as CDW ( http://www.cdw.com/shop/products/default.aspx?edc=636254 ).
Studios also take into consideration how well software runs on one OS and architecture compared to another. As it is now highend software such as Maya, Mental Ray and Shake run faster on Linux than on OSX. Completing tasks faster means lower cost.
Benchmarks:
Maya/Mental Ray render benchmark: http://zoorender.com/
Shake benchmark: http://homepage.mac.com/breadboi/shake/
“Just some examples of what’s the point:
– I’m a webdesigner. How does my site run on Safari?”
1) You try out Konqueror/x86 which has the same core (KHTML)
2) You buy a (cheap) Mac to save yourself all this horror. Because the installation, the running of the OS, is horribly slow.
Not to say PearPC isn’t a valuable solution or the project goals itself aren’t neat — personally, i like them.
“Yeah they run Linux on more expensive equipement(you mean right, SGI?)than Apple equipement.”
No, they run such software on SGI’s IRIX (a UNIX) or on Apple Mac or on Linux/x86 or on Windows/x86. Those are the 4 common and usable options. Linux/MIPS isn’t, it is mostly a hobbyist project though i do see several commercial or professional possibilities with it. Eventually in the future. For example, consider video cards not supported. So if you had that Onyx with RE2 or Fuel/Octane(2) with V6/8/10/12 you couldn’t use hardware rendering. Pain, huh? Proprietary software build for Linux/x86, Apple/ppc, Windows/x86 and SGI/MIPS doesn’t run on Linux/MIPS. Yeah, there are several emulators for MIPS, but that would be so counter-productive (same as with using PearPC to run OSX apps which are proted to the native OS). As a side point, rendering is ofcourse done on different computers than the above. Why buy yourself an Origin if you can do the same faster and cheaper on other hardware?
But they’re not “the best of all time”. Hell, A DEC Alpha running a a measly 1GHz blows them both out of the water.
You just know as i do that that’s because of the so-called “MHz hype” while MHz just doesn’t say a f*ck when comparing different architectures. Heck, not even when comparing different x86’s themeselves. Therefore one who says my processor runs on X MHz isn’t making an informative statement. There are several reasons for this, in particular the whole RISC vs CISC argument.
POWER, MIPS, Alpha and SPARC all lose because of misinformation like this while in reality its a moot point.
Afaik SpecINT and SpecFP are better tools to analyze performance between different architectures, but that only works on a case-by-case compare (application by application).
“Care to elaborate on that? Perhaps is one obscure application only available on that niche OS, Windows? If it’s not ”
Pretty much every major engineering application isn’t available on the Mac.
“available on OS X is probably useless.”
That’s the usual Mac advocate argument. It’s also wrong.
“Or maybe it’s a game. Stupid gamers.”
Stupid Photoshoppers.
“If X86 is so good then JCS, why is Microsoft using PPC as the processor for Xbox2?”
Please point to the statement by Microsoft to this effect.
You can’t.
“…and yes, I too have heard that the P4 is the laughing stock at any university.”
Which university? The ones that buy them?
“I can see you are trying to make a point of some kind, but you come across as being very childish. Please make a few on”
Pot, meet Mr. Kettle.
“topic comments once in a while and perhaps try to take in ”
I was making “on topic comments”. You aren’t.