Post a Comment
Excellent article, this is what osnews needs, not the blaring fanatical opinions of ogres and trolls, just simple, factual text.
Rather than compare which processor scores better on which benchmarks, it's good to read about the actual architectures of the different processors. That's the good stuff.
Good man!
I just finished a 600-level class in computer architecture and that was over my head. What a great article--truly an exemplar for other submitters!
Another article from for the armchair computer enthusist that has no real information. Sorry, but you either need to explain more about what you are talking about or use more technical verbage to make the article a higher level. You assume that the reader knows about registers, but don't talk about IPC (using that term). Your article also makes bold claims that are unfounded and unsupported within your article or by your references.
SMT will be twice as good as HT? Where is your reference?
Also at the end of your article you predict (as many others have in the past) that Intel will hit the 'heat wall' and that the future looks bright for RISC technologies but BAD for CISC. I can't wait to see you proven wrong again as CISC technologys continue to work as well or better then RISC. For all of RISCs 'advantages' (many of which you state in this article) CISC still seems to come out on top.
This is how far I read the article until noticing a grave mistake: The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU.
If you don't know shit about different CPU architectures, why do you feel need to write about them?
For those interested in RISC vs CISC (or especially why there is no RISC vs CISC any more), I highly recommend these articles at Ars-Technica:
http://arstechnica.com/cpu/index.html
I particularly liked the reference to changing programmers.
Yes, that has been known to result in faster programs!
This is how far I read the article until noticing a grave mistake: The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU.
If you don't know shit about different CPU architectures, why do you feel need to write about them?
"The Intel 8086, a new microcomputer, extends the midrange 8080 family into the 16-bit arena."
Intel Corporation, February, 1979
SMT will be twice as good as HT? Where is your reference?
IBM & DEC
Both IBM and the Alpha team announced the addition of Multithreading support was expected to give a 100% boost in performance.
When HT arrived it gave only 20% - 30%, I believe it is to be enhanced soon.
Also at the end of your article you predict (as many others have in the past) that Intel will hit the 'heat wall'
Microprocessor Report said that, not me.
This is how far I read the article until noticing a grave mistake: The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU.
If you don't know shit about different CPU architectures, why do you feel need to write about them?
If you don't know shit about different CPU architectures, why POST about them?
DUMBASS
Kudos to you, Nicholas, on a well written article. It was a well researched and factual piece, a welcome change from the opinion articles that have been becoming more common here. (Not that that is a bad thing, but it's nice to have a change once in awhile.)
instead of insults, could you provide pointers contradicting the article, and specifically that grave mistake you've noticed ?
I second! Thanks for write this article, Nicholas Blachford!
I believe there is a great potential future for the PPC platform. Compared to x86 CPUs it has a generally cleaner design, efficient power consumption and very well performing vector units.
With PPC platform moving towards a solid 64-bit architecture and good multi-processing capabilities, IBM may well have a winner (hopefully it won't ake long until ther are good 64-bit OS solutions available). Sadly consumers aren't well informed about the MHz myth despite Apple's efforts. Many people still buy a Celeron mainly for the higher clockrate (instead of performance). Most people simply don't undestand that a 50 MHz 68030 isn't twice as fast as a 25 MHz 68040, but that it's rather the other way around. IMO emphasizing clockrates only misleads the general consumer and 3rd party SPEC stats would be a far more reliable source, but of course still not ideal. Hardware companies will most likely continue to try to use tricks to mislead benchmarking software and so to artificially produce higher figures or will do everything dispute results when they are not in their favour and further confusing the general public.
Anyway thanks to Nicholas for his IMO well researched article. 
Really great article. We do need more of these on OS News!
First off, HT (Hyper Threading) is a form of SMT (Simultaneous MultiThreading), so stop all this nonsense of HT vs SMT!
Second neither IBM nor DEC (with their EV8) ever claimed 100% performance increase of a SMT implementationg vs. non SMT of the same architecture.
I also have a few bones to pick with the author, since he makes a lot of false claims, for example:
"The amount of voltage the CPU can use restricts the power available and this effects the speed the clock can run at, x86 CPUs use relatively high voltages to allow higher clock rates, "
This statement is so wrong, that I do not where to begin with the nitpicking! phew...
Not sure I by his ideas about this and don't think the references where good enough to prove it.
Perhaps it would have been better if it where a comparison of the two architectures and not a drift off into a poorly formated discussion of many chip architectures.
It is like trying to decide what is the best engine design for any application. If their was a best for "any" it surely would not be the best for each specific application.
It also does not consider the glue. So much of how well a computer works depends on the task and the parts surrounding the cpu and the tools to do the work. There are many examples of crappy cpus being very effective because the surround kit and code solve the problem better.
Process technology and price are important when you talk about the desktop market. But so are the artificial benchmarks.
I also question how much linux is really cross platfrom. Having used both the Itanium and the Alpha versions it become pretty clear that it is a x86 os with ports that less then optomized and stable.
Compliants asside it is to bad they put altvev in and kill the double percision mult-add instructions. For spec this counts for 2 ops. So if you have to fpu that can do double percision multadd you get 4 flops per clock. Power4 and Itanium both have this and it is how the win the Flop performance benchmarks and marketeering.
If they had left it in the ppc 970 would have been the Flop lead above the power 4 and everyone else. IBM probably did not want that... But apple would have gotten a lot of HPTC customers. Again IBM would not like that. Instead they have a thing with a poorly optimized compiler that does low percision floating point ok. Probably should have gotten IBM to have ported all the compilers to OSX at the same time. That would have helped too.
"Most people simply don't undestand that a 50 MHz 68030 isn't twice as fast as a 25 MHz 68040"
Well, the 040 was "double clocked" internally, i.e. the 25MHz 040 was indeed running at 50MHz internally (much like the R4000 for example)
True he may havg gotten it wrong - the x86 architecture actually goes all the way back to the 4004. The truth is when the PC choose the 8088, it was already somewhat handicapped by it's ties to the past.
I am an armchair computer enthusist. This is the first time I have read anything that even remotly understood the differences in the two different cpus. I may not have understood everything but I got enough.
I just read my own sentence and saw I made a mistake in my post. What I mean is that this is the frist time I have read something that explained the differnces in a way that made some sence to me. Most of the time is is more like listening to two preaches going at it over thier own particular beliefs.
Thanks for the informative article. A very concise summary of a difficult topic.
Also at the end of your article you predict (as many others have in the past) that Intel will hit the 'heat wall'
Microprocessor Report said that, not me.
Problem is that they assume that Intel will not change some aspect of their technology. Yes, Intel will have to change some part of their physical design or logical layout in order to compensate for the laws of physics. However, that doesn't mean that x86 code can not scale to higher and higher speeds. Just that the way it is executed will have to change. As it has already done multiple times over the x86 lifetime.
Both IBM and the Alpha team announced the addition of Multithreading support was expected to give a 100% boost in performance.
When HT arrived it gave only 20% - 30%, I believe it is to be enhanced soon.
http://meseec.ce.rit.edu/eecc722-fall2002/722-9-16-2002.pdf
This class lecture ( brought to you via google with "SMT DEC alpha speedup") proves that yes, in certain cases you can get a 100% speedup using SMT. Problem is that not all applications are going to be able to achieve that speedup (YMMV) and will have to be recoded (or at least recompiled) for SMT, as it requires the code to be processor aware.
After having finishing the article, it does seem to miss some points, but still overall, the article is good and one of the better reads I've had on Osnews in a long, long time.
Thinking about the x86 strategy in terms of marketing is a pure wonder--however, if Intel had actually focused on creating a better architecture rather than one that had many parameters to tweak such as mhz, cache size, bus speed, hyperthreading, etc where some marketing guru could overstate again and again, where would we be today?
Don't get me wrong, the x86 is a true piece of engineering excellence, taking something that's not that great and inefficient and making it good enough to satisfy the current user base to fanatical points where they berate powerpc users on a common basis. But what if intel was less marketing driven, could they have come up with something better than x86. I guess that's where the alpha and epic architectures fall in. Makes me wonder about buying anything x86 in the future (i.e. x86-64).
Somebody please do an architecture overview on madison.
They are neither binary nor source compatible. Their interrupt, memory segmentation/banking, I/O modes are completly unrelated. 8080 is completly hand coded while 8086 uses microcode. 8080 has no complex instructions, 8086 has plenty of them. 8086 is way more complicated than 8080 with 16 bit additions (8080 can use 16 bit adressing with BC, DE or HL register couples, btw.)
IOW, 8086's only relation to 8080 is that both were designed and produced by intel. That is it. Intel might have said 8086 extends their midrange to 16 bit, which was established by 8080 but technically, they are completly unrelated CPUs even designed by different people (original 8080 designers left to found Zilog) and philosophy.
Who is dumbass now?
>Who is dumbass now?
I suggest you all calm down in the way you talk over here, or I will mod all the rude comments down.
B,C,D,E,H and L are register names used in Z80 version of the 8080 asm. I can't recall the original, 8080 names right now.
because it was making my head hurt. Can you say proofreading? spellcheck? A second grader writes with better grammar. Perhaps you need to put the pipe down a bit sooner before writing your next article.
Let me suggest looking two words up in the dictionary: effect and affect.
I have to agree that this guy doesn't really know what he is talking about. He seems to start with a conclusion and then look for ways of justifying it. The ArsTechnica article is VERY good, but it pretty much requires extensive knowledge of computer/processor architecture.
The author seems to enjoy making broad statements without providing real proof. The Power5 SMT vs. Pentium4 HT is particularly blatent (though I have no doubt that Power5 SMT will provide more improvement than Pentium4 SMT, I doubt it will double performance and even then it will only improve parallel stuff - much more important for servers than desktops).
The benchmarking section was also given a cursory treatment. He uses an OSNews post as justification for throwing out ICC results in favor of GCC even though the post doesn't even address that. The part of the post the author refers to correctly points out that SPEC FP performance is NOT indicative of overall system performance because most applications use mainly integer code. This does not invalidate the ICC SPEC FP results or justify Apple's use of GCC. I have read in other locations (Ace's Hardware forums) that ICC does drastically improve performance on real-world floating point intensive code.
I would dismiss this article as blatant fanboyism, but the author seems to believe everything he wrote. Guess he stepped too close to the reality distortion field. Please, throw out this article and look elsewhere (Ace's Hardware and ArsTechnica are both VERY good sites for this type of stuff).
He builds powerpc systems. I wonder what his conclusions will be....hmmm..
Glad to see this, good job Nicholas.
Now the world may end, Bouma and I actually agreed on something. 8)
Anyway, I think this article is best viewed as a brief summary (and a pretty decent one), not as proof of any hypothesis or as an argument.
There was one statement that stood out as being particularly wacky to me, though: The comparison of programming language, programmer, and CPU in their relative importance for the resulting execution.
That's like asking: What's more important for a good driving experience, the steering wheel, the pedals, or the engine?
The obvious answer: All are critical for good performance, and a deficiency in any can bring down the whole system.
While i liked the article, and i'm a die-hard anti-x86 guy, I have problems taking the article as a whole very seriously. I liked what he had to say, and i dont think he made many SERIOUS mistakes, but it was obvious from the beginning that he was going to make this a PPC 0wNz x86 article.
I liked the bit about the Alpha outgunning the P4 though, god i want an EV7 box!
Re: Roy
Yes, the Ars articles actually provide content intead of fanboyism. They are well researched and good reads.
The author seems to enjoy making broad statements without providing real proof. The Power5 SMT vs. Pentium4 HT is particularly blatent (though I have no doubt that Power5 SMT will provide more improvement than Pentium4 SMT, I doubt it will double performance and even then it will only improve parallel stuff - much more important for servers than desktops).
IMO, SMT will not speed up servers (file, web, DB, etc) that much. The tasks that most 'servers' do is more single threaded and not prone to parraleziation in the same way that will reap the benefits of SMT. Not to say that increased resource sharing that SMT allows wills not be goot, but the 100% speedup (or more) that is possiable with SMT in certain applications will not be achieved. Video games have more potential for improvement via parrallel algorithims (graphics rendering can be highly parrallized) (sorry about the spelling).
Web serving doesn't require much CPU and the output can't normally be generated in parrallel for example. Even dynamic content can't be generated in parrallel, most of the time Google is the shining example of a parrallel algorithim though, while BBS systems like this page here really can't be generated in parrallel as you can't generate a part of the page before another, at the end of the day it's a long string of html. Also don't forget that dynamic content mostly uses integer math, and the floating point units are left doing nothing.
well actually this article tells the reader, little if anything about the PPC. its all about the x86.
as for ibm/alpha 100% SMT increase vs intels 30%.
i gotta laugh sooo hard here. to have a 100% increase, that would indicate the ppc architecture is so far below the x86 for parralellisng instructions and filling the pipe with uops that its beyond a joke. i'd bet realworld performace would be in similar ballpark as to intels 30%.
x86 is built on a 1979 legacy, ppc 1993, so much has been learned about processor design from 79<>93. so i would expect ppc to best x86 for its a new clean design.
so too would i expect ia64 to best ppc, since its again a new clean design with no cruft, postdating ppc.
and so on.
Yeah, the Alpha rocks. Too bad it is now a dead end design. DEC just didn't know how to market it and Compaq didn't care about it. The Alpha (especially 21064 - is this right?) really was the furthest evolution of RISC.
The Power5 SMT vs. Pentium4 HT is particularly blatent (though I have no doubt that Power5 SMT will provide more improvement than Pentium4 SMT, I doubt it will double performance and even then it will only improve parallel stuff - much more important for servers than desktops).
HyperThreading is a hack designed to utilize execution units of the P4 which sit idle as a tainted trace cache is cleared and its pipeline is repopulated following a mispredicted branch. If you work the numbers on the Pentium 4, you'll find that the percentage of time its execution units sit idle is approximately equal to the percentage of branch instructions in the code it is executing.
SMT in the Power5, on the other hand, is designed to leverage the full power of a dual core processor by allowing the pipelines to pick and choose which execution units to send decoded instructions to, with the assumption that the entire pool of execution units on both cores won't be completely used at a given time when they are being fed by only two pipelines.
Why is it that whenever we get an article on this site that praises PPC, Apple, Mac OS etc., that there are several which respond saying that its just a mac fanboy article... It seems that the signal to noise ration on these boards gets worse by the day.
The most accurate and sincere attempt to lay out the facts in an environment that is filled with so much fear uncertainty and doubt. I've been following the x86 processor family since the first PC was released. My first PC was a Radio Shack system, I learned to program using an IBM PC using BSD Fortran 77, then Pascal and of course I taught myself basic as well. I bought a 1984 Mac and marveled at the 68000 processor. I marveled at the 80286 and remember my excitement when I got one of the first 20 Mhz PC's, it made my dBase code smoke. I also had the privilege of working on the S/38 which eventually became the AS/400 and I marvelled as IBM converted it over to the Power platform. The 80486 was most excellent with its virtual 86 capability and of course I was thrilled when Intel finally got the Pentium done right with the Pentium III. I was disappointed with the Pentium 4 and still am because I felt that Intel sold themselves to the marketing side. We all know that the Pentium 4 was a bad deal compared to the Pentium III till it broke 2Ghz, AMD taught Intel a lesson for that blunder and took a major chunk of their marketshare with what now is the Athlon. The Itanium is also a big disappointment, and it appears that Opteron and Athlon 64 will once again get more attention that Intel. I believe this is because Intel's engineers strayed from their discipline when the compromised on the Pentium 4 and it has been a long road back to excellence.
In the meantime, Intel left the market wide open for IBM and their 970 processor is just amazing, it truly is one of the most exciting developments I have seen for some time in the desktop world. To think that we have a processor that is a superset of the Power4 core and even faster, makes me excited. I was also blown away with the G5's architecture, it really is a new generation of machine and not an incremental change.
I'm not surprised really that Intel's ICC compiler vectorizes Spec's FP intended instructions. It really is rigging to the nth degree. And, I am not surprised that journalists in general do not do their due diligence. But, you are starting to restore my trust that there are still those out there who are willing to do some research before writing an article. And, congrats to the OSnews eidtorial staff to have the courage to publish it. Great writing and looking forward to reading more from you.
Great article; I'd like to see more like it. In particular I'd like to know more about the less-common processors, and their operating systems and software.
Obviously readers can find flaws with anything; that's what the "Comments" area is for after all. But there's no reason to be rude.
Best Wishes,
Bob
The Alpha (especially 21064 - is this right?) really was the furthest evolution of RISC
the 21064 was a REAL dog. but saying that it was the first generation (EV4).
Man!
You Wintel guys have ZERO credibility anytime you let your blatant fanboyism for an inferior system get the best of you.
The article was accurate for the level of depth it put forward. Sorry if it doesn't jive with your revisionist methods of viewing the history of personal computers.
Along with Ars Technica columns, be sure to check out David K. Every's articles on the same subjects... http://www.igeek.com/articles/Hardware/Processors/
He's great at explaining how things work and why and which are better suited for specific applications.
By the way, there is no such thing as an unbiased opinion.
I guess that you get what you pay for, as this website is free, you can't really expect much from it. The comments have more true information then this article. Sigh.
This is the kind of Article I would love to read on OSNews all the time. Well written, referenced, and professionally done. Not like a lot of the fetid tripe that dares call itself a "review" that gets posted here (Eugenia's articles excluded of course).
i've allways liked 3dnow! better because it can do two operations per clock, because games don't need 128bit vectors, and like Tim Sweeney says: "Since register-memory instructions are as fast as register-register
instructions, I don't usually need to use more than 4 registers"
:-P
I agree that it is not fair to compare the 970 to the P4 or even the XEON. Intel simply does not have a modern processor to compare against the more advanced design of the 970. The real comparisons will happen against the Athlon 64. The way I see it the categories of comparison look like this.
High-End Server:
Opteron vs. Itanium vs Power4
High End Desktop/Workstations/SMB Servers:
Athlon 64 vs 970 (G5) vs ? (nothing from Intel yet)
Legacy Desktops, legacy servers, current notebooks:
Pentium III vs. G3 vs. P4 vs. PM vs. Athlon vs. Xeon vs. Athlon MP
This is a generational shift and right now only the Athlon 64 and 970 are in play for the next generation desktop.
Nicholas Blachford,
Good article. I enjoyed reading it.
- Mark
Yup, you are probably right. My main point was that the guy was pulling numbers out of his arse. Like I said in my original statement...
"though I have no doubt that Power5 SMT will provide more improvement than Pentium4 SMT"
The 100% improvement just sounded inflated to me. There are certainly cases where SMT will provide large performance increases, but we aren't talking about a 100% improvement in most cases.
RE SteveToth: Yup, you are right too. I never meant web-servers, but it looks like SMT/HT helps more for heavy computation tasks (scientific, multimedia editing, games possibly someday). Servers (web/database) in general are more I/O bound.
From a March 2003 article, it appears Apple beat even his optimistic forecast.
http://www.igeek.com/articles/Hardware/Processors/x86-64vPPC-64.txt
If you care about 64 bit, we're probably going to see it significantly effecting the Mac market around 2004-2005, and in the PC market around 2008-2009. Not because of technological issues (though there are some of those), but mostly business and market issues. I'm a technology guy, I wish the technology was all there was. (Technology is much more clean and pure than politics and business markets). But if you don't understand business and markets in this industry, then you don't understand jack.
PC advocates will talk about how they had 64 bit first; but ignore that they did it poorly, it doesn't effect much, and will take forever to actually gain momentum. Mac users will likely be seeing any benefits from 64 bit computing, far sooner. In fact, the most likely way that you'll see 64 bit x86 adoption is if it comes from Apple in the form of OS X ported for AMD.
Don't get me wrong; I hope I'm wrong for the PC markets sake. I have no problems with AMD, and I like their x86-64 implementation. It would be great if this summer AMD was ruled the winner and the entire PC market adapted x86-64, and Intel licensed it, and there was no more war or world hunger, and dogs and cats could live together in peace; but I just don't see that happening.
Many of you talk about CISC pulling ahead of RISC, but many of you forget that Intel had to basically make their processors RISC-like to compete. These days RISC is more like CISC and CISC is more like RISC, we confuse the too a lot. Take a look at the instruction set of a PowerPC processor compared to that of a X86 processor and tell me that I'm wrong.
Right, cause the only people that make processors are IBM, Motorola, Intel and AMD.
It would be great if this summer AMD was ruled the winner and the entire PC market adapted x86-64, and Intel licensed it,
FYI, intel doesn't have to licence anything to use x86-64. They can just go ahead an implement it and lose nothing but face. That kind of x86 extensions are already covered by an old cross licensing agreement between Intel and AMD.
The early generations of Alpha really took the RISC principles (read KISS) to the extreme. Maybe I'm thinking of the 21164 rather than the 21064, but I know that later generations (21364 and possibly 21264) started using things like instruction reordering that are less in line with the principles of RISC. The 21264 and 21364 were great designs, but they weren't as "pure" RISC as the earlier generations. EPIC takes the RISC ideas of letting the compiler do the work one step further (though I haven't seen any evidence that this is paying off yet).
simultaneous multi-threading (SMT) is designed to convert threading to instruction level parallelism (ILP). That is its main purpose. It does very little good on Processors that have a low degree of parallelism and whose OS's and their development frameworks do not promote asynchronous processing. Windows and COM+ are not very well threaded. Though the COM+ environment allowed better threading, it was difficult to program in the unmanaged VS 6 environment. Only with .NET has Microsoft started to emphasize delegating of threads and asynchronous programming, but it is a very large framework and will take a couple of more years to mature.
This is not the case with OS X which is a highly threaded Unix based OS and the Cocao framework is very mature being in development since NextStep in the late 80's as a truly Object Oriented Smalltalk type environment. This coupled with the fact that both IBM and Apple have a long history of developing for multiprocessing systems; as well as providing a highly parallel processor in the 970 and future 980 designs clearly shows that it is not only possible but more than likely that many operations will achieve close to 100% performance increase in IBM's implementation of SMT. SMT thrives on ILP and P4 greatly lacks ILP. That's just a fact, it is not meant as a personal insult so get your emotions out of it already. Intel may be forced to step up to the plate with a competing design, and wouldn't that be a good thing? If I were you I would be promoting competition, its healthy and will benefit the Intel Zealots in the end as well.
By the way databases and transaction based systems thrive on multi-threading. It's games that currently prefer single-threading, but that is changing as well, take a look at Quake on an SMP Mac, it rocks.
As Nicholas pointed out in this article CISC-commands are hard to decode, they are more complex, have different length ... But this also means that a CISC command carries more information from the memory to the processor than a RISC command. Nicholas also stated that the bottleneck is the processor <-> memory connection. So you can regard CISC commandos as a kind of compression algorithm, so more information can be transported to the CPU, which has time to decode this information into something it can handle optimal.
I can't provide you a link, but IBM thinks about integrating a GZIP-unit at memory controller and at processor for its zSeries, so the data are compressed before transfer.
Anton
Unless you have completely closed your eyes, OSX 10.2 added the GPU as another processor to offload some of its OS duties for GUO in the form of Quartz Extreme. As it was OpenGL was already hardware accelerated, but Quartz Extreme allows all the compositing to take place in the GPU freeing up the CPU. In addition, since the Altivec unit is truly orthogonal, Shadows and Pattern fills along with 6 other desktop drawing functions are being handled by the vector unit while the rest of the processor core was free to do what it needed.
Now, with the release of Panther, Apple has added Windowing and scrolling to Quartz Extreme as well. It's never been faster or smoother and the CPU is even more free to handle the actual Apps. Panther will greatly benefit all Mac's with G4's and up. G3 Macs will also benefit from highly optimized Scalar libraries that now outperform the very well and time tested OS 9 libraries (Jaguar had previously achieved parity.) It should be a fun winter for hobbyists.
Intel pushed the 8088 as the "next" 8080 while the Z-80 was Zilog (loaded with former Intel engineers) vision of what the next 8080 processor should have been. The 8086 was just an 8088 with a 16 bit data bus. I do not know if the binary's were compatible, and I know the mneumonics were extended, but the idea was to be able to use your 8088/8086. If there were such drasic differances, then it would be my guess that Itel missed the mark. But from what I have seen, the 80486 and 8080 appeared very similar at the assembly level (Sorry, I have not done much intel assembly to have a real feel for it). I only encountered the 8080 codd because we used a C compiler that generated 8080 assembly to run on our Z-80's 15 years ago.
But the bootom line is that Intel intended the 8088/8086 to be a 16 bit extension of their 8 bit 8080 which came from the 8008 that owed it's start to the 4-bit 4004 processor used in early calculators.
They transistioned to micro-coded architecture with everyone else in the 80's. But the architecture was not improved by it. It just allowed the architectuer to be extended one more time. The old Single accumilator design persists even in the P4. The x86 line is a 1970's arcitecture that has been tweaked into the future. The PPC is a 1990's architecture that is near the beginning of it's life. Designed from day 1 as 64 bits (The 32 bit processors are implimented as a 64 bit processor with the extra bits removed). The x86 is an 8 bit arcitecture that has been extended to now, so the A register was extended to 16 bits by renaming the A as AL and adding an AH, then when they went to 32 bits, they attached another 16 bits and call it AX.
You still have to shuffle the registers so that all math involves the AX register. You still have All the segment register nonsense to maintain compatability with the 80186/80286 attempts at 32 bit operation. With the 80386 they added flat 32 bit memory.
Do not get me wrong - Intel has done a wonderful job at keeping the platform going - I have been declaring it dead since the 80286 came out. But they keep tweaking the speeds up. But the Itainium is their concession to the eventual death of the architecture. AMD seems to want to keep it alive by broviding for effiecent operation of 32 bit code.
Bottom line - The x86 is like an old 60's muscle car. They have tweaked the engine so that it has the spead of a sleak new Porsche ... But the Porsche does it with an engine that is half the size and double the gas mileage.
The x86 is bigger requires twice the clock speed, generates 4 times the heat do do the same amount of work as the PPC. They may be about the same speed, but the PPC has a lot more room to grow.
Dude -- an apostrophe does not mean "watch out, here comes an 's' !!" Posessive pronouns, "its, hers, yours," do not have apostrophes. Use apostrophes when you are using a contraction, for instance "it's" means "it is" and the apostrophe stands for the (space and) vowell. And I wanted to read the article because I'm a big-time architecture geek and couldn't because of all these trivial errors --- whaaaaa! :-( Factually, you seem to understand x86 about as well as Hannibal over at Ars understands PPC so this might make a good companion piece but again I can't tell because of the frustration at de-skewing the apostrophe catastrophe --- whaaaa! :-(
I wonder why some basic features were not covered like Out-of-Order execution and Branch Prediction which seems to be the major items commonly found on current IA processors.
What about Pipelining, any ideas on that one?
Interested, I checked out the website of MorphOS, in a paper about MorphOS "in Detail" it said the below. I think this would have been a big point in the article but it was not mentioned. Is it true and how does it work that it is 10x faster? And, more importantly, is that fast enough to provide a speedy OS?!
Thanks for the good article.
Microkernel Vs Macro Kernel
A common problem encountered in the development of microkernel Operating Systems is speed. This is due to the CPU having to context switch back and forth between the kernel and user processes, context switching is expensive in terms of computing power. The consequence of this has been that many Operating Systems have switched from their original microkernel roots and become closer to a macrokernel by moving functionality into the kernel, i.e. Microsoft moved graphics into the Windows NT kernel, Be moved networking inside, Linux began as a macrokernel so includes everything. This technique provides a speed boost but at the cost of stability and security since different kernel tasks can potentially overwrite one another's memory.
Given the above, one might wonder why Q can be based on a microkernel (strictly speaking it's only "microkernel like") and still expected to perform well. The answer to this lies in the fact that MorphOS runs on PowerPC and not x86 CPUs. It is a problem with the x86 architecture that causes context switches to be computationally expensive. Context switching on the PowerPC is in the region of 10 times faster, similar in speed to a subroutine call. This means PowerPC Operating Systems can use a microkernel architecture with all it's advantages yet without the cost of slow context switches. There are no plans for an x86 version of MorphOS, if this changes there will no doubt be internal changes to accommodate the different processor architecture.
Why the heavy-handed treatment of the author by some here? If you disagree, do so in a reasonable manner.
Eugenia, you shouldn't even have to warn people about their tone and language with an article like this.
Lately it seems like Windows fans are worse than Mac fans in their worship!
A good article for the entry-level (me). I'd like to see a lot more of these informational articles.
A personal note: I come here a lot less these days. Why? The level of discourse on these boards has gone to hell. Most of you sound like second-graders, and that's being really generous of me.
This article isn't all that it looks to be. Check his sources, some are just a step above marketing speak. He has very few hard facts, and mostly opinions. You should really check out Arstechnica and Aceshardware, as others have suggested, if you want the real story. They have much more in-depth analyses with real facts, and even benchmarks to test cerain subsystems to make sure they are right. Mr. Blachford attempts to speak with authority, yet he just doesn't seem credible, especailly compared to all the better sources out there. An OSnews comment is hardly an authoritative source. This is more like a college freshman's lab report.
That doesn't not mean that he is totally wrong. He is quite right that the x86 is highly inefficient, and should probbably have died years ago, but it keeps getting more complex and faster.
Regarding ICC, yes it is somewhat biased, however it really can auto-vectorize code, which means that its benchmarks have much highler believability than Apple's old photoshop tests with the hand optimized assembly. I'm not saying that he's completely wrong, just that ICC CAN be that fast in real applications, and doesn't require hand coding assembly.
He seems blatantly biased towards the G3-G5 cpu's, but just because he's biased doesn't mean he's wrong. They are highly efficient, and low power cpu's. The P4 really does seem more market driven than engineering driven.
Power consumption is a very complex field. There are more than one or two facts which describe why a processor consumes more or less power. Nicholas writes that Intel uses high speed transistors which consume more power. This is true, that faster transistors can waste more energy. First, leakage current is higher and second, you have to overload the base of the transistor by using higher voltage for make it switch faster (oversaturation). But on the other side for reaching higher clock rate you can make transistors smaller, you can reduce your voltage, because a smaller transistor needs less electrons inserted into his base area for reaching saturation.
A great power consumer is the clock tree. Alphas are very power hungry due to their clock tree which is a mesh with a very high capacitance. So to make the clock tree switching fast, a lot of power has to be pumped. I don't exactly know which clocking structure uses Intel on his chips.
All I want to say is, that the reasons why x86s are power hungry have to be more diversificate than just the fact that Intel probably uses high speed transistors.
Greetings from Anton
The article was okay, but still somewhat biased, especially in concluding that RISC processors have always been faster. In my experience in the past 10 years comparing scientific programs to different architectures, especially suns and hps I've always consistently seen average desktop x86 machines being able handle more than 2x the throughput than cutting edge risc boxes more than 20x more expensive.
It all comes down to real competition. The funny part is that everyone always predicts that linux will fragment. What Linux has been doing is defragmenting the hardware vendors.
Freedom from vendor lockin to hardware! Down with MS! Down with Apple!
Man I wish DEC would have gotten a clue and tried to push the Alpha into the consumer arena. But in those years MS truly had a ton of lockin...
As a fledgling computer engineer (computer systems and architecture), I enjoyed reading this article, despite the oddly colloquial writing style. Likewise, even though it's not directly OS-related, such an article is representative of what I'd like to see more often at OSnews instead of the non-informative, highly opinionated, unresearched drivel we unfortunately seem to get so much of. Kudos!
Honestly.... this entire conversation is stupid. Most of you know very little about processor design and the merits of different schools of thought.
For example, stingerman's conjecture than Quartz (Extreme)'s use of the GPU as a secondary processing engine is a great idea is frankly daft. To put it simply, you don't need your windows to warp, spin, etc. No matter which way you look at it, you are creating extra system load and the idea of having an independent framebuffer per window in memory is insane and has predictable results.
Anyone who looks at PowerPC vs. x86 architectures will come to the conclusion that the RISC vs. CISC argument is a dead one. Effectively both architectures have reached a point where they rely on a RISC core with a translator and interesting caching and processing units to compensate.
Moreover, the heat output and speed of x86 and PPC architectures is much the same in mass-market products. The Pentium 4 is a high-clocked low-IPC architecture, and the Athlon and PPC head in the other direction. At the end of the day, however, the actual performance in inherently similar. Moreover, the heat outputs are substantially similar. Comparing the heatsink on my Athlon XP to that on my friend's G4 indicates similar levels of heat dissipation.
At the end of the day, I do appreciate that the Mac users here (and indeed the majority of posters seem to be Mac users) would like to crow about the 970, but as the recent benchmarks and more in-depth analysis has shown, it runs about 90% the actual performance of the current Athlons/P4s. There is nothing wrong with that, but it is not a revolution of any kind.
Ultimately, you will always find that the PPC architecture will perform around 70-95% of current x86 architecture in the consumer market and this will remain the case, simply because processor design is admittedly complex and we've not seen massively revelatory new designs in recent years. Enhancements yield limited percentage improvements in speed, but ultimately, that is that. We haven't seen a consumer (<- I emphasise this word) development emerge in recent years which has come from nowhere and doubled the speed over current systems. It doesn't happen in design, and to be frank, it will only appear due to entire process changes to take advantage of new materials or migration to quantum computing or the like. PPC will never see a significant lead over x86 due simply to economies of scale. More x86 are being sold and more people are working on enhancements. That's life, and it may as well be dealt with.
To be frank, Mac users need to work out that their machines are more than ample for the tasks they put them to, regardless. Even migration to the 970 will yield them a limited benefit over a high-end G4, in fact, perhaps not even massively noticable in many places. Software optimisations could easily be more worthwhile than the upgrade.
How much do you want him to cover in a short article. He hit on the salient points, we all understood what he meant. Great article, easy for even the lay person to understand the gist of it and feel intellectually satisfied.
By the way my OS X is automatically spell checking everything I type in this form and actually allows me to context switch to the right spelling. It is so cool!
One more thing, Apple provides its developer a very nice vector library that will automatically downgrade to scalar if a vector unit is not present. It will even optimize between the different generations of vector units (its called veclib.) But Apple chose not to use its Veclib in the Spec test. Vector Unit vs Vector unit Apple wins hands down, so get off justifying ICC's auto-vectorization. No one uses ICC anyway unless your an intel engineer or obscure developer. We use VS or Borland in the Wintel world. ICC is mainly used for benchmarking.
Otherwise you wouldn't be able to make such blanket statements such as "the PPC architecture will perform around 70-95% of current x86 architecture" with a straight face, and with some facts behind you, no doubt!!!
So tell me, why did the G5 smoke x86 here:
http://www.luxology.net/company/wwdc03followup.aspx
Now, since you're a developer, you should easily be able to show why this windows-predominant shop wasn't able to correctly gauge the speed of the relevant processors.
I await your informed, technical reply with great anticipation!!!
Your Quartz Extreme observations are wrong, offloading more and more processing to the GPU is state of the art in computer science circles and much research is being done on it at the university level. Every time OS X directs work to the GPU the more the CPU is free to do other work and Apple's implementation is not to wave windows around (like in the longhorn demos) but to actually speed up the whole system. And Quartz extreme does.
Your referring to old PC tricks to speed up screen draws, Apple's quartz extreme is implementing university level research for the future of computing. One MIT study showed that using the GPU for indexing a database can increase performance by up to 30 times with current GPU's. Apple promised at the 2002 WWDC that they have only begun to exploit the GPU for the OS and they are showing even more work in Panther. Why do you think Microsoft is so lauding similar technology for their future WIndows 2007.
Facts: CISC vs. RISC doesn't matter. All conventional processors are moving towards the "heat wall".
That a 3GHz P4 consumes over 100W (peak) and a 1GHz G4 only consumes ~10W is not relevant as:
(1) the G4 is a low power embedded processor and the P4 is a high-end workstation processor
(2) the P4 is clocked 3 times higher than the G4, have a higher bandwidth interconnection and have more cache.
Wow, thats news to me. I always thought it was a desktop processor. You must be thinking of the Xeon.
There seems to be two types of people here, the electrical engineer/computer engineer type, and the GUI/widgets/font/web designer type. The former may think that this author is simply wrong/incomplete. The latter don't know much detail about processor design. As an example, do you understand what pipelinig is and why it is good?
Therefore the Mac folks(mostly the second type) think that the engineers are full of it and simply flaming the author(and some are) while many (engineers) are pointing out that he is just plain wrong in some of the things he says.
Sab: I haven't read the article past the quote I made. I like wasting my time proving people wrong butI don't like wasting my time for absolutely nothing. The sentence was a sign of things to come and I didn't even bother to read the remaining hence didn't even know it was slanted towards PPC until reading comments. As such my windows fanboyness (which I don't use and not a fan of) is a misunderstanding on your part.
Bob: 4004 and 8086 are not related (except in the lame MS joke that ends with "... 1 bit of competition") either. 4004 has no architectural descendants. 8088 and 8086 are similar (even same on software level) but 8080 is a different beast. Compare:
8080: interrupt are handled via a specilized function call (1 byte long. except that identical to CALL)
8086: interrupts are handled via special all-register stack dumping instructions
8080: Flat 16 bit addressing with 8 bit GP registers.
8086: Segmented 20 bit addressing with 16 bit registers. No real general purpose registers except accumulator
8080: No integer arithmetic except ADD and SUB, no loop, floating point, indexed or string handling instructions
8086: You know what 8086 has
etc.
Of course intel didn't start over as if 8086 was their first CPU, there is bound to be more similarities between 8080 and 8086, say, compared to 6502 and 8086. However you can't even trivially modify 8080 code to compile on 8086. Names 808 0 and 808 6 imply a stronger link that doesn't exist. See if you can read the following 8080 code (CP/M operating system manual, 1982 edition, page 212-213, lines 186-199)
(cpmspt, noovf, unatrk are labels)
inr m
mov a,m
cpi cpmspt
jc noovf
mvi m,o
lhld unatrk
inx h
shld unatrk
xra a
...
My next computer will be a Mac (G5 based if I can afford it at that time), not because of MacOS X, I couldn't care less about that, but because the CPU is low heat and high power - I'm going completely insane over the immense noise levels that my current 1600+ AthlonXP is producing.
I considered the C3 but it seems a tad underpowered for most of my tasks but not by much, I'm however concerned that the Eden platform is locked down so I can't replace say my gfx card should it be needed at some later stage.
I've always looked at the Macs and admired their clean solutions, and now I simply must own one...
Listen, Megol, I'm not the very best at math, and I certainly don't want to defend Motorola's clock speeds, but the fastest G4 right now is 1.42ghz.
Now to the math:
1.42ghz x 3 = 4.26
Is there a 4.26 P4?
Is this "new math"?
So which type are these people?
http://www.luxology.net/company/wwdc03followup.aspx
The best G4 motorola produces is 1GHz. Apple overclocks them. You can *probably* overclock a P4 to 4.26GHz too, if you can cherry pick which P4 to overclock. With exotic cooling methods much higher frequencies have been achieved.
Alternatively, you can disregard out-of-spec frequencies. Then, yes, there exist 3GHz P4s (which is 3*1Ghz of G4.)
I did enjoy the article. But, the commentary has made my day.
Do you have any evidence for this, goo?
Apple has said that they don't, so maybe you have some reference to back up your claim?
And please show me where a non-overclocked P4 at 4.2ghz is.
Facts, not "rumor", please.
The PPC platform is so far different that this is a useless discussion. Some think of the PPC as only a Mac but IBM has been selling top of the line professional mission critical machines based on the PPC platform for many years. Even industrial machines are running PPC every day. This is a no contest. PPC can be one of the best computers for any task if so designed to do so. The x86 can never be designed for a mission critical task. It started as a toy and should remain that way. It has very basic design flaws.
Fact: IBM doesn't produce plain G4s.
Fact: Apple apperanly sells >1GHz G4s
Fact: Except Motorola and IBM, nobody produces G4 cpus.
Baseless speculation without evidence nor reference, fanboy yada yada yada: Apple overclocks 1GHz motorola G4s.
Feel free to draw a different conclusion from these facts.
Ok, facts are facts, right!
:-)
Hey, I think Moto has been sucking on the gas-pipe regardless of the "facts", and I am no fanboy of either platform.
The G5's however, are a whole new ballgame, and the competition is good for everyone.
However you can't even trivially modify 8080 code to compile on 8086.
Thats not what Intel said, perhaps I should have quoted the full release...
"The Intel 8086, a new microcomputer, extends the midrange 8080 family into the 16-bit arena. The chip has attributes of both 8- and 16-bit processors. By executing the full set of 8080A/8085 8-bit instructions plus a powerful new set of 16-bit instructions, it enables a system designer familiar with existing 8080 devices to boost performance by a factor of as much as 10 while using essentially the same 8080 software package and development tools.
-- Intel Corporation, February, 1979
Way I look at is it really argument of what instruction set you like to program in: ARM, MIPS ALPHA Sparc, PPC, or X86. All of the previous CPU architectures are capable of getting to the Nirvana CPU speed all CPU geeks seek. It just matter of a lot of money, 18 to 48 month of time, getting good experienced CPU micro-architects and Quantum Mechanics (Transistor Tweekers).
What we need to look at is Software Operating system have become stable if not boring, Window2000/XP and Mac OS X are based on research from about late 80's. Also memory code size limitations of the past are distant memory in the desktop and server space. Which drove feature rich similar O/S Cores. Plus if you remember history True64 Unix (Mach) , Mac OS X(Mach), and Windows NT(Cutler Design ex Dec) are all based on modified micro kernels
What this has done is move all CPU Architecture off their polar position an began rationalizing their ISA to better meet the needs of software evolution to pretty stable foundation based on common C/C++ based Multithreaded, multi-user, operating system to almost a homogenization of feature (Core CPU Instruction, Floating Point Instruction(SP, DP, Parried Single) Debugging Instruction, DSP Like Instructions (Multiply Accumulate, etc), and Vector Instruction the need to better support the market segments and application where they were moving to support.
What this did is put more pressure on CPU Micro-Architects to innovate since their was going to be less innovation coming form ISA extensions. So they had two choice Fast Clock speed Narrow Super-pipelined architecture or wide slower clock high CPI Micro architectures. They had to look at innovative way to deal with memory latencies (Caches, Larger Register Sets, Instruction Buffer, etc) , also understand how best to deal to code control flow issue ( branch prediction) Here is were the visionaries evolved and ALPHA was one of the greatest CPU experimenting environments to emerge in the last 10 years and they tried all the variation ( In-order, Out of Order, Dual Issue, Multi issue, Multithreading, on chip memory controllers and more). Big Issue today all of these innovations drive Gate Count and chip complexity which reduce our ability to make bigger innovation beyond wait for the next process geometry
When compare and contrast the PIV and the 970 they both do something similar. If you want to crank up the clock on the CPU the best way to do this is go with a super-pipelined micro-architecture. And to do this at these new speeds you need to do some thing which Dec invented on the MicroVax Processor and that is to crack PPC or X86 instruction set into simpler instruction (micro-ops). What interesting is Intel been doing this since PentiumPRO. I would argue these Microcodes made Intel more RISC then the current classic ISA level RISC processor. So now that IBM made this leap in Processor design it now back to race to who the best process technology and do most innovative transistors, with minor micro-architecture tweaks . Also with the announcement of Power5/980 Architecture, IBM and Intel are parity of feature again around SMT/HT. Here some of the best research on the subject. (http://www.cs.washington.edu/research/smt/)
On the power issue of X86 core look know further then the Pentium M which is one incredible X86 CPU which matches PPC G4 10 Watts 1 GHz with and the bonus of an amazing branch prediction unit, and 1 megabyte of onboard L2 Cache. So this point is moot as well since this Micro-architecture and Quantum Mechanic issue (Transistor tweeker)
If you want to see innovation in CPU architecture look at following project since they are truly driving innovation into again CPU design, Compiler Research and Operating Systems and Application Design To MIT projects are based on MIPS like instruction set.
http://www.cs.utexas.edu/users/cart/trips/
http://www.cag.lcs.mit.edu/scale/overview.html
http://www.cag.lcs.mit.edu/raw/
(don't worry about it, "goo" doesn't know what he's talking about, but you have to give it to him, he talks a good game!!)
And hey, Nicholas, that is one of the better, factual, non-troll articles that have been here in a while.
You have to expect the heat (sorry, bad pun) from the x86's when you point out the facts to them.
:-)




