This article started life when I was asked to write a comparison of x86 and PowerPC CPUs for work. We produce PowerPC based systems and are often asked why we use PowerPC CPUs instead of x86 so a comparison is rather useful. While I have had an interest in CPUs for quite some time but I have never explored this issue in any detail so writing the document proved an interesting exercise. I thought my conclusions would be of interest to OSNews readers so I’ve done more research and written this new, rather more detailed article. This article is concerned with the technical differences between the families not the market differences.
History and Architectural Differences
The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU.
It was a 16bit CISC (Complex instruction Set Computing) processor.
In the following year the 8088 was introduced which was used in the original IBM PC. It is this computer which lead to todays PCs which are still compatible with the 8086 instruction set from 1978.
The PowerPC family began life with the PowerPC 601 in 1993, the result of a collaboration started in 1991 between Apple, IBM and Motorola.
The family was designed to be a low cost RISC (Reduced Instruction Set Computing) CPU, it was based on the existing IBM POWER CPU used in the RS/6000 workstations so it would have an existing software base.
RISC Vs CISC
When Microprocessors such as x86 were first developed during the 1970s memories were very low capacity and highly expensive. Consequently keeping the size of software down was important and the instruction sets in CPUs at the time reflected this.
The x86 instruction set is highly complex with many instructions and addressing modes. Additionally it also shows it’s age by the small number and complex nature of registers (internal stores) available to the programmer. The x86 only has 8 registers and some of these are special purpose, PowerPC has 32 general purpose registers.
RISC was originally developed at IBM by John Cocke in 1974 [1]. Commercial RISC microprocessors appeared in the mid 80s first in workstations later moving to the desktop in the Acorn Archimedes.
These use a simplified instruction set which allow the CPUs to be simpler and thus faster. They also included a number of architectural improvements such as pipelining, super scalar execution and out of order execution which enabled the CPUs to perform significantly better than any CISC CPUs.
CISC CPUs such as the 68040 and the Intel 80486 onwards picked up and used many of these architectural improvements.
In the mid 1990s a company called NextGen produced an x86 CPU which used a translator to convert x86 instructions to run within a RISC core. Pretty much all x86 CPUs have since used this Technique. Even some RISC CPUs such as the POWER4 / PowerPC 970 use this technique for some instructions.
The high level internal architecture of the vast majority of modern desktop CPUs is now glaringly similar be they RISC or CISC.
Current State Of x86 And PowerPC CPUs
The current desktop PowerPC and x86 CPUs are the following:
x86
AMD Athlon XP
Intel Pentium 4
PowerPC
IBM 750xx (G3)
Motorola 74xx (G4)
IBM 970 (G5)
The current G4 CPUs run at significantly lower speeds compared with the x86 CPUs which are now above 2GHz (P4 > 3GHz). The recently announced PowerPC 970 currently runs up to 2GHz and delivers performance in line with the x86 CPUs.
CPUs break down all operations into stages and these are performed in a pipeline, these stages can be big or small and the number of stages depends on what’s done in each stage, the more an individual stage does the less stages you need to complete the operation. However if the stages are simple you will need more of them but each stage can complete quicker. The clock speed of the CPU is limited by the time an individual stage needs to complete. A CPU with simpler but greater number of stages will operate at a higher frequency.
Both the Athlon and Pentium 4 use longer pipelines (long and thin) with simple stages whereas the PowerPC G4s use shorter pipelines with more complex stages (short and fat). This is the essence of the so called “megahertz myth”. A CPU with a very high clock speed may not be any faster than a CPU with a lower clock speed. The Pentium 4 is now at 3.2 GHz yet a 1.25 GHz Alpha can easily outgun it on floating point operations.
The longer pipelines allow the x86 CPUs to attain these very high frequencies whereas the PowerPCs G4s are somewhat restricted because they use a smaller number of pipeline stages and this limits the clock frequency.
The amount of voltage the CPU can use restricts the power available and this effects the speed the clock can run at, x86 CPUs use relatively high voltages to allow higher clock rates, to boost clock speeds further, power hungry high speed transistors are used. A long thin pipeline is very fast but also very inefficient power wise. All these things add up so a 3GHz CPU may be fast but are also very power hungry with maximum power consumption rates now approaching or even exceeding 100 Watts. Intel in fact have taken to using a much lower frequency part for laptop computers than the top end Pentium 4. Yet, despite the fact it is only 1.6GHz, the Pentium M performs just as well as the 2.2GHz Pentium 4.
The Law Of Diminishing Returns (Aka Amdahl’s Law)
The Law of diminishing returns is not exactly a new phenomenon, it was originally noticed in parallel computers by IBM engineer Gene Amdahl, one of creators of the IBM System 360 Architecture. The original describes the problem in parallel computing terms however this simplified version pretty much describes the problem in terms of any modern computer system:
“Each component of a computer system contributes delay to the system
If you make a single component of the system infinitely fast…
…system throughput will still exhibit the combined delays of the other components.” [3]
As the clock speeds goes upwards the actual performance of the CPU does not scale exactly with the clock speed. A 2GHz CPU is unlikely to be twice the speed of a 1GHz CPU, indeed on everyday tasks people seem to have some difficulty telling the difference between these speeds.
The reason for the lack of scaling is the fact that memory performance has not scaled with the CPU so the CPU is sitting doing nothing for much of it’s time (HP estimate this at 70% for server CPUs). Additionally the latency of memory has barely improved at all so any program which requires the CPU to access memory a lot will be effected badly by memory latency and the CPU will not reach anything near it’s true potential. The CPU memory cache can alleviate this sort of problem to a degree but it’s effectiveness depends very much on the type of cache and software algorithm used.
Many of the techniques used within x86 CPUs may only boost performance by a small amount but they are used because of the need for AMD and Intel to outdo one another. As the clock speed increases ever higher the scaling problem increases further meaning that the additional effort has less and less effect on overall performance. Recent SPEC marks for two Dell workstations show that a greater than 50% increase in CPU speed and the addition of hyper-threading results in only a 26% increase in SPEC marks [2]. Yet when the Itanium 2 CPU got an 11% clock speed boost and double the cache the SPEC mark increased by around 50%
Of course there are other factors which effect the performance of CPUs such as the cache size and design, the memory interface, compiler & settings, the language it’s programmed in and the programmer who wrote it. Changing the language can in fact be shown to have a much greater effect than changing the CPU [4]. Changing the programmer can also have a very large effect [5].
Performance Differences Between The PowerPC And x86
Since AMD began competing effectively with Intel in the late 1990s both Intel and AMD have been aggressively developing new faster x86 CPUs. This has lead them to becoming competitive with and sometimes even exceeding the performance of RISC CPUs (If you believe the benchmarks, see below). However RISC vendors are now becoming aware of this threat and are responding by making faster CPUs. Ironically however if you were to make all CPUs at the same geometry the Alpha 21364 is the fastest CPU going – yet it uses a 7 year old core design.
PowerPCs although initially designed as desktop processors are primarily used in embedded applications where power usage concerns outweigh raw processing power. Additionally, current G4 CPUs use a relatively slow single data rate bus system which cannot match the faster double or quad data rate busses found on x86 CPUs.
The current (non G5) PowerPC CPUs do not match up to the level of the top x86 CPUs however due to the effects of the law of diminishing returns they are not massively behind in terms of CPU power. The x86 CPUs are faster but not by as much as you might expect [6]. (Again, see below section on benchmarks).
Vector Processing Differences
Vector processing is also known as SIMD (Single Instruction Multiple Data) and it is used in some types of processing. When used it speeds up operations many times over the normal processing core.
Both x86 and PowerPC have added extensions to support Vector instructions. x86 started with MMX, MMX2 then SSE and SSE2. These have 8 128 bit registers but operations cannot generally be executed at the same time as floating point instructions. However the x86 floating point unit is notoriously weak and SSE is now used for floating point operations. Intel has also invested in compiler technology which automatically uses the SSE2 unit even if the programmer hasn’t specified it boosting performance.
The PowerPC gained vector processing in one go when Apple, IBM and Motorola revised the powerPC instruction set and added the Altivec unit which has 32 128 bit registers. This was added in the G4 CPUs but not to the G3s but these are now expected to get Altivec in a later revision. Altivec is also present in the 970.
Currently the bus interface of the G4 slows down Altivec as it is very demanding of memory. However the Altivec has more registers than SSE so it can operate without going to memory too much which boosts performance over SSE. The Altivec unit can also operate independently from and simultaneously to the floating point unit.
Power Consumption Differences
One very big difference between PowerPC and x86 is in the area of power consumption. Because PowerPCs are designed for and used in the embedded sector their power consumption is deliberately low. The x86 CPUs on the other hand have very high power consumption due to the old, inefficient architecture as well as all the techniques used to raise the performance and clock speed. The difference in power consumption is greater than 10X for a 1GHz G4 (7447) compared with the 3GHz Pentium 4. The maximum rating for a G4 is less than 10 Watts whereas Intel do not appear to give out figures for power consumption rather referring to a “thermal design rating” which is around 30 Watts lower than the maximum figure. The Figure given for the design rating of a P4 3GHz is 81.9 Watts so the maximum is closer to and may even exceed 100 Watts.
A single 3GHz Pentium 4 CPU alone consumes more than 4 times power than a Pegasos PowerPC motherboard including a 1GHz G4.
Low Power x86s
There are a number of low power x86 designs from Intel, AMD, VIA and Transmeta.
It seems however that cutting power consumption in the x86 also means cutting performance – sometimes drastically. Intel still sell low power Pentiium III CPUs right down at 650MHz. The Pentium 4 M can reduce it’s power consumption but only by scaling down it’s clock frequency.
Transmeta use a completely different architecture and “code morphing” software to translate the x86 instructions but their CPUs have never exactly broken speed records.
VIA have managed to get power usage down even at 1GHz levels but they too use a different architecture. The VIA C3 series is a very simple CPU based on an architecture which forgoes the advanced features like instruction re-ordering and multiple execution units. The nearest equivalent is the 486 launched way back in 1989. This simplified approach produces something of a compromise however, at 800MHz it still requires a fan and even at 1GHz the performance is abysmal – a 1.3GHz Celeron completely destroys it in multiple benchmarks [7].
Why The Difference?
PowerPCs seem to have no difficulty reaching 1GHz without compromising their performance or generating much heat – how?
CISC and RISC CPUs may use the same techniques and look the same at a high level but at a lower level things are very different. RISC CPUs are a great deal more efficient.
No need to convert CISC -> RISC ISA
x86 CPUs are still compatible with the large complex x86 Instruction set which started with the 8080 and has been growing ever since. In a modern x86 CPU this has to be decoded into simpler instructions which can be executed faster. The POWER4 and PPC 970 also do this with some instructions but this is a relatively simple process compared with the multi-length instructions or the complex addressing modes found in the x86 instruction set.
Decoding the x86 instruction set is not going to be a simple operation, especially if you want to do it fast.
How for instance does a CPU know where the next instruction is if the instructions are different lengths? It could be found by decoding the first instruction and getting it’s length but this takes time and imposes a performance bottleneck. It could of course be done in parallel, guess where the instructions might be and get all possibilities, once the first is decoded you pick the right one and drop the incorrect ones. This of course takes up silicon and consumes power. RISC CPUs on the other hand do not have multi-length instructions so instruction decoding is vastly simpler.
Related to the above is addressing modes, an x86 has to figure out what addressing mode is used so it can figure out what the instruction is. A similar parallel process like the above could be used. RISC CPUs on the other hand again have a much simpler job as they usually only have one or two addressing modes at most.
To RISC Or Not To RISC
Once you have the instructions in simpler “RISC like” format they should run just as fast – or should they?
Remember that the x86 only has 8 registers, this makes life complicated for the execution core in an x86 CPU. x86 execution cores use the same techniques as RISC CPUs but the limited number of registers will prove problematic. Consider an loop which uses 10 variables in an iteration. An x86 will need hardware assist just to perform a single iteration.
Now consider a RISC CPU which generally have in the order of 32 registers. It can work across multiple iterations simultaneously, the compiler can handle this without any hardware assist.
The Hardware assist in question is Out-Of-Order execution and the tools of this trade are called rename registers. Essentially the hardware fools the executing program into thinking there are more registers than there really are and in the example this will allow for instance an iteration to be completed without the CPU needing to go the cache for data, the data needed will be in a rename register.
OOO execution is mainly used to increase the performance of a CPU by executing multiple instructions simultaneously. If so the instructions per cycle increases and the CPU gets it’s work done faster.
However when the x86 includes this kind of hardware the 8 registers becomes a problem. In order to perform OOO execution, program flow has to be tracked ahead to find instructions which can be executed differently from their normal order without messing up the logic of the program. In x86 this means the 8 registers may need to be renamed many times and this requires complex tracking logic.
RISC wins out here again because of it’s larger number of registers. Less renaming will be necessary because of the larger number of registers so less hardware is required to do register usage tracking. The Pentium 4 has 128 rename registers, the 970 has less than half at 48 and the G4 has just 16.
Because of the sheer complexity of the x86 ISA and it’s limited number of architectural registers a RISC processor requires less hardware to do the same work.
Despite not using the highly aggressive methodologies used in the x86 CPUs, IBM have managed to match and even exceed the computing power of x86 CPUs with the PowerPC 970 – at lower power consumption. They were able to do this because of the efficiency of RISC and the inefficiency of x86 CPUs. IBM have already managed to get this processor to run at 2.5GHz and this should perform better than any x86 (with the possible exception of the Opteron).
The idea that x86 have RISC-like cores is a myth. They use the same techniques but the cores of x86 CPUs require a great deal more hardware to deal with the complexities of the original instruction set and architecture.
PowerPC And x86 Get More Bits
Both families are in the process of transitioning to 64 bit.
AMD
Opteron
Athlon 64 (due September)
IBM
PowerPC 970
The AMD Opteron adds 64 bit addressing and 64 bit registers to the x86 line. There is already some support for this CPU in linux and the BSDs, a 64 bit version of Windows is also due.
The Opteron is designed as a server CPU and as such both the CPU and motherboards cost more than for normal desktop x86 CPUs. The Athlon 64 can be expected to arrive at rather lower prices.
Despite performing better than the best existing 32 bit Athlon, the Opteron has a slower clock speed (1.8GHz Vs 2.2GHz).
AMDs x86-64 instruction set extensions give the architecture additional registers and an additional addressing mode but at the same time remove some of the older modes and instructions. This should simplify things a bit and increase performance but the compatibility with the x86 instruction set will still hold back it’s potential performance.
The PowerPC 970 is as predicted on OSNews [8] is a 64 bit PowerPC CPU based on the IBM POWER 4 design but with a smaller cache and the addition of the Altivec unit as found in the G4. It supports 32 bit software with little or no changes although some changes to the original 64bit PowerPC architecture have been made in the form of a “64 bit bridge” to ease the porting of 32 bit Operating Systems [9]. This bridge shall be removed in subsequent processors.
The hardware architecture of the 970 is similar to that of any advanced CPU however it does not have the aggressive hardware design of the x86 chips. IBM use automated design tools to do layout whereas Intel does it by hand to boost performance.
The 970 has a long pipeline however it is not run at a very high clock rate, unusually the CPU does more per clock than other long pipeline designs so the 970 is expected to perform very well.
In addition to the new architecture the 970 includes dual floating point units and a very high bandwidth bus which matches or exceeds anything in the x86 world, this will boost performance and especially boost the Altivec unit’s capabilities.
The IBM PPC 970 closes the performance difference between the PowerPC and x86 CPU without consuming x86 levels of power (estimated 20 Watts at 1.4GHz, 40W at 1.8GHz). It has been announced in Apple Power Macintosh computers for August 2003, with the pent up demand I think we can expect Mac sales to increase significantly.
Benchmarks
There has been a great deal of controversy over the benchmarks that Apple has published when it announced the new PPC 970 based G5 [10].
The figures Apple gave for the Dell PC were a great deal lower than the figures presented on the SPEC website. Many have criticised Apple for this but all they did is use a different compiler (GCC) and this gave the lower x86 results. GCC may not be the best x86 compiler but it contains a scheduler for neither the P4 or PPC 970 however it is considerably more mature on x86 than PowerPC. In fact only very recently has the PowerPC code generation began to approach the quality of x86 code generation. GCC 3.2 for instance produced incorrect code for some PowerPC applications.
However, this does lead to the question of why the SPEC scores produced by GCC are so different from those produced by Intel’s ICC compiler which it uses when submitting SPEC results. Is ICC really that much better than GCC? In a recent test [11] of x86 compilers most results turned out glaringly similar but when SEE2 is activated ICC completely floors the competition. ICC is picking up the code and auto-vectorising it for the x86 SSE2 unit, the other compilers do not have this feature so don’t get it’s benefit. I think it’s fairly safe to assume this at least in part is the reason for the difference between the SPEC scores produced by Apple and Intel.
This was a set of artificial benchmarks but does this translate into real life speed improvements? According to this comment [12] by an ICC user the auto-vectorising for the most part doesn’t make any difference as most code cannot be auto-vectorised.
In the description of the SPEC CPU2000 benchmarks the following is stated:
“These benchmarks measure the performance of the processor, memory and compiler on the tested system.”
SPEC marks are generally used to compare the performance of CPUs however the above states explicitly this is not what they are designed for, SPEC marks also also test the compiler. There are no doubt real life areas where the auto-vectorisation works but if these are only a small minority of applications, benchmarks that are effected by it become rather meaningless since they do show reliably how most applications are likely to perform.
Auto-vetorisation also work the other way, The PowerPCs Altivec unit is very powerful and benchmarks which are vectorised for it can show a G4 outperforming a P4 by up to 3 1/2.
By using GCC Apple removed the compiler from the factors effecting system speed and gave a more direct CPU to CPU comparison. This is a better comparison if you just want to compare CPUs and prevents the CPU vendor from getting inflated results due to the compiler.
x86 CPUs may use all the tricks in the book to improve performance but for the reasons I explained above they remain inefficient and are not as fast as you may think or as benchmarks appear to indicate. I’m not the only one to hold such an opinion:
“Intel’s chips perform disproportionately well on SPEC’s tests because Intel has optimised its compiler for such tests”[13]* – Peter Glaskowsky, editor-in-chief of Microprocessor Report.
I note that the term “chips” is used, I wonder does the same apply to the Itanium? This architecture is also highly sensitive to the compiler and this author has read (on more than one occasion) from Itanium users that it’s performance is not what the benchmarks suggest.
If SPEC marks are to a useful measure of CPU performance they should use the same compiler, an open source compiler is ideal for this as any optimisations added for one CPU will be in the source code and can thus be added to the other CPUs also keeping things rather more balanced.
People accuse Apple of fudging their benchmarks, but everybody in the industry does it – and SPEC marks are certainly not immune, it’s called marketing.
Personally I liked the following comment from Slashdot which pretty much sums the situation up:
“The only benchmarks that matter is my impression of the system while using the apps I use. Everything else is opinion.” – FooGoo
The Future
x86 has the advantage of a massive market place and the domination of Microsoft. There is plenty of low cost hardware and tons of software to run on it, the same cannot be said for any other CPU architecture.
RISC may be technically better but it is held in a niche by market forces which prefer the lower cost and plentiful software for x86. Market forces do not work on technical grounds and rarely chose the best solution.
Could that be about to change? There are changes afoot and these could have an unpredictable effect on the market:
1) Corporate adoption of Linux
Microsoft is now facing competition from Linux and unlike Windows it is not locked into x86. Linux runs across many different architectures if you need more power or low heat / noise you can run Linux on systems which have those features. If you are adopting Linux you are no longer locked into x86.
2) Market saturation
The computer age as we know it is at an end. The massive growth of the computer market is ending as the market is reaching saturation. Companies wishing to sell more computers will need to find reasons for people to upgrade, unfortunately these reasons are beginning to run out.
3) No more need for speed
Computers are now so fast it’s getting difficult to tell the difference between CPUs even if their clock speeds are a GHz apart. What’s the point of upgrading your computer if you’re not going to notice any difference?
How many people really need a computer that’s even over 1GHz? If your computer feels slow at that speed it’s because the OS has not been optimised for responsiveness, it’s not the fault of the CPU – just ask anyone using BeOS or MorphOS.
There have of course always been people who can use as much power as they can get their hands on but their numbers are small and getting smaller. Notably Apple’s software division has invested in exactly these sorts of applications.
4) Heat problems
What is going to be a hurdle for x86 systems is heat. x86 CPUs already get hot and require considerable cooling but this is getting worse and eventually it will hit a wall. A report by the publishers of Microprocessor Report indicated that Intel is expected to start hitting the heat wall in 2004.
x86 CPUs generate a great deal of heat because they are pushed to give maximum performance but because of their inefficient instruction set this takes a lot of energy.
In order to compete with one another AMD and Intel will need to keep upping their clock rates and running their chips at the limit, their chips are going to get hotter and hotter.
You may not think heat is important but once you put a number of computers together heat becomes a real problem as does the cost of electricity. The x86’s cost advantage becomes irrelevant when the cooling system costs many times the cost of the computers.
RISC CPUs like the 970 are at a distinct advantage here as they give competitive performance at significantly lower power consumption, they don’t need to be pushed to their limit to perform. Once they get a die shrink into the next process generation power consumption for the existing performance will go down. This strategy looks set to continue in the next generation POWER5.
The POWER5 (of which there will be a “consumer version”) will include Simultaneous Multi-Threading which effectively doubles the performance of the processor unlike Intel’s Hyper Threading which only boosted the performance by 20% (although this looks set to improve). IBM are also adding hardware acceleration of common functions such as communications and virtual memory acceleration onto the CPU. Despite these the number of transistors is not expected to grow by any significant measure so both manufacturing cost and heat dissipation will go down.
Conclusion
x86 is not what it’s sold as. x86 benchmarks very well but benchmarks can and are twisted to the advantage of the manufacturer. RISC still has an advantage as the RISC cores present in x86 CPUs are only a marketing myth. An instruction converter cannot remove the inherent complexity present in the x86 instruction set and consequently x86 is large and inefficient and is going to remain so. x86 is still outgunned at the high end and perhaps surprisingly also at the low end – you can’t make an x86 fast and run cool. There is a lot of marketing goes into x86 and the market -technical people included- just lap it up.
x86 has the desktop market and there are many large companies who depend on it. Indeed it has been speculated that inefficient or not, the market momentum of x86 is such that even Intel, it’s creator may not be able to drag us away from it [14]. The volume of x86 production makes them very low cost and the amount of software available goes without saying. Microsoft and Intel’s domination of the PC world has meant no RISC CPU has ever had success in this market aside from the PowerPCs in Apple systems and their market share is hardly huge.
In the high end markets, RISC CPUs from HP, SGI, IBM and Sun still dominate. x86 has never been able to reach these performance levels even though they are sometimes a process generation or two ahead.
RISC vendors will always be able to make a faster, smaller CPUs. Intel however can make many more CPUs for less.
x86 CPUs have been getting faster and faster for the last few years, threatening even the server vendors. HP and SGI may have given up but IBM has POWER5 and POWER6 on the way and Sun is set to launch CPUs which handle up to 32 threads. Looks like the server vendors are fighting back.
Things are changing, Linux and other Operating Systems are becoming increasingly popular and these are not locked into x86 or any other platform. x86 is running into problems and PowerPC looks like it is going to increasingly become a real, valid alternative to x86 CPUs both matching and exceeding the performance without the increasingly important power consumption or heat issues.
Notes:
Both Amdahl’s Law (of diminishing returns) and Moore’s Law date from around the same time but notably we hear a great deal more about Moore’s law. Moore’s Law describes how things are getting better, Amdahl’s Law says why it’s not. There is a difference however: Moore’s Law was an observation, Amdahl’s Law is a Law.
References:
[1] John Cocke, inventor of RISC (obituary)
http://www.guardian.co.uk/Print/0,3858,4469781,00.html
[2] SPEC benchmark results
http://www.spec.org/cpu2000/results/
[3] Amdahl’s Law Simplified – Richard Wiggins
http://www.ucalgary.ca/library/access97/wiggins/tsld027.htm
[4] Speed differences in different languages
http://www.kuro5hin.org/story/2002/6/25/122237/078
[5] Coding competition shows humans are better than compilers
http://www.realworldtech.com/page.cfm?ArticleID=RWT041603000942
[6] Combined CPU Benchmarks
http://www.cpuscorecard.com/
[7] C3 V’s Celeron benchmarks
http://www.digit-life.com/articles2/roundupmobo/via-c3-nehemiah.html
[8] Speculation on the PowerPC G5
http://www.osnews.com/story.php?news_id=1357
[9] Details of the 64bit bridge can be found in the Software Reference Manual.
http://www-3.ibm.com/chips/techlib/techlib.nsf/products/PowerPC_970_Microprocessor
[10] Apples G5 benchmarks
http://www.apple.com/powermac/performance/
[11] ICCs optimisations can greatly effect performance
http://www.aceshardware.com/read_news.jsp?id=75000387
[12] But [11] does not appear to continue into real life code
http://www.osnews.com/comment.php?news_id=3931&limit=no#117135
[13]* Article on G5 benchmarks
http://news.zdnet.co.uk/story/0,,t271-s2136537,00.html
*I do not know if this is an exact quote.
[14] Escape from planet x86 – Paul DeMone
http://www.realworldtech.com/page.cfm?ArticleID=RWT060503232439
Further Reading
Article covering the differences between RISC and CISC
http://www.realworldtech.com/page.cfm?articleid=RWT021300000000
Article on PowerPC 970
http://arstechnica.com/cpu/03q1/ppc970/ppc970-0.html
About the Author:
Nicholas Blachford has been interested in CPUs for many years and has written on the subject for OSNews before. He works for Genesi who produce the Pegasos G3 / G4 PowerPC based motherboard and the MorphOS Operating System.
Excellent article, this is what osnews needs, not the blaring fanatical opinions of ogres and trolls, just simple, factual text.
Rather than compare which processor scores better on which benchmarks, it’s good to read about the actual architectures of the different processors. That’s the good stuff.
Good man!
I just finished a 600-level class in computer architecture and that was over my head. What a great article–truly an exemplar for other submitters!
Another article from for the armchair computer enthusist that has no real information. Sorry, but you either need to explain more about what you are talking about or use more technical verbage to make the article a higher level. You assume that the reader knows about registers, but don’t talk about IPC (using that term). Your article also makes bold claims that are unfounded and unsupported within your article or by your references.
SMT will be twice as good as HT? Where is your reference?
Also at the end of your article you predict (as many others have in the past) that Intel will hit the ‘heat wall’ and that the future looks bright for RISC technologies but BAD for CISC. I can’t wait to see you proven wrong again as CISC technologys continue to work as well or better then RISC. For all of RISCs ‘advantages’ (many of which you state in this article) CISC still seems to come out on top.
This is how far I read the article until noticing a grave mistake: The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU.
If you don’t know shit about different CPU architectures, why do you feel need to write about them?
For those interested in RISC vs CISC (or especially why there is no RISC vs CISC any more), I highly recommend these articles at Ars-Technica:
http://arstechnica.com/cpu/index.html
I particularly liked the reference to changing programmers.
Yes, that has been known to result in faster programs!
This is how far I read the article until noticing a grave mistake: The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU.
If you don’t know shit about different CPU architectures, why do you feel need to write about them?
“The Intel 8086, a new microcomputer, extends the midrange 8080 family into the 16-bit arena.”
Intel Corporation, February, 1979
SMT will be twice as good as HT? Where is your reference?
IBM & DEC
Both IBM and the Alpha team announced the addition of Multithreading support was expected to give a 100% boost in performance.
When HT arrived it gave only 20% – 30%, I believe it is to be enhanced soon.
Also at the end of your article you predict (as many others have in the past) that Intel will hit the ‘heat wall’
Microprocessor Report said that, not me.
This is how far I read the article until noticing a grave mistake: The x86 family of CPUs began life in 1978 as the 8086, an extension to the 8 bit 8080 CPU.
If you don’t know shit about different CPU architectures, why do you feel need to write about them?
If you don’t know shit about different CPU architectures, why POST about them?
DUMBASS
Kudos to you, Nicholas, on a well written article. It was a well researched and factual piece, a welcome change from the opinion articles that have been becoming more common here. (Not that that is a bad thing, but it’s nice to have a change once in awhile.)
instead of insults, could you provide pointers contradicting the article, and specifically that grave mistake you’ve noticed ?
I second! Thanks for write this article, Nicholas Blachford!
I believe there is a great potential future for the PPC platform. Compared to x86 CPUs it has a generally cleaner design, efficient power consumption and very well performing vector units.
With PPC platform moving towards a solid 64-bit architecture and good multi-processing capabilities, IBM may well have a winner (hopefully it won’t ake long until ther are good 64-bit OS solutions available). Sadly consumers aren’t well informed about the MHz myth despite Apple’s efforts. Many people still buy a Celeron mainly for the higher clockrate (instead of performance). Most people simply don’t undestand that a 50 MHz 68030 isn’t twice as fast as a 25 MHz 68040, but that it’s rather the other way around. IMO emphasizing clockrates only misleads the general consumer and 3rd party SPEC stats would be a far more reliable source, but of course still not ideal. Hardware companies will most likely continue to try to use tricks to mislead benchmarking software and so to artificially produce higher figures or will do everything dispute results when they are not in their favour and further confusing the general public.
Anyway thanks to Nicholas for his IMO well researched article.
Really great article. We do need more of these on OS News!
First off, HT (Hyper Threading) is a form of SMT (Simultaneous MultiThreading), so stop all this nonsense of HT vs SMT!
Second neither IBM nor DEC (with their EV8) ever claimed 100% performance increase of a SMT implementationg vs. non SMT of the same architecture.
I also have a few bones to pick with the author, since he makes a lot of false claims, for example:
“The amount of voltage the CPU can use restricts the power available and this effects the speed the clock can run at, x86 CPUs use relatively high voltages to allow higher clock rates, ”
This statement is so wrong, that I do not where to begin with the nitpicking! phew…
Not sure I by his ideas about this and don’t think the references where good enough to prove it.
Perhaps it would have been better if it where a comparison of the two architectures and not a drift off into a poorly formated discussion of many chip architectures.
It is like trying to decide what is the best engine design for any application. If their was a best for “any” it surely would not be the best for each specific application.
It also does not consider the glue. So much of how well a computer works depends on the task and the parts surrounding the cpu and the tools to do the work. There are many examples of crappy cpus being very effective because the surround kit and code solve the problem better.
Process technology and price are important when you talk about the desktop market. But so are the artificial benchmarks.
I also question how much linux is really cross platfrom. Having used both the Itanium and the Alpha versions it become pretty clear that it is a x86 os with ports that less then optomized and stable.
Compliants asside it is to bad they put altvev in and kill the double percision mult-add instructions. For spec this counts for 2 ops. So if you have to fpu that can do double percision multadd you get 4 flops per clock. Power4 and Itanium both have this and it is how the win the Flop performance benchmarks and marketeering.
If they had left it in the ppc 970 would have been the Flop lead above the power 4 and everyone else. IBM probably did not want that… But apple would have gotten a lot of HPTC customers. Again IBM would not like that. Instead they have a thing with a poorly optimized compiler that does low percision floating point ok. Probably should have gotten IBM to have ported all the compilers to OSX at the same time. That would have helped too.
“Most people simply don’t undestand that a 50 MHz 68030 isn’t twice as fast as a 25 MHz 68040”
Well, the 040 was “double clocked” internally, i.e. the 25MHz 040 was indeed running at 50MHz internally (much like the R4000 for example)
This is a great article, thanks.
True he may havg gotten it wrong – the x86 architecture actually goes all the way back to the 4004. The truth is when the PC choose the 8088, it was already somewhat handicapped by it’s ties to the past.
good article. Good references.
I am an armchair computer enthusist. This is the first time I have read anything that even remotly understood the differences in the two different cpus. I may not have understood everything but I got enough.
I just read my own sentence and saw I made a mistake in my post. What I mean is that this is the frist time I have read something that explained the differnces in a way that made some sence to me. Most of the time is is more like listening to two preaches going at it over thier own particular beliefs.
Thanks for the informative article. A very concise summary of a difficult topic.
Also at the end of your article you predict (as many others have in the past) that Intel will hit the ‘heat wall’
Microprocessor Report said that, not me.
Problem is that they assume that Intel will not change some aspect of their technology. Yes, Intel will have to change some part of their physical design or logical layout in order to compensate for the laws of physics. However, that doesn’t mean that x86 code can not scale to higher and higher speeds. Just that the way it is executed will have to change. As it has already done multiple times over the x86 lifetime.
Both IBM and the Alpha team announced the addition of Multithreading support was expected to give a 100% boost in performance.
When HT arrived it gave only 20% – 30%, I believe it is to be enhanced soon.
http://meseec.ce.rit.edu/eecc722-fall2002/722-9-16-2002.pdf
This class lecture ( brought to you via google with “SMT DEC alpha speedup”) proves that yes, in certain cases you can get a 100% speedup using SMT. Problem is that not all applications are going to be able to achieve that speedup (YMMV) and will have to be recoded (or at least recompiled) for SMT, as it requires the code to be processor aware.
After having finishing the article, it does seem to miss some points, but still overall, the article is good and one of the better reads I’ve had on Osnews in a long, long time.
Thinking about the x86 strategy in terms of marketing is a pure wonder–however, if Intel had actually focused on creating a better architecture rather than one that had many parameters to tweak such as mhz, cache size, bus speed, hyperthreading, etc where some marketing guru could overstate again and again, where would we be today?
Don’t get me wrong, the x86 is a true piece of engineering excellence, taking something that’s not that great and inefficient and making it good enough to satisfy the current user base to fanatical points where they berate powerpc users on a common basis. But what if intel was less marketing driven, could they have come up with something better than x86. I guess that’s where the alpha and epic architectures fall in. Makes me wonder about buying anything x86 in the future (i.e. x86-64).
Somebody please do an architecture overview on madison.
They are neither binary nor source compatible. Their interrupt, memory segmentation/banking, I/O modes are completly unrelated. 8080 is completly hand coded while 8086 uses microcode. 8080 has no complex instructions, 8086 has plenty of them. 8086 is way more complicated than 8080 with 16 bit additions (8080 can use 16 bit adressing with BC, DE or HL register couples, btw.)
IOW, 8086’s only relation to 8080 is that both were designed and produced by intel. That is it. Intel might have said 8086 extends their midrange to 16 bit, which was established by 8080 but technically, they are completly unrelated CPUs even designed by different people (original 8080 designers left to found Zilog) and philosophy.
Who is dumbass now?
>Who is dumbass now?
I suggest you all calm down in the way you talk over here, or I will mod all the rude comments down.
B,C,D,E,H and L are register names used in Z80 version of the 8080 asm. I can’t recall the original, 8080 names right now.
because it was making my head hurt. Can you say proofreading? spellcheck? A second grader writes with better grammar. Perhaps you need to put the pipe down a bit sooner before writing your next article.
Let me suggest looking two words up in the dictionary: effect and affect.
I have to agree that this guy doesn’t really know what he is talking about. He seems to start with a conclusion and then look for ways of justifying it. The ArsTechnica article is VERY good, but it pretty much requires extensive knowledge of computer/processor architecture.
The author seems to enjoy making broad statements without providing real proof. The Power5 SMT vs. Pentium4 HT is particularly blatent (though I have no doubt that Power5 SMT will provide more improvement than Pentium4 SMT, I doubt it will double performance and even then it will only improve parallel stuff – much more important for servers than desktops).
The benchmarking section was also given a cursory treatment. He uses an OSNews post as justification for throwing out ICC results in favor of GCC even though the post doesn’t even address that. The part of the post the author refers to correctly points out that SPEC FP performance is NOT indicative of overall system performance because most applications use mainly integer code. This does not invalidate the ICC SPEC FP results or justify Apple’s use of GCC. I have read in other locations (Ace’s Hardware forums) that ICC does drastically improve performance on real-world floating point intensive code.
I would dismiss this article as blatant fanboyism, but the author seems to believe everything he wrote. Guess he stepped too close to the reality distortion field. Please, throw out this article and look elsewhere (Ace’s Hardware and ArsTechnica are both VERY good sites for this type of stuff).
He builds powerpc systems. I wonder what his conclusions will be….hmmm..
Glad to see this, good job Nicholas.
Now the world may end, Bouma and I actually agreed on something. 8)
Anyway, I think this article is best viewed as a brief summary (and a pretty decent one), not as proof of any hypothesis or as an argument.
There was one statement that stood out as being particularly wacky to me, though: The comparison of programming language, programmer, and CPU in their relative importance for the resulting execution.
That’s like asking: What’s more important for a good driving experience, the steering wheel, the pedals, or the engine?
The obvious answer: All are critical for good performance, and a deficiency in any can bring down the whole system.
While i liked the article, and i’m a die-hard anti-x86 guy, I have problems taking the article as a whole very seriously. I liked what he had to say, and i dont think he made many SERIOUS mistakes, but it was obvious from the beginning that he was going to make this a PPC 0wNz x86 article.
I liked the bit about the Alpha outgunning the P4 though, god i want an EV7 box!
Re: Roy
Yes, the Ars articles actually provide content intead of fanboyism. They are well researched and good reads.
The author seems to enjoy making broad statements without providing real proof. The Power5 SMT vs. Pentium4 HT is particularly blatent (though I have no doubt that Power5 SMT will provide more improvement than Pentium4 SMT, I doubt it will double performance and even then it will only improve parallel stuff – much more important for servers than desktops).
IMO, SMT will not speed up servers (file, web, DB, etc) that much. The tasks that most ‘servers’ do is more single threaded and not prone to parraleziation in the same way that will reap the benefits of SMT. Not to say that increased resource sharing that SMT allows wills not be goot, but the 100% speedup (or more) that is possiable with SMT in certain applications will not be achieved. Video games have more potential for improvement via parrallel algorithims (graphics rendering can be highly parrallized) (sorry about the spelling).
Web serving doesn’t require much CPU and the output can’t normally be generated in parrallel for example. Even dynamic content can’t be generated in parrallel, most of the time Google is the shining example of a parrallel algorithim though, while BBS systems like this page here really can’t be generated in parrallel as you can’t generate a part of the page before another, at the end of the day it’s a long string of html. Also don’t forget that dynamic content mostly uses integer math, and the floating point units are left doing nothing.
well actually this article tells the reader, little if anything about the PPC. its all about the x86.
as for ibm/alpha 100% SMT increase vs intels 30%.
i gotta laugh sooo hard here. to have a 100% increase, that would indicate the ppc architecture is so far below the x86 for parralellisng instructions and filling the pipe with uops that its beyond a joke. i’d bet realworld performace would be in similar ballpark as to intels 30%.
x86 is built on a 1979 legacy, ppc 1993, so much has been learned about processor design from 79<>93. so i would expect ppc to best x86 for its a new clean design.
so too would i expect ia64 to best ppc, since its again a new clean design with no cruft, postdating ppc.
and so on.
Yeah, the Alpha rocks. Too bad it is now a dead end design. DEC just didn’t know how to market it and Compaq didn’t care about it. The Alpha (especially 21064 – is this right?) really was the furthest evolution of RISC.
The Power5 SMT vs. Pentium4 HT is particularly blatent (though I have no doubt that Power5 SMT will provide more improvement than Pentium4 SMT, I doubt it will double performance and even then it will only improve parallel stuff – much more important for servers than desktops).
HyperThreading is a hack designed to utilize execution units of the P4 which sit idle as a tainted trace cache is cleared and its pipeline is repopulated following a mispredicted branch. If you work the numbers on the Pentium 4, you’ll find that the percentage of time its execution units sit idle is approximately equal to the percentage of branch instructions in the code it is executing.
SMT in the Power5, on the other hand, is designed to leverage the full power of a dual core processor by allowing the pipelines to pick and choose which execution units to send decoded instructions to, with the assumption that the entire pool of execution units on both cores won’t be completely used at a given time when they are being fed by only two pipelines.
Why is it that whenever we get an article on this site that praises PPC, Apple, Mac OS etc., that there are several which respond saying that its just a mac fanboy article… It seems that the signal to noise ration on these boards gets worse by the day.
The most accurate and sincere attempt to lay out the facts in an environment that is filled with so much fear uncertainty and doubt. I’ve been following the x86 processor family since the first PC was released. My first PC was a Radio Shack system, I learned to program using an IBM PC using BSD Fortran 77, then Pascal and of course I taught myself basic as well. I bought a 1984 Mac and marveled at the 68000 processor. I marveled at the 80286 and remember my excitement when I got one of the first 20 Mhz PC’s, it made my dBase code smoke. I also had the privilege of working on the S/38 which eventually became the AS/400 and I marvelled as IBM converted it over to the Power platform. The 80486 was most excellent with its virtual 86 capability and of course I was thrilled when Intel finally got the Pentium done right with the Pentium III. I was disappointed with the Pentium 4 and still am because I felt that Intel sold themselves to the marketing side. We all know that the Pentium 4 was a bad deal compared to the Pentium III till it broke 2Ghz, AMD taught Intel a lesson for that blunder and took a major chunk of their marketshare with what now is the Athlon. The Itanium is also a big disappointment, and it appears that Opteron and Athlon 64 will once again get more attention that Intel. I believe this is because Intel’s engineers strayed from their discipline when the compromised on the Pentium 4 and it has been a long road back to excellence.
In the meantime, Intel left the market wide open for IBM and their 970 processor is just amazing, it truly is one of the most exciting developments I have seen for some time in the desktop world. To think that we have a processor that is a superset of the Power4 core and even faster, makes me excited. I was also blown away with the G5’s architecture, it really is a new generation of machine and not an incremental change.
I’m not surprised really that Intel’s ICC compiler vectorizes Spec’s FP intended instructions. It really is rigging to the nth degree. And, I am not surprised that journalists in general do not do their due diligence. But, you are starting to restore my trust that there are still those out there who are willing to do some research before writing an article. And, congrats to the OSnews eidtorial staff to have the courage to publish it. Great writing and looking forward to reading more from you.
Great article; I’d like to see more like it. In particular I’d like to know more about the less-common processors, and their operating systems and software.
Obviously readers can find flaws with anything; that’s what the “Comments” area is for after all. But there’s no reason to be rude.
Best Wishes,
Bob
The Alpha (especially 21064 – is this right?) really was the furthest evolution of RISC
the 21064 was a REAL dog. but saying that it was the first generation (EV4).
Man!
You Wintel guys have ZERO credibility anytime you let your blatant fanboyism for an inferior system get the best of you.
The article was accurate for the level of depth it put forward. Sorry if it doesn’t jive with your revisionist methods of viewing the history of personal computers.
Along with Ars Technica columns, be sure to check out David K. Every’s articles on the same subjects… http://www.igeek.com/articles/Hardware/Processors/
He’s great at explaining how things work and why and which are better suited for specific applications.
By the way, there is no such thing as an unbiased opinion.
I guess that you get what you pay for, as this website is free, you can’t really expect much from it. The comments have more true information then this article. Sigh.
This is the kind of Article I would love to read on OSNews all the time. Well written, referenced, and professionally done. Not like a lot of the fetid tripe that dares call itself a “review” that gets posted here (Eugenia’s articles excluded of course).
i’ve allways liked 3dnow! better because it can do two operations per clock, because games don’t need 128bit vectors, and like Tim Sweeney says: “Since register-memory instructions are as fast as register-register
instructions, I don’t usually need to use more than 4 registers”
😛
I agree that it is not fair to compare the 970 to the P4 or even the XEON. Intel simply does not have a modern processor to compare against the more advanced design of the 970. The real comparisons will happen against the Athlon 64. The way I see it the categories of comparison look like this.
High-End Server:
Opteron vs. Itanium vs Power4
High End Desktop/Workstations/SMB Servers:
Athlon 64 vs 970 (G5) vs ? (nothing from Intel yet)
Legacy Desktops, legacy servers, current notebooks:
Pentium III vs. G3 vs. P4 vs. PM vs. Athlon vs. Xeon vs. Athlon MP
This is a generational shift and right now only the Athlon 64 and 970 are in play for the next generation desktop.
Nicholas Blachford,
Good article. I enjoyed reading it.
– Mark
Yup, you are probably right. My main point was that the guy was pulling numbers out of his arse. Like I said in my original statement…
“though I have no doubt that Power5 SMT will provide more improvement than Pentium4 SMT”
The 100% improvement just sounded inflated to me. There are certainly cases where SMT will provide large performance increases, but we aren’t talking about a 100% improvement in most cases.
RE SteveToth: Yup, you are right too. I never meant web-servers, but it looks like SMT/HT helps more for heavy computation tasks (scientific, multimedia editing, games possibly someday). Servers (web/database) in general are more I/O bound.
From a March 2003 article, it appears Apple beat even his optimistic forecast.
http://www.igeek.com/articles/Hardware/Processors/x86-64vPPC-64.txt
If you care about 64 bit, we’re probably going to see it significantly effecting the Mac market around 2004-2005, and in the PC market around 2008-2009. Not because of technological issues (though there are some of those), but mostly business and market issues. I’m a technology guy, I wish the technology was all there was. (Technology is much more clean and pure than politics and business markets). But if you don’t understand business and markets in this industry, then you don’t understand jack.
PC advocates will talk about how they had 64 bit first; but ignore that they did it poorly, it doesn’t effect much, and will take forever to actually gain momentum. Mac users will likely be seeing any benefits from 64 bit computing, far sooner. In fact, the most likely way that you’ll see 64 bit x86 adoption is if it comes from Apple in the form of OS X ported for AMD.
Don’t get me wrong; I hope I’m wrong for the PC markets sake. I have no problems with AMD, and I like their x86-64 implementation. It would be great if this summer AMD was ruled the winner and the entire PC market adapted x86-64, and Intel licensed it, and there was no more war or world hunger, and dogs and cats could live together in peace; but I just don’t see that happening.
Many of you talk about CISC pulling ahead of RISC, but many of you forget that Intel had to basically make their processors RISC-like to compete. These days RISC is more like CISC and CISC is more like RISC, we confuse the too a lot. Take a look at the instruction set of a PowerPC processor compared to that of a X86 processor and tell me that I’m wrong.
Right, cause the only people that make processors are IBM, Motorola, Intel and AMD.
It would be great if this summer AMD was ruled the winner and the entire PC market adapted x86-64, and Intel licensed it,
FYI, intel doesn’t have to licence anything to use x86-64. They can just go ahead an implement it and lose nothing but face. That kind of x86 extensions are already covered by an old cross licensing agreement between Intel and AMD.
The early generations of Alpha really took the RISC principles (read KISS) to the extreme. Maybe I’m thinking of the 21164 rather than the 21064, but I know that later generations (21364 and possibly 21264) started using things like instruction reordering that are less in line with the principles of RISC. The 21264 and 21364 were great designs, but they weren’t as “pure” RISC as the earlier generations. EPIC takes the RISC ideas of letting the compiler do the work one step further (though I haven’t seen any evidence that this is paying off yet).
simultaneous multi-threading (SMT) is designed to convert threading to instruction level parallelism (ILP). That is its main purpose. It does very little good on Processors that have a low degree of parallelism and whose OS’s and their development frameworks do not promote asynchronous processing. Windows and COM+ are not very well threaded. Though the COM+ environment allowed better threading, it was difficult to program in the unmanaged VS 6 environment. Only with .NET has Microsoft started to emphasize delegating of threads and asynchronous programming, but it is a very large framework and will take a couple of more years to mature.
This is not the case with OS X which is a highly threaded Unix based OS and the Cocao framework is very mature being in development since NextStep in the late 80’s as a truly Object Oriented Smalltalk type environment. This coupled with the fact that both IBM and Apple have a long history of developing for multiprocessing systems; as well as providing a highly parallel processor in the 970 and future 980 designs clearly shows that it is not only possible but more than likely that many operations will achieve close to 100% performance increase in IBM’s implementation of SMT. SMT thrives on ILP and P4 greatly lacks ILP. That’s just a fact, it is not meant as a personal insult so get your emotions out of it already. Intel may be forced to step up to the plate with a competing design, and wouldn’t that be a good thing? If I were you I would be promoting competition, its healthy and will benefit the Intel Zealots in the end as well.
By the way databases and transaction based systems thrive on multi-threading. It’s games that currently prefer single-threading, but that is changing as well, take a look at Quake on an SMP Mac, it rocks.
As Nicholas pointed out in this article CISC-commands are hard to decode, they are more complex, have different length … But this also means that a CISC command carries more information from the memory to the processor than a RISC command. Nicholas also stated that the bottleneck is the processor <-> memory connection. So you can regard CISC commandos as a kind of compression algorithm, so more information can be transported to the CPU, which has time to decode this information into something it can handle optimal.
I can’t provide you a link, but IBM thinks about integrating a GZIP-unit at memory controller and at processor for its zSeries, so the data are compressed before transfer.
Anton
Unless you have completely closed your eyes, OSX 10.2 added the GPU as another processor to offload some of its OS duties for GUO in the form of Quartz Extreme. As it was OpenGL was already hardware accelerated, but Quartz Extreme allows all the compositing to take place in the GPU freeing up the CPU. In addition, since the Altivec unit is truly orthogonal, Shadows and Pattern fills along with 6 other desktop drawing functions are being handled by the vector unit while the rest of the processor core was free to do what it needed.
Now, with the release of Panther, Apple has added Windowing and scrolling to Quartz Extreme as well. It’s never been faster or smoother and the CPU is even more free to handle the actual Apps. Panther will greatly benefit all Mac’s with G4’s and up. G3 Macs will also benefit from highly optimized Scalar libraries that now outperform the very well and time tested OS 9 libraries (Jaguar had previously achieved parity.) It should be a fun winter for hobbyists.
Intel pushed the 8088 as the “next” 8080 while the Z-80 was Zilog (loaded with former Intel engineers) vision of what the next 8080 processor should have been. The 8086 was just an 8088 with a 16 bit data bus. I do not know if the binary’s were compatible, and I know the mneumonics were extended, but the idea was to be able to use your 8088/8086. If there were such drasic differances, then it would be my guess that Itel missed the mark. But from what I have seen, the 80486 and 8080 appeared very similar at the assembly level (Sorry, I have not done much intel assembly to have a real feel for it). I only encountered the 8080 codd because we used a C compiler that generated 8080 assembly to run on our Z-80’s 15 years ago.
But the bootom line is that Intel intended the 8088/8086 to be a 16 bit extension of their 8 bit 8080 which came from the 8008 that owed it’s start to the 4-bit 4004 processor used in early calculators.
They transistioned to micro-coded architecture with everyone else in the 80’s. But the architecture was not improved by it. It just allowed the architectuer to be extended one more time. The old Single accumilator design persists even in the P4. The x86 line is a 1970’s arcitecture that has been tweaked into the future. The PPC is a 1990’s architecture that is near the beginning of it’s life. Designed from day 1 as 64 bits (The 32 bit processors are implimented as a 64 bit processor with the extra bits removed). The x86 is an 8 bit arcitecture that has been extended to now, so the A register was extended to 16 bits by renaming the A as AL and adding an AH, then when they went to 32 bits, they attached another 16 bits and call it AX.
You still have to shuffle the registers so that all math involves the AX register. You still have All the segment register nonsense to maintain compatability with the 80186/80286 attempts at 32 bit operation. With the 80386 they added flat 32 bit memory.
Do not get me wrong – Intel has done a wonderful job at keeping the platform going – I have been declaring it dead since the 80286 came out. But they keep tweaking the speeds up. But the Itainium is their concession to the eventual death of the architecture. AMD seems to want to keep it alive by broviding for effiecent operation of 32 bit code.
Bottom line – The x86 is like an old 60’s muscle car. They have tweaked the engine so that it has the spead of a sleak new Porsche … But the Porsche does it with an engine that is half the size and double the gas mileage.
The x86 is bigger requires twice the clock speed, generates 4 times the heat do do the same amount of work as the PPC. They may be about the same speed, but the PPC has a lot more room to grow.
Dude — an apostrophe does not mean “watch out, here comes an ‘s’ !!” Posessive pronouns, “its, hers, yours,” do not have apostrophes. Use apostrophes when you are using a contraction, for instance “it’s” means “it is” and the apostrophe stands for the (space and) vowell. And I wanted to read the article because I’m a big-time architecture geek and couldn’t because of all these trivial errors — whaaaaa! 🙁 Factually, you seem to understand x86 about as well as Hannibal over at Ars understands PPC so this might make a good companion piece but again I can’t tell because of the frustration at de-skewing the apostrophe catastrophe — whaaaa! 🙁
I wonder why some basic features were not covered like Out-of-Order execution and Branch Prediction which seems to be the major items commonly found on current IA processors.
What about Pipelining, any ideas on that one?
Interested, I checked out the website of MorphOS, in a paper about MorphOS “in Detail” it said the below. I think this would have been a big point in the article but it was not mentioned. Is it true and how does it work that it is 10x faster? And, more importantly, is that fast enough to provide a speedy OS?!
Thanks for the good article.
Microkernel Vs Macro Kernel
A common problem encountered in the development of microkernel Operating Systems is speed. This is due to the CPU having to context switch back and forth between the kernel and user processes, context switching is expensive in terms of computing power. The consequence of this has been that many Operating Systems have switched from their original microkernel roots and become closer to a macrokernel by moving functionality into the kernel, i.e. Microsoft moved graphics into the Windows NT kernel, Be moved networking inside, Linux began as a macrokernel so includes everything. This technique provides a speed boost but at the cost of stability and security since different kernel tasks can potentially overwrite one another’s memory.
Given the above, one might wonder why Q can be based on a microkernel (strictly speaking it’s only “microkernel like”) and still expected to perform well. The answer to this lies in the fact that MorphOS runs on PowerPC and not x86 CPUs. It is a problem with the x86 architecture that causes context switches to be computationally expensive. Context switching on the PowerPC is in the region of 10 times faster, similar in speed to a subroutine call. This means PowerPC Operating Systems can use a microkernel architecture with all it’s advantages yet without the cost of slow context switches. There are no plans for an x86 version of MorphOS, if this changes there will no doubt be internal changes to accommodate the different processor architecture.
Why the heavy-handed treatment of the author by some here? If you disagree, do so in a reasonable manner.
Eugenia, you shouldn’t even have to warn people about their tone and language with an article like this.
Lately it seems like Windows fans are worse than Mac fans in their worship!
A good article for the entry-level (me). I’d like to see a lot more of these informational articles.
A personal note: I come here a lot less these days. Why? The level of discourse on these boards has gone to hell. Most of you sound like second-graders, and that’s being really generous of me.
This article isn’t all that it looks to be. Check his sources, some are just a step above marketing speak. He has very few hard facts, and mostly opinions. You should really check out Arstechnica and Aceshardware, as others have suggested, if you want the real story. They have much more in-depth analyses with real facts, and even benchmarks to test cerain subsystems to make sure they are right. Mr. Blachford attempts to speak with authority, yet he just doesn’t seem credible, especailly compared to all the better sources out there. An OSnews comment is hardly an authoritative source. This is more like a college freshman’s lab report.
That doesn’t not mean that he is totally wrong. He is quite right that the x86 is highly inefficient, and should probbably have died years ago, but it keeps getting more complex and faster.
Regarding ICC, yes it is somewhat biased, however it really can auto-vectorize code, which means that its benchmarks have much highler believability than Apple’s old photoshop tests with the hand optimized assembly. I’m not saying that he’s completely wrong, just that ICC CAN be that fast in real applications, and doesn’t require hand coding assembly.
He seems blatantly biased towards the G3-G5 cpu’s, but just because he’s biased doesn’t mean he’s wrong. They are highly efficient, and low power cpu’s. The P4 really does seem more market driven than engineering driven.
Power consumption is a very complex field. There are more than one or two facts which describe why a processor consumes more or less power. Nicholas writes that Intel uses high speed transistors which consume more power. This is true, that faster transistors can waste more energy. First, leakage current is higher and second, you have to overload the base of the transistor by using higher voltage for make it switch faster (oversaturation). But on the other side for reaching higher clock rate you can make transistors smaller, you can reduce your voltage, because a smaller transistor needs less electrons inserted into his base area for reaching saturation.
A great power consumer is the clock tree. Alphas are very power hungry due to their clock tree which is a mesh with a very high capacitance. So to make the clock tree switching fast, a lot of power has to be pumped. I don’t exactly know which clocking structure uses Intel on his chips.
All I want to say is, that the reasons why x86s are power hungry have to be more diversificate than just the fact that Intel probably uses high speed transistors.
Greetings from Anton
The article was okay, but still somewhat biased, especially in concluding that RISC processors have always been faster. In my experience in the past 10 years comparing scientific programs to different architectures, especially suns and hps I’ve always consistently seen average desktop x86 machines being able handle more than 2x the throughput than cutting edge risc boxes more than 20x more expensive.
It all comes down to real competition. The funny part is that everyone always predicts that linux will fragment. What Linux has been doing is defragmenting the hardware vendors.
Freedom from vendor lockin to hardware! Down with MS! Down with Apple!
Man I wish DEC would have gotten a clue and tried to push the Alpha into the consumer arena. But in those years MS truly had a ton of lockin…
As a fledgling computer engineer (computer systems and architecture), I enjoyed reading this article, despite the oddly colloquial writing style. Likewise, even though it’s not directly OS-related, such an article is representative of what I’d like to see more often at OSnews instead of the non-informative, highly opinionated, unresearched drivel we unfortunately seem to get so much of. Kudos!
Honestly…. this entire conversation is stupid. Most of you know very little about processor design and the merits of different schools of thought.
For example, stingerman’s conjecture than Quartz (Extreme)’s use of the GPU as a secondary processing engine is a great idea is frankly daft. To put it simply, you don’t need your windows to warp, spin, etc. No matter which way you look at it, you are creating extra system load and the idea of having an independent framebuffer per window in memory is insane and has predictable results.
Anyone who looks at PowerPC vs. x86 architectures will come to the conclusion that the RISC vs. CISC argument is a dead one. Effectively both architectures have reached a point where they rely on a RISC core with a translator and interesting caching and processing units to compensate.
Moreover, the heat output and speed of x86 and PPC architectures is much the same in mass-market products. The Pentium 4 is a high-clocked low-IPC architecture, and the Athlon and PPC head in the other direction. At the end of the day, however, the actual performance in inherently similar. Moreover, the heat outputs are substantially similar. Comparing the heatsink on my Athlon XP to that on my friend’s G4 indicates similar levels of heat dissipation.
At the end of the day, I do appreciate that the Mac users here (and indeed the majority of posters seem to be Mac users) would like to crow about the 970, but as the recent benchmarks and more in-depth analysis has shown, it runs about 90% the actual performance of the current Athlons/P4s. There is nothing wrong with that, but it is not a revolution of any kind.
Ultimately, you will always find that the PPC architecture will perform around 70-95% of current x86 architecture in the consumer market and this will remain the case, simply because processor design is admittedly complex and we’ve not seen massively revelatory new designs in recent years. Enhancements yield limited percentage improvements in speed, but ultimately, that is that. We haven’t seen a consumer (<- I emphasise this word) development emerge in recent years which has come from nowhere and doubled the speed over current systems. It doesn’t happen in design, and to be frank, it will only appear due to entire process changes to take advantage of new materials or migration to quantum computing or the like. PPC will never see a significant lead over x86 due simply to economies of scale. More x86 are being sold and more people are working on enhancements. That’s life, and it may as well be dealt with.
To be frank, Mac users need to work out that their machines are more than ample for the tasks they put them to, regardless. Even migration to the 970 will yield them a limited benefit over a high-end G4, in fact, perhaps not even massively noticable in many places. Software optimisations could easily be more worthwhile than the upgrade.
How much do you want him to cover in a short article. He hit on the salient points, we all understood what he meant. Great article, easy for even the lay person to understand the gist of it and feel intellectually satisfied.
By the way my OS X is automatically spell checking everything I type in this form and actually allows me to context switch to the right spelling. It is so cool!
One more thing, Apple provides its developer a very nice vector library that will automatically downgrade to scalar if a vector unit is not present. It will even optimize between the different generations of vector units (its called veclib.) But Apple chose not to use its Veclib in the Spec test. Vector Unit vs Vector unit Apple wins hands down, so get off justifying ICC’s auto-vectorization. No one uses ICC anyway unless your an intel engineer or obscure developer. We use VS or Borland in the Wintel world. ICC is mainly used for benchmarking.
Otherwise you wouldn’t be able to make such blanket statements such as “the PPC architecture will perform around 70-95% of current x86 architecture” with a straight face, and with some facts behind you, no doubt!!!
So tell me, why did the G5 smoke x86 here:
http://www.luxology.net/company/wwdc03followup.aspx
Now, since you’re a developer, you should easily be able to show why this windows-predominant shop wasn’t able to correctly gauge the speed of the relevant processors.
I await your informed, technical reply with great anticipation!!!
Your Quartz Extreme observations are wrong, offloading more and more processing to the GPU is state of the art in computer science circles and much research is being done on it at the university level. Every time OS X directs work to the GPU the more the CPU is free to do other work and Apple’s implementation is not to wave windows around (like in the longhorn demos) but to actually speed up the whole system. And Quartz extreme does.
Your referring to old PC tricks to speed up screen draws, Apple’s quartz extreme is implementing university level research for the future of computing. One MIT study showed that using the GPU for indexing a database can increase performance by up to 30 times with current GPU’s. Apple promised at the 2002 WWDC that they have only begun to exploit the GPU for the OS and they are showing even more work in Panther. Why do you think Microsoft is so lauding similar technology for their future WIndows 2007.
Facts: CISC vs. RISC doesn’t matter. All conventional processors are moving towards the “heat wall”.
That a 3GHz P4 consumes over 100W (peak) and a 1GHz G4 only consumes ~10W is not relevant as:
(1) the G4 is a low power embedded processor and the P4 is a high-end workstation processor
(2) the P4 is clocked 3 times higher than the G4, have a higher bandwidth interconnection and have more cache.
G5?
Hello???
Wow, thats news to me. I always thought it was a desktop processor. You must be thinking of the Xeon.
There seems to be two types of people here, the electrical engineer/computer engineer type, and the GUI/widgets/font/web designer type. The former may think that this author is simply wrong/incomplete. The latter don’t know much detail about processor design. As an example, do you understand what pipelinig is and why it is good?
Therefore the Mac folks(mostly the second type) think that the engineers are full of it and simply flaming the author(and some are) while many (engineers) are pointing out that he is just plain wrong in some of the things he says.
Sab: I haven’t read the article past the quote I made. I like wasting my time proving people wrong butI don’t like wasting my time for absolutely nothing. The sentence was a sign of things to come and I didn’t even bother to read the remaining hence didn’t even know it was slanted towards PPC until reading comments. As such my windows fanboyness (which I don’t use and not a fan of) is a misunderstanding on your part.
Bob: 4004 and 8086 are not related (except in the lame MS joke that ends with “… 1 bit of competition”) either. 4004 has no architectural descendants. 8088 and 8086 are similar (even same on software level) but 8080 is a different beast. Compare:
8080: interrupt are handled via a specilized function call (1 byte long. except that identical to CALL)
8086: interrupts are handled via special all-register stack dumping instructions
8080: Flat 16 bit addressing with 8 bit GP registers.
8086: Segmented 20 bit addressing with 16 bit registers. No real general purpose registers except accumulator
8080: No integer arithmetic except ADD and SUB, no loop, floating point, indexed or string handling instructions
8086: You know what 8086 has
etc.
Of course intel didn’t start over as if 8086 was their first CPU, there is bound to be more similarities between 8080 and 8086, say, compared to 6502 and 8086. However you can’t even trivially modify 8080 code to compile on 8086. Names 808 0 and 808 6 imply a stronger link that doesn’t exist. See if you can read the following 8080 code (CP/M operating system manual, 1982 edition, page 212-213, lines 186-199)
(cpmspt, noovf, unatrk are labels)
inr m
mov a,m
cpi cpmspt
jc noovf
mvi m,o
lhld unatrk
inx h
shld unatrk
xra a
…
My next computer will be a Mac (G5 based if I can afford it at that time), not because of MacOS X, I couldn’t care less about that, but because the CPU is low heat and high power – I’m going completely insane over the immense noise levels that my current 1600+ AthlonXP is producing.
I considered the C3 but it seems a tad underpowered for most of my tasks but not by much, I’m however concerned that the Eden platform is locked down so I can’t replace say my gfx card should it be needed at some later stage.
I’ve always looked at the Macs and admired their clean solutions, and now I simply must own one…
Listen, Megol, I’m not the very best at math, and I certainly don’t want to defend Motorola’s clock speeds, but the fastest G4 right now is 1.42ghz.
Now to the math:
1.42ghz x 3 = 4.26
Is there a 4.26 P4?
Is this “new math”?
So which type are these people?
http://www.luxology.net/company/wwdc03followup.aspx
The best G4 motorola produces is 1GHz. Apple overclocks them. You can *probably* overclock a P4 to 4.26GHz too, if you can cherry pick which P4 to overclock. With exotic cooling methods much higher frequencies have been achieved.
Alternatively, you can disregard out-of-spec frequencies. Then, yes, there exist 3GHz P4s (which is 3*1Ghz of G4.)
I did enjoy the article. But, the commentary has made my day.
Do you have any evidence for this, goo?
Apple has said that they don’t, so maybe you have some reference to back up your claim?
And please show me where a non-overclocked P4 at 4.2ghz is.
Facts, not “rumor”, please.
The PPC platform is so far different that this is a useless discussion. Some think of the PPC as only a Mac but IBM has been selling top of the line professional mission critical machines based on the PPC platform for many years. Even industrial machines are running PPC every day. This is a no contest. PPC can be one of the best computers for any task if so designed to do so. The x86 can never be designed for a mission critical task. It started as a toy and should remain that way. It has very basic design flaws.
Fact: IBM doesn’t produce plain G4s.
Fact: Apple apperanly sells >1GHz G4s
Fact: Except Motorola and IBM, nobody produces G4 cpus.
Baseless speculation without evidence nor reference, fanboy yada yada yada: Apple overclocks 1GHz motorola G4s.
Feel free to draw a different conclusion from these facts.
Ok, facts are facts, right!
🙂
Hey, I think Moto has been sucking on the gas-pipe regardless of the “facts”, and I am no fanboy of either platform.
The G5’s however, are a whole new ballgame, and the competition is good for everyone.
However you can’t even trivially modify 8080 code to compile on 8086.
Thats not what Intel said, perhaps I should have quoted the full release…
“The Intel 8086, a new microcomputer, extends the midrange 8080 family into the 16-bit arena. The chip has attributes of both 8- and 16-bit processors. By executing the full set of 8080A/8085 8-bit instructions plus a powerful new set of 16-bit instructions, it enables a system designer familiar with existing 8080 devices to boost performance by a factor of as much as 10 while using essentially the same 8080 software package and development tools.
— Intel Corporation, February, 1979
Way I look at is it really argument of what instruction set you like to program in: ARM, MIPS ALPHA Sparc, PPC, or X86. All of the previous CPU architectures are capable of getting to the Nirvana CPU speed all CPU geeks seek. It just matter of a lot of money, 18 to 48 month of time, getting good experienced CPU micro-architects and Quantum Mechanics (Transistor Tweekers).
What we need to look at is Software Operating system have become stable if not boring, Window2000/XP and Mac OS X are based on research from about late 80’s. Also memory code size limitations of the past are distant memory in the desktop and server space. Which drove feature rich similar O/S Cores. Plus if you remember history True64 Unix (Mach) , Mac OS X(Mach), and Windows NT(Cutler Design ex Dec) are all based on modified micro kernels
What this has done is move all CPU Architecture off their polar position an began rationalizing their ISA to better meet the needs of software evolution to pretty stable foundation based on common C/C++ based Multithreaded, multi-user, operating system to almost a homogenization of feature (Core CPU Instruction, Floating Point Instruction(SP, DP, Parried Single) Debugging Instruction, DSP Like Instructions (Multiply Accumulate, etc), and Vector Instruction the need to better support the market segments and application where they were moving to support.
What this did is put more pressure on CPU Micro-Architects to innovate since their was going to be less innovation coming form ISA extensions. So they had two choice Fast Clock speed Narrow Super-pipelined architecture or wide slower clock high CPI Micro architectures. They had to look at innovative way to deal with memory latencies (Caches, Larger Register Sets, Instruction Buffer, etc) , also understand how best to deal to code control flow issue ( branch prediction) Here is were the visionaries evolved and ALPHA was one of the greatest CPU experimenting environments to emerge in the last 10 years and they tried all the variation ( In-order, Out of Order, Dual Issue, Multi issue, Multithreading, on chip memory controllers and more). Big Issue today all of these innovations drive Gate Count and chip complexity which reduce our ability to make bigger innovation beyond wait for the next process geometry
When compare and contrast the PIV and the 970 they both do something similar. If you want to crank up the clock on the CPU the best way to do this is go with a super-pipelined micro-architecture. And to do this at these new speeds you need to do some thing which Dec invented on the MicroVax Processor and that is to crack PPC or X86 instruction set into simpler instruction (micro-ops). What interesting is Intel been doing this since PentiumPRO. I would argue these Microcodes made Intel more RISC then the current classic ISA level RISC processor. So now that IBM made this leap in Processor design it now back to race to who the best process technology and do most innovative transistors, with minor micro-architecture tweaks . Also with the announcement of Power5/980 Architecture, IBM and Intel are parity of feature again around SMT/HT. Here some of the best research on the subject. (http://www.cs.washington.edu/research/smt/)
On the power issue of X86 core look know further then the Pentium M which is one incredible X86 CPU which matches PPC G4 10 Watts 1 GHz with and the bonus of an amazing branch prediction unit, and 1 megabyte of onboard L2 Cache. So this point is moot as well since this Micro-architecture and Quantum Mechanic issue (Transistor tweeker)
If you want to see innovation in CPU architecture look at following project since they are truly driving innovation into again CPU design, Compiler Research and Operating Systems and Application Design To MIT projects are based on MIPS like instruction set.
http://www.cs.utexas.edu/users/cart/trips/
http://www.cag.lcs.mit.edu/scale/overview.html
http://www.cag.lcs.mit.edu/raw/
(don’t worry about it, “goo” doesn’t know what he’s talking about, but you have to give it to him, he talks a good game!!)
And hey, Nicholas, that is one of the better, factual, non-troll articles that have been here in a while.
You have to expect the heat (sorry, bad pun) from the x86’s when you point out the facts to them.
🙂
my sister is a nun and she’d love to take a switch to you!
(you know who you are)
i don’t think one person can possibly keep up on moderation chores on the miserable lot of you.
good day.
…this is utterly a pointless argument, since it’s coming down to different interpretations of what Intel’s 1979 press release meant.
The two processors weren’t opcode-compatible, but they were explicitly designed to have one-to-one translations from 8080 to 8086 opcodes so machine code could actually be translated simply, not reassembled. This is how the infamous QDOS, MS-DOS’s ancestor, was created, and is part of why Digital Research eventually sued Microsoft: California Computer Systems (if I’m remembering the name right) ran CP/M’s 8080 code through just such a translator, and then wrote a native BDOS for their development system. (The original releases of MS-DOS 1.0 actually had a Digital Research copyright string embedded in them because of this.)
So goo is technically right, but he’s also being knowingly pendantic, since Blachford’s point–that the 8086 was considered a descendant of the 8080 by Intel itself–is also correct.
You can get a quiet heat sink and power supply for that Athlon. It will cost you some money, but much less than getting a new Mac. Just check the decibel rating for something below 40 dB, below 30 if you can. I’ve heard of power supplies with a 28 dB rating, which is VERY quiet. Plus you can get anechoic tiles to absorb even more noise.
While it may not be quite as quiet as a new Mac, they can be very tolerable. I’ve heard(or not heard) several Dells that I could barely tell if they were on or off. (That depends on where you live, in urban areas that can definitely be true).
Fan technology is no reason to spend $3000 on by itself.
P.S. I would only get a G5 for OS X, because whether or not its faster than an x86 cpu, that is probably unnoticeable, except in benchmarks.
It’s strange how polorised the debate here is. Not quite as bad as the Mac Vs PC threads though. I gave up commenting on those a long time ago because although there are good and bad points to both platforms – and of course personal opinions – there were a lot of people who obviously couldn’t see this and attempting to debate sensibly is a futile exercise.
Thank you to those who have made the very kind comments, it took a lot of time and effort and these make it worth the while. Thanks.
To those telling us that it’s full of basic errors please be aware that this is not meant to be a technical reference manual, it’s an article for OSNews. I expect I did make the odd error or explain things not quite perfectly but if I am making glaring errors please tell us where they are.
—
That said I note that not many have commented on or downright missed the main point of the article – that CISC processors are NOT the same as RISC, and unless Intel or AMD or someone else comes up with a *very* clever design they never will be.
RISC has the advantage and could outperform x86 CPUs if the effort was put in. Thats a very big If however. x86 has the market and is likely to have so for some considerable time to come.
Paul DeMone explains this much better than I can here:
http://www.realworldtech.com/page.cfm?articleid=RWT021300000000
And no realworldtech is not “just a step above marketing”. It’s the most technical of any of the sites I (or anyone else) has referred to.
This place is always going to be a war zone in the comments, but your article was fair, balanced, and quite accurate.
Just like many things, the fact that PPC is more efficient doesn’t mean that it is going to be better, or faster, or more wide-spread.
However, it seems to me that at some point x86 is going to have to have liquid-cooling to keep their processors cool enough.
Are you kidding mate? The new G5 is a lovely machine, but it requires *9 fans* and cannot be that quiet. Get yourself some Zalman bits for your Athlon (not expensive) and it can be completely silent. I’m running an Athlon 2400+ system with no case fans, and it runs stable and cool with virtually zero noise. It can easily be done – it’s just that most white box builders don’t bother, which is indeed crap!
ILoveWindows: Without using SSE or Altivec, you are really going against the abilities of modern processors, and the results you get are not meaningful. Most especially, if you look at the P4, the classic FP performance is very weak, while using SSE literally will double performance in many cases. Go to Anandtech, Tom’s Hardware, Ars Technica and other tech sites will show you that on intensive processes such as 3D rendering, for the first half year of P4 release, without SSE2 recompiles of the software, relying on x87 floating point, it got creamed by the Athlon. Once software became recompiled, performance was better. Apple’s little notes about disabling SSE on the P4/Xeon benchmarks effectively cripple the floating point performance of that CPU, and they knew that. That is not to mention that due to the architectures of the P4 vs. the 970, they will perform differently depending on the detailed formation of the code, such as the sizes of matrices, fp precision required, formation of loops/conditions and a whole host of other factors. And no, I’m not a 3D developer, I’m a university researcher into image processing (2D, 3D, stereo vision) and I deal with this stuff a lot. You can’t just throw a piece of entirely un-optimised code at a CPU and expect the initial response to be true of the capabilities of the chip. In the case of the P4, this is incredibly pronounced, due to the design decisions that Intel took. I disagree in many places with their implementation, and prefer Athlons myself, precisely because they are better at x87 rather than requiring the SSE2 optimisation, but there it is.
Stingerman: Firstly, “Dawnrider your wrong” should be “Dawnrider you’re wrong”. Just FYI. I won’t argue that offloading processing onto the GPU is a bad thing, because it isn’t, but that is only worthwhile if you intend to use some serious vector-based tasks on such a system. Current iterations of rendering software, for example, are using the GPUs in that way. The trouble with Quartz Extreme, which I was wishing to highlight, is simply that rendering basic 2D forms in a windowing environment is a minor task to a modern processor. In fact, OpenGL effectively offloads a degree of that to the GPU to start with, which is why the graphics card needs memory for more than just a look-up table, as opposed to simply streaming a framebuffer out to the screen. My points was, that it is wrong-headed to the point of being moronic to take such a ripe source of processing power and then create spurious tasks such as rendering shrinking windows to saturate it with. Longhorn is daft in trying to do that as well. Moreover, OSX users suffer, because having a framebuffer-sized chunk of memory dedicated to each window rapidly chews through physical memory once you start using more than a few applications. You add massive overhead to the system and quickly reduce responsiveness if the thing has to start paging to disk to support your graphical excess. You might have no problem watching your windows shrink, spin, etc. when you just have one or two, but if I have >20 windows open at a time (and I do), it would absolutely destroy the performance of my system. In short, get rid of quartz effects, save memory and use those GPU cycles for more useful work.
I got news for you.
Please see the link below…
http://www.apple.com/powermac/architecture.html
http://64.246.37.205/tech_specs.php
Peripheral design is almost the same as the commoditized brands.
In short, you will pay too much for non-commoditized parts.
Where did you find out that the P4 lacks ILP? (not trying to be difficult, I just never heard that before) I agree that the programming frameworks provided by MS haven’t pushed parallel processing. Still, the 100% improvement in general apps seems farfetched.
Also, please explain how ILP and SMT are related. Again, just trying to learn here. My understanding of SMT is that it basically allows a single core to execute multiple threads at once and share a pool of execution units. It would seem that threads that don’t fully utilize the execution units would benefit most from SMT. I’m not sure what this has to do with ILP. I’ve mainly heard of ILP in relation to Intel’s EPIC architecture.
I knew I’d read somewhere that HT provided good server performance improvements though my memory is failing me as to what type of server. I’ll take your word on the database/transaction stuff. Sounds familiar. As for games, that is pretty much what I meant by “games possibly someday”. Most games are not currently written to take advantage of any sort of parallelism, but they certainly could be.
BTW, before OSX, Apple’s multiprocessor experience pretty much consisted of adding an extra processor to improve Photoshop performance (the second processor was not utilized by most applications). IBM certainly does have a lot of experience here though. Next certainly has a lot of multiprocessing experience too, though I’m not sure about threads and I don’t remember seeing multiprocessor NeXT boxes. Anyone know about these?
As for the Intel zealot / competition thing, I’m not for Intel and I’m not against competition. I find AMD’s Athlon64 design MUCH more interesting than Intel’s Pentium4 (including Prescott). AMD’s system architecture (outside the processor) is very cool. I just hope they don’t stumble anymore with poor execution. I also think the PPC970 is a great processor (as well as the Power4/5).
“That said I note that not many have commented on or downright missed the main point of the article – that CISC processors are NOT the same as RISC, and unless Intel or AMD or someone else comes up with a *very* clever design they never will be. ”
That’s what bothers me a bit. Because really, even if I am not a CPU specialist myself, in all technical articles I have ever read ( ars technica, etc… A good site, French only : http://www.onversity.com/cgi-bin/progdepa/default.cgi?Eudo=bgteob&N… ), it’s said that the debate RISC against CISC is dead. For example, I don’t understand how you can say that G4 is purely RISC with SIMD units, as it is rather a heavy “unit” in CPU ?
“nd no realworldtech is not “just a step above marketing”. It’s the most technical of any of the sites I (or anyone else) has referred to. ”
I really don’t think it. I don’t know well realwordtech, but ars technica is really a good reference, even if it is a bit PC biased. And for me, it seems a lot better than realworldtech.
But I liked your article, though.
” You can’t just throw a piece of entirely un-optimised code at a CPU and expect the initial response to be true of the capabilities of the chip”
Again, I’ll post the link, and even throw in a quote for you:
http://www.luxology.net/company/wwdc03followup.aspx
Quote:
”
Luxology uses a custom-built cross platform toolkit to handle all platform-specific operations such as mousing and windowing. All the good bits in our app, the 3D engines, etc, are made up of identical code that is simply recompiled on the various platforms and linked with the appropriate toolkit. It is for this reason that our code is actually quite perfect for a cross platform performance test”
“In fact, the performance tuning was done on Windows and OSX. We used Intel’s vTune, AMD’s CodeAnalyst and Apple’s Shark.”
This wasn’t un-optimized code, and if they had utilitzed alvitec and SSE, most likely the results would have been even more dispartate.
Again, show me why these developers, who are at the very top of their discipline, in a raw test of CPU performance, did not conduct a fair test, as they claim they did?
Why is it that whenever we get an article on this site that praises PPC, Apple, Mac OS etc., that there are several which respond saying that its just a mac fanboy article… It seems that the signal to noise ration on these boards gets worse by the day.
more like signal to noise AND distortion…:)