The biggest problem is that the machines are not at all similar in specifications. They used desktops and laptops, but in both cases, the Apple machines had far less memory to go around, but did have slightly faster processors with better bus speeds. The Gateway all-in-one machine had 2GB of RAM more than the iMac, but a 400Mhz slower processor. In the laptop category, the ASUS notebook had 1GB of RAM more than the MacBook (on page 1, that is - the results page say the MacBook had 1GB more), but a 2.2Ghz processor compared to the 2.4Ghz in the MacBook. In addition, the MacBook had an integrated Intel graphics chipset, where the ASUS notebook had an Ati one. This means that the ability to draw conclusions on the performance-related benchmarks is limited, at best.
The hardware differences will also skew the battery results in the laptop comparison. The ASUS has a more powerful graphics chipset, which also drains more battery power. The MacBook had a Penryn processor, which, according to AnandTech, has a significant increase in battery life compared to its predecessor in the ASUS. In addition, the ASUS has a much larger screen (15.4" compared to 13.3") which also affects battery life.
Another big issue is that the Vista computers did not seem to have Service Pack 1 installed, which is another thing that could have seriously skewed the results - this time in favour of the Macintosh. Service Pack 1 is said to fix a set of performance-related problems, and it has been released quite a while back already - it should have been installed. When you do these comparisons, you always use the latest versions, fully patched. Common practice.
The last major issue I have with the results are summed up in the article's first page:
Seeing as no further details are given regarding the test subjects - no background information, no direct quotes, no anecdotes, no experience levels - the 'usability test', as PM calls it, is extremely anecdotal, and more or less useless - a common problem in usability testing. Usability testing is not a matter of putting grandma behind a computer and see how she gets around. It requires a scientific setup, reduction or control of external influences, larger test samples, detailed information regarding the subjects, before you can draw any conclusions from what you have observed.
As for the results of the benchmarks themselves - read the article. While many of the results do reflect my personal feelings and opinions, I do not take these specific results seriously. Even if results are in line with my own feelings and opinions, if they are based on faulty testing, they do nothing to strengthen those feelings and opinions.


