‘The Ultimate Lab Test’?

PopularMechanics has performed three types of benchmarks on Apple and ‘ordinary’ PC machines, with the former running Mac OS X Leopard, and the latter Windows Vista. The first type of benchmark consisted of users giving ratings to things like design, ergonomics, web browsing experience, and so on. The second benchmark focused on real-world performance (launching applications, boot/shutdown times, and so on), while the third and final benchmark consisted of Geekbench and Cinebench runs. PM concludes: “The results gave us a clear winner in the performance categories, but the big surprise was how little difference we found in user preferences. Turns out, both platforms are capable and easy to use, but only one was the victor.” Even before I got to the results, I noticed a whole set of problems with the benchmarks performed in this article, that would seriously skew the results.The biggest problem is that the machines are not at all similar in specifications. They used desktops and laptops, but in both cases, the Apple machines had far less memory to go around, but did have slightly faster processors with better bus speeds. The Gateway all-in-one machine had 2GB of RAM more than the iMac, but a 400Mhz slower processor. In the laptop category, the ASUS notebook had 1GB of RAM more than the MacBook (on page 1, that is – the results page say the MacBook had 1GB more), but a 2.2Ghz processor compared to the 2.4Ghz in the MacBook. In addition, the MacBook had an integrated Intel graphics chipset, where the ASUS notebook had an Ati one. This means that the ability to draw conclusions on the performance-related benchmarks is limited, at best.

The hardware differences will also skew the battery results in the laptop comparison. The ASUS has a more powerful graphics chipset, which also drains more battery power. The MacBook had a Penryn processor, which, according to AnandTech, has a significant increase in battery life compared to its predecessor in the ASUS. In addition, the ASUS has a much larger screen (15.4″ compared to 13.3″) which also affects battery life.

Another big issue is that the Vista computers did not seem to have Service Pack 1 installed, which is another thing that could have seriously skewed the results – this time in favour of the Macintosh. Service Pack 1 is said to fix a set of performance-related problems, and it has been released quite a while back already – it should have been installed. When you do these comparisons, you always use the latest versions, fully patched. Common practice.

The last major issue I have with the results are summed up in the article’s first page:

These things are largely matters of preference and style, but you can still make a reasonable attempt to quantify them, and we did. We tested two all-in-one desktops and two laptops – one Mac and one PC per category – and assembled a panel of testers with a range of experience and preference that ran the gamut from expert users to my wife’s stepfather, who, by his own account, had never actually turned on a computer. Our testers were asked to set up the computers right out of the box and explore the machines through everyday tasks such as Web surfing, document creation, uploading photos, downloading Adobe Acrobat files and playing music and movies through Media Center and Front Row (the entertainment software suites integrated into Vista and Leopard, respectively). Our testers were instructed to divorce themselves as much as possible from their previous technological preferences and rate their experiences with each computer’s software and hardware.

Seeing as no further details are given regarding the test subjects – no background information, no direct quotes, no anecdotes, no experience levels – the ‘usability test’, as PM calls it, is extremely anecdotal, and more or less useless – a common problem in usability testing. Usability testing is not a matter of putting grandma behind a computer and see how she gets around. It requires a scientific setup, reduction or control of external influences, larger test samples, detailed information regarding the subjects, before you can draw any conclusions from what you have observed.

As for the results of the benchmarks themselves – read the article. While many of the results do reflect my personal feelings and opinions, I do not take these specific results seriously. Even if results are in line with my own feelings and opinions, if they are based on faulty testing, they do nothing to strengthen those feelings and opinions.

47 Comments

  1. 2008-04-21 10:43 am
    • 2008-04-21 11:04 am
      • 2008-04-21 11:16 am
        • 2008-04-21 11:25 am
          • 2008-04-21 12:08 pm
          • 2008-04-21 12:15 pm
      • 2008-04-21 11:52 am
      • 2008-04-21 1:54 pm
        • 2008-04-21 2:11 pm
          • 2008-04-21 4:11 pm
          • 2008-04-21 4:38 pm
          • 2008-04-21 7:11 pm
      • 2008-04-21 4:41 pm
        • 2008-04-21 10:25 pm
    • 2008-04-21 11:13 am
      • 2008-04-21 11:21 am
        • 2008-04-22 9:51 pm
  2. 2008-04-21 11:40 am
    • 2008-04-21 12:45 pm
      • 2008-04-21 4:47 pm
        • 2008-04-22 6:30 am
          • 2008-04-22 11:51 am
          • 2008-04-23 4:03 am
          • 2008-04-23 11:49 am
          • 2008-04-23 6:27 am
  3. 2008-04-21 11:50 am
    • 2008-04-21 5:04 pm
  4. 2008-04-21 1:41 pm
    • 2008-04-22 11:40 am
  5. 2008-04-21 1:48 pm
    • 2008-04-22 4:54 am
  6. 2008-04-21 1:48 pm
    • 2008-04-21 2:39 pm
  7. 2008-04-21 2:36 pm
    • 2008-04-21 2:45 pm
  8. 2008-04-21 2:50 pm
  9. 2008-04-21 2:58 pm
  10. 2008-04-21 3:29 pm
  11. 2008-04-21 4:26 pm
  12. 2008-04-21 4:29 pm
  13. 2008-04-21 5:03 pm
  14. 2008-04-21 5:25 pm
  15. 2008-04-21 6:47 pm
  16. 2008-04-21 10:57 pm
  17. 2008-04-22 2:15 am
  18. 2008-04-22 4:08 am