Blogger Kevin Bowling takes a look at the never-ending stream of benchmarks from Phoronix, with various Linux distros pitted against each other and even different operating systems, and he wonders, are they bullshit? . Case in point, this Debian vs FreeBSD benchmark that was submitted to OSNews yesterday.
The first link point to here and the second to the article. One is missing (the important one)
It’s here http://www.kev009.com/wp/2008/12/phoronix-benchmarking-statisticall…
But it date from 2008????
Edited 2010-07-26 01:14 UTC
Seriously, the article is almost 2 years old and while Phoronix benchmark may be rubbish I have a hard time taking anyone seriously who also post junk like this:
http://www.kev009.com/wp/2009/02/i-hate-ubuntu/
Hey Kev,
You know what kind of people I immediately lose respect for? People who judge others on such shallow grounds.
I reviewed the post about phoronix and the points are as valid today as they were then. In general, I try to write blog posts that have meaning beyond current events and use wordpress more as an old-school homepage. I guess the blog format/expectation is sometimes counterproductive. I don’t understand why Michael doesn’t simply take the advice given to him and enable error bars, more carefully document testing conditions, and dial back the sometimes comical analysis.
Seriously, the Ubuntu post was a tongue-in-cheek troll. Although I do believe at best Canonical isn’t doing enough to shepherd underlying development of their OS, I cannot think of any person or thing I hate in life. I use Linux as a workstation class OS and the Ubuntu community are not as helpful as others in this role. At the time, it seemed like the general tide of Ubuntu users I’d met in real life and online were giving off a sort of snobby Mac aura and I wanted to stir the pot. The comments are quite comical, all across the board, and worth reading so it was successful with respect to my intentions.
as people say over and over again in the phoronix forums, without error bars most of there benchmarks are worthless.
every measurement has an uncertainty. it might be 10% or 0.1%. you can only say that one measurement is big/smaller than another if you know that the difference is bigger that the uncertainty.
this is how it should be done http://www.hep.man.ac.uk/u/sam/zgoubi-optimise/
Benhmarks are not statistics. It measures how fast a computer does the specified thing. Whether the specified thing is accurate picture of real world use is another matter, but that’s not something you can measure with “standard deviation” or whatever.
If you repeat the test many times and get different results, statistics can come in handy – but usually benchmarks can be written in a way that the results do not vary.
Benchmarks are results, statistics are the context and without context the results are useless. Were they running anti-virus or seti@home in the background? Did they run 10 trials and take the best result, worse result or average result or a random number? There is absolutely no way that a benchmark on a computer is perfectly reproducible and the statistics like error would tell us how variable the results are. If I fired a gun 10 times, hit the bulls eye once and completely missed the target 9 other times a reasonable person would not conclude I am an expert marksmen based on the 1 benchmark.
Apparently, you don’t have much of a background in the sciences. Every measurement, you can ever take with any instrument comes with some measure of accuracy. All scientifically significant measurements reported always come with an indication of their accuracy. So while two different labs can come up with two different measurements for the same thing, we can say they actually agree if their values come within the measure of uncertainty for each others values.
All anyone is asking, is that they indicate the uncertainty with their measurements. Other websites with benchmarks do an equally crappy job with this.