Post a Comment
Don't know either way, but it would be interesting to know what the COMPILE times were, to see how the Sun and GCC compilers faired for just daily coding.
Also, all of these programs are (I believe) pure C, some C++ programs would be very interesting as well.
(I know, he goes through all of this effort, and yet all we do is criticize anyway. Not my intention, seemed like a good effort.)
Thanks so much, I've really enjoyed your series so far on your old sparc box. Keep them coming!
My only question, I have a feeling you're gonna address in future articles, did you breath new life in to the sparc? Or is it gonna sit next to your desk after all this is over?
What would be great now is have some comments from the gcc team. Why is gcc slower ? Is it just some very specific areas(e.g. floating points, long long arithmetic) ? What's planned in the future for getting gcc to produce faster code ?
Another thing I miss in the comparison is standard compliance. How well does gcc and the Sun compiler conform to the relevant C and C++ standards ?
Its definitely a good article. A real joy to read. Keep up the good work.
I think this suggest that the earlier 32bit is faster than 64bit conclusion is in fact entirely dependent on GCC. Sun's compiler seems to consistantly generate the same or faster 64bit code than its 32bit version.
What would be great now is have some comments from the gcc team. Why is gcc slower ? Is it just some very specific areas(e.g. floating points, long long arithmetic) ? What's planned in the future for getting gcc to produce faster code ?
gcc for sparcv9 is relatively new. Thanks to gcc's modular backend, the difference isn't severe in these examples, although from personal experience I've seen much more drastic differences in performance of binaries compiled with the Forte Compiler Collection tools versus gcc. When I tried compiling one of our grid analysis tools here (which makes extensive use of 64-bit integers and floating point math) with both compilers, the version compiled with the Sun compiler ran about 40% faster than the version compiled with gcc 3.3.
Sun has been fine tuning their compilers for sparcv8/v9 for over a decade, and it certainly shows.
Another thing I miss in the comparison is standard compliance. How well does gcc and the Sun compiler conform to the relevant C and C++ standards ?
Per default gcc has some very odd behavior. With the -ansi -pedantic flags gcc behaves a bit nicer. Obviously Sun's compiler doesn't support a lot of the gcc-specific extensions which some have seen as fit to use (i.e. variable argument macroes, nested functions, etc) I believe it's the responsibility of all programmers to ensure their code is portable across a number of compilers and not bound to a particular toolchain.
Most of the problems you'll run into trying to compile code developed primarily with gcc/x86 without a lot of portability testing are going to be in things like endianness issues and addressing misaligned words, which is, in my opinion, a coding error.
Yeah, that was a big surprise, which seemed to mean that 64-bit binaries with Sun's compiler, despite applications not specifically writen for 64-bit, seem to be faster.
Still, the fact that performance wasn't all that far off says a lot about the quality of GCC, and why it's in such wide use. They've done a fantastic job.
What about a benchmark that tests the more advanced C++ features such as the STL? That would be nice.
But other than that it is a very good article. The performance of GCC is really impressive. I wonder if this is why KDE3.2 is so fast.
All the programs examined here appear to be integer heavy. Sparc chips are not great integer performers, those who buy them are not looking for integer performance.
Floating point performance is where Sparc really shines, and is a spot where GCC is traditionally bad. The Sparc compiler does amazing things to floating point heavy code. The code I run is fp heavy, and the sparc compiliers make a noticiable difference.
The author of this article has really missed the boat. If you want to run standard GNU/Linux type stuff like what was benchmarked here, buy a cheap intel box and run linux. But if you need fp, get the Solaris stuff.
Yeah, that was a big surprise, which seemed to mean that 64-bit binaries with Sun's compiler, despite applications not specifically writen for 64-bit, seem to be faster.
It's not that much of a surprise considering the majority of executables that Sun ships with Solaris are 32-bit...
These comparisons are great !
Now, anyone up for benchmarking Tendra vs gcc on x86/linux/*bsd ?
First off good article.
But with the benchmarking CFlags for gcc, I think it would be intersting to use: "-O2 -march=ultrasparc -fomit-frame-pointer" instead of "-O3 -mcpu=ultrasparc".
March generates it for the cpu specified W/O backwards compalibility. So it should in theory generate faster code
Also I have found that -O3 slows some programs up, And -fomit-frame pointer always seems to speed up programs but only slightly.
Although I think the conclusion will be the same that the sun compiler generates better code...
Is Tendra even running on Linux?
And if so, where is the rpm?
I too would like to see some in depth c++ comparisons, if the author could provide them.
Thanks
Wouldn't the easiest way to keep I/O speed out of the equation for the gzip tests be to redirect the output to /dev/null ? Ie: time gzip -dc /some/file.gz > /dev/null.
Good idea, I hadn't thought of that. I'm traveling now, so I don't have access to the system, but I'll give it a try when I do.
Is that Sun has the same advantage as SGI, they do produce their own hardware so Sun's compilers will perform higher than GCC because they are optimized for that platform.
Why no IO benchmarks? The reason for me to get Sun has always been the ability to run a superior OS (SunOS). Now there is Linux for the Intel CPUs, no reason to get Sun. Why get a Sun now? IO on Intel still chokes on heavy IO (I2O? Dunno... never cound find a board that supported it OK) Hows about someone give Tony a 450 loaded with RAM and disks so we can see some IO benchmarks?
Yes. If you take that into account, the performance of GCC is even more impressive.
Much of the gcc effort is revolving around x86. Thus if indeed gcc comes so close to Workshop's performance it's very impressive. However, as Jason pointed already to make any kind of statement author should have performed full CPU2000 suite. Just gzip is not enough to evaluate anything.
gcc 3.x's -O3 sometimes produces marginally faster code on x86 cpus (its not worth it; stick to -Os for general things and -O2 for hot spot code that gets executed a lot).
I've found -O2/-O3 being faster not to be true on other architectures. For instance on my alpha. gcc 3.x's -Os produces the fastest code (faster than -O2, -O3 or -O).
I think the Sun compilers would perform a bit better than gcc on higher end Sun hardware. A U5 even in it's day wasn't a terribly impressive machine compared to the U60 workstation line. Today's workstations vs desktops are even more spread out if you compare CPU cache and memory bandwith. A SunBlade 2k verses a Blade 100 is a signifigant difference.
-Os produces buggy code sometimes. It is not advisable to use it for most code. I know Gnumeric, Abiword and some GNOME games segfault and crash with -Os.
But Tony, Sun's compiler is so expensive! Yes, it is expensive, especially when compared to the free (as in beer and freedom) GCC. However, Sun does offer a free 60-day evaluation license for their compiler suite, which is what I used.
For Sun's compiler, I used the 60-day trial for Sun ONE Studio 7, which can be found here. It includes C, C++, and Fortran compilers, as well as Java other development tools. The full version lists for $2,995.
It is possible to obtain these compilers for a substantially lower price. See:
http://wwws.sun.com/software/cover/2003-1027/index.html
If you run your own small business, you could get it for as low as $105/yr, which, compared to the other price is quite a deal.
Mr. Bourke,
What's the difference between the compilers when profile directed optimization comes into play?
In regards to I/O bound issues, as an addition to /dev/null, you could try running inside of /tmp and/or switching your UFS filesystems to logging,noatime.
Yours truly,
Jeffrey Boulier
I would greatly appreciate it if the actual numbers from the tests were given. By just looking at the graphs, it is difficult to tell just how pronounced the differences are.
It is good to see someone at least trying to verify the "Sun vs. GNU" statements.
Geoff
Yes. If you take that into account, the performance of GCC is even more impressive
Not really. The vast majority of code optimizations are not platform specific. gcc sports a modular backend which makes it easy to port to other architectures. When faced with complex mathematical code, I've found (see my previous post in this thread) gcc to be a poor performer on sparcv9 (doing calculations on based on large sets of 64-bit fixed point grid coordinates)
Reading your story about how gcc is almost as fast as Sun's compiler, I thought I'd try it for myself given that I have both of them handy. I use Sun's Forte compiler on my product on Solaris/SPARC.
Here are my compiler specs that I currently have on a 300MHZ Ultra 60.
gcc: gcc version 3.3
cc: Forte Developer 7 C 5.4 2002/03/09
I'm doing 32 bit only, for now.
The application will be a memory manager, so one can safaly say that this is integer based through and through. Lots of loads and stores with many register ops in between. There isn't any reading/writing to/from disk, so no bottlenecks there. Just straight load/store/register operations.
Compiler flags used: I don't know if I'm using the right compiler flags for gcc, and in fact, maybe I could use better compiler flags on cc as well, but never the less, here's what I'm currently using. If anyone want to suggest better flags to try, pelase let me know and I can rerun the tests.
Compiler flags SPARC and INTEL platforms:
gcc flags: -O3
-fexpensive-optimizations
-finline-functions
-ffast-math
-fomit-frame-pointer
cc flags: -fast
The numbers are the time it takes to complete the test, so lower numbers are better:
gcc: 55s
cc: 44s
Did each run 5 times, took the average.
Sun is 25% faster by my math.
Sun's compiler is much faster. It takes longer to compiler the application, but that's not what is important. What's important is how fast the resulting binaries are.
On the INTEL side, I get the following results from using the following
compilers.
gcc: gcc version 3.3.2
cc: cc: Sun WorkShop 6 update 2 C 5.3
2001/05/15
Execution times (lower times are better):
gcc: 95s
cc: 95s
On INTEL, in my application, they are equal.
Any questions, email me: balson@attbi.com
Jim
This may well be a minor point, but I think it would be helpful if Tony ran his benchmarks more than 3 times a piece and averaged the results. While he claims that his numbers were pretty much consistent, I think it would be instructive to run the benchmarks a large number of times to be sure that his data is mostly accurate. With a large dataset, you can easily pick-off the outlying points, and have a better idea of what your distribution is. My experience is that you mostly get a lot of hits around one datapoint, however, finding bi-modal cases can be an indication of complex/interesting behavior that might warrant further investegation. It's a minor point, but it would certainly lend additional credibility to his claims.
that for us it ain't worth investing in proprietary compilers.
I work for a small company producing an open, cross-platform Seismic interpretation platform (www.opendtect.org).
We compile our suite with gcc on all platforms (including win32) and that is where the great benefit of gcc lies: it is the same on all platforms. If it compiles on Linux, it almost certainly also compiles on Solaris, SGI and even win32 if you leave differences in api's out of the equation.
Whatever you do with templates and whatever clever C++ constructs you might come up with, if it compiles on one platform, it compiles everywhere.
And yes, it might cost some performance, but I don't think a performance loss in the order of 5-10% average is a big deal, compared to the benifits of having a single, standard compliant compiler across all platforms.
This is not only benifitial to the developer (after all, the customer 'pays' the performance penalty, not the developer), because if I spend less time porting and debugging, I can spend more time on developing new features or write better documentaion.
By the way, beside the compiler there is also gdb, which is also the same across all platforms, even tough the SGI one does not support debugging of multi-threaded applications.
So, I personally prefer the cross-platform benifits of gcc over performance gain of proprietary alternatives without one second of hesitation.
If you'd like to see a bit more performance difference,
try a really FP intensive code like Dyna, use the latest compiler (currently version 8), enable full optimization
at least "f90 -fast -v9b" or similar, and run on a modern machine (UIIICu, UIIIi or UIV processor).
For the the gcc compiler -march=v9 should work better.
-funroll-all-loops too.
-fomit-frame-pointer could give you 10% more performance.
You should also try -Os and -O2
So, okay. This is a decidedly integer-heavy test. Given that at least one other person has commended on the floating point focus of scc and the solaris platform, I'd love to see some good heavy floating point done. Since we're talking about real-world performance, it might also be a good idea to get some fixed-point in there.
I'm a little surprised that you're using GCC 3.3.2, when GCC 3.5 is out. GCC 3.5 produces impacts on ARM7 code speed as much as 15%; I would be excited to learn how it performed on the Sparc, both in comparison to scc and to older GCCs.
I'd also like to see how the compilers hold up to Dhrystone, under general-approach algorithms like the Mersenne Twister or Boost's Lagged Fibonacci RNG generators; to large-scale large-variance code like the Boost regression tests, the Loki tests and some or another STL test suite; to large patterned number maniuplation like GIMPS or a GMP rigor test; et cetera.
This is a wonderfully neat page, but it could use some work in the way the tests are done. I suggest a look at David Welch's compiler performance page for the GameBoy Advance, which though just one test is a test done in a rather more rigorous fashion. http://www.dwelch.com/gba/dhry.htm
Your articles are great, but I would suggest running the test more than three times, say at least ten times, to be able to use common statistical tools to evaluate significance of your results.
I'm not an expert (far from it) in compiler technology, but it seems to me that there would be some improvement in performance with the GCC compilers if they had both been built from source with the Sun cc compiler. The problem I see with the setup used here is that a binary version of the GCC 3.2 compiler was installed so it was likely optimized for a different peice of hardware or not optimized at all and then the 2.95 version was built with this non-optimized compiler.
I recall that the SPARC processor has an equivalent instruction set to MMX. I also recall that the Sun C compiler supports this and that GCC does not. For those of us involved in signal processing and image processing this would be a big win.
Also this would seemingly be a very big win for the Sun C compiler.
Any thoughts on this?
Also, it would be really nice to see some benckmarks with more floating point intensice operations.
- Andrew
While the article is interesting, it neglects to mention that Sun's C compiler supports a lot of things that gcc does not, like code autovectorization (-xautopar) and OpenMP support. I've found that these can provide big wins on SMP machines (I've done some testing on a 6-processor V880 at my university).
Take a look at the cc man page - http://developers.sun.com/tools/cc/documentation/s1s8cc_documentati.... About half of it is dedicated to the various optimization flags.
Hi Tony,
Your article gives an interesting reading and it is very close to my own experience using SunOS and Solaris for longer than 15 years. Yet, I must differ with you on the actual rigourousness of the tests, because the optimization flags you used are *very different*; as it is your performance comparisons are between equivalent to comparing apples and oranges. In order to have equivalnet testing conditions, you should have used "gcc -O3 -march=ultrasparc -mcpu=ultrascparc [-m64]", because using -mcpu alone, you one set up timer switches but take no advantage of particular register optimizations available in the target architecture. On the same venue, "-xfast" is the bane of the SunPro compilers, it creates binaries that as a matter of fact, contain ABI imcompatibilities with system shared libraries!!! Rather, you should have used "cc -xO3 -Olimit=<something very high> -xarch=v8plusa|v9a" to have equivalent binaries and therefore a valid comparative test.
In summary, if I were your technical editor or your academic supervisor, I'd have you repeat all the experiments with an adjusted experimental model.





