When doing research for my evaluation of Solaris 9 on my Ultra 5, I kept running into one comment over and over again: Sun’s C compiler produces much faster code than GCC does. However, I couldn’t find one set of benchmarks to back this up. (If you know of any, drop me an email.) Could this be yet another case of rumor-taken-as-fact?
If indeed Sun’s compilers are faster, how much faster? 5 percent? 20 percent? twice as fast? I’ve heard them all from various people, so I decided to do some tests on my own to see if I could see this supposed performance difference between GCC and Sun’s compiler.
Another potential question would be which version of GCC to use: That latest (as of this writing) 3.3.2, or GCC 2.95 which is still in wide use (especially on Linux, FreeBSD, NetBSD, and OpenBSD running on SPARC). So I included them both, to also answer the questions of which GCC produces the better code (on Solaris/SPARC). 64-bit binary creation was also tested, for binaries created by GCC 3.3.2 and Sun’s compiler.
Bling Bling
But Tony, Sun’s compiler is so expensive! Yes, it is expensive, especially when compared to the free (as in beer and freedom) GCC. However, Sun does offer a free 60-day evaluation license for their compiler suite, which is what I used.
Compilers
For Sun’s compiler, I used the 60-day trial for Sun ONE Studio 7, which can be found here. It includes C, C++, and Fortran compilers, as well as Java other development tools. The full version lists for $2,995.
For GCC 3.3.2, I used a binary version from SunFreeware, built for making 32-bit and 64-bit binaries. For GCC 2.95.3, I downloaded the source from GNU, and built it using GCC 3.3.2.
Applications
I’m using the same applications as I did for my previous article on 32-bit versus 64-bit binaries. In fact, the tests for this article and the 32-bit versus 64-bit article were all done at the same time, and these articles done separate in the interest of brevity. Those applications are OpenSSL 0.9.7c, GNU gzip 1.2.4a, and MySQL 4.0.17.
For the GCCs, the compiler options were “-O3 -mcpu=ultrasparc” for optimized 32-bit binaries, and “-O3 -mcpu=ultrasparc -m64” for optimized 64-bit binaries. For Sun’s compiler, I used “-fast -xarch=v8plusa” for optimized 32-bit binaries, and “-fast -xarch=v9a” for optimized 64-bit binaries.
GNU gzip 1.2.4a
I used the latest version from GNU’s ftp site, and compiled in the 32-bit and 64-bit variations with the various compilers. The gzip and gunzip operations were run 3 times each, and the results averaged.
In this test, Sun’s c compiler created a GNU gzip binary that zipped up the 624 MB file about 11% faster than GCC did. Surprisingly, GCC 2.95 produced a gzip binary slightly (2%) faster than GCC 3.3.2. The unzip operation, GCC 2.95 won at 89 seconds, with GCC 3.3.2 coming in at 90, followed by Sun’s compiler at 91 seconds.
In the 64-bit run, Sun’s compiler produced the faster code.
As with the 32-bit tests, the gunzip operation produced very little delta between the three binaries.
After seeing my 64-bit versus 32-bit article, Shane Pearson wrote me posing the question whether or not disk I/O was a bottleneck in the gunzip operations, thus causing the results to come out essentially the same. After I ran some tests, it turns out he my very well be right. The 624 MB file is written in 91 seconds, which is about 6.9 MB/s, and then you include reading the 126 MB gzip file in, for a total of around 8.4 MB/s, read and write. This is pretty close to the effective maximum of the system, which for a 100 MB dd if=/dev/zero test and 100 MB mkfile test, comes to about 10 MB/s, with no other CPU operations going on.
And here is a graph showing how they all stacked up:
Sun’s compiler was the clear winner. Surprisingly, the older version of GNU’s GCC beat 3.3.2 by a very slim margin.
For the OpenSSL 0.9.7c tests, I used the static builds of the binaries (static as in no separate libssl.s0 or libcrypt.so). I ran openssl speed rsa dsa for each of the 5 binaries (3 32-bit and 2 64-bit) 3 times, and averaged the results.
The differences were not nearly as profound with OpenSSL as with the other tests. Generally, Sun’s compiler did slightly better than the GCCs. The two GCCs were about evenly matched. Several of the operations showed little or no difference between either three.
64-bit
There was much more significant performance difference when comparing 64-bit binaries created by the GCC and Sun compilers. For 64-bit code, Sun was the clear winner.
These graphs show how all 5 binaries stacked up against each other.
MySQL 4.0.17 has both C code and C++ code, and I compiled it with the three compilers in question. Getting 4.0.17 to compile under GCC 2.95.3 took the most finagling, but I was eventually able to sort it out. Because the tests take around 3 hours to run, I ran each iteration twice, and used the best run. There was very little disparity between the runs.
For the 32-bit runs, Sun’s compiler actually got beat by one or even both GCCs for many of the operations. In general, the results were quite sporadic between the 3 compilers.
64-bit
As with the previous article, the 64-bit MySQL server binary was used, but the 32-bit Perl and MySQL client libraries were used to run the client end of the benchmarks in order to keep the client end of the tests consistent.
Sun’s compilers produced a binary that beat GCC in all but the select operation.
Here’s a view with all the compilers accounted for.
The results were all over the map for these runs. I went back to double check and make sure sql-bench wasn’t producing sporadic results (which took quite a bit of time), but the runs were consistent.
The biggest surprise though was that Sun’s 64-bit compiler created a binary that for most operations beat out all the others, whether they be 32-bit or 64-bit.
Conclusion
These were a limited set of tests on a limited number of applications run in limited ways, and I only tested the C compiler (and for MySQL, the C++ compiler).
Still, we can draw a few interesting generalizations. First, Sun does indeed appear to produce faster code, although not always, and the difference in speed varied quite a bit, and toped out at about 22% faster.
The greatest differences were seen comparing 64-bit binaries between GCC and Sun’s compiler, with Sun’s compiler having a greater advantage. Another surprise was that GCC 2.95 tended to produce slightly faster code compared to 3.3.2, with a difference in the 2% range and not much more.
The difference in speed varied greatly throughout the tests and specific operations, so it all depends on the application as well as the particular function it performs.
As with the 32-bit versus 64-bit tests, it all depends on your application and what you’re doing with it. If you’ve got an application that you really need optimized, trying with the three compilers listed here may be worth your time, especially since you can use Sun’s compiler free in an evaluation mode.
Seeing that GCC performance isn’t all that far off from Sun’s compiler suite (and in a few operations, bests Sun), says quite a bit about the quality that GCC has been able to achieve, especially considering the magnitude of platforms that GCC operates (beautifully) on.
Related reading:
Are 64-bit Binaries Really Slower than 32-bit Binaries?
My Sun Ultra 5 And Me: A Geek Odyssey
A general note on benchmarking
Don’t know either way, but it would be interesting to know what the COMPILE times were, to see how the Sun and GCC compilers faired for just daily coding.
Also, all of these programs are (I believe) pure C, some C++ programs would be very interesting as well.
(I know, he goes through all of this effort, and yet all we do is criticize anyway. Not my intention, seemed like a good effort.)
Thanks so much, I’ve really enjoyed your series so far on your old sparc box. Keep them coming!
My only question, I have a feeling you’re gonna address in future articles, did you breath new life in to the sparc? Or is it gonna sit next to your desk after all this is over?
What would be great now is have some comments from the gcc team. Why is gcc slower ? Is it just some very specific areas(e.g. floating points, long long arithmetic) ? What’s planned in the future for getting gcc to produce faster code ?
Another thing I miss in the comparison is standard compliance. How well does gcc and the Sun compiler conform to the relevant C and C++ standards ?
Its definitely a good article. A real joy to read. Keep up the good work.
I think this suggest that the earlier 32bit is faster than 64bit conclusion is in fact entirely dependent on GCC. Sun’s compiler seems to consistantly generate the same or faster 64bit code than its 32bit version.
What would be great now is have some comments from the gcc team. Why is gcc slower ? Is it just some very specific areas(e.g. floating points, long long arithmetic) ? What’s planned in the future for getting gcc to produce faster code ?
gcc for sparcv9 is relatively new. Thanks to gcc’s modular backend, the difference isn’t severe in these examples, although from personal experience I’ve seen much more drastic differences in performance of binaries compiled with the Forte Compiler Collection tools versus gcc. When I tried compiling one of our grid analysis tools here (which makes extensive use of 64-bit integers and floating point math) with both compilers, the version compiled with the Sun compiler ran about 40% faster than the version compiled with gcc 3.3.
Sun has been fine tuning their compilers for sparcv8/v9 for over a decade, and it certainly shows.
Another thing I miss in the comparison is standard compliance. How well does gcc and the Sun compiler conform to the relevant C and C++ standards ?
Per default gcc has some very odd behavior. With the -ansi -pedantic flags gcc behaves a bit nicer. Obviously Sun’s compiler doesn’t support a lot of the gcc-specific extensions which some have seen as fit to use (i.e. variable argument macroes, nested functions, etc) I believe it’s the responsibility of all programmers to ensure their code is portable across a number of compilers and not bound to a particular toolchain.
Most of the problems you’ll run into trying to compile code developed primarily with gcc/x86 without a lot of portability testing are going to be in things like endianness issues and addressing misaligned words, which is, in my opinion, a coding error.
Yeah, that was a big surprise, which seemed to mean that 64-bit binaries with Sun’s compiler, despite applications not specifically writen for 64-bit, seem to be faster.
Still, the fact that performance wasn’t all that far off says a lot about the quality of GCC, and why it’s in such wide use. They’ve done a fantastic job.
What about a benchmark that tests the more advanced C++ features such as the STL? That would be nice.
But other than that it is a very good article. The performance of GCC is really impressive. I wonder if this is why KDE3.2 is so fast.
All the programs examined here appear to be integer heavy. Sparc chips are not great integer performers, those who buy them are not looking for integer performance.
Floating point performance is where Sparc really shines, and is a spot where GCC is traditionally bad. The Sparc compiler does amazing things to floating point heavy code. The code I run is fp heavy, and the sparc compiliers make a noticiable difference.
The author of this article has really missed the boat. If you want to run standard GNU/Linux type stuff like what was benchmarked here, buy a cheap intel box and run linux. But if you need fp, get the Solaris stuff.
Yeah, that was a big surprise, which seemed to mean that 64-bit binaries with Sun’s compiler, despite applications not specifically writen for 64-bit, seem to be faster.
It’s not that much of a surprise considering the majority of executables that Sun ships with Solaris are 32-bit…
These comparisons are great !
Now, anyone up for benchmarking Tendra vs gcc on x86/linux/*bsd ?
First off good article.
But with the benchmarking CFlags for gcc, I think it would be intersting to use: “-O2 -march=ultrasparc -fomit-frame-pointer” instead of “-O3 -mcpu=ultrasparc”.
March generates it for the cpu specified W/O backwards compalibility. So it should in theory generate faster code
Also I have found that -O3 slows some programs up, And -fomit-frame pointer always seems to speed up programs but only slightly.
Although I think the conclusion will be the same that the sun compiler generates better code…
Is Tendra even running on Linux?
And if so, where is the rpm?
I too would like to see some in depth c++ comparisons, if the author could provide them.
Thanks
Wouldn’t the easiest way to keep I/O speed out of the equation for the gzip tests be to redirect the output to /dev/null ? Ie: time gzip -dc /some/file.gz > /dev/null.
Good idea, I hadn’t thought of that. I’m traveling now, so I don’t have access to the system, but I’ll give it a try when I do.
Is that Sun has the same advantage as SGI, they do produce their own hardware so Sun’s compilers will perform higher than GCC because they are optimized for that platform.
Why no IO benchmarks? The reason for me to get Sun has always been the ability to run a superior OS (SunOS). Now there is Linux for the Intel CPUs, no reason to get Sun. Why get a Sun now? IO on Intel still chokes on heavy IO (I2O? Dunno… never cound find a board that supported it OK) Hows about someone give Tony a 450 loaded with RAM and disks so we can see some IO benchmarks?
Yes. If you take that into account, the performance of GCC is even more impressive.
Much of the gcc effort is revolving around x86. Thus if indeed gcc comes so close to Workshop’s performance it’s very impressive. However, as Jason pointed already to make any kind of statement author should have performed full CPU2000 suite. Just gzip is not enough to evaluate anything.
gcc 3.x’s -O3 sometimes produces marginally faster code on x86 cpus (its not worth it; stick to -Os for general things and -O2 for hot spot code that gets executed a lot).
I’ve found -O2/-O3 being faster not to be true on other architectures. For instance on my alpha. gcc 3.x’s -Os produces the fastest code (faster than -O2, -O3 or -O).
I think the Sun compilers would perform a bit better than gcc on higher end Sun hardware. A U5 even in it’s day wasn’t a terribly impressive machine compared to the U60 workstation line. Today’s workstations vs desktops are even more spread out if you compare CPU cache and memory bandwith. A SunBlade 2k verses a Blade 100 is a signifigant difference.
-Os produces buggy code sometimes. It is not advisable to use it for most code. I know Gnumeric, Abiword and some GNOME games segfault and crash with -Os.
But Tony, Sun’s compiler is so expensive! Yes, it is expensive, especially when compared to the free (as in beer and freedom) GCC. However, Sun does offer a free 60-day evaluation license for their compiler suite, which is what I used.
For Sun’s compiler, I used the 60-day trial for Sun ONE Studio 7, which can be found here. It includes C, C++, and Fortran compilers, as well as Java other development tools. The full version lists for $2,995.
It is possible to obtain these compilers for a substantially lower price. See:
http://wwws.sun.com/software/cover/2003-1027/index.html
If you run your own small business, you could get it for as low as $105/yr, which, compared to the other price is quite a deal.
Mr. Bourke,
What’s the difference between the compilers when profile directed optimization comes into play?
In regards to I/O bound issues, as an addition to /dev/null, you could try running inside of /tmp and/or switching your UFS filesystems to logging,noatime.
Yours truly,
Jeffrey Boulier
I would greatly appreciate it if the actual numbers from the tests were given. By just looking at the graphs, it is difficult to tell just how pronounced the differences are.
It is good to see someone at least trying to verify the “Sun vs. GNU” statements.
Geoff
Yes. If you take that into account, the performance of GCC is even more impressive
Not really. The vast majority of code optimizations are not platform specific. gcc sports a modular backend which makes it easy to port to other architectures. When faced with complex mathematical code, I’ve found (see my previous post in this thread) gcc to be a poor performer on sparcv9 (doing calculations on based on large sets of 64-bit fixed point grid coordinates)
Reading your story about how gcc is almost as fast as Sun’s compiler, I thought I’d try it for myself given that I have both of them handy. I use Sun’s Forte compiler on my product on Solaris/SPARC.
Here are my compiler specs that I currently have on a 300MHZ Ultra 60.
gcc: gcc version 3.3
cc: Forte Developer 7 C 5.4 2002/03/09
I’m doing 32 bit only, for now.
The application will be a memory manager, so one can safaly say that this is integer based through and through. Lots of loads and stores with many register ops in between. There isn’t any reading/writing to/from disk, so no bottlenecks there. Just straight load/store/register operations.
Compiler flags used: I don’t know if I’m using the right compiler flags for gcc, and in fact, maybe I could use better compiler flags on cc as well, but never the less, here’s what I’m currently using. If anyone want to suggest better flags to try, pelase let me know and I can rerun the tests.
Compiler flags SPARC and INTEL platforms:
gcc flags: -O3
-fexpensive-optimizations
-finline-functions
-ffast-math
-fomit-frame-pointer
cc flags: -fast
The numbers are the time it takes to complete the test, so lower numbers are better:
gcc: 55s
cc: 44s
Did each run 5 times, took the average.
Sun is 25% faster by my math.
Sun’s compiler is much faster. It takes longer to compiler the application, but that’s not what is important. What’s important is how fast the resulting binaries are.
On the INTEL side, I get the following results from using the following
compilers.
gcc: gcc version 3.3.2
cc: cc: Sun WorkShop 6 update 2 C 5.3
2001/05/15
Execution times (lower times are better):
gcc: 95s
cc: 95s
On INTEL, in my application, they are equal.
Any questions, email me: [email protected]
Jim
This may well be a minor point, but I think it would be helpful if Tony ran his benchmarks more than 3 times a piece and averaged the results. While he claims that his numbers were pretty much consistent, I think it would be instructive to run the benchmarks a large number of times to be sure that his data is mostly accurate. With a large dataset, you can easily pick-off the outlying points, and have a better idea of what your distribution is. My experience is that you mostly get a lot of hits around one datapoint, however, finding bi-modal cases can be an indication of complex/interesting behavior that might warrant further investegation. It’s a minor point, but it would certainly lend additional credibility to his claims.
that for us it ain’t worth investing in proprietary compilers.
I work for a small company producing an open, cross-platform Seismic interpretation platform (www.opendtect.org).
We compile our suite with gcc on all platforms (including win32) and that is where the great benefit of gcc lies: it is the same on all platforms. If it compiles on Linux, it almost certainly also compiles on Solaris, SGI and even win32 if you leave differences in api’s out of the equation.
Whatever you do with templates and whatever clever C++ constructs you might come up with, if it compiles on one platform, it compiles everywhere.
And yes, it might cost some performance, but I don’t think a performance loss in the order of 5-10% average is a big deal, compared to the benifits of having a single, standard compliant compiler across all platforms.
This is not only benifitial to the developer (after all, the customer ‘pays’ the performance penalty, not the developer), because if I spend less time porting and debugging, I can spend more time on developing new features or write better documentaion.
By the way, beside the compiler there is also gdb, which is also the same across all platforms, even tough the SGI one does not support debugging of multi-threaded applications.
So, I personally prefer the cross-platform benifits of gcc over performance gain of proprietary alternatives without one second of hesitation.
If you’d like to see a bit more performance difference,
try a really FP intensive code like Dyna, use the latest compiler (currently version 8), enable full optimization
at least “f90 -fast -v9b” or similar, and run on a modern machine (UIIICu, UIIIi or UIV processor).
For the the gcc compiler -march=v9 should work better.
-funroll-all-loops too.
-fomit-frame-pointer could give you 10% more performance.
You should also try -Os and -O2
So, okay. This is a decidedly integer-heavy test. Given that at least one other person has commended on the floating point focus of scc and the solaris platform, I’d love to see some good heavy floating point done. Since we’re talking about real-world performance, it might also be a good idea to get some fixed-point in there.
I’m a little surprised that you’re using GCC 3.3.2, when GCC 3.5 is out. GCC 3.5 produces impacts on ARM7 code speed as much as 15%; I would be excited to learn how it performed on the Sparc, both in comparison to scc and to older GCCs.
I’d also like to see how the compilers hold up to Dhrystone, under general-approach algorithms like the Mersenne Twister or Boost’s Lagged Fibonacci RNG generators; to large-scale large-variance code like the Boost regression tests, the Loki tests and some or another STL test suite; to large patterned number maniuplation like GIMPS or a GMP rigor test; et cetera.
This is a wonderfully neat page, but it could use some work in the way the tests are done. I suggest a look at David Welch’s compiler performance page for the GameBoy Advance, which though just one test is a test done in a rather more rigorous fashion. http://www.dwelch.com/gba/dhry.htm
Your articles are great, but I would suggest running the test more than three times, say at least ten times, to be able to use common statistical tools to evaluate significance of your results.
I’m not an expert (far from it) in compiler technology, but it seems to me that there would be some improvement in performance with the GCC compilers if they had both been built from source with the Sun cc compiler. The problem I see with the setup used here is that a binary version of the GCC 3.2 compiler was installed so it was likely optimized for a different peice of hardware or not optimized at all and then the 2.95 version was built with this non-optimized compiler.
I recall that the SPARC processor has an equivalent instruction set to MMX. I also recall that the Sun C compiler supports this and that GCC does not. For those of us involved in signal processing and image processing this would be a big win.
Also this would seemingly be a very big win for the Sun C compiler.
Any thoughts on this?
Also, it would be really nice to see some benckmarks with more floating point intensice operations.
– Andrew
While the article is interesting, it neglects to mention that Sun’s C compiler supports a lot of things that gcc does not, like code autovectorization (-xautopar) and OpenMP support. I’ve found that these can provide big wins on SMP machines (I’ve done some testing on a 6-processor V880 at my university).
Take a look at the cc man page – http://developers.sun.com/tools/cc/documentation/s1s8cc_documentati…. About half of it is dedicated to the various optimization flags.
Hi Tony,
Your article gives an interesting reading and it is very close to my own experience using SunOS and Solaris for longer than 15 years. Yet, I must differ with you on the actual rigourousness of the tests, because the optimization flags you used are *very different*; as it is your performance comparisons are between equivalent to comparing apples and oranges. In order to have equivalnet testing conditions, you should have used “gcc -O3 -march=ultrasparc -mcpu=ultrascparc [-m64]”, because using -mcpu alone, you one set up timer switches but take no advantage of particular register optimizations available in the target architecture. On the same venue, “-xfast” is the bane of the SunPro compilers, it creates binaries that as a matter of fact, contain ABI imcompatibilities with system shared libraries!!! Rather, you should have used “cc -xO3 -Olimit=<something very high> -xarch=v8plusa|v9a” to have equivalent binaries and therefore a valid comparative test.
In summary, if I were your technical editor or your academic supervisor, I’d have you repeat all the experiments with an adjusted experimental model.