“I have a Pentium 3 866MHZ CPU. After reading the freshmeat article on optimizing GCC a few days ago, it got me thinking. So I posed the following question: How much faster would gcc compile the kernel if gcc itself was optimized?” Read the rest at Linux Gazette.
Optimizing gcc itself is a bad, BAD idea. The packages glibc, binutils, and gcc together make up the gnu “toolchain,” or a sane building environment. None of these packages should /ever/ be optimized (although somewhere in the glibc docs there are a set of minimal optimizations that are said to work). This includes both -march and -mcpu options, and virtually anything that starts with a -f. -O2 is usually the default. Always unset those CFLAGS and CXXFLAGS before you compile any of these packages! Finally, remember that they like to be built in their own directory, and not in the untarred source directory.
Cool. As a newbie it’s good to see how to do this stuff. Right now I use Knoppix, but I’ll be compilin’ soon.
There’s so much cool knowlage in the Linux world. It’s too bad there’s no way to bottle it and use it to promote the cause.
One thing you learn when writing compilers is that you can have a fast compiler or an accurate compiler. The two do not generally go hand in hand.
It’s your gun and your foot. Don’t say no one warned you.
One thing you learn when writing compilers is that you can have a fast compiler or an accurate compiler. The two do not generally go hand in hand.
I beg to differ. At Borland, we created many fast and accurate compilers, starting with Turbo Pascal, Turbo C, and going onto Turbo Prolog, Borland C++, Delphi, etc.
You need skilled compiler writers to get both speed and accuracy, but it is quite achievable.
Aggressively using optimization flags on your compiler can lead to bad code generation, certainly. But that doesn’t guarantee that “no optimization” is putting out proper code either.
Proper testing will find the bugs and then you can fix them. At Borland, not only did we have a large and very skilled internal QA staff, we also had active alpha and beta tests and giant internal bug hunts with incentives. It worked and Borland has traditionally been well rated when it comes to quality code generation.
–ms
The gcc compilation itself is optimized – the created gcc binary will run faster than it would be when compiled without optimization.
That all doesn’t change the code that gcc will create.
To improve the code output, you need to optimize the optimziers, manually, by hand. Make it a better program by programming something better!
(General rule to all Open Source.)
And how many of his options are already included in -O3 and -march=CPU? And -mcpu has no influence if -march is specified too.
And how many of his options are already included in -O3 and -march=CPU?
None of the options were covered by -O3 (I’m a bit surprised that -funroll-loops isn’t).
I’m not exactly sure what -mmmx and -msse actually do. I thought they controlled the generation of MMX and SSE instructions and were implied by -march, but now that I read it again it looks like they just enable the use of MMX and SSE intrinsic functions. OTOH I’ve read reports that GCC sometimes generates broken code for some targets when they aren’t specified, so it may not be a bad idea to include them.
According to the gcc-3.3 documentation that describes precisely what optimizations -O2 and -O3 activates, -fomit-frame-pointer is included in -O2. Also -mcpu=, -mmmx and -msse are all included in -march=.
I think he also overlooked the two flags -ffast-math and -fstrict-aliasing. They should mandatory for any optimized builds.
“Optimizing gcc itself is a bad, BAD idea. The packages glibc, binutils, and gcc together make up the gnu “toolchain,” or a sane building environment. None of these packages should /ever/ be optimized ”
If optimizing these packages causes problems then surely this means that either your compiler is messed up (since the results of the code are different depending on optimization) or the calling code is using the libraries in undocumented ways (relying on the exact compiled form of the code, rather than the APIs)
Basically this sounds like rubbish to me.
According to the gcc-3.3 documentation that describes precisely what optimizations -O2 and -O3 activates, -fomit-frame-pointer is included in -O2.
Not AFAICT. The current (pre-3.3) docs say that -O turns on -fomit-frame-pointer for targets where it doesn’t effect debugging, but none of the -O* options seems to turn it on universally. I just tested -O2 and -O3 with GCC 3.2.2/x86, and frame pointers were still being used in both cases.
I think he also overlooked the two flags -ffast-math and -fstrict-aliasing.
-fstrict-aliasing is included in -O2.
I’m a little wary about using those options on random programs, since the former can change the FP semantics, and the latter depends on people not doing strange things.
In my experience only the very latest gcc/glibc actually benefits from several of the pentium4 optimizations. I am currently using glibc-2.3.2_pre1 and gcc- 3.2.2-r1, this has enabled me to compile all packages on my system with CFLAGS/CXXFLAGS=”-march=pentium4 -O3 -pipe -mmmx -msse -msse2 -mfpmath=sse”.
Previous attempts to use such optimizations failed miserably due to bugs with gcc itself. The only thing which I don’t optimize on my system is the kernel itself- but I have used these setting for virtually everything else on my system
(KDE3.1-r2/GNOME2.2/e17/XFCE-cvs/Mozilla-1.31alpha/Phoenix-0.5/
Galeon.1.31/GIMP1-.3/GAIM-cvs/Abiword-cvs/OpenOffice-1.0.2 etc.)
And each time I compile a new version of gcc/glibc I also re-compile the entire toolkit with the same optimizations-ensuring self-consistency.
There are so many steps involved in tweaking the last uumph out of your linux system- and it really is a work of art to pull it off- I have used many different kernels and all sorts of optimization combinations-yesterday I finally used -noatime and -notail for my reiserfs file system: The single biggest performance boost I have yet to see-now I can have gnome2.2 running using gnome-terminal to compile the latest j2sdk from source (nice -n 19)while browsing with mozilla while running e17 in a seperate login with two eterms and run Unreal Tournament at full speed (this with an apache webserver running for my dyndns pseudo-domain and a mysql for my answering-machine software for my isdn card-which keeps track of all incomming phonecalls and manages my telephone book app and ntfsd/sshd/dhcp server/squid)…
now that is multitasking…and this on a 1.3Ghz Pentium4 w/ Geforce4mx440 w/ 768 MB RDRAM and 120 GB over 2 harddrives. And I am only using the gentoo linux-2.4.19-r10 kernel with ken Collivas patches….all the moaning about desktop multimedia multitasking is exactly that: moaning. If one takes the time and pays attention to the details one can get a 70%-100% speed improvement over a stock install of any distro. But understanding how all the myriad of inter-related configurations work is not easy, and is only worth it for those who enjoy the challenge! The work I have invested in getting my machine to run stable and fast has kept me from investing in a CPU upgrade- I bought the 423pin-williamette to 478pin-northwood adapter($40) and due to the recent improvements to system speed have yet to feel the need to actually go and buy a 2.4Ghz 400MHz FSB Pentium4.
of course Gentoo is what is enabling me to do this…..
tips for speed:
use the latest tookchain(gcc3.2.2/glibc >=2.3.1)
optimize your compilations-the latest gcc/glibc can use aggressive optimizations- experiment-your mileage may vary. Try to use the optimization setting which work for all programs
use ext3 or resierfs and if you use resiserfs use -noatime and -notail
the more memory the better >=512 MB means not having to hit swap
carefully tune your kernel config, eliminate unnecessary stuff
Someone already mentioned that optimizing gcc/glibc/binutils is dangerous and in my experience with compiling (which is a year and a hald of compiling my systems) is that these packages can be optimized, but they require a huge amount of testing to make sure that they actually work. I’ve optimized the three and found out days later that a problem that I was having was because I optimized them to the point of breaking my system in strange ways.
Just because they compile with optimizations doesn’t mean that they have been built stable enough to compile everything else for you. It’s just a matter of finding those minimal optimizations that will work for you. And considering how long those packages take to build, along with the fact that errors may take days/weeks to show suggests a long debugging process.
Just some food for thought.
Optimizing is an art
From my experience, the:
“make BOOT_CFLAGS=<optimization flags> bootstrap”
doesn’t end up comprehensively applying those flags to your final build. Try specifying:
BOOT_CFLAGS=”-O3 -march=pentium3″
for instance, and capture all the output of make with tee, then look at how many “-g -O2” lines appear in the output. The flags specified with BOOT_CFLAGS don’t seem to propagate successfully throughout the build tree.
Re infwis:
There are still known codegen bugs related to SSE/SSE2 in gcc itself, and as someone who found some of them and have looked at the Gentoo ebuilds — I know you won’t experience any speedups, because the ebuilds explicitly disable any P4 arch settings from your CC/CXXFLAGS.
GCC also produces pretty shitty code for the P4 IME. I don’t know why, it’s probably an issue of tuning to some architectural pecularities.
>And -mcpu has no influence if -march is specified too.
Certanly it has. e.g. -march=i386 -mcpu=i686 (this is what Redhat compiles most packages with) -> generated code only uses the i386 instruction set, but is optimized for an i686. optimized as in the layout of the code(better branc prediction, more cache hits) and similar. The diffrence with using only -march=i686 (that implies also -mcpu=i686) is very, very small.
The best way to speed up compiles is to add RAM to your machine. Compile on your speedy disk (the scsi ones not the IDE) and add the majic “-j” switch to your make statement.
Ludovic
—
http://homepage.mac.com/softkid/
Running gentoo here… gentoo already optimises gcc to whatever flags you have set.
🙂
– which is why I have edited the ebuilds myself, fine turning them. Several ebuilds simply do a clean slate as regards optimizing, overriding your settings in /etc/make.conf.. The recent gcc/glibc ist not having those same errors with SSE/SSE2, AFAIK.
While I agree that Delphi compiles very fast, C++Builder certainly does not. I’m currently working on a rather large project using C++Builder, so I know what I’m talking about. Even the modification of only one line in the sources results in compilation times of several minutes.
On a related note, I don’t understand how the compiler handles dependencies. Often I observe lots of source being recompiled that shouldn’t have been affected by my code changes.
MS Visual Studio seems to handle these things much better. OTOH C++ Builder is *much* better for building GUIs.
I think you’re forgetting that compilers are more likely to output broken (ie. buggy) code when used on high optimisation settings, especially on non-x86 targets.
gcc is pretty slow to compile (and getting slower, see thread on gcc-devel), but I’d rather use ccache/gcc than risk being bitten by generation bugs.
–Jon
>And -mcpu has no influence if -march is specified too.
Certanly it has.
So, is there anything in the i386 section of the manual that isn’t wrong, misleading, or incomplete?
> The diffrence with using only -march=i686 (that implies also -mcpu=i686) is very, very small.
Not when you are talking about i686, it happens to be the architecture that enables cmov instructions, they alone give a speedup on 10-15% on byte-manipulation and parsing code. On gcc I would guess they count for a large part of the speed-up.
anyone has tried to compile gcc using icc???
How fast is it?
Several of those packages you listed won’t even compile with -march=pentium4. So perhaps maybe you were hallucinating or were high when you thought you edited them for that and built them.
As for [email protected]:
Currently the -march=i386 option is equivalent to -mcpu=i386. There is no difference. That RedHat uses it is stupid, as the chances of gcc adding some i386 specific optimizations at this time is probably about the same as hell freezing over.
The only -march flags that are different from -mcpu on x86 are for CPUs that have mmx, sse, sse2, or 3dnow insns (i386 does not).
Quote:
“So perhaps maybe you were hallucinating or were high when you thought you edited them for that and built them.”
nope…
Quote:
“Several of those packages you listed won’t even compile with -march=pentium4.”
To date I have only found 2-3 apps which won’t compile with these settings, libmpeg was one of them, and there are certainly some apps which I have yet to recompile with these settings, however I have been using these settings and editing ebuilds to take advantage of them for the last couple of weeks now and I have recompiled most of my system this way… Perhaps your experiences differ
GCC has never been known to be a fast compiler. In fact on Mac OS X alone, GCC (even 3.1) lags far behind the commercial C compilers in compile time and speed of code generated (sometimes up to 20x slower than Metrowerks on compile time). Apple has been doing work to get GCC up to speed and it seems to be helping. (The introduced a pre-compiled headers mechanism, etc…) But man, I hope things get better, GCC has a way to go..
yeah, for example mozilla blows up:
http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&d…
KDE also explodes:
…
-O2 -march=pentium4 -O3 -pipe
…
./’ `compute.c compute.c: In function `generate_sector’:
compute.c:175: unable to find a register to spill in class ‘FLOAT_REGS’
compute.c:175: this is the insn:
(insn 187 184 189 (set (reg:DF 9 st(1) [117])
(float_extend:DF (subreg:SF (reg/v:DI rxmm0 [82]) 0))) 133
{*extendsfdf2_1} (nil) (nil))
compute.c:175: confused by earlier errors bailing out
make: *** [compute.o] Error 1
glibc also has some hidden bugs that I’ve yet to track down, but probably not SSE related.
—
Re: GCC has always been slow
Another Apple guy is apparantly working on a “compile server.” God, it’s sad.
you refered to this link
<a href=”http://gcc.gnu.org/cgi-bin/gnatsweb.pl?cmd=view%20audit-trail&a…
gcc version 3.2.2 20021218 (prerelease)
well that is somewhat older than the version I am using(3.2.2-r1-6 weeks later) and glibc has changed sinc then to(2.3.2_pre1)…mozilla compiles fine for me with pentium4-did it today with mozilla-1.3beta….
as for kde-I believe I only had a problem with arts and kdemultimedia….
The bad news is, If after using optimization, the program behaives diffrently: THE COMPILER”S OPTIMIZATION IS BROKEN, and optimization does not occur. If even regular compliing gives you unpredictiable results.. the the program is not a compiler, its a poor syntax checker, as its code generation is garbage. A compiler needs to create stable, usable and repeatable binaries for libraries and executables.
Quoted from the web
“make … Debug=opt
Build the client programs optimized but also with symbolic information for the debugger. These monsters are sometimes needed when chasing for compiler optimization bugs. The binary files produced will have a suffix -dO. “
Currently the -march=i386 option is equivalent to -mcpu=i386. There is no difference. That RedHat uses it is stupid, as the chances of gcc adding some i386 specific optimizations at this time is probably about the same as hell freezing over.
I tried compiling with -march=i386 -mcpu-athlon-xp and just -march=i386 and they did indeed produce different code. For example, the former used add and sub to adjust the stack (which should be faster), while the latter used push and pop. I don’t how significant little things like that will be, but Red Hat’s approach does have some effect.
> The bad news is, If after using optimization, the program behaives >diffrently: THE COMPILER”S OPTIMIZATION IS BROKEN, and
Obviously, such differences are called bugs 🙂 Bugs are more likely to occur on high optimisation settings since a. they’re doing much fancier things, and b. they receive less testing that lower settings.
If anyone has been using GNU C/C++ since 2.6, they’ll remember that they had to add the ‘-fno-strength-reduce’ flag to avoid one such bug.
–Jon