Message boards :
Application Code Discussion :
compiler optimization flags
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 ![]() ![]() |
Can anyone suggest good compiler optimization flags for the different platforms we're compiling for? ![]() |
Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0 ![]() ![]() |
Mac PPC: -arch ppc -O2 -maltivec -mabi=altivec -mcpu=7400 Mac x86: -arch i386 -O2 -msse2 -mfpmath=sse -mtune=prescott Mac x86_64 (Mostly untested): -arch x86_64 -O2 -mfpmath=sse -mtune=nocona EDIT: Oh yeah, for the Intel platforms, feel free to add -msse -msse3 or -mssse3 too. I've just found SSE2 has the biggest impact. |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 ![]() ![]() |
Can anyone suggest good compiler optimization flags for the different platforms we're compiling for? GCC 4.3 is recommended for Linux. i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math x86_64-pc-linux-gnu: -O3 -funroll-loops -ffast-math -ftree-vectorize Visual Studio 2008 is recommended for Windows. windows_intelx86: -Ox -GL -fp:fast windows_x86_64: -Ox -GL -fp:fast Finally, not for performance, but for compatibility, you should try to eliminate the dependency on some dynamic libraries, as I pointed out here. HTH ![]() |
Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0 ![]() ![]() |
i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math Hmm, forgot about those flags... BTW, x86_64 machines are guaranteed to have at least SSE2 or 3 (I think it's 3), so that can be enabled for that build. EDIT: Scratch that. Early AMD64 architectures only had SSE2, so that's the safe one. However, it would enable some nice speedups compared to non-SSE compilation. I'm also going to try some of those other flags on my machine, provided I can get some work... |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 ![]() ![]() |
Dev-C++ also enables '-fexpensive-optimizations' if you tell it to 'Perform a number of minor optimizations'. Is this one worth it? |
![]() ![]() Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0 ![]() ![]() |
Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 ![]() ![]() |
Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use. This is an urban legend nowadays. -O3 has been solid since GCC 2.95. Why else would SPEC benchmarks be submitted using this very same option with GCC then? HTH ![]() |
Send message Joined: 29 Jul 08 Posts: 9 Credit: 2,200,784 RAC: 0 ![]() ![]() |
Here are the opt flags I'm using on a Core2 Duo in 32bits Linux (gcc 4.1): -O2 -fomit-frame-pointer -frename-registers -fweb -fexpensive-optimizations -fno-strict-aliasing -march=i686 -msse3 -mfpmath=sse Note about flags seen above in this thread: -ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway). |
![]() ![]() Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0 ![]() ![]() |
I've tried O3 previously with this project and it has caused run time errors so i had to go back down to O2 . Might give it a shot again though. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 ![]() ![]() |
-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway). Not so. It indeed relaxes floating-point exception handling and can result in slightly different results, though seldom different enough to be noticed when outputting the results in decimal format. It's still quite usable and IS used by other projects, including SETI (I did use it when I did the official port of SETI Classic to x86-64). HTH ![]() |
Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0 ![]() ![]() |
From man gcc on my system: -ffast-math gcc version 4.0.1 (Apple Inc. build 5488) |
![]() Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 ![]() ![]() |
-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway). Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math. ![]() |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 ![]() ![]() |
And here's what they mean:
![]() |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 ![]() ![]() |
Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math. As I explained above, it should still be accurate enough (down to 1 or 2 ULPS), well within the error margin of finite floating-point math. HTH ![]() |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 ![]() ![]() |
Allow me to repeat here a suggestion to make the application more portable across quite differently configured Linux systems. libgcc should not be linked dynamically, for it then requires that the volunteer systems have the same version of GCC as the one used to build the application. Rather, specify the option -static-libgcc when linking to link it statically. libstdc++ may cause the same compatibility grieves, but it is a bit more involved to link it statically. Namely, when linking, use "gcc" instead of "g++" and specify the options "-Wl,-Bstatic `gcc -print-file-name=libstdc++.a` -Wl,-Bdynamic". See also this. HTH ![]() |
Send message Joined: 29 Jul 08 Posts: 9 Credit: 2,200,784 RAC: 0 ![]() ![]() |
And here's what they mean: Here what 'man gcc' says: -funsafe-math-optimizations Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link-time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. This option is not turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math. The default is -fno-unsafe-math-optimizations. I can assure you that using -ffast-math in optimized apps such as Seti's can lead to INVALID results (i.e. results considered as not precise enough when Seto@Home validates your results by comparing it with others.
One or tow bits of mantissa, perhaps, but for *each* operation: the result after many consecutive ops can be quite significant. Let me give you an example. Let's consider we only have 7 decimal positions of precision for a FPU (there are much more in modern FPUs, but that's just to make it easier in this example), and take this simple operation: 15 * 10 / 1000000000 = 0.00000015 (truncated as 0.0000001 because of or 7 decimals limitations) should it be optimized (for example, because of out or order ops optimizations) as: 10 / 1000000000 * 15 then you get 10 / 1000000000 = 0.000000001 = 0.0000000 (7 decimals) and 0.0000000 * 15 = 0.0000000 in the end... Believe me, the above effect is far from negligible... |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 ![]() ![]() |
One or tow bits of mantissa, perhaps, but for *each* operation: the result after many consecutive ops can be quite significant. Because that's one decimal digit not a bit of difference. Besides, all FP operations have an average error of 0.5 bit by definition. We're talking about a difference smaller than 15 decimal digits! If the output of the application is truncated to the default 5 digits, it'll never even show up. HTH ![]() |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 ![]() ![]() |
And here's what they mean:
![]() |
Send message Joined: 29 Jul 08 Posts: 9 Credit: 2,200,784 RAC: 0 ![]() ![]() |
You don't seem to understand that in a chain of many operations (or worst: in a loop with the same operation using the results from the previous iteration, such as in suites), your 15th decimal error will grow to the 14th, then the 13th, etc.. and this at each dozen of operations. In the end, the error might show on the 5th, 4th or even third decimal, depending on how many loops you went through... and precisely, calculations such as BOINC's all rely on complex calculations done within numerous loops. Don't use -ffast-math. Period. |
Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0 ![]() ![]() |
This isn't related to the -ffast-math discussion, but it looks like the x86_64 compile for Linux isn't actually doing x86_64. The i686 target has -m32 in the CXXFLAGS, but x86_64 doesn't have a -m64 flag anywhere. Good to see SSE2 is enabled though. |
©2025 Astroinformatics Group