| log in |
Message boards : Application Code Discussion : compiler optimization flags
1 · 2 · Next
| Author | Message |
|---|---|
|
Can anyone suggest good compiler optimization flags for the different platforms we're compiling for? | |
| ID: 6639 · Rating: 0 · rate:
| |
|
Mac PPC: -arch ppc -O2 -maltivec -mabi=altivec -mcpu=7400 | |
| ID: 6646 · Rating: 0 · rate:
| |
Can anyone suggest good compiler optimization flags for the different platforms we're compiling for? GCC 4.3 is recommended for Linux. i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math x86_64-pc-linux-gnu: -O3 -funroll-loops -ffast-math -ftree-vectorize Visual Studio 2008 is recommended for Windows. windows_intelx86: -Ox -GL -fp:fast windows_x86_64: -Ox -GL -fp:fast Finally, not for performance, but for compatibility, you should try to eliminate the dependency on some dynamic libraries, as I pointed out here. HTH ____________ | |
| ID: 6649 · Rating: 0 · rate:
| |
i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math Hmm, forgot about those flags... BTW, x86_64 machines are guaranteed to have at least SSE2 or 3 (I think it's 3), so that can be enabled for that build. EDIT: Scratch that. Early AMD64 architectures only had SSE2, so that's the safe one. However, it would enable some nice speedups compared to non-SSE compilation. I'm also going to try some of those other flags on my machine, provided I can get some work... | |
| ID: 6650 · Rating: 0 · rate:
| |
|
Dev-C++ also enables '-fexpensive-optimizations' if you tell it to 'Perform a number of minor optimizations'. Is this one worth it? | |
| ID: 6659 · Rating: 0 · rate:
| |
|
Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use. | |
| ID: 6741 · Rating: 0 · rate:
| |
Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use. This is an urban legend nowadays. -O3 has been solid since GCC 2.95. Why else would SPEC benchmarks be submitted using this very same option with GCC then? HTH ____________ | |
| ID: 6742 · Rating: 0 · rate:
| |
|
Here are the opt flags I'm using on a Core2 Duo in 32bits Linux (gcc 4.1): | |
| ID: 6745 · Rating: 0 · rate:
| |
|
I've tried O3 previously with this project and it has caused run time errors so i had to go back down to O2 . Might give it a shot again though. | |
| ID: 6750 · Rating: 0 · rate:
| |
-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway). Not so. It indeed relaxes floating-point exception handling and can result in slightly different results, though seldom different enough to be noticed when outputting the results in decimal format. It's still quite usable and IS used by other projects, including SETI (I did use it when I did the official port of SETI Classic to x86-64). HTH ____________ | |
| ID: 6773 · Rating: 0 · rate:
| |
|
From man gcc on my system: -ffast-math gcc version 4.0.1 (Apple Inc. build 5488) | |
| ID: 6774 · Rating: 0 · rate:
| |
-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway). Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math. ____________ | |
| ID: 6775 · Rating: 0 · rate:
| |
|
And here's what they mean:
| |
| ID: 6777 · Rating: 0 · rate:
| |
Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math. As I explained above, it should still be accurate enough (down to 1 or 2 ULPS), well within the error margin of finite floating-point math. HTH ____________ | |
| ID: 6778 · Rating: 0 · rate:
| |
|
Allow me to repeat here a suggestion to make the application more portable across quite differently configured Linux systems. | |
| ID: 6784 · Rating: 0 · rate:
| |
And here's what they mean: Here what 'man gcc' says:
I can assure you that using -ffast-math in optimized apps such as Seti's can lead to INVALID results (i.e. results considered as not precise enough when Seto@Home validates your results by comparing it with others.
One or tow bits of mantissa, perhaps, but for *each* operation: the result after many consecutive ops can be quite significant. Let me give you an example. Let's consider we only have 7 decimal positions of precision for a FPU (there are much more in modern FPUs, but that's just to make it easier in this example), and take this simple operation: 15 * 10 / 1000000000 = 0.00000015 (truncated as 0.0000001 because of or 7 decimals limitations) should it be optimized (for example, because of out or order ops optimizations) as: 10 / 1000000000 * 15 then you get 10 / 1000000000 = 0.000000001 = 0.0000000 (7 decimals) and 0.0000000 * 15 = 0.0000000 in the end... Believe me, the above effect is far from negligible... | |
| ID: 6785 · Rating: 0 · rate:
| |
One or tow bits of mantissa, perhaps, but for *each* operation: the result after many consecutive ops can be quite significant. Because that's one decimal digit not a bit of difference. Besides, all FP operations have an average error of 0.5 bit by definition. We're talking about a difference smaller than 15 decimal digits! If the output of the application is truncated to the default 5 digits, it'll never even show up. HTH ____________ | |
| ID: 6789 · Rating: 0 · rate:
| |
|
And here's what they mean:
| |
| ID: 6790 · Rating: 0 · rate:
| |
You don't seem to understand that in a chain of many operations (or worst: in a loop with the same operation using the results from the previous iteration, such as in suites), your 15th decimal error will grow to the 14th, then the 13th, etc.. and this at each dozen of operations. In the end, the error might show on the 5th, 4th or even third decimal, depending on how many loops you went through... and precisely, calculations such as BOINC's all rely on complex calculations done within numerous loops. Don't use -ffast-math. Period. | |
| ID: 6798 · Rating: 0 · rate:
| |
|
This isn't related to the -ffast-math discussion, but it looks like the x86_64 compile for Linux isn't actually doing x86_64. The i686 target has -m32 in the CXXFLAGS, but x86_64 doesn't have a -m64 flag anywhere. Good to see SSE2 is enabled though. | |
| ID: 6800 · Rating: 0 · rate:
| |
Message boards :
Application Code Discussion :
compiler optimization flags