compiler optimization flags
log in

Advanced search

Message boards : Application Code Discussion : compiler optimization flags

1 · 2 · Next
Author Message
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2021
Credit: 26,480
RAC: 0
Message 6639 - Posted: 24 Nov 2008, 22:58:16 UTC

Can anyone suggest good compiler optimization flags for the different platforms we're compiling for?
____________

jedirock
Avatar
Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 6646 - Posted: 25 Nov 2008, 0:12:30 UTC - in response to Message 6639.
Last modified: 25 Nov 2008, 0:14:09 UTC

Mac PPC: -arch ppc -O2 -maltivec -mabi=altivec -mcpu=7400
Mac x86: -arch i386 -O2 -msse2 -mfpmath=sse -mtune=prescott
Mac x86_64 (Mostly untested): -arch x86_64 -O2 -mfpmath=sse -mtune=nocona

EDIT: Oh yeah, for the Intel platforms, feel free to add -msse -msse3 or -mssse3 too. I've just found SSE2 has the biggest impact.

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 1
Message 6649 - Posted: 25 Nov 2008, 2:41:45 UTC - in response to Message 6639.
Last modified: 25 Nov 2008, 2:52:28 UTC

Can anyone suggest good compiler optimization flags for the different platforms we're compiling for?

GCC 4.3 is recommended for Linux.

i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math
x86_64-pc-linux-gnu: -O3 -funroll-loops -ffast-math -ftree-vectorize

Visual Studio 2008 is recommended for Windows.

windows_intelx86: -Ox -GL -fp:fast
windows_x86_64: -Ox -GL -fp:fast

Finally, not for performance, but for compatibility, you should try to eliminate the dependency on some dynamic libraries, as I pointed out here.

HTH
____________

jedirock
Avatar
Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 6650 - Posted: 25 Nov 2008, 3:02:12 UTC - in response to Message 6649.
Last modified: 25 Nov 2008, 3:05:14 UTC

i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math
x86_64-pc-linux-gnu: -O3 -funroll-loops -ffast-math -ftree-vectorize

Hmm, forgot about those flags... BTW, x86_64 machines are guaranteed to have at least SSE2 or 3 (I think it's 3), so that can be enabled for that build.

EDIT: Scratch that. Early AMD64 architectures only had SSE2, so that's the safe one. However, it would enable some nice speedups compared to non-SSE compilation. I'm also going to try some of those other flags on my machine, provided I can get some work...

Emanuel
Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 37
Message 6659 - Posted: 25 Nov 2008, 12:55:38 UTC

Dev-C++ also enables '-fexpensive-optimizations' if you tell it to 'Perform a number of minor optimizations'. Is this one worth it?

Profile Dave Przybylo
Avatar
Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 6741 - Posted: 26 Nov 2008, 0:24:24 UTC

Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use.
____________
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 1
Message 6742 - Posted: 26 Nov 2008, 0:30:27 UTC - in response to Message 6741.

Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use.

This is an urban legend nowadays. -O3 has been solid since GCC 2.95.

Why else would SPEC benchmarks be submitted using this very same option with GCC then?

HTH

____________

Thierry Godefroy
Send message
Joined: 29 Jul 08
Posts: 9
Credit: 842,117
RAC: 52
Message 6745 - Posted: 26 Nov 2008, 1:20:32 UTC

Here are the opt flags I'm using on a Core2 Duo in 32bits Linux (gcc 4.1):
-O2 -fomit-frame-pointer -frename-registers -fweb -fexpensive-optimizations -fno-strict-aliasing -march=i686 -msse3 -mfpmath=sse

Note about flags seen above in this thread:

-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway).

Profile Dave Przybylo
Avatar
Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 6750 - Posted: 26 Nov 2008, 1:57:11 UTC - in response to Message 6742.

I've tried O3 previously with this project and it has caused run time errors so i had to go back down to O2 . Might give it a shot again though.
____________
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 1
Message 6773 - Posted: 26 Nov 2008, 16:03:33 UTC - in response to Message 6745.

-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway).

Not so. It indeed relaxes floating-point exception handling and can result in slightly different results, though seldom different enough to be noticed when outputting the results in decimal format. It's still quite usable and IS used by other projects, including SETI (I did use it when I did the official port of SETI Classic to x86-64).

HTH

____________

jedirock
Avatar
Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 6774 - Posted: 26 Nov 2008, 16:21:06 UTC - in response to Message 6773.

From man gcc on my system:

-ffast-math
Sets -fno-math-errno, -funsafe-math-optimizations,
-fno-trapping-math, -ffinite-math-only, -fno-rounding-math,
-fno-signaling-nans and fcx-limited-range.

This option causes the preprocessor macro "__FAST_MATH__" to be
defined.

This option should never be turned on by any -O option since it can
result in incorrect output for programs which depend on an exact
implementation of IEEE or ISO rules/specifications for math
functions.

gcc version 4.0.1 (Apple Inc. build 5488)

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2021
Credit: 26,480
RAC: 0
Message 6775 - Posted: 26 Nov 2008, 16:21:22 UTC - in response to Message 6773.

-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway).

Not so. It indeed relaxes floating-point exception handling and can result in slightly different results, though seldom different enough to be noticed when outputting the results in decimal format. It's still quite usable and IS used by other projects, including SETI (I did use it when I did the official port of SETI Classic to x86-64).

HTH


Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math.

____________

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 1
Message 6777 - Posted: 26 Nov 2008, 16:48:08 UTC - in response to Message 6774.
Last modified: 26 Nov 2008, 17:43:02 UTC

And here's what they mean:


  • -fno-math-errno: don't bother to set the global variable errno in case of a math error (diagnostic purposes).
  • -funsafe-math-optimizations: IEEE754 calls for following the order in the source code, ruling out commutative operations, such as two multiplies, or associative operations, such as multiplication of a sum by a factor; this option does away with this restriction.
  • -fno-trapping-math: some math operations can result in math errors, such as log (-1), so compilers tip-toe around such operations at the expense of performance; this option assumes that such invalid operation won't occur (diagnostic purposes).
  • -ffinite-math-only: assumes that intermediary results will always be normal numbers (e.g., never infinite).
  • -fno-rounding-math: this is the default anyways.
  • -fno-signaling-nans: assumes that invalid results will not raise an exception (diagnostic purposes).


As you can see, most options are for error reporting purposes and the only option that causes different results is -funsafe-math-optimizations, but different by one or two bits in the mantissa, which never shows up when outputting the values in decimal format.

And since Milkyway has to compare results by different architectures, I don't believe that bit-by-bit comparison is used to validate the results, but rather an arithmetic comparison within an acceptable error margin, which should be much larger than the difference caused by optimizations.

HTH
____________

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 1
Message 6778 - Posted: 26 Nov 2008, 16:50:36 UTC - in response to Message 6775.

Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math.

As I explained above, it should still be accurate enough (down to 1 or 2 ULPS), well within the error margin of finite floating-point math.

HTH

____________

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 1
Message 6784 - Posted: 26 Nov 2008, 17:40:32 UTC
Last modified: 26 Nov 2008, 17:42:19 UTC

Allow me to repeat here a suggestion to make the application more portable across quite differently configured Linux systems.

libgcc should not be linked dynamically, for it then requires that the volunteer systems have the same version of GCC as the one used to build the application. Rather, specify the option -static-libgcc when linking to link it statically.

libstdc++ may cause the same compatibility grieves, but it is a bit more involved to link it statically. Namely, when linking, use "gcc" instead of "g++" and specify the options "-Wl,-Bstatic `gcc -print-file-name=libstdc++.a` -Wl,-Bdynamic".

See also this.

HTH
____________

Thierry Godefroy
Send message
Joined: 29 Jul 08
Posts: 9
Credit: 842,117
RAC: 52
Message 6785 - Posted: 26 Nov 2008, 17:48:27 UTC - in response to Message 6777.

And here's what they mean:


  • -fno-math-errno: don't bother to set the global variable errno in case of a math error (diagnostic purposes)
  • -funsafe-math-optimizations: IEEE754 calls for following the order in the source code, ruling out commutative operations, such as two multiplies, or associative operations, such as multiplication of a sum by a factor; this option does away with this restriction.
  • -fno-trapping-math: some math operations can result in math errors, such as log (-1), so compilers tip-toe around such operations at the expense of performance; this option assumes that such invalid operation won't occur (diagnostic purposes).
  • -ffinite-math-only: assumes that intermediary results will always be normal numbers (e.g., never infinite).
  • -fno-rounding-math: this is the default anyways.
  • -fno-signaling-nans: assumes that invalid results will not raise an exception (diagnostic purposes).


As you can see, most options are for error reporting purposes and the only option that causes different results is -funsafe-math-optimizations



Here what 'man gcc' says:
-funsafe-math-optimizations Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link-time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. This option is not turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math. The default is -fno-unsafe-math-optimizations.

I can assure you that using -ffast-math in optimized apps such as Seti's can lead to INVALID results (i.e. results considered as not precise enough when Seto@Home validates your results by comparing it with others.


but different by one or two bits in the mantissa, which never shows up when outputting the values in decimal format.

One or tow bits of mantissa, perhaps, but for *each* operation: the result after many consecutive ops can be quite significant.
Let me give you an example. Let's consider we only have 7 decimal positions of precision for a FPU (there are much more in modern FPUs, but that's just to make it easier in this example), and take this simple operation:
15 * 10 / 1000000000 = 0.00000015 (truncated as 0.0000001 because of or 7 decimals limitations)
should it be optimized (for example, because of out or order ops optimizations) as:
10 / 1000000000 * 15
then you get 10 / 1000000000 = 0.000000001 = 0.0000000 (7 decimals)
and 0.0000000 * 15 = 0.0000000 in the end...

Believe me, the above effect is far from negligible...

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 1
Message 6789 - Posted: 26 Nov 2008, 18:10:42 UTC - in response to Message 6785.

One or tow bits of mantissa, perhaps, but for *each* operation: the result after many consecutive ops can be quite significant.
Let me give you an example. Let's consider we only have 7 decimal positions of precision for a FPU (there are much more in modern FPUs, but that's just to make it easier in this example), and take this simple operation:
15 * 10 / 1000000000 = 0.00000015 (truncated as 0.0000001 because of or 7 decimals limitations)
should it be optimized (for example, because of out or order ops optimizations) as:
10 / 1000000000 * 15
then you get 10 / 1000000000 = 0.000000001 = 0.0000000 (7 decimals)
and 0.0000000 * 15 = 0.0000000 in the end...

Believe me, the above effect is far from negligible...

Because that's one decimal digit not a bit of difference. Besides, all FP operations have an average error of 0.5 bit by definition.

We're talking about a difference smaller than 15 decimal digits! If the output of the application is truncated to the default 5 digits, it'll never even show up.

HTH

____________

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 1
Message 6790 - Posted: 26 Nov 2008, 18:16:53 UTC - in response to Message 6785.
Last modified: 26 Nov 2008, 18:18:19 UTC

And here's what they mean:


  • -fno-signed-zeros: irrelevant for non-trapping math.
  • -fno-trapping-math: assumes that no illegal math operation will happen, so that operations can be performed in any order.
  • -fassociative-math: allows associative properties to be used to speed calculations up (e.g., in modern processors, "a + b + c + d" is slower than "(a + b) + (c + d)").
  • -freciprocal-math: allows multiplication by reciprocal instead of slow divisions (e.g., uses "a * 0.5" instead "a / 2").



HTH
____________

Thierry Godefroy
Send message
Joined: 29 Jul 08
Posts: 9
Credit: 842,117
RAC: 52
Message 6798 - Posted: 26 Nov 2008, 20:23:37 UTC - in response to Message 6789.
Last modified: 26 Nov 2008, 20:36:26 UTC


We're talking about a difference smaller than 15 decimal digits! If the output of the application is truncated to the default 5 digits, it'll never even show up.

You don't seem to understand that in a chain of many operations (or worst: in a loop with the same operation using the results from the previous iteration, such as in suites), your 15th decimal error will grow to the 14th, then the 13th, etc.. and this at each dozen of operations. In the end, the error might show on the 5th, 4th or even third decimal, depending on how many loops you went through... and precisely, calculations such as BOINC's all rely on complex calculations done within numerous loops.

Don't use -ffast-math. Period.

jedirock
Avatar
Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 6800 - Posted: 26 Nov 2008, 20:25:33 UTC

This isn't related to the -ffast-math discussion, but it looks like the x86_64 compile for Linux isn't actually doing x86_64. The i686 target has -m32 in the CXXFLAGS, but x86_64 doesn't have a -m64 flag anywhere. Good to see SSE2 is enabled though.

1 · 2 · Next
Post to thread

Message boards : Application Code Discussion : compiler optimization flags


Main page · Your account · Message boards


Copyright © 2013 AstroInformatics Group