Welcome to MilkyWay@home

compiler optimization flags

Message boards : Application Code Discussion : compiler optimization flags
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 6639 - Posted: 24 Nov 2008, 22:58:16 UTC

Can anyone suggest good compiler optimization flags for the different platforms we're compiling for?
ID: 6639 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jedirock
Avatar

Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 6646 - Posted: 25 Nov 2008, 0:12:30 UTC - in response to Message 6639.  
Last modified: 25 Nov 2008, 0:14:09 UTC

Mac PPC: -arch ppc -O2 -maltivec -mabi=altivec -mcpu=7400
Mac x86: -arch i386 -O2 -msse2 -mfpmath=sse -mtune=prescott
Mac x86_64 (Mostly untested): -arch x86_64 -O2 -mfpmath=sse -mtune=nocona

EDIT: Oh yeah, for the Intel platforms, feel free to add -msse -msse3 or -mssse3 too. I've just found SSE2 has the biggest impact.
ID: 6646 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 6649 - Posted: 25 Nov 2008, 2:41:45 UTC - in response to Message 6639.  
Last modified: 25 Nov 2008, 2:52:28 UTC

Can anyone suggest good compiler optimization flags for the different platforms we're compiling for?

GCC 4.3 is recommended for Linux.

i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math
x86_64-pc-linux-gnu: -O3 -funroll-loops -ffast-math -ftree-vectorize

Visual Studio 2008 is recommended for Windows.

windows_intelx86: -Ox -GL -fp:fast
windows_x86_64: -Ox -GL -fp:fast

Finally, not for performance, but for compatibility, you should try to eliminate the dependency on some dynamic libraries, as I pointed out here.

HTH
ID: 6649 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jedirock
Avatar

Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 6650 - Posted: 25 Nov 2008, 3:02:12 UTC - in response to Message 6649.  
Last modified: 25 Nov 2008, 3:05:14 UTC

i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math
x86_64-pc-linux-gnu: -O3 -funroll-loops -ffast-math -ftree-vectorize

Hmm, forgot about those flags... BTW, x86_64 machines are guaranteed to have at least SSE2 or 3 (I think it's 3), so that can be enabled for that build.

EDIT: Scratch that. Early AMD64 architectures only had SSE2, so that's the safe one. However, it would enable some nice speedups compared to non-SSE compilation. I'm also going to try some of those other flags on my machine, provided I can get some work...
ID: 6650 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 6659 - Posted: 25 Nov 2008, 12:55:38 UTC

Dev-C++ also enables '-fexpensive-optimizations' if you tell it to 'Perform a number of minor optimizations'. Is this one worth it?
ID: 6659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dave Przybylo
Avatar

Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 6741 - Posted: 26 Nov 2008, 0:24:24 UTC

Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use.
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute
ID: 6741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 6742 - Posted: 26 Nov 2008, 0:30:27 UTC - in response to Message 6741.  

Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use.

This is an urban legend nowadays. -O3 has been solid since GCC 2.95.

Why else would SPEC benchmarks be submitted using this very same option with GCC then?

HTH

ID: 6742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Thierry Godefroy

Send message
Joined: 29 Jul 08
Posts: 9
Credit: 2,200,784
RAC: 0
Message 6745 - Posted: 26 Nov 2008, 1:20:32 UTC

Here are the opt flags I'm using on a Core2 Duo in 32bits Linux (gcc 4.1):
-O2 -fomit-frame-pointer -frename-registers -fweb -fexpensive-optimizations -fno-strict-aliasing -march=i686 -msse3 -mfpmath=sse

Note about flags seen above in this thread:

-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway).
ID: 6745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dave Przybylo
Avatar

Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 6750 - Posted: 26 Nov 2008, 1:57:11 UTC - in response to Message 6742.  

I've tried O3 previously with this project and it has caused run time errors so i had to go back down to O2 . Might give it a shot again though.
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute
ID: 6750 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 6773 - Posted: 26 Nov 2008, 16:03:33 UTC - in response to Message 6745.  

-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway).

Not so. It indeed relaxes floating-point exception handling and can result in slightly different results, though seldom different enough to be noticed when outputting the results in decimal format. It's still quite usable and IS used by other projects, including SETI (I did use it when I did the official port of SETI Classic to x86-64).

HTH

ID: 6773 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jedirock
Avatar

Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 6774 - Posted: 26 Nov 2008, 16:21:06 UTC - in response to Message 6773.  

From man gcc on my system:
-ffast-math
Sets -fno-math-errno, -funsafe-math-optimizations,
-fno-trapping-math, -ffinite-math-only, -fno-rounding-math,
-fno-signaling-nans and fcx-limited-range.

This option causes the preprocessor macro "__FAST_MATH__" to be
defined.

This option should never be turned on by any -O option since it can
result in incorrect output for programs which depend on an exact
implementation of IEEE or ISO rules/specifications for math
functions.

gcc version 4.0.1 (Apple Inc. build 5488)
ID: 6774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 6775 - Posted: 26 Nov 2008, 16:21:22 UTC - in response to Message 6773.  

-ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway).

Not so. It indeed relaxes floating-point exception handling and can result in slightly different results, though seldom different enough to be noticed when outputting the results in decimal format. It's still quite usable and IS used by other projects, including SETI (I did use it when I did the official port of SETI Classic to x86-64).

HTH


Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math.

ID: 6775 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 6777 - Posted: 26 Nov 2008, 16:48:08 UTC - in response to Message 6774.  
Last modified: 26 Nov 2008, 17:43:02 UTC

And here's what they mean:


  • -fno-math-errno: don't bother to set the global variable errno in case of a math error (diagnostic purposes).
  • -funsafe-math-optimizations: IEEE754 calls for following the order in the source code, ruling out commutative operations, such as two multiplies, or associative operations, such as multiplication of a sum by a factor; this option does away with this restriction.
  • -fno-trapping-math: some math operations can result in math errors, such as log (-1), so compilers tip-toe around such operations at the expense of performance; this option assumes that such invalid operation won't occur (diagnostic purposes).
  • -ffinite-math-only: assumes that intermediary results will always be normal numbers (e.g., never infinite).
  • -fno-rounding-math: this is the default anyways.
  • -fno-signaling-nans: assumes that invalid results will not raise an exception (diagnostic purposes).


As you can see, most options are for error reporting purposes and the only option that causes different results is -funsafe-math-optimizations, but different by one or two bits in the mantissa, which never shows up when outputting the values in decimal format.

And since Milkyway has to compare results by different architectures, I don't believe that bit-by-bit comparison is used to validate the results, but rather an arithmetic comparison within an acceptable error margin, which should be much larger than the difference caused by optimizations.

HTH


ID: 6777 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 6778 - Posted: 26 Nov 2008, 16:50:36 UTC - in response to Message 6775.  

Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math.

As I explained above, it should still be accurate enough (down to 1 or 2 ULPS), well within the error margin of finite floating-point math.

HTH

ID: 6778 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 6784 - Posted: 26 Nov 2008, 17:40:32 UTC
Last modified: 26 Nov 2008, 17:42:19 UTC

Allow me to repeat here a suggestion to make the application more portable across quite differently configured Linux systems.

libgcc should not be linked dynamically, for it then requires that the volunteer systems have the same version of GCC as the one used to build the application. Rather, specify the option -static-libgcc when linking to link it statically.

libstdc++ may cause the same compatibility grieves, but it is a bit more involved to link it statically. Namely, when linking, use "gcc" instead of "g++" and specify the options "-Wl,-Bstatic `gcc -print-file-name=libstdc++.a` -Wl,-Bdynamic".

See also this.

HTH
ID: 6784 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Thierry Godefroy

Send message
Joined: 29 Jul 08
Posts: 9
Credit: 2,200,784
RAC: 0
Message 6785 - Posted: 26 Nov 2008, 17:48:27 UTC - in response to Message 6777.  

And here's what they mean:


  • -fno-math-errno: don't bother to set the global variable errno in case of a math error (diagnostic purposes)
  • -funsafe-math-optimizations: IEEE754 calls for following the order in the source code, ruling out commutative operations, such as two multiplies, or associative operations, such as multiplication of a sum by a factor; this option does away with this restriction.
  • -fno-trapping-math: some math operations can result in math errors, such as log (-1), so compilers tip-toe around such operations at the expense of performance; this option assumes that such invalid operation won't occur (diagnostic purposes).
  • -ffinite-math-only: assumes that intermediary results will always be normal numbers (e.g., never infinite).
  • -fno-rounding-math: this is the default anyways.
  • -fno-signaling-nans: assumes that invalid results will not raise an exception (diagnostic purposes).


As you can see, most options are for error reporting purposes and the only option that causes different results is -funsafe-math-optimizations



Here what 'man gcc' says:
-funsafe-math-optimizations
           Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards.  When used at link-time, it
           may include libraries or startup files that change the default FPU control word or other similar optimizations.

           This option is not turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications
           for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.  Enables -fno-signed-zeros, -fno-trapping-math,
           -fassociative-math and -freciprocal-math.

           The default is -fno-unsafe-math-optimizations.

I can assure you that using -ffast-math in optimized apps such as Seti's can lead to INVALID results (i.e. results considered as not precise enough when Seto@Home validates your results by comparing it with others.


but different by one or two bits in the mantissa, which never shows up when outputting the values in decimal format.

One or tow bits of mantissa, perhaps, but for *each* operation: the result after many consecutive ops can be quite significant.
Let me give you an example. Let's consider we only have 7 decimal positions of precision for a FPU (there are much more in modern FPUs, but that's just to make it easier in this example), and take this simple operation:
15 * 10 / 1000000000 = 0.00000015 (truncated as 0.0000001 because of or 7 decimals limitations)
should it be optimized (for example, because of out or order ops optimizations) as:
10 / 1000000000 * 15
then you get 10 / 1000000000 = 0.000000001 = 0.0000000 (7 decimals)
and 0.0000000 * 15 = 0.0000000 in the end...

Believe me, the above effect is far from negligible...
ID: 6785 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 6789 - Posted: 26 Nov 2008, 18:10:42 UTC - in response to Message 6785.  

One or tow bits of mantissa, perhaps, but for *each* operation: the result after many consecutive ops can be quite significant.
Let me give you an example. Let's consider we only have 7 decimal positions of precision for a FPU (there are much more in modern FPUs, but that's just to make it easier in this example), and take this simple operation:
15 * 10 / 1000000000 = 0.00000015 (truncated as 0.0000001 because of or 7 decimals limitations)
should it be optimized (for example, because of out or order ops optimizations) as:
10 / 1000000000 * 15
then you get 10 / 1000000000 = 0.000000001 = 0.0000000 (7 decimals)
and 0.0000000 * 15 = 0.0000000 in the end...

Believe me, the above effect is far from negligible...

Because that's one decimal digit not a bit of difference. Besides, all FP operations have an average error of 0.5 bit by definition.

We're talking about a difference smaller than 15 decimal digits! If the output of the application is truncated to the default 5 digits, it'll never even show up.

HTH

ID: 6789 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 6790 - Posted: 26 Nov 2008, 18:16:53 UTC - in response to Message 6785.  
Last modified: 26 Nov 2008, 18:18:19 UTC

And here's what they mean:


  • -fno-signed-zeros: irrelevant for non-trapping math.
  • -fno-trapping-math: assumes that no illegal math operation will happen, so that operations can be performed in any order.
  • -fassociative-math: allows associative properties to be used to speed calculations up (e.g., in modern processors, "a + b + c + d" is slower than "(a + b) + (c + d)").
  • -freciprocal-math: allows multiplication by reciprocal instead of slow divisions (e.g., uses "a * 0.5" instead "a / 2").



HTH


ID: 6790 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Thierry Godefroy

Send message
Joined: 29 Jul 08
Posts: 9
Credit: 2,200,784
RAC: 0
Message 6798 - Posted: 26 Nov 2008, 20:23:37 UTC - in response to Message 6789.  
Last modified: 26 Nov 2008, 20:36:26 UTC


We're talking about a difference smaller than 15 decimal digits! If the output of the application is truncated to the default 5 digits, it'll never even show up.

You don't seem to understand that in a chain of many operations (or worst: in a loop with the same operation using the results from the previous iteration, such as in suites), your 15th decimal error will grow to the 14th, then the 13th, etc.. and this at each dozen of operations. In the end, the error might show on the 5th, 4th or even third decimal, depending on how many loops you went through... and precisely, calculations such as BOINC's all rely on complex calculations done within numerous loops.

Don't use -ffast-math. Period.
ID: 6798 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jedirock
Avatar

Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 6800 - Posted: 26 Nov 2008, 20:25:33 UTC

This isn't related to the -ffast-math discussion, but it looks like the x86_64 compile for Linux isn't actually doing x86_64. The i686 target has -m32 in the CXXFLAGS, but x86_64 doesn't have a -m64 flag anywhere. Good to see SSE2 is enabled though.
ID: 6800 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Application Code Discussion : compiler optimization flags

©2024 Astroinformatics Group