compiler optimization flags

Author	Message
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 6639 - Posted: 24 Nov 2008, 22:58:16 UTC Can anyone suggest good compiler optimization flags for the different platforms we're compiling for? ID: 6639 · Rating: 0 · rate: / Reply Quote

jedirock Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0	Message 6646 - Posted: 25 Nov 2008, 0:12:30 UTC - in response to Message 6639. Last modified: 25 Nov 2008, 0:14:09 UTC Mac PPC: -arch ppc -O2 -maltivec -mabi=altivec -mcpu=7400 Mac x86: -arch i386 -O2 -msse2 -mfpmath=sse -mtune=prescott Mac x86_64 (Mostly untested): -arch x86_64 -O2 -mfpmath=sse -mtune=nocona EDIT: Oh yeah, for the Intel platforms, feel free to add -msse -msse3 or -mssse3 too. I've just found SSE2 has the biggest impact. ID: 6646 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 6649 - Posted: 25 Nov 2008, 2:41:45 UTC - in response to Message 6639. Last modified: 25 Nov 2008, 2:52:28 UTC Can anyone suggest good compiler optimization flags for the different platforms we're compiling for? GCC 4.3 is recommended for Linux. i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math x86_64-pc-linux-gnu: -O3 -funroll-loops -ffast-math -ftree-vectorize Visual Studio 2008 is recommended for Windows. windows_intelx86: -Ox -GL -fp:fast windows_x86_64: -Ox -GL -fp:fast Finally, not for performance, but for compatibility, you should try to eliminate the dependency on some dynamic libraries, as I pointed out here. HTH ID: 6649 · Rating: 0 · rate: / Reply Quote

jedirock Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0	Message 6650 - Posted: 25 Nov 2008, 3:02:12 UTC - in response to Message 6649. Last modified: 25 Nov 2008, 3:05:14 UTC i686-pc-linux-gnu: -O3 -funroll-loops -ffast-math x86_64-pc-linux-gnu: -O3 -funroll-loops -ffast-math -ftree-vectorize Hmm, forgot about those flags... BTW, x86_64 machines are guaranteed to have at least SSE2 or 3 (I think it's 3), so that can be enabled for that build. EDIT: Scratch that. Early AMD64 architectures only had SSE2, so that's the safe one. However, it would enable some nice speedups compared to non-SSE compilation. I'm also going to try some of those other flags on my machine, provided I can get some work... ID: 6650 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 6659 - Posted: 25 Nov 2008, 12:55:38 UTC Dev-C++ also enables '-fexpensive-optimizations' if you tell it to 'Perform a number of minor optimizations'. Is this one worth it? ID: 6659 · Rating: 0 · rate: / Reply Quote

Dave Przybylo Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0	Message 6741 - Posted: 26 Nov 2008, 0:24:24 UTC Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute ID: 6741 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 6742 - Posted: 26 Nov 2008, 0:30:27 UTC - in response to Message 6741. Folks O3 is dangerous in gcc 4 and is not recommended. It has proven in the past to cause errors during runtime. O2 is the highest we can use. This is an urban legend nowadays. -O3 has been solid since GCC 2.95. Why else would SPEC benchmarks be submitted using this very same option with GCC then? HTH ID: 6742 · Rating: 0 · rate: / Reply Quote

Thierry Godefroy Send message Joined: 29 Jul 08 Posts: 9 Credit: 2,200,784 RAC: 0	Message 6745 - Posted: 26 Nov 2008, 1:20:32 UTC Here are the opt flags I'm using on a Core2 Duo in 32bits Linux (gcc 4.1): -O2 -fomit-frame-pointer -frename-registers -fweb -fexpensive-optimizations -fno-strict-aliasing -march=i686 -msse3 -mfpmath=sse Note about flags seen above in this thread: -ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway). ID: 6745 · Rating: 0 · rate: / Reply Quote

Dave Przybylo Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0	Message 6750 - Posted: 26 Nov 2008, 1:57:11 UTC - in response to Message 6742. I've tried O3 previously with this project and it has caused run time errors so i had to go back down to O2 . Might give it a shot again though. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute ID: 6750 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 6773 - Posted: 26 Nov 2008, 16:03:33 UTC - in response to Message 6745. -ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway). Not so. It indeed relaxes floating-point exception handling and can result in slightly different results, though seldom different enough to be noticed when outputting the results in decimal format. It's still quite usable and IS used by other projects, including SETI (I did use it when I did the official port of SETI Classic to x86-64). HTH ID: 6773 · Rating: 0 · rate: / Reply Quote

jedirock Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0	Message 6774 - Posted: 26 Nov 2008, 16:21:06 UTC - in response to Message 6773. From man gcc on my system: -ffast-math Sets -fno-math-errno, -funsafe-math-optimizations, -fno-trapping-math, -ffinite-math-only, -fno-rounding-math, -fno-signaling-nans and fcx-limited-range. This option causes the preprocessor macro "__FAST_MATH__" to be defined. This option should never be turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. gcc version 4.0.1 (Apple Inc. build 5488) ID: 6774 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 6775 - Posted: 26 Nov 2008, 16:21:22 UTC - in response to Message 6773. -ffast-math shall not be used in projects where strict IEEE math is required (can cause problems because it skips a lot of validity tests and math exceptions, and may also lead to bad rounding ups (inferior precision on decimals): a no-no for Seti, for instance. I don't know for Milkyway). Not so. It indeed relaxes floating-point exception handling and can result in slightly different results, though seldom different enough to be noticed when outputting the results in decimal format. It's still quite usable and IS used by other projects, including SETI (I did use it when I did the official port of SETI Classic to x86-64). HTH Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math. ID: 6775 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 6777 - Posted: 26 Nov 2008, 16:48:08 UTC - in response to Message 6774. Last modified: 26 Nov 2008, 17:43:02 UTC And here's what they mean: -fno-math-errno: don't bother to set the global variable errno in case of a math error (diagnostic purposes). -funsafe-math-optimizations: IEEE754 calls for following the order in the source code, ruling out commutative operations, such as two multiplies, or associative operations, such as multiplication of a sum by a factor; this option does away with this restriction. -fno-trapping-math: some math operations can result in math errors, such as log (-1), so compilers tip-toe around such operations at the expense of performance; this option assumes that such invalid operation won't occur (diagnostic purposes). -ffinite-math-only: assumes that intermediary results will always be normal numbers (e.g., never infinite). -fno-rounding-math: this is the default anyways. -fno-signaling-nans: assumes that invalid results will not raise an exception (diagnostic purposes). As you can see, most options are for error reporting purposes and the only option that causes different results is -funsafe-math-optimizations, but different by one or two bits in the mantissa, which never shows up when outputting the values in decimal format. And since Milkyway has to compare results by different architectures, I don't believe that bit-by-bit comparison is used to validate the results, but rather an arithmetic comparison within an acceptable error margin, which should be much larger than the difference caused by optimizations. HTH ID: 6777 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 6778 - Posted: 26 Nov 2008, 16:50:36 UTC - in response to Message 6775. Right now we're trying to get the most accurate model of the saggitarius stream, so we need all the accuracy we can get. I think it's best to be safe and not use -ffast-math. As I explained above, it should still be accurate enough (down to 1 or 2 ULPS), well within the error margin of finite floating-point math. HTH ID: 6778 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 6784 - Posted: 26 Nov 2008, 17:40:32 UTC Last modified: 26 Nov 2008, 17:42:19 UTC Allow me to repeat here a suggestion to make the application more portable across quite differently configured Linux systems. libgcc should not be linked dynamically, for it then requires that the volunteer systems have the same version of GCC as the one used to build the application. Rather, specify the option -static-libgcc when linking to link it statically. libstdc++ may cause the same compatibility grieves, but it is a bit more involved to link it statically. Namely, when linking, use "gcc" instead of "g++" and specify the options "-Wl,-Bstatic `gcc -print-file-name=libstdc++.a` -Wl,-Bdynamic". See also this. HTH ID: 6784 · Rating: 0 · rate: / Reply Quote

Thierry Godefroy Send message Joined: 29 Jul 08 Posts: 9 Credit: 2,200,784 RAC: 0	Message 6785 - Posted: 26 Nov 2008, 17:48:27 UTC - in response to Message 6777. And here's what they mean: -fno-math-errno: don't bother to set the global variable errno in case of a math error (diagnostic purposes) -funsafe-math-optimizations: IEEE754 calls for following the order in the source code, ruling out commutative operations, such as two multiplies, or associative operations, such as multiplication of a sum by a factor; this option does away with this restriction. -fno-trapping-math: some math operations can result in math errors, such as log (-1), so compilers tip-toe around such operations at the expense of performance; this option assumes that such invalid operation won't occur (diagnostic purposes). -ffinite-math-only: assumes that intermediary results will always be normal numbers (e.g., never infinite). -fno-rounding-math: this is the default anyways. -fno-signaling-nans: assumes that invalid results will not raise an exception (diagnostic purposes). As you can see, most options are for error reporting purposes and the only option that causes different results is -funsafe-math-optimizations Here what 'man gcc' says: -funsafe-math-optimizations Allow optimizations for floating-point arithmetic that (a) assume that arguments and results are valid and (b) may violate IEEE or ANSI standards. When used at link-time, it may include libraries or startup files that change the default FPU control word or other similar optimizations. This option is not turned on by any -O option since it can result in incorrect output for programs which depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications. Enables -fno-signed-zeros, -fno-trapping-math, -fassociative-math and -freciprocal-math. The default is -fno-unsafe-math-optimizations. I can assure you that using -ffast-math in optimized apps such as Seti's can lead to INVALID results (i.e. results considered as not precise enough when Seto@Home validates your results by comparing it with others. but different by one or two bits in the mantissa, which never shows up when outputting the values in decimal format. One or tow bits of mantissa, perhaps, but for each operation: the result after many consecutive ops can be quite significant. Let me give you an example. Let's consider we only have 7 decimal positions of precision for a FPU (there are much more in modern FPUs, but that's just to make it easier in this example), and take this simple operation: 15 * 10 / 1000000000 = 0.00000015 (truncated as 0.0000001 because of or 7 decimals limitations) should it be optimized (for example, because of out or order ops optimizations) as: 10 / 1000000000 * 15 then you get 10 / 1000000000 = 0.000000001 = 0.0000000 (7 decimals) and 0.0000000 * 15 = 0.0000000 in the end... Believe me, the above effect is far from negligible... ID: 6785 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 6789 - Posted: 26 Nov 2008, 18:10:42 UTC - in response to Message 6785. One or tow bits of mantissa, perhaps, but for each operation: the result after many consecutive ops can be quite significant. Let me give you an example. Let's consider we only have 7 decimal positions of precision for a FPU (there are much more in modern FPUs, but that's just to make it easier in this example), and take this simple operation: 15 * 10 / 1000000000 = 0.00000015 (truncated as 0.0000001 because of or 7 decimals limitations) should it be optimized (for example, because of out or order ops optimizations) as: 10 / 1000000000 * 15 then you get 10 / 1000000000 = 0.000000001 = 0.0000000 (7 decimals) and 0.0000000 * 15 = 0.0000000 in the end... Believe me, the above effect is far from negligible... Because that's one decimal digit not a bit of difference. Besides, all FP operations have an average error of 0.5 bit by definition. We're talking about a difference smaller than 15 decimal digits! If the output of the application is truncated to the default 5 digits, it'll never even show up. HTH ID: 6789 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 6790 - Posted: 26 Nov 2008, 18:16:53 UTC - in response to Message 6785. Last modified: 26 Nov 2008, 18:18:19 UTC And here's what they mean: -fno-signed-zeros: irrelevant for non-trapping math. -fno-trapping-math: assumes that no illegal math operation will happen, so that operations can be performed in any order. -fassociative-math: allows associative properties to be used to speed calculations up (e.g., in modern processors, "a + b + c + d" is slower than "(a + b) + (c + d)"). -freciprocal-math: allows multiplication by reciprocal instead of slow divisions (e.g., uses "a * 0.5" instead "a / 2"). HTH ID: 6790 · Rating: 0 · rate: / Reply Quote

Thierry Godefroy Send message Joined: 29 Jul 08 Posts: 9 Credit: 2,200,784 RAC: 0	Message 6798 - Posted: 26 Nov 2008, 20:23:37 UTC - in response to Message 6789. Last modified: 26 Nov 2008, 20:36:26 UTC We're talking about a difference smaller than 15 decimal digits! If the output of the application is truncated to the default 5 digits, it'll never even show up. You don't seem to understand that in a chain of many operations (or worst: in a loop with the same operation using the results from the previous iteration, such as in suites), your 15th decimal error will grow to the 14th, then the 13th, etc.. and this at each dozen of operations. In the end, the error might show on the 5th, 4th or even third decimal, depending on how many loops you went through... and precisely, calculations such as BOINC's all rely on complex calculations done within numerous loops. Don't use -ffast-math. Period. ID: 6798 · Rating: 0 · rate: / Reply Quote

jedirock Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0	Message 6800 - Posted: 26 Nov 2008, 20:25:33 UTC This isn't related to the -ffast-math discussion, but it looks like the x86_64 compile for Linux isn't actually doing x86_64. The i686 target has -m32 in the CXXFLAGS, but x86_64 doesn't have a -m64 flag anywhere. Good to see SSE2 is enabled though. ID: 6800 · Rating: 0 · rate: / Reply Quote