Recompiled Linux 32/64 apps

Author	Message
ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 8850 - Posted: 22 Jan 2009, 15:13:31 UTC - in response to Message 8838. The SSE3 code runs very well on AMD LE-1600, like C2D with the same clock. I use icpc -xO for this. At run-time the processor is probed and if it's by AMD, then degraded code is run instead of the SSE3 code. See http://techreport.com/discussions.x/8547 for a snippet. HTH ID: 8850 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 8851 - Posted: 22 Jan 2009, 15:34:36 UTC - in response to Message 8673. And here are flags: CXX_i686 = icpc CXXFLAGS_i686 = -xSSE3 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals CXX_x86_64 = icpc CXXFLAGS_x86_64 = -xSSE4.1 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals You can omit "fast-transcendentals" as this is the default when specifying "-fp-model fast" (or even fast=2). I don't use -unroll-aggressive and -opt-multi-version-aggressive, does it help the performance? I would think that it doesn't bring much to the table. ID: 8851 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 8854 - Posted: 22 Jan 2009, 16:30:06 UTC - in response to Message 8851. Last modified: 22 Jan 2009, 16:30:22 UTC And here are flags: CXX_i686 = icpc CXXFLAGS_i686 = -xSSE3 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals CXX_x86_64 = icpc CXXFLAGS_x86_64 = -xSSE4.1 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals You can omit "fast-transcendentals" as this is the default when specifying "-fp-model fast" (or even fast=2). I don't use -unroll-aggressive and -opt-multi-version-aggressive, does it help the performance? I would think that it doesn't bring much to the table. Right, leaving them away doesn't make an difference. Any suggestions to squeeze out some more? mic. ID: 8854 · Rating: 0 · rate: / Reply Quote

mindc Send message Joined: 9 Jul 08 Posts: 7 Credit: 11,070,991 RAC: 0	Message 8898 - Posted: 23 Jan 2009, 13:19:43 UTC - in response to Message 8850. The SSE3 code runs very well on AMD LE-1600, like C2D with the same clock. I use icpc -xO for this. At run-time the processor is probed and if it's by AMD, then degraded code is run instead of the SSE3 code. See http://techreport.com/discussions.x/8547 for a snippet. HTH 1. This article is very old: 11:58 AM on July 13, 2005 2. I've got a lot better performance with SSE2 (20% boost) than without it, and slightly better performance with SSE3 than SSE2 (another 1% boost) and I'm talking about AMD chip and milkyway app of course. ID: 8898 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 8903 - Posted: 23 Jan 2009, 15:47:42 UTC - in response to Message 8898. 1. This article is very old: 11:58 AM on July 13, 2005 2. I've got a lot better performance with SSE2 (20% boost) than without it, and slightly better performance with SSE3 than SSE2 (another 1% boost) and I'm talking about AMD chip and milkyway app of course. 1 - Yet, it's still true. It's been known in the open source community and Intel's response was that they cannot guarantee their compiler except on their processors, fair enough. Is this new enough for you? 2 - 1% is too close to noise to call a boost. ID: 8903 · Rating: 0 · rate: / Reply Quote

mindc Send message Joined: 9 Jul 08 Posts: 7 Credit: 11,070,991 RAC: 0	Message 8922 - Posted: 24 Jan 2009, 0:38:19 UTC - in response to Message 8903. 1 - It's been known in the open source community and Intel's response was that they cannot guarantee their compiler except on their processors, fair enough. Is this new enough for you? 2 - 1% is too close to noise to call a boost. 1. It seems you are right. 2. But I've made more tests: averaged boost in calculation times for 126 runs of milkyway app on idle machine SSE3 app: 121.86% SSE2 app: 119.03% base app: 100.00% I don't think this is 'noise' only... I'm confused now... Maybe this is milkyway specific... ID: 8922 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 8923 - Posted: 24 Jan 2009, 0:44:43 UTC - in response to Message 8922. 2. But I've made more tests: averaged boost in calculation times for 126 runs of milkyway app on idle machine SSE3 app: 121.86% SSE2 app: 119.03% base app: 100.00% I don't think this is 'noise' only... I'm confused now... Maybe this is milkyway specific... Hard to explain why. Maybe even though the processor doesn't get to run SSE3 code, the code is different, though SSE2, and the outcome is better, perhaps because of something as mundane as some branches getting aligned favorably. Regardless, I agree that it's more than noise. Thanks. ID: 8923 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 8933 - Posted: 24 Jan 2009, 17:41:50 UTC - in response to Message 8923. 2. But I've made more tests: averaged boost in calculation times for 126 runs of milkyway app on idle machine SSE3 app: 121.86% SSE2 app: 119.03% base app: 100.00% I don't think this is 'noise' only... I'm confused now... Maybe this is milkyway specific... Hard to explain why. Maybe even though the processor doesn't get to run SSE3 code, the code is different, though SSE2, and the outcome is better, perhaps because of something as mundane as some branches getting aligned favorably. Regardless, I agree that it's more than noise. Thanks. SSE3 might be doing some other optimizations (or have some changes in optimizations) which are better than what was in SSE2, because it's newer. ID: 8933 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 8952 - Posted: 24 Jan 2009, 20:48:05 UTC Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14. mic. ID: 8952 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 8956 - Posted: 24 Jan 2009, 20:56:42 UTC - in response to Message 8952. Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14. I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits. ID: 8956 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 8960 - Posted: 24 Jan 2009, 21:10:05 UTC - in response to Message 8956. Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14. I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits. The results of the test-units is exactly the same as my v12. I'll post the v14 as soon as give the ok. :) mic. ID: 8960 · Rating: 0 · rate: / Reply Quote

ebahapo Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0	Message 8976 - Posted: 24 Jan 2009, 23:13:27 UTC - in response to Message 8933. SSE3 might be doing some other optimizations (or have some changes in optimizations) which are better than what was in SSE2, because it's newer. Yes, but the Intel compiler checks if the code is running on an Intel CPU and, if it's not, it runs an alternative SSE2 code instead. It'll run SSE3 or later only on Intel processors. As these results are on an AMD CPU, it's not benefiting from the SSE3 optimizations. HTH ID: 8976 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 9017 - Posted: 25 Jan 2009, 2:14:03 UTC - in response to Message 8960. Last modified: 25 Jan 2009, 2:25:15 UTC Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14. I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits. The results of the test-units is exactly the same as my v12. I'll post the v14 as soon as give the ok. :) Looks to me like it's generating good results, so I'd go ahead and release it. edit scratch that. Looking at some results, the stock app and other new compiled apps are still having the same issue (however not as frequently). No point in updating it until this whole thing is fixed. ID: 9017 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 9023 - Posted: 25 Jan 2009, 2:25:54 UTC - in response to Message 9017. Looks to me like it's generating good results, so I'd go ahead and release it. Ok, new recompiled v14 apps for Linux on Intel CPUs: Linux32 SSE3_32 SSE2_32 SSE_32 Linux64 SSE3_64 SSSE3_64 SSE41_64 Please report errors (or success) here. mic. ID: 9023 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 9025 - Posted: 25 Jan 2009, 2:27:32 UTC - in response to Message 9023. Although these are running a bit faster and not erroring as frequently as before so it's no big deal that they're released :D ID: 9025 · Rating: 0 · rate: / Reply Quote

Purple Rabbit Send message Joined: 9 Nov 08 Posts: 44 Credit: 128,043,914 RAC: 0	Message 9028 - Posted: 25 Jan 2009, 2:33:59 UTC Last modified: 25 Jan 2009, 3:23:54 UTC I'll take that as a yes :-) I just downloaded the SSE version. Unfortunately only my 1.3 GHz Celerons can take advantage of these apps (much to their pleasure I might add). The .12 version saw an increase from 2:02 to 1:37 CPU time as compared to the stock version. My AMD 3800+ x2 and AMD 5600+ x2 can't handle them (as expected). I only tried the 64 bit SSE3 app tho. Maybe the 32 bit SSE2 will work? It's much too late tonight to embark on what may be a major task. I'll try it tomorrow unless someone knows it's futile. ID: 9028 · Rating: 0 · rate: / Reply Quote

Purple Rabbit Send message Joined: 9 Nov 08 Posts: 44 Credit: 128,043,914 RAC: 0	Message 9050 - Posted: 25 Jan 2009, 4:18:06 UTC - in response to Message 9028. The .12 version saw an increase from 2:02 to 1:37 CPU time as compared to the stock version. I meant decrease ...sigh. Stupid computer sends what I type rather than what I meant! ID: 9050 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 9088 - Posted: 25 Jan 2009, 12:36:04 UTC Travis, please take a look at this host, everything coming in now is done with the new recompiled v15. mic. ID: 9088 · Rating: 0 · rate: / Reply Quote

Neal Chantrill Send message Joined: 17 Jan 09 Posts: 98 Credit: 72,182,367 RAC: 0	Message 9095 - Posted: 25 Jan 2009, 14:10:44 UTC You people obviously put a lot of hard work in and I thank you for that, but is there a newbies guide to installing these? I have just started playing with linux and have 2 quad cores that I'd love to try these out on. I have tried searching but to no avail. Thanks again, Neal ID: 9095 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 9107 - Posted: 25 Jan 2009, 17:59:11 UTC - in response to Message 9088. Travis, please take a look at this host, everything coming in now is done with the new recompiled v15. It looked good until maybe the couple workunits which were bad... However the stock app is STILL doing the same thing ;( I have no clue what's up. ID: 9107 · Rating: 0 · rate: / Reply Quote