Welcome to MilkyWay@home

Recompiled Linux 32/64 apps

Message boards : Application Code Discussion : Recompiled Linux 32/64 apps
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 8850 - Posted: 22 Jan 2009, 15:13:31 UTC - in response to Message 8838.  

The SSE3 code runs very well on AMD LE-1600, like C2D with the same clock.
I use icpc -xO for this.

At run-time the processor is probed and if it's by AMD, then degraded code is run instead of the SSE3 code.

See http://techreport.com/discussions.x/8547 for a snippet.

HTH

ID: 8850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 8851 - Posted: 22 Jan 2009, 15:34:36 UTC - in response to Message 8673.  

And here are flags:

CXX_i686 = icpc
CXXFLAGS_i686 = -xSSE3 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals

CXX_x86_64 = icpc
CXXFLAGS_x86_64 = -xSSE4.1 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals


You can omit "fast-transcendentals" as this is the default when specifying "-fp-model fast" (or even fast=2).
I don't use -unroll-aggressive and -opt-multi-version-aggressive, does it help the performance? I would think that it doesn't bring much to the table.
ID: 8851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8854 - Posted: 22 Jan 2009, 16:30:06 UTC - in response to Message 8851.  
Last modified: 22 Jan 2009, 16:30:22 UTC

And here are flags:

CXX_i686 = icpc
CXXFLAGS_i686 = -xSSE3 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals

CXX_x86_64 = icpc
CXXFLAGS_x86_64 = -xSSE4.1 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals


You can omit "fast-transcendentals" as this is the default when specifying "-fp-model fast" (or even fast=2).
I don't use -unroll-aggressive and -opt-multi-version-aggressive, does it help the performance? I would think that it doesn't bring much to the table.


Right, leaving them away doesn't make an difference.

Any suggestions to squeeze out some more?
mic.


ID: 8854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mindc

Send message
Joined: 9 Jul 08
Posts: 7
Credit: 11,070,991
RAC: 0
Message 8898 - Posted: 23 Jan 2009, 13:19:43 UTC - in response to Message 8850.  

The SSE3 code runs very well on AMD LE-1600, like C2D with the same clock.
I use icpc -xO for this.

At run-time the processor is probed and if it's by AMD, then degraded code is run instead of the SSE3 code.

See http://techreport.com/discussions.x/8547 for a snippet.

HTH


1. This article is very old: 11:58 AM on July 13, 2005
2. I've got a lot better performance with SSE2 (20% boost) than without it, and slightly better performance with SSE3 than SSE2 (another 1% boost) and I'm talking about AMD chip and milkyway app of course.


ID: 8898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 8903 - Posted: 23 Jan 2009, 15:47:42 UTC - in response to Message 8898.  

1. This article is very old: 11:58 AM on July 13, 2005
2. I've got a lot better performance with SSE2 (20% boost) than without it, and slightly better performance with SSE3 than SSE2 (another 1% boost) and I'm talking about AMD chip and milkyway app of course.

1 - Yet, it's still true. It's been known in the open source community and Intel's response was that they cannot guarantee their compiler except on their processors, fair enough. Is this new enough for you?

2 - 1% is too close to noise to call a boost.

ID: 8903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mindc

Send message
Joined: 9 Jul 08
Posts: 7
Credit: 11,070,991
RAC: 0
Message 8922 - Posted: 24 Jan 2009, 0:38:19 UTC - in response to Message 8903.  

1 - It's been known in the open source community and Intel's response was that they cannot guarantee their compiler except on their processors, fair enough. Is this new enough for you?

2 - 1% is too close to noise to call a boost.


1. It seems you are right.
2. But I've made more tests:

averaged boost in calculation times for 126 runs of milkyway app on idle machine
SSE3 app: 121.86%
SSE2 app: 119.03%
base app: 100.00%

I don't think this is 'noise' only...
I'm confused now...
Maybe this is milkyway specific...


ID: 8922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 8923 - Posted: 24 Jan 2009, 0:44:43 UTC - in response to Message 8922.  


2. But I've made more tests:

averaged boost in calculation times for 126 runs of milkyway app on idle machine
SSE3 app: 121.86%
SSE2 app: 119.03%
base app: 100.00%

I don't think this is 'noise' only...
I'm confused now...
Maybe this is milkyway specific...

Hard to explain why. Maybe even though the processor doesn't get to run SSE3 code, the code is different, though SSE2, and the outcome is better, perhaps because of something as mundane as some branches getting aligned favorably. Regardless, I agree that it's more than noise.

Thanks.

ID: 8923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 8933 - Posted: 24 Jan 2009, 17:41:50 UTC - in response to Message 8923.  


2. But I've made more tests:

averaged boost in calculation times for 126 runs of milkyway app on idle machine
SSE3 app: 121.86%
SSE2 app: 119.03%
base app: 100.00%

I don't think this is 'noise' only...
I'm confused now...
Maybe this is milkyway specific...

Hard to explain why. Maybe even though the processor doesn't get to run SSE3 code, the code is different, though SSE2, and the outcome is better, perhaps because of something as mundane as some branches getting aligned favorably. Regardless, I agree that it's more than noise.

Thanks.


SSE3 might be doing some other optimizations (or have some changes in optimizations) which are better than what was in SSE2, because it's newer.
ID: 8933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8952 - Posted: 24 Jan 2009, 20:48:05 UTC

Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14.
mic.


ID: 8952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 8956 - Posted: 24 Jan 2009, 20:56:42 UTC - in response to Message 8952.  

Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14.


I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits.
ID: 8956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8960 - Posted: 24 Jan 2009, 21:10:05 UTC - in response to Message 8956.  

Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14.


I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits.


The results of the test-units is exactly the same as my v12.

I'll post the v14 as soon as give the ok. :)
mic.


ID: 8960 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
Message 8976 - Posted: 24 Jan 2009, 23:13:27 UTC - in response to Message 8933.  

SSE3 might be doing some other optimizations (or have some changes in optimizations) which are better than what was in SSE2, because it's newer.

Yes, but the Intel compiler checks if the code is running on an Intel CPU and, if it's not, it runs an alternative SSE2 code instead. It'll run SSE3 or later only on Intel processors. As these results are on an AMD CPU, it's not benefiting from the SSE3 optimizations.

HTH


ID: 8976 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 9017 - Posted: 25 Jan 2009, 2:14:03 UTC - in response to Message 8960.  
Last modified: 25 Jan 2009, 2:25:15 UTC

Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14.


I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits.


The results of the test-units is exactly the same as my v12.

I'll post the v14 as soon as give the ok. :)


Looks to me like it's generating good results, so I'd go ahead and release it.

*edit* scratch that. Looking at some results, the stock app and other new compiled apps are still having the same issue (however not as frequently). No point in updating it until this whole thing is fixed.
ID: 9017 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9023 - Posted: 25 Jan 2009, 2:25:54 UTC - in response to Message 9017.  

Looks to me like it's generating good results, so I'd go ahead and release it.

Ok, new recompiled v14 apps for Linux on Intel CPUs:

Linux32

SSE3_32

SSE2_32

SSE_32

Linux64

SSE3_64

SSSE3_64

SSE41_64

Please report errors (or success) here.

mic.


ID: 9023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 9025 - Posted: 25 Jan 2009, 2:27:32 UTC - in response to Message 9023.  

Although these are running a bit faster and not erroring as frequently as before so it's no big deal that they're released :D
ID: 9025 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 9 Nov 08
Posts: 44
Credit: 128,043,914
RAC: 0
Message 9028 - Posted: 25 Jan 2009, 2:33:59 UTC
Last modified: 25 Jan 2009, 3:23:54 UTC

I'll take that as a yes :-) I just downloaded the SSE version. Unfortunately only my 1.3 GHz Celerons can take advantage of these apps (much to their pleasure I might add). The .12 version saw an increase from 2:02 to 1:37 CPU time as compared to the stock version.

My AMD 3800+ x2 and AMD 5600+ x2 can't handle them (as expected). I only tried the 64 bit SSE3 app tho. Maybe the 32 bit SSE2 will work? It's much too late tonight to embark on what may be a major task. I'll try it tomorrow unless someone knows it's futile.
ID: 9028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Purple Rabbit
Avatar

Send message
Joined: 9 Nov 08
Posts: 44
Credit: 128,043,914
RAC: 0
Message 9050 - Posted: 25 Jan 2009, 4:18:06 UTC - in response to Message 9028.  

The .12 version saw an increase from 2:02 to 1:37 CPU time as compared to the stock version.


I meant decrease ...sigh. Stupid computer sends what I type rather than what I meant!
ID: 9050 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9088 - Posted: 25 Jan 2009, 12:36:04 UTC

Travis, please take a look at this host, everything coming in now is done with the new recompiled v15.

mic.


ID: 9088 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Neal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
Message 9095 - Posted: 25 Jan 2009, 14:10:44 UTC

You people obviously put a lot of hard work in and I thank you for that, but is there a newbies guide to installing these? I have just started playing with linux and have 2 quad cores that I'd love to try these out on.

I have tried searching but to no avail.

Thanks again,

Neal
ID: 9095 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 9107 - Posted: 25 Jan 2009, 17:59:11 UTC - in response to Message 9088.  

Travis, please take a look at this host, everything coming in now is done with the new recompiled v15.


It looked good until maybe the couple workunits which were bad... However the stock app is STILL doing the same thing ;( I have no clue what's up.
ID: 9107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : Application Code Discussion : Recompiled Linux 32/64 apps

©2024 Astroinformatics Group