Recompiled Linux 32/64 apps
log in

Advanced search

Message boards : Application Code Discussion : Recompiled Linux 32/64 apps

Author Message
Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8664 - Posted: 19 Jan 2009 | 16:06:06 UTC

I thought it be better to have them in one place...

Linux 32bit:

SSE

SSE2

SSE3


Linux 64bit:

SSE3

SSSE3

SSE4.1

All compiled with Intel icpc from the original code.
All tested against the test wus and giving exactly the same result.

Please report any problems here.

____________
mic.


Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 8671 - Posted: 19 Jan 2009 | 18:51:01 UTC - in response to Message 8664.

What kind of speedup did you get on these compared to the stock app? And did you use any specific compiler flags?
____________

Profile DoctorNow
Avatar
Send message
Joined: 28 Aug 07
Posts: 146
Credit: 5,183,509
RAC: 0
Message 8672 - Posted: 19 Jan 2009 | 19:13:43 UTC - in response to Message 8664.

All compiled with Intel icpc from the original code.
All tested against the test wus and giving exactly the same result.

Can you make them AMD compatibel?
I just tried it out and it didn't run of course. :-\
____________
Member of BOINC@Heidelberg and ATA!

My BOINCstats

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8673 - Posted: 19 Jan 2009 | 19:39:42 UTC - in response to Message 8671.

What kind of speedup did you get on these compared to the stock app? And did you use any specific compiler flags?


This host is runing the SSE41_64 version. But I don't have numbers for stock...

Temujin posted numbers for the 32bit versions.

And here are flags:

CXX_i686 = icpc
CXXFLAGS_i686 = -xSSE3 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals

CXX_x86_64 = icpc
CXXFLAGS_x86_64 = -xSSE4.1 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals


____________
mic.


Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8676 - Posted: 19 Jan 2009 | 20:23:25 UTC

Due to the v11 release I removed the apps from the server.
New ones soon. :-)
____________
mic.


Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 8677 - Posted: 19 Jan 2009 | 20:28:02 UTC - in response to Message 8676.

Stickying this because i think it's something our linux users would like. Also, the linux optimized apps seem to have been returning good results :)
____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8687 - Posted: 19 Jan 2009 | 22:59:11 UTC - in response to Message 8677.

Stickying this because i think it's something our linux users would like. Also, the linux optimized apps seem to have been returning good results :)


Good to hear, I hope the V11/12 sill does. ;)

The SSE41_64 already running on my Q9550, I'll post it (and the other SSE levels) when it still gets credits tomorrow.
____________
mic.


Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 8720 - Posted: 20 Jan 2009 | 6:31:58 UTC - in response to Message 8687.

Stickying this because i think it's something our linux users would like. Also, the linux optimized apps seem to have been returning good results :)


Good to hear, I hope the V11/12 sill does. ;)

The SSE41_64 already running on my Q9550, I'll post it (and the other SSE levels) when it still gets credits tomorrow.


Things look fine on my end, it will be getting credits tomorrow.
____________

Gavin Shaw
Avatar
Send message
Joined: 16 Jan 08
Posts: 98
Credit: 1,371,299
RAC: 0
Message 8722 - Posted: 20 Jan 2009 | 6:40:41 UTC
Last modified: 20 Jan 2009 | 6:41:10 UTC

I suppose the question then has to be asked.

Why are the linux opt apps giving good results, but the Win ones are causing a load of trouble? Is it a compiler flag difference or code change?

And I'm sure there are a lot of Win users (and maybe MAC users too) who would like to have similarly optimized apps (that return good results).
____________
Never surrender and never give up. In the darkest hour there is always hope.

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8726 - Posted: 20 Jan 2009 | 8:18:52 UTC - in response to Message 8722.

I suppose the question then has to be asked.

Why are the linux opt apps giving good results, but the Win ones are causing a load of trouble? Is it a compiler flag difference or code change?

And I'm sure there are a lot of Win users (and maybe MAC users too) who would like to have similarly optimized apps (that return good results).


Any code change from my side, just compiler and flags (posted below).

____________
mic.


Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8730 - Posted: 20 Jan 2009 | 10:14:49 UTC

New recompiled apps for Linux64 on Intel:

SSE3_64


SSSE3_64


SSE41_64

It seems they don't run well on AMD, so it might better for AMD users to try a testfile in standalone mode --> look here

Please report errors (and success) here.
____________
mic.


Lazarus-uk
Send message
Joined: 22 Mar 08
Posts: 7
Credit: 2,608,066
RAC: 0
Message 8731 - Posted: 20 Jan 2009 | 11:12:38 UTC - in response to Message 8730.

New recompiled apps for Linux64 on Intel:

SSE3_64


SSSE3_64


SSE41_64

It seems they don't run well on AMD, so it might better for AMD users to try a testfile in standalone mode --> look here

Please report errors (and success) here.



Just completed and validated my first WU with the SSE4.1 App.

Result here.

Running Linux-Ubuntu 8.10 64-bit, Q9450@3.4GHz, WU took 683 secs.


Looking good Mic


Profile Cori
Avatar
Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 8733 - Posted: 20 Jan 2009 | 12:34:43 UTC

Hi speedimic,

THX again for the great work!
My Linux quad likes the SSSE3 app very much... ;-)

Xubuntu 8.10 64-bit, Intel Q6600 @ 2.4 Ghz

Each WU needs ~18-19 minutes on that host and no errors so far! :-)))
____________
Lovely greetings, Cori

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8744 - Posted: 20 Jan 2009 | 14:10:06 UTC - in response to Message 8730.
Last modified: 20 Jan 2009 | 14:10:22 UTC

New recompiled apps for Linux32 on Intel:

SSE3_32


SSE2_32


SSE_32

It seems they don't run well on AMD, so it might better for AMD users to try a testfile in standalone mode --> look here

Please report errors (and success) here.
____________
mic.


Profile [AF>HFR>RR] Sp0wn
Send message
Joined: 16 Mar 08
Posts: 10
Credit: 59,990,626
RAC: 0
Message 8753 - Posted: 20 Jan 2009 | 15:29:54 UTC - in response to Message 8744.

As I told here, I have some problem with some wus only with SSE4.1 X86_64 app !

I don t know if it's the application or the new assimilator/validator :/

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 8756 - Posted: 20 Jan 2009 | 15:53:26 UTC - in response to Message 8730.

It seems they don't run well on AMD, so it might better for AMD users to try a testfile in standalone mode --> look here

Well, of course not. The Intel compiler runs degraded code when run on AMD processors. The highest SSE level that it runs on AMD processors is SSE2.

HTH

____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 8762 - Posted: 20 Jan 2009 | 17:27:48 UTC - in response to Message 8722.

I suppose the question then has to be asked.

Why are the linux opt apps giving good results, but the Win ones are causing a load of trouble? Is it a compiler flag difference or code change?

And I'm sure there are a lot of Win users (and maybe MAC users too) who would like to have similarly optimized apps (that return good results).


Linux and OSX seem to be returning correct results (ie. those in line with the stock app). For windows, it seems there are 2-3 different strains of bad applications out there.
____________

[BMF]DevinK
Send message
Joined: 25 Nov 08
Posts: 1
Credit: 2,741,066
RAC: 0
Message 8831 - Posted: 22 Jan 2009 | 1:43:02 UTC - in response to Message 8762.

How do you install these clients in Ubuntu64?

Tried replacing the stock opt app in /var/lib/boinc-client/.. but it errored out.
Where to place the app_info.xml?

Guess i'm not following the right steps?

thx

Profile mindc
Send message
Joined: 9 Jul 08
Posts: 7
Credit: 11,070,991
RAC: 0
Message 8838 - Posted: 22 Jan 2009 | 3:16:07 UTC - in response to Message 8756.

The Intel compiler runs degraded code when run on AMD processors. The highest SSE level that it runs on AMD processors is SSE2.

HTH


The SSE3 code runs very well on AMD LE-1600, like C2D with the same clock.
I use icpc -xO for this.

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8842 - Posted: 22 Jan 2009 | 4:46:56 UTC - in response to Message 8831.

How do you install these clients in Ubuntu64?

Tried replacing the stock opt app in /var/lib/boinc-client/.. but it errored out.
Where to place the app_info.xml?

Guess i'm not following the right steps?

thx


First stop the service ( /etc/init.d/boinc-client stop ), then put the files from the zip (app_info.xml, milkyway_0.12_SSEwhatever...) to /var/lib/boinc-client/projects/milky..., then restart the client.
____________
mic.


Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 8850 - Posted: 22 Jan 2009 | 15:13:31 UTC - in response to Message 8838.

The SSE3 code runs very well on AMD LE-1600, like C2D with the same clock.
I use icpc -xO for this.

At run-time the processor is probed and if it's by AMD, then degraded code is run instead of the SSE3 code.

See http://techreport.com/discussions.x/8547 for a snippet.

HTH

____________

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 8851 - Posted: 22 Jan 2009 | 15:34:36 UTC - in response to Message 8673.

And here are flags:

CXX_i686 = icpc
CXXFLAGS_i686 = -xSSE3 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals

CXX_x86_64 = icpc
CXXFLAGS_x86_64 = -xSSE4.1 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals


You can omit "fast-transcendentals" as this is the default when specifying "-fp-model fast" (or even fast=2).
I don't use -unroll-aggressive and -opt-multi-version-aggressive, does it help the performance? I would think that it doesn't bring much to the table.

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8854 - Posted: 22 Jan 2009 | 16:30:06 UTC - in response to Message 8851.
Last modified: 22 Jan 2009 | 16:30:22 UTC

And here are flags:

CXX_i686 = icpc
CXXFLAGS_i686 = -xSSE3 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals

CXX_x86_64 = icpc
CXXFLAGS_x86_64 = -xSSE4.1 -O3 -ipo -no-prec-div -static -fp-model fast=2 -fp-speculation=fast -opt-calloc -unroll-aggressive -opt-multi-version-aggressive -fast-transcendentals


You can omit "fast-transcendentals" as this is the default when specifying "-fp-model fast" (or even fast=2).
I don't use -unroll-aggressive and -opt-multi-version-aggressive, does it help the performance? I would think that it doesn't bring much to the table.


Right, leaving them away doesn't make an difference.

Any suggestions to squeeze out some more?
____________
mic.


Profile mindc
Send message
Joined: 9 Jul 08
Posts: 7
Credit: 11,070,991
RAC: 0
Message 8898 - Posted: 23 Jan 2009 | 13:19:43 UTC - in response to Message 8850.

The SSE3 code runs very well on AMD LE-1600, like C2D with the same clock.
I use icpc -xO for this.

At run-time the processor is probed and if it's by AMD, then degraded code is run instead of the SSE3 code.

See http://techreport.com/discussions.x/8547 for a snippet.

HTH


1. This article is very old: 11:58 AM on July 13, 2005
2. I've got a lot better performance with SSE2 (20% boost) than without it, and slightly better performance with SSE3 than SSE2 (another 1% boost) and I'm talking about AMD chip and milkyway app of course.


Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 8903 - Posted: 23 Jan 2009 | 15:47:42 UTC - in response to Message 8898.

1. This article is very old: 11:58 AM on July 13, 2005
2. I've got a lot better performance with SSE2 (20% boost) than without it, and slightly better performance with SSE3 than SSE2 (another 1% boost) and I'm talking about AMD chip and milkyway app of course.

1 - Yet, it's still true. It's been known in the open source community and Intel's response was that they cannot guarantee their compiler except on their processors, fair enough. Is this new enough for you?

2 - 1% is too close to noise to call a boost.

____________

Profile mindc
Send message
Joined: 9 Jul 08
Posts: 7
Credit: 11,070,991
RAC: 0
Message 8922 - Posted: 24 Jan 2009 | 0:38:19 UTC - in response to Message 8903.

1 - It's been known in the open source community and Intel's response was that they cannot guarantee their compiler except on their processors, fair enough. Is this new enough for you?

2 - 1% is too close to noise to call a boost.


1. It seems you are right.
2. But I've made more tests:

averaged boost in calculation times for 126 runs of milkyway app on idle machine
SSE3 app: 121.86%
SSE2 app: 119.03%
base app: 100.00%

I don't think this is 'noise' only...
I'm confused now...
Maybe this is milkyway specific...


Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 8923 - Posted: 24 Jan 2009 | 0:44:43 UTC - in response to Message 8922.


2. But I've made more tests:

averaged boost in calculation times for 126 runs of milkyway app on idle machine
SSE3 app: 121.86%
SSE2 app: 119.03%
base app: 100.00%

I don't think this is 'noise' only...
I'm confused now...
Maybe this is milkyway specific...

Hard to explain why. Maybe even though the processor doesn't get to run SSE3 code, the code is different, though SSE2, and the outcome is better, perhaps because of something as mundane as some branches getting aligned favorably. Regardless, I agree that it's more than noise.

Thanks.

____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 8933 - Posted: 24 Jan 2009 | 17:41:50 UTC - in response to Message 8923.


2. But I've made more tests:

averaged boost in calculation times for 126 runs of milkyway app on idle machine
SSE3 app: 121.86%
SSE2 app: 119.03%
base app: 100.00%

I don't think this is 'noise' only...
I'm confused now...
Maybe this is milkyway specific...

Hard to explain why. Maybe even though the processor doesn't get to run SSE3 code, the code is different, though SSE2, and the outcome is better, perhaps because of something as mundane as some branches getting aligned favorably. Regardless, I agree that it's more than noise.

Thanks.


SSE3 might be doing some other optimizations (or have some changes in optimizations) which are better than what was in SSE2, because it's newer.
____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8952 - Posted: 24 Jan 2009 | 20:48:05 UTC

Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14.
____________
mic.


Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 8956 - Posted: 24 Jan 2009 | 20:56:42 UTC - in response to Message 8952.

Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14.


I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits.
____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 8960 - Posted: 24 Jan 2009 | 21:10:05 UTC - in response to Message 8956.

Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14.


I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits.


The results of the test-units is exactly the same as my v12.

I'll post the v14 as soon as give the ok. :)
____________
mic.


Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 8976 - Posted: 24 Jan 2009 | 23:13:27 UTC - in response to Message 8933.

SSE3 might be doing some other optimizations (or have some changes in optimizations) which are better than what was in SSE2, because it's newer.

Yes, but the Intel compiler checks if the code is running on an Intel CPU and, if it's not, it runs an alternative SSE2 code instead. It'll run SSE3 or later only on Intel processors. As these results are on an AMD CPU, it's not benefiting from the SSE3 optimizations.

HTH


____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 9017 - Posted: 25 Jan 2009 | 2:14:03 UTC - in response to Message 8960.
Last modified: 25 Jan 2009 | 2:25:15 UTC

Travis, please take a look at this host, everything coming in after 20:45 UTC is done with the new recompiled v14.


I'll let you know as soon as I get some more results from it. But it should be OK if it was returning the same results for the test workunits.


The results of the test-units is exactly the same as my v12.

I'll post the v14 as soon as give the ok. :)


Looks to me like it's generating good results, so I'd go ahead and release it.

*edit* scratch that. Looking at some results, the stock app and other new compiled apps are still having the same issue (however not as frequently). No point in updating it until this whole thing is fixed.
____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9023 - Posted: 25 Jan 2009 | 2:25:54 UTC - in response to Message 9017.

Looks to me like it's generating good results, so I'd go ahead and release it.

Ok, new recompiled v14 apps for Linux on Intel CPUs:

Linux32

SSE3_32

SSE2_32

SSE_32

Linux64

SSE3_64

SSSE3_64

SSE41_64

Please report errors (or success) here.

____________
mic.


Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 9025 - Posted: 25 Jan 2009 | 2:27:32 UTC - in response to Message 9023.

Although these are running a bit faster and not erroring as frequently as before so it's no big deal that they're released :D
____________

Profile Purple Rabbit
Avatar
Send message
Joined: 9 Nov 08
Posts: 43
Credit: 62,531,043
RAC: 69,842
Message 9028 - Posted: 25 Jan 2009 | 2:33:59 UTC
Last modified: 25 Jan 2009 | 3:23:54 UTC

I'll take that as a yes :-) I just downloaded the SSE version. Unfortunately only my 1.3 GHz Celerons can take advantage of these apps (much to their pleasure I might add). The .12 version saw an increase from 2:02 to 1:37 CPU time as compared to the stock version.

My AMD 3800+ x2 and AMD 5600+ x2 can't handle them (as expected). I only tried the 64 bit SSE3 app tho. Maybe the 32 bit SSE2 will work? It's much too late tonight to embark on what may be a major task. I'll try it tomorrow unless someone knows it's futile.

Profile Purple Rabbit
Avatar
Send message
Joined: 9 Nov 08
Posts: 43
Credit: 62,531,043
RAC: 69,842
Message 9050 - Posted: 25 Jan 2009 | 4:18:06 UTC - in response to Message 9028.

The .12 version saw an increase from 2:02 to 1:37 CPU time as compared to the stock version.


I meant decrease ...sigh. Stupid computer sends what I type rather than what I meant!

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9088 - Posted: 25 Jan 2009 | 12:36:04 UTC

Travis, please take a look at this host, everything coming in now is done with the new recompiled v15.

____________
mic.


Profile Neal Chantrill
Avatar
Send message
Joined: 17 Jan 09
Posts: 96
Credit: 69,405,108
RAC: 9,730
Message 9095 - Posted: 25 Jan 2009 | 14:10:44 UTC

You people obviously put a lot of hard work in and I thank you for that, but is there a newbies guide to installing these? I have just started playing with linux and have 2 quad cores that I'd love to try these out on.

I have tried searching but to no avail.

Thanks again,

Neal

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 9107 - Posted: 25 Jan 2009 | 17:59:11 UTC - in response to Message 9088.

Travis, please take a look at this host, everything coming in now is done with the new recompiled v15.


It looked good until maybe the couple workunits which were bad... However the stock app is STILL doing the same thing ;( I have no clue what's up.
____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9113 - Posted: 25 Jan 2009 | 18:38:41 UTC - in response to Message 9107.

It looked good until maybe the couple workunits which were bad... However the stock app is STILL doing the same thing ;( I have no clue what's up.

Ok, as they are not worse than stock, new recompiled v15 apps for Linux on Intel CPUs:

Linux32

SSE3_32
SSE2_32
SSE_32

Linux64

SSE3_64
SSSE3_64
SSE41_64

For the AMD users I got two new apps to try:

AMD SSE3_64
AMD SSE2_32

I only had a chance to test the AMD SSE2_32 on my Athlon64 3200+, so the rest of the testing is up to you...

____________
mic.


Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9121 - Posted: 25 Jan 2009 | 19:52:39 UTC - in response to Message 9095.

You people obviously put a lot of hard work in and I thank you for that, but is there a newbies guide to installing these? I have just started playing with linux and have 2 quad cores that I'd love to try these out on.

I have tried searching but to no avail.

Thanks again,

Neal


Basically you shut down BOINC, put the files from the zip into your project folder and restart BOINC.

Did you get BOINC from the website, or did you install it with the package manager?
____________
mic.


Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9144 - Posted: 25 Jan 2009 | 21:31:09 UTC

Travis, please take a look at this host, everything coming in now is done with the new recompiled v16.

Feels like "Groundhog Day"... :)
____________
mic.


Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 1976
Credit: 26,480
RAC: 0
Message 9173 - Posted: 26 Jan 2009 | 1:52:26 UTC - in response to Message 9144.

Travis, please take a look at this host, everything coming in now is done with the new recompiled v16.

Feels like "Groundhog Day"... :)


Looks like they're doing well. Haven't had a bad result come back with any compiled v0.16 app yet :)

____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9199 - Posted: 26 Jan 2009 | 9:48:16 UTC - in response to Message 9173.

I'll post the the new apps whe I get back from work, around 17.00 UTC.
...hope that's soon enough to catch the stricter error checking.

____________
mic.


Profile DoctorNow
Avatar
Send message
Joined: 28 Aug 07
Posts: 146
Credit: 5,183,509
RAC: 0
Message 9200 - Posted: 26 Jan 2009 | 10:23:51 UTC - in response to Message 9113.

For the AMD users I got two new apps to try:

AMD SSE3_64

Just wanted to try this one out, but it gave me a "Not found"-message...
____________
Member of BOINC@Heidelberg and ATA!

My BOINCstats

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9203 - Posted: 26 Jan 2009 | 12:11:23 UTC - in response to Message 9200.

For the AMD users I got two new apps to try:

AMD SSE3_64

Just wanted to try this one out, but it gave me a "Not found"-message...


new version will be up soon - stay tuned ;)
____________
mic.


Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9208 - Posted: 26 Jan 2009 | 16:27:40 UTC - in response to Message 9173.

Looks like they're doing well. Haven't had a bad result come back with any compiled v0.16 app yet :)

Now then, the new recompiled v16 apps for Linuxs:

Linux32 on Intel

SSE3_32
SSE2_32
SSE_32

Linux64 on Intel

SSE3_64
SSSE3_64
SSE41_64

For AMD users:

AMD SSE3_64
AMD SSE2_32

I only had the chance to test the AMD SSE2_32 on my Athlon64 3200+, so the rest of the testing is up to you... Please report!

____________
mic.


Profile DoctorNow
Avatar
Send message
Joined: 28 Aug 07
Posts: 146
Credit: 5,183,509
RAC: 0
Message 9212 - Posted: 26 Jan 2009 | 17:53:45 UTC - in response to Message 9208.

AMD SSE3_64

Hi speedimic.
Just finished the first two results with the 64-bit version.
Here's one example.
Unfortunately it takes about 24 minutes, which is about 2 1/2 times longer than the Windows one.
Gives a good result anyway, but it's not worth the 64-bit and so gives less credits.
____________
Member of BOINC@Heidelberg and ATA!

My BOINCstats

Haksu
Send message
Joined: 28 Nov 08
Posts: 4
Credit: 17,795,128
RAC: 0
Message 9213 - Posted: 26 Jan 2009 | 17:55:42 UTC

Hi
thanks a lot for these linux apps, they are running fine. I have one older AMD SSE-capable and on that I am now trying the v16 "Linux on Intel" SSE, and that seems to be working as well
____________

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 9219 - Posted: 26 Jan 2009 | 19:30:59 UTC - in response to Message 9208.

Since x86-64 guarantees that at least SSE2 is available, did you make sure to enable vectorization through GCC's -ftree-vectorize option (implied by -O3 in versions 4.3 and later)? For that matter, any SSE build could benefit from vectorization.

HTH


____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9222 - Posted: 26 Jan 2009 | 20:41:09 UTC - in response to Message 9219.

Since x86-64 guarantees that at least SSE2 is available, did you make sure to enable vectorization through GCC's -ftree-vectorize option (implied by -O3 in versions 4.3 and later)? For that matter, any SSE build could benefit from vectorization.

HTH


-O3 is on, but I use the Intel compiler. ;)
____________
mic.


Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9224 - Posted: 26 Jan 2009 | 20:49:07 UTC - in response to Message 9212.

...
Unfortunately it takes about 24 minutes, which is about 2 1/2 times longer than the Windows one.

You can't compare apples with pears. :) How long is the stock app running?

Gives a good result anyway, but it's not worth the 64-bit and so gives less credits.

If it gives less credit, you might be running into the 108 credit barrier.
What do mean by 'it's not worth the 64bit'?

____________
mic.


Profile Purple Rabbit
Avatar
Send message
Joined: 9 Nov 08
Posts: 43
Credit: 62,531,043
RAC: 69,842
Message 9230 - Posted: 26 Jan 2009 | 21:47:57 UTC
Last modified: 26 Jan 2009 | 22:01:37 UTC

Here's apples to apples. SSE_32 and AMD SSE3_64 are both winners. First of all they run! Here's what I've seen so far (remember sample size is 1 for the new app).

SSE_32: Running on two 1.3 GHz Celeron computers running SUSE 10.3. The stock app took about 2:02 hours. The optimized app (both .14 and .16) take about about 1:35 hours.

AMD SSE3_64: My AMD64 5600+X2 running SUSE 10.3 64 bit ran the optimized .07 (SSE2) app in 0:31 hours. The stock apps from 0.12 to 0.16 ran in about 0:53 hours. The new optimized app ran in 0:23 hours.

AMD SSE3_64: My AMD64 3800+X2 OC'd to 2.35 GHz running SUSE 11.1 64 bit ran the 0.16 stock app in 1:01 hours. The new optimized app ran in 0:28 hours.

Thanks Speedimic for the optimized apps. They are working VERY well, but what will you do for an encore? :-)

Rick

Profile KSMarksPsych
Avatar
Send message
Joined: 9 Sep 07
Posts: 22
Credit: 320,035
RAC: 0
Message 9240 - Posted: 27 Jan 2009 | 0:01:42 UTC

Many thanks from me too!

I had the .14 SSE2 running on my Core Duo (Fedora 9). Times went down for about an hour and a half to just under an hour. I found out that PNI = SSE3 so I'm trying the .16 SSE3 version now.

Running well on my Core 2 Duo (Fedora 7) as well. I'm getting ready to upgrade to the .16 release.
____________
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.

Profile DoctorNow
Avatar
Send message
Joined: 28 Aug 07
Posts: 146
Credit: 5,183,509
RAC: 0
Message 9260 - Posted: 27 Jan 2009 | 6:00:27 UTC - in response to Message 9224.
Last modified: 27 Jan 2009 | 6:02:06 UTC

How long is the stock app running?

Erm, sorry, I didn't try the 64-bit 0.16 stock app yet, I automatically thought your one would be faster... *shame-on-me* *rolleyes*

If it gives less credit, you might be running into the 108 credit barrier.

Erm, no, with that run time it gave me about 66 credits/hour (on an AMD X2 64 5200 with Suse 11 64-Bit). No limit reached, but it should normally...

What do mean by 'it's not worth the 64bit'?

Was no intended harm, I meant it that way that a 64-Bit app normally should be faster than a 32-Bit app.
And that is not the case. ;-)
I don't know how Crunch3r managed that but as he made his optimized app way earlier he made also a Linux one. Compared to his Windows app (5-6 minutes on my X2) it took then 4-5 minutes...
____________
Member of BOINC@Heidelberg and ATA!

My BOINCstats

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 9265 - Posted: 27 Jan 2009 | 16:20:02 UTC
Last modified: 27 Jan 2009 | 16:42:33 UTC

Here are some results from this host:


  • 0.16: 82min
  • 0.16: 84min
  • 0.16: 82min
  • 0.16: 81min
  • 0.16: 84min
  • 0.16: 81min
  • 0.15: 83min


I didn't record the times for 0.14, but IIRC they were roughly in the same ballpark, i.e., no noticeable improvement between versions.

Note that this host is a production server which idles most of the time, therefore Linux power-management runs it at the slowest speed of 1GHz most of the time.
____________

Profile Cori
Avatar
Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 9267 - Posted: 27 Jan 2009 | 16:28:53 UTC - in response to Message 9265.
Last modified: 27 Jan 2009 | 16:29:51 UTC

Wow, that's fast!!!
My Linux quad (2.4Ghz) needs at least around 18 minutes with the opti SSSE3 (64bit) app!
What am I doing wrong?
____________
Lovely greetings, Cori

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 9268 - Posted: 27 Jan 2009 | 17:22:50 UTC - in response to Message 9260.

I don't know how Crunch3r managed that but as he made his optimized app way earlier he made also a Linux one. Compared to his Windows app (5-6 minutes on my X2) it took then 4-5 minutes...

I think Crunch3r has never given his optimized app to anyone. And you can't call the old 1.24 Linux version an optimized version, it was just using a better compiler enabling auto vectorization (SSE2) on 64Bit systems afaik.

And you should also remember the current WUs are 4 to 4.2 times the length of the old 260credit WUs for the 1.22 version.

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9270 - Posted: 27 Jan 2009 | 19:45:49 UTC - in response to Message 9260.

What do mean by 'it's not worth the 64bit'?

Was no intended harm, I meant it that way that a 64-Bit app normally should be faster than a 32-Bit app.
And that is not the case. ;-)...


Normally the 64bit (stock) apps are compiled with SSE2 enabled, because a 64bit-capable cpu is also capable of at least SSE2.
That's not the case for all 32bit cpus, so the 32bit apps are usually compiled without SSE2 and thus slower.
____________
mic.


Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 9271 - Posted: 27 Jan 2009 | 19:51:25 UTC - in response to Message 9270.

Normally the 64bit (stock) apps are compiled with SSE2 enabled, because a 64bit-capable cpu is also capable of at least SSE2.

And, if the compiler is capable of auto-vectorization, it should always be enabled for x86-64. For GCC, the option is -ftree-vectorize, which is implied by -O3 on versions 4.3 and later. Unfortunately, MS VS does not support auto-vectorization. For Windows the Intel compiler could be used instead.

HTH

____________

Temujin
Send message
Joined: 12 Oct 07
Posts: 77
Credit: 404,471,187
RAC: 0
Message 9297 - Posted: 28 Jan 2009 | 11:54:30 UTC - in response to Message 9240.

I found out that PNI = SSE3

According to this intel document, my Intel Xeon L5420 supports SSE4.1
cat /proc/cpuinfo flags shows -
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr

I know pni = SSE3 but how is SSE4.1 indicated?

Profile kashi
Send message
Joined: 30 Dec 07
Posts: 309
Credit: 148,432,104
RAC: 0
Message 9300 - Posted: 28 Jan 2009 | 14:13:55 UTC

Looks like it is indicated by sse4_1

My Xeon X3350 cat /proc/cpuinfo flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmovpat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 lahf_lm

Thank you for your SSE41_64 version speedimic. It is working well on my computer @ 3.3GHz.

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 9302 - Posted: 28 Jan 2009 | 15:51:51 UTC - in response to Message 9297.

cat /proc/cpuinfo flags shows -
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr

I know pni = SSE3 but how is SSE4.1 indicated?

Update your kernel.

HTH

____________

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 9704 - Posted: 5 Feb 2009 | 3:31:22 UTC - in response to Message 9208.

Hi mic,

I've added these to zslip

I hope that's OK


Looks like they're doing well. Haven't had a bad result come back with any compiled v0.16 app yet :)

Now then, the new recompiled v16 apps for Linuxs:

Linux32 on Intel

SSE3_32
SSE2_32
SSE_32

Linux64 on Intel

SSE3_64
SSSE3_64
SSE41_64

For AMD users:

AMD SSE3_64
AMD SSE2_32

I only had the chance to test the AMD SSE2_32 on my Athlon64 3200+, so the rest of the testing is up to you... Please report!


____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 9852 - Posted: 7 Feb 2009 | 8:04:03 UTC - in response to Message 9704.

Hi mic,

I've added these to zslip

I hope that's OK


Sure.
Always good to have everything in one place.

____________
mic.


Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 9856 - Posted: 7 Feb 2009 | 8:35:57 UTC - in response to Message 9852.

Hi mic,

I've added these to zslip

I hope that's OK


Sure.
Always good to have everything in one place.

cool ;)

____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 11020 - Posted: 16 Feb 2009 | 14:59:23 UTC

If someone feels like crunching full speed on linux, here are the new v18d apps.
Crunch time is cut to half compared to the prior version (on my quad at least...).

Linux32 Pack
(with new SSSE3-version)

Linux64 Pack

LinuxAMD Pack

Report problems & errors here...

____________
mic.


Profile Daniel
Send message
Joined: 25 Nov 07
Posts: 25
Credit: 35,505,304
RAC: 90,405
Message 11022 - Posted: 16 Feb 2009 | 15:07:26 UTC

Thanks speedimic! Just dumped it onto my linux32 laptop.

cwhyl
Send message
Joined: 11 Nov 07
Posts: 41
Credit: 1,000,181
RAC: 0
Message 11029 - Posted: 16 Feb 2009 | 16:23:19 UTC
Last modified: 16 Feb 2009 | 16:28:32 UTC

Yeah, 32bit SSSE3 Linux dropped from 22 to 9 minutes.
64bit SSSE3 runs them in 8.5 minutes compared to 18 with the stock app.
Both on Intel Q6600.
Very nice :)

Temujin
Send message
Joined: 12 Oct 07
Posts: 77
Credit: 404,471,187
RAC: 0
Message 11040 - Posted: 16 Feb 2009 | 19:34:42 UTC - in response to Message 11029.

Thanks mic, I now have it running on my Fedora9_64 boxes

Profile arkayn
Avatar
Send message
Joined: 14 Feb 09
Posts: 914
Credit: 74,780,239
RAC: 276
Message 11043 - Posted: 16 Feb 2009 | 20:54:17 UTC

My laptop is now running around 22-25 minutes with the optimised Linux app.
____________

Profile GalaxyIce
Avatar
Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 11046 - Posted: 16 Feb 2009 | 21:20:14 UTC - in response to Message 11020.
Last modified: 16 Feb 2009 | 21:20:41 UTC

If someone feels like crunching full speed on linux, here are the new v18d apps.
Crunch time is cut to half compared to the prior version (on my quad at least...).

Linux32 Pack
(with new SSSE3-version)

Linux64 Pack

LinuxAMD Pack

Report problems & errors here...

The above has been updated to zslip, thanks speedimic ;)
____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 11052 - Posted: 16 Feb 2009 | 21:58:30 UTC

If there's any other Linux32/64 - SSE-level combination needed, just tell me!

____________
mic.


Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 11508 - Posted: 18 Feb 2009 | 21:59:57 UTC

Just got myself the shiny new version of the Intel compiler... and made new apps.

Codebase is still 18d.
Due to the varying WUs I can't really tell if those are any faster - that' up to you to decide... ;)
So I would call updating optional!

What has changed is that everything up to and including sse3 is AMD compatible, and I made all sse-levels the compiler offered me.

Linux32-SSE
Linux32-SSE2
Linux32-SSE3
Linux32-SSSE3
Linux32-SSE4.1
Linux32-SSE4.2

Linux64-SSE3
Linux64-SSSE3
Linux64-SSE4.1
Linux64-SSE4.2



____________
mic.


Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 11514 - Posted: 18 Feb 2009 | 22:33:38 UTC - in response to Message 11508.

What has changed is that everything up to and including sse3 is AMD compatible, and I made all sse-levels the compiler offered me.

Linux32-SSE

If that's a ICC 11 version, than SSE is just a synonyme for x87 (no enhanced instruction set). It won't put any SSE instructions in the code.
The difference to the stock app is just the change gcc -> ICC. But maybe that gains some speed for older machines, too.

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 11518 - Posted: 18 Feb 2009 | 22:54:58 UTC - in response to Message 11514.

If that's a ICC 11 version, than SSE is just a synonyme for x87 (no enhanced instruction set). It won't put any SSE instructions in the code.
The difference to the stock app is just the change gcc -> ICC. But maybe that gains some speed for older machines, too.


someone called for that, so I made it... didn't get any feedback on the crunch time and I didn't try it.
____________
mic.


Profile Phil
Avatar
Send message
Joined: 13 Feb 08
Posts: 1124
Credit: 46,740
RAC: 0
Message 11535 - Posted: 19 Feb 2009 | 0:33:15 UTC - in response to Message 11508.

Just got myself the shiny new version of the Intel compiler... and made new apps.

Codebase is still 18d.
Due to the varying WUs I can't really tell if those are any faster - that' up to you to decide... ;)
So I would call updating optional!

What has changed is that everything up to and including sse3 is AMD compatible, and I made all sse-levels the compiler offered me.

SSE3 AMD is a really useful improvement, thank you!

Profile Dingo
Avatar
Send message
Joined: 28 Aug 07
Posts: 29
Credit: 49,053,887
RAC: 0
Message 11973 - Posted: 21 Feb 2009 | 2:44:09 UTC - in response to Message 11052.
Last modified: 21 Feb 2009 | 2:47:41 UTC

What about Intel Linux 64 Bit SSE2 ? I have a couple of Q6600's on SSE2 PC's
____________

Proud Founder and member of BOINC@AUSTRALIA

Have a look at my webcam

Profile kashi
Send message
Joined: 30 Dec 07
Posts: 309
Credit: 148,432,104
RAC: 0
Message 11989 - Posted: 21 Feb 2009 | 3:12:37 UTC - in response to Message 11973.

Q6600 has SSE3 and SSSE3, doesn't it? SSE3 is denoted by the code pni.

PNI (Prescott New Instructions) was the original engineering code name for SSE3.

Profile Dingo
Avatar
Send message
Joined: 28 Aug 07
Posts: 29
Credit: 49,053,887
RAC: 0
Message 11995 - Posted: 21 Feb 2009 | 3:27:01 UTC - in response to Message 11989.

OK on one Linux PC it just said SSE2 but on the windows using CPU-Z it says SSE3 so I guess I am fine and do not need a SSE2 version. I am not up on all this stuff :)

Thanks
____________

Proud Founder and member of BOINC@AUSTRALIA

Have a look at my webcam

Profile kashi
Send message
Joined: 30 Dec 07
Posts: 309
Credit: 148,432,104
RAC: 0
Message 12017 - Posted: 21 Feb 2009 | 4:00:41 UTC - in response to Message 11995.

No worries, judging by your stats for many projects you're struggling along quite well.
A few suitable ATI cards in some of those boxes and you'll be in orbit. :)

In case I didn't make myself clear before, Linux will not report SSE3 but pni instead. pni=SSE3.

I don't know what CPU-Z shows, as I haven't got a working SSE3 capable computer. With perfect timing my X3350 computer with a HD3850 failed a few days before the ATI 32-bit Windows teaser application was released.

Profile Dingo
Avatar
Send message
Joined: 28 Aug 07
Posts: 29
Credit: 49,053,887
RAC: 0
Message 12021 - Posted: 21 Feb 2009 | 4:13:45 UTC - in response to Message 12017.

Turns out that Ubuntu (cat /proc/cpuinfo) shows it as ssse3, and the optimized ssse3 works fine, well so far anyway.
____________

Proud Founder and member of BOINC@AUSTRALIA

Have a look at my webcam

Profile kashi
Send message
Joined: 30 Dec 07
Posts: 309
Credit: 148,432,104
RAC: 0
Message 12034 - Posted: 21 Feb 2009 | 4:54:00 UTC - in response to Message 12021.
Last modified: 21 Feb 2009 | 5:09:28 UTC

Yes that's fine, SSSE3 is different to SSE3 though, it has some extra instructions and an extra "S". :)
SSE3=pni and SSSE3=SSSE3.

As mentioned in an earlier post in this thread an older Linux kernel may not report the most recent codes. This may explain why cat /proc/cpuinfo will report the same CPU differently on different boxes if you are using older and newer versions of Linux on them.

Not sure of the difference in speed between SSE3 and SSSE3 versions, I've only ever tried the Linux64 SSE4.1.
The main thing is you are crunching OK.

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 12403 - Posted: 22 Feb 2009 | 17:51:13 UTC - in response to Message 12034.

Not sure of the difference in speed between SSE3 and SSSE3 versions...

It should be zilch, since it's unlikely that the compiler will find opportunities in MW code to fit the multimedia-like SSSE3 instructions.

HTH

____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 12411 - Posted: 22 Feb 2009 | 18:28:58 UTC - in response to Message 12403.

Not sure of the difference in speed between SSE3 and SSSE3 versions...

It should be zilch, since it's unlikely that the compiler will find opportunities in MW code to fit the multimedia-like SSSE3 instructions.

HTH


right, not much difference - but a higher SSE-level surely looks faster. :D
____________
mic.


Profile RAMen
Avatar
Send message
Joined: 8 Apr 08
Posts: 41
Credit: 132,577,128
RAC: 192,111
Message 13746 - Posted: 3 Mar 2009 | 6:58:44 UTC
Last modified: 3 Mar 2009 | 7:03:50 UTC

Is there any plan for a linux app for ATI/GPU in the pipeline quad or i7

Profile gabberattack (johnny, eriq, segfault, r2k4, bully, sifon)
Send message
Joined: 6 Jun 08
Posts: 8
Credit: 152,709
RAC: 0
Message 16864 - Posted: 25 Mar 2009 | 22:18:41 UTC

Yes, release ATI app for Linux, please. CUDA for all systems is under construction now, but it makes sense to make app for stronger cards first and ATI is the one. I hate running my machine under Win just to be able to run Milky on GPU. :-( And I do not think there are more guys with NVidia GTS/GTX 200 cards than Linux ones with ATI 38xx/48xx.

Profile shaf*
Send message
Joined: 9 Mar 09
Posts: 37
Credit: 37,538,556
RAC: 0
Message 17624 - Posted: 5 Apr 2009 | 12:37:55 UTC

Still a little disappointed with the linux apps.

I have SSE3 running on an e6400@2.4 test machine ...

-rw-r--r-- 1 root root 594 2009-02-18 21:33 app_info.xml
-rw-r--r-- 1 root root 35127 2009-01-19 20:07 GPL.txt
-rwxr-xr-x 1 root root 1222624 2009-02-18 21:25 milkyway_0.18_sse3_i686-pc-linux-gnu

This yield a wu time of 35minutes

Whereas the windows op app yields a time of 20minutes on a slower processor.(1.8ghz T2390)

The difference seems a lot to me or is this typical ?



____________

mfl0p
Send message
Joined: 18 Feb 09
Posts: 8
Credit: 2,424,453
RAC: 0
Message 17633 - Posted: 5 Apr 2009 | 15:05:54 UTC
Last modified: 5 Apr 2009 | 15:08:00 UTC

After reading that Intel compiler flag "-fp-model fast=2" was being used on these applications, I figured I would run a test, and results are as expected.

The Linux 64bit SSSE3 0.18d mkII app downloaded from zslip.com gives invalid results. I suspect that the same flag was used on the other apps, too. Flag "-fp-model precise" needs to be used to get the fitness in line with expected output.

This app's output:


[admin@ntellx4 test_files]$ ./milkyway_0.18_SSSE3_x86_64-pc-linux-gnu
[admin@ntellx4 test_files]$ more out
searchname
parameters [8]: 0.733171635575244 14.657212876628332 -1.705465347395041 16.911711745343634 28.077212666463502 -1.203290851581461 3.527360643924728 2.224821450587501
metadata: this is the metadata
fitness: -3.027909854710229
speedimic_SSSE3_64: 0.18



Correct output:

86:
searchname
parameters [8]: 0.733171635575244 14.657212876628332 -1.705465347395041 16.911711745343634 28.077212666463502 -1.203290851581461 3.527360643924728 2.224821450587501
metadata: this is the metadata
fitness: -3.027909854710189
your_app_name: 0.18


Travis, i'm starting to think a quorum should be used as in other BOINC projects, since the code is open source.
____________

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 17634 - Posted: 5 Apr 2009 | 15:15:09 UTC - in response to Message 17633.
Last modified: 5 Apr 2009 | 15:17:20 UTC

The Linux 64bit SSSE3 0.18d mkII app downloaded from zslip.com gives invalid results.
[..]
This app's output:
fitness: -3.027909854710229

Correct output:
fitness: -3.027909854710189

No. Travis has given a range for an allowed deviation from the result you quoted. And speedimics results are within those requirements.
You also get small deviations between the results using the x87 FPU, SSE2, PowerPC FPU or AltiVec. As long they are small enough it is okay.

mfl0p
Send message
Joined: 18 Feb 09
Posts: 8
Credit: 2,424,453
RAC: 0
Message 17637 - Posted: 5 Apr 2009 | 15:37:23 UTC
Last modified: 5 Apr 2009 | 15:41:29 UTC

If that is the case, then why does Travis have a thread about testing custom apps, saying "The results should look like the following:" instead of "should be within a deviation of the following:"?

I'm sure others would like to know what this allowed deviation is. The result posted above has error as a result of about a 50% speed increase. That's pretty major.

Profile banditwolf
Avatar
Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 295,133
RAC: 0
Message 17639 - Posted: 5 Apr 2009 | 16:07:47 UTC

He did post awhile ago that the results must be to the x place, I believe it was the 10th. Those results are equal to the 12th.
____________
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 17665 - Posted: 5 Apr 2009 | 20:48:49 UTC - in response to Message 17624.

Still a little disappointed with the linux apps...

Whereas the windows op app yields a time of 20minutes on a slower processor.(1.8ghz T2390)

The difference seems a lot to me or is this typical ?

It might be because Linux manages power differently from Windows, running BOINC applications at a slow CPU frequency in order to save energy. See more details here.

HTH
____________

Profile shaf*
Send message
Joined: 9 Mar 09
Posts: 37
Credit: 37,538,556
RAC: 0
Message 17668 - Posted: 5 Apr 2009 | 21:42:59 UTC - in response to Message 17665.

It might be because Linux manages power differently from Windows, running BOINC applications at a slow CPU frequency in order to save energy. See more details here.

HTH


Afraid not, speedstep, thermal throttling are all disabled in the BIOS. Solid overclock applied and powernow/acpi daemons are not running once booted.

The linux client is simply inefficient compared to win32.
____________

Augustine
Avatar
Send message
Joined: 6 Sep 07
Posts: 65
Credit: 213,068
RAC: 26
Message 17677 - Posted: 5 Apr 2009 | 22:48:49 UTC - in response to Message 17668.

The linux client is simply inefficient compared to win32.

Different compilers perhaps?

____________

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 17678 - Posted: 5 Apr 2009 | 22:49:57 UTC - in response to Message 17668.

The linux client is simply inefficient compared to win32.

I thought the fastest version of Speedimic is quite close.

Just looked it up, speedimics own Q9550 (45nm, 2.83GHz) is taking about 1030 seconds for a 27.77 credit WU under Linux.

He has another host (Q6600, 65nm, appears to be overclocked to 2.7-2.8GHz from the benchmark values) completing the same tasks in about 1150 seconds under Windows. It is a somehow bad comparison because of the different clockspeed and that the 45nm CPUs should be faster per clock than their 65nm counterparts, but nonetheless it is clear that his Linux version can't be that bad.

Profile shaf*
Send message
Joined: 9 Mar 09
Posts: 37
Credit: 37,538,556
RAC: 0
Message 17680 - Posted: 5 Apr 2009 | 22:57:12 UTC - in response to Message 17678.
Last modified: 5 Apr 2009 | 22:58:38 UTC

The linux client is simply inefficient compared to win32.

I thought the fastest version of Speedimic is quite close.

Just looked it up, speedimics own Q9550 (45nm, 2.83GHz) is taking about 1030 seconds for a 27.77 credit WU under Linux.

He has another host (Q6600, 65nm, appears to be overclocked to 2.7-2.8GHz from the benchmark values) completing the same tasks in about 1150 seconds under Windows. It is a somehow bad comparison because of the different clockspeed and that the 45nm CPUs should be faster per clock than their 65nm counterparts, but nonetheless it is clear that his Linux version can't be that bad.


Here are my tasks ....

Laptop win32 : http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=39511370

Linux : http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=39523720

Thanks for the pointers I'll recheck if I'm using the wrong client
____________

Profile shaf*
Send message
Joined: 9 Mar 09
Posts: 37
Credit: 37,538,556
RAC: 0
Message 17681 - Posted: 5 Apr 2009 | 23:04:16 UTC - in response to Message 17678.


I thought the fastest version of Speedimic is quite close.

Just looked it up, speedimics own Q9550 (45nm, 2.83GHz) is taking about 1030 seconds for a 27.77 credit WU under Linux.

He has another host (Q6600, 65nm, appears to be overclocked to 2.7-2.8GHz from the benchmark values) completing the same tasks in about 1150 seconds under Windows. It is a somehow bad comparison because of the different clockspeed and that the 45nm CPUs should be faster per clock than their 65nm counterparts, but nonetheless it is clear that his Linux version can't be that bad.


OK the q6600 is closer to arch than the q9550. He's using SSE4.1 on the q9550.

Odd - He has a different client to me on his q6600. Any ideas where I can get it from ?

____________

Cluster Physik
Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 17683 - Posted: 5 Apr 2009 | 23:15:20 UTC - in response to Message 17680.

Here are my tasks ....

Laptop win32 : http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=39511370

Linux : http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=39523720

Thanks for the pointers I'll recheck if I'm using the wrong client

You could use the SSSE3 version (instead of only SSE3) on your Core2. The SSE3 binary retains compatibilty to AMD CPUs and may be a bit slower.

Profile shaf*
Send message
Joined: 9 Mar 09
Posts: 37
Credit: 37,538,556
RAC: 0
Message 17684 - Posted: 5 Apr 2009 | 23:20:04 UTC - in response to Message 17683.
Last modified: 5 Apr 2009 | 23:40:17 UTC

You could use the SSSE3 version (instead of only SSE3) on your Core2. The SSE3 binary retains compatibilty to AMD CPUs and may be a bit slower.


OK thanks - I don't think it'll make much difference - I'll give that a try... Seems also that he's using version 0.19....

This is looking more and more like me needing to compile 0.19 from source unless there are any binaries laying around for the lazy ;)
____________

Profile speedimic
Avatar
Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 17734 - Posted: 6 Apr 2009 | 16:01:19 UTC - in response to Message 17678.

The linux client is simply inefficient compared to win32.

I thought the fastest version of Speedimic is quite close.

Just looked it up, speedimics own Q9550 (45nm, 2.83GHz) is taking about 1030 seconds for a 27.77 credit WU under Linux.

He has another host (Q6600, 65nm, appears to be overclocked to 2.7-2.8GHz from the benchmark values) completing the same tasks in about 1150 seconds under Windows. It is a somehow bad comparison because of the different clockspeed and that the 45nm CPUs should be faster per clock than their 65nm counterparts, but nonetheless it is clear that his Linux version can't be that bad.


To be honest, the win-apps by ClusterPhysik are a bit faster.
Those Q6600ers aren't OC'd (the benchmark results might be a little off because of the 6.1 BOINC client)
If I recall it right, Cluster made some more modifications which didn't make it into stock code...

As stated before my apps are compiled from stock source and approved by Travis.

@mfl0p:
The compiler flag "-fp-model fast=2" doesn't make any difference - nighter in speed nor in accuracy (at least on my test box).
____________
mic.


mfl0p
Send message
Joined: 18 Feb 09
Posts: 8
Credit: 2,424,453
RAC: 0
Message 17767 - Posted: 6 Apr 2009 | 22:52:22 UTC - in response to Message 17734.
Last modified: 6 Apr 2009 | 22:52:41 UTC

[@mfl0p:
The compiler flag "-fp-model fast=2" doesn't make any difference - nighter in speed nor in accuracy (at least on my test box).


With Intel compiler 11.0 default -fp-model is fast=1. Using -fp-model precise on my linux machine gives the exact same results as expected in the test files forum thread. Changing to -fp-model fast (or any variation of fast) drops about 60 seconds off test parameters 86 runtime, at the expense of the deviated fitness result, like your app. But, it's irrelevant anyway, since it still produces a "close enough" answer, as mentioned in a few posts in this thread.

Figured I would mention it, for accuracy's sake, from Intel docs:

Recommendation: /fp:precise /fp:source (-fp-model precise –fp-model source) is the recommended form for the majority of situations where enhanced floating point consistency and reproducibility are needed.


Re: fpus on different architectures giving different results, I have observed this on the PPC platform, not using any math shortcuts in the compiler, all of the test file results match x86 except parameters 20, which has a devation of 1 at the 15th decimal place.

Skip Da Shu
Avatar
Send message
Joined: 11 Apr 08
Posts: 65
Credit: 35,475,834
RAC: 0
Message 24914 - Posted: 11 Jun 2009 | 3:17:17 UTC - in response to Message 16864.

Yes, release ATI app for Linux, please. CUDA for all systems is under construction now, but it makes sense to make app for stronger cards first and ATI is the one. I hate running my machine under Win just to be able to run Milky on GPU. :-( And I do not think there are more guys with NVidia GTS/GTX 200 cards than Linux ones with ATI 38xx/48xx.

Count me in on the Linux 64b (Ubuntu) w/ HD3870 tally.

____________
- da shu @ HeliOS,
"A child's exposure to technology should never be predicated on an ability to afford it."

CTAPbIi
Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
Message 36136 - Posted: 29 Jan 2010 | 16:13:06 UTC
Last modified: 29 Jan 2010 | 16:20:12 UTC

PLEASE, PLEASE, PLEASE!!! make ATI GPU 0.20 app for linux x64. I'm pissed off using windows... Right now I'm gonna build a rig for MW, but if there is a chance to avoid this - it will perfect :-)

And furthermore - there is app for linux in Collatz. What's the problem to make the same trick for MW?
____________

Anthony Waters
Send message
Joined: 16 Jun 09
Posts: 85
Credit: 172,476
RAC: 0
Message 36184 - Posted: 31 Jan 2010 | 23:23:03 UTC - in response to Message 36136.

PLEASE, PLEASE, PLEASE!!! make ATI GPU 0.20 app for linux x64. I'm pissed off using windows... Right now I'm gonna build a rig for MW, but if there is a chance to avoid this - it will perfect :-)

And furthermore - there is app for linux in Collatz. What's the problem to make the same trick for MW?


Please see http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1475, once the code is available I plan on attempting to compile a version for GNU/Linux.

CTAPbIi
Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
Message 36239 - Posted: 3 Feb 2010 | 19:42:30 UTC
Last modified: 3 Feb 2010 | 19:43:47 UTC

WOW, that's great news :-) Hopefully u'll get code soon

I'm not a geek in Linux (I'm using Ubuntu 3 years only), but u if need any help - feel free to contact me :-)
____________

Profile skildude
Avatar
Send message
Joined: 13 Jan 08
Posts: 51
Credit: 46,285,288
RAC: 394
Message 36665 - Posted: 19 Feb 2010 | 3:49:12 UTC - in response to Message 36239.

how about a linux ati app.
____________
Blah blah blah you know the rest

CTAPbIi
Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
Message 37425 - Posted: 16 Mar 2010 | 15:22:10 UTC - in response to Message 36665.

how about a linux ati app.

+1. guys, what's the news? if there is any chance to get it done? PLEASE, I wanna switch my 2nd rig to linux
____________

Divide Overflow
Avatar
Send message
Joined: 16 Feb 09
Posts: 109
Credit: 11,089,510
RAC: 0
Message 39351 - Posted: 2 May 2010 | 3:23:41 UTC

I'll add my request in for a Linux ATI GPU application. Hasn't the source code been available for a while now?

Ken_g6
Send message
Joined: 8 May 10
Posts: 2
Credit: 281,438
RAC: 2
Message 39592 - Posted: 10 May 2010 | 23:09:45 UTC

The Linux x86-64 SSE4.1 app consistently crashed on me about 30% of the way through, with a SIGSEGV (segfault) error. I have a C2Q 9400 that should support it.

CTAPbIi
Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
Message 40316 - Posted: 10 Jun 2010 | 14:54:17 UTC

looks no1 really care :( that's weird...
____________

barsanuphe
Send message
Joined: 19 Oct 08
Posts: 19
Credit: 1,463,876
RAC: 0
Message 40814 - Posted: 5 Jul 2010 | 21:15:58 UTC

i also have a hd5870 just asking to crunch under linux64... is this being worked on or is my GPU doomed to only crunch collatz?

fractal
Send message
Joined: 26 Oct 07
Posts: 8
Credit: 20,171,428
RAC: 19,112
Message 41394 - Posted: 11 Aug 2010 | 4:27:34 UTC - in response to Message 36184.

PLEASE, PLEASE, PLEASE!!! make ATI GPU 0.20 app for linux x64. I'm pissed off using windows... Right now I'm gonna build a rig for MW, but if there is a chance to avoid this - it will perfect :-)

And furthermore - there is app for linux in Collatz. What's the problem to make the same trick for MW?


Please see http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1475, once the code is available I plan on attempting to compile a version for GNU/Linux.

7 months later ... any change? My linux64/ati boxes have been parked on colatz for what seems like forever...

Divide Overflow
Avatar
Send message
Joined: 16 Feb 09
Posts: 109
Credit: 11,089,510
RAC: 0
Message 41569 - Posted: 18 Aug 2010 | 23:11:30 UTC - in response to Message 41394.

7 months later ... any change?

It would be nice to get an update on this since so much time has gone by. I'd still love to be able to use the ATI GPU in my Linux system.

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,704,253
RAC: 0
Message 41570 - Posted: 18 Aug 2010 | 23:18:49 UTC - in response to Message 41569.

7 months later ... any change?

It would be nice to get an update on this since so much time has gone by. I'd still love to be able to use the ATI GPU in my Linux system.



I'm working on a new OpenCL one which should work everywhere(Linux/OS X/Windows/Nvidia/ATI) right now. I hope to have it ready to release in 1-2 weeks.

Divide Overflow
Avatar
Send message
Joined: 16 Feb 09
Posts: 109
Credit: 11,089,510
RAC: 0
Message 41573 - Posted: 19 Aug 2010 | 0:20:26 UTC - in response to Message 41570.

I'm working on a new OpenCL one which should work everywhere(Linux/OS X/Windows/Nvidia/ATI) right now. I hope to have it ready to release in 1-2 weeks.

That's very nice to hear, Matt! There are a lot of people looking forward to testing it out when it's ready.

CTAPbIi
Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
Message 42014 - Posted: 9 Sep 2010 | 17:29:21 UTC

Hi Matt,

Nice to hear that :-)

R u talking about MilkyWay@Home Version 3 apps? if so, when we can expect them?
____________

KWH*
Avatar
Send message
Joined: 24 Aug 10
Posts: 180
Credit: 44,748,528
RAC: 115
Message 43363 - Posted: 1 Nov 2010 | 0:49:55 UTC

I am thinking of going Linux with Intel and ATI/AMD GPU farm. How is this project shaping up? Any advice for set up?

Post to thread

Message boards : Application Code Discussion : Recompiled Linux 32/64 apps


Main page · Your account · Message boards


Copyright © 2013 AstroInformatics Group