Welcome to MilkyWay@home

Need specific instructions on how to optimize AMD Phenom for SSEa, and Milkyway@home for X64.

Message boards : Number crunching : Need specific instructions on how to optimize AMD Phenom for SSEa, and Milkyway@home for X64.
Message board moderation

To post messages, you must log in.

AuthorMessage
BadBike2

Send message
Joined: 11 Feb 11
Posts: 5
Credit: 21,146,417
RAC: 0
Message 46251 - Posted: 15 Feb 2011, 22:38:54 UTC
Last modified: 15 Feb 2011, 23:04:06 UTC

Hello,

I'm looking to optimize BOINC manager in a way that will improve processing performance, and not affect BOINC stability. My computer is equipped with a Phenom 9850BE processor clocked to 2.9Ghz, and has support for SSE1,2,3, and 4A. My video processor is an HD4850 that is working fine with BOINC at the moment. My BOINC client version is X64, 6.10.58.

What I would like to improve: Milkyway@home processes are running in *32 bit mode, and in SSE2. Is there a way that I could have these processes run in X64, and SSE4a or SSE3? The BOINC client itself is running fine as an X64 process.

Thanks again in advance! =)



ID: 46251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BadBike2

Send message
Joined: 11 Feb 11
Posts: 5
Credit: 21,146,417
RAC: 0
Message 46274 - Posted: 18 Feb 2011, 0:12:14 UTC

I understand that there are many helpful guides on this website. I just want to know if there is a safe way to optimize Milkyway@home for SSE4 or SSE3, without affecting the quality of data received on the Milkyway@home end?
ID: 46274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cartoonman

Send message
Joined: 10 Dec 09
Posts: 18
Credit: 9,456,111
RAC: 0
Message 46276 - Posted: 18 Feb 2011, 1:00:41 UTC
Last modified: 18 Feb 2011, 1:06:52 UTC

I'm not entirely sure, on terms of 32 vs 64 bit performance regarding this project, but what i can tell you is that SSE's are instruction sets, not necessarily optimizations in themselves (although the instructions themselves can serve to execute an action much faster than without)
I don't believe that our wu's would have any benefit from SSE 3 or 4/4a, as the instruction sets would have to correlate with what the application is trying to achieve, in order to see a noticeable change in WU completion. Just adding them won't optimize anything, unless instructions only found in sse3 or sse4/4a greatly speeds up a necessary calculation.
Compared to SSE2, SSE3 and SSE4 are merely minor updates to the SSE instruction set, and thus don't have as much of an impact on speed as compared to with and without SSE2.
ID: 46276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 46278 - Posted: 18 Feb 2011, 3:16:46 UTC
Last modified: 18 Feb 2011, 3:17:43 UTC

For non N-Body apps, it's already been done. See this thread.
ID: 46278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 46279 - Posted: 18 Feb 2011, 6:21:31 UTC - in response to Message 46278.  

For non N-Body apps, it's already been done. See this thread.


Not any more, the CPU apps have all been depreciated.
ID: 46279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 46285 - Posted: 18 Feb 2011, 23:26:32 UTC - in response to Message 46279.  

For non N-Body apps, it's already been done. See this thread.


Not any more, the CPU apps have all been depreciated.

Hmm, I see that now. Dang. I thought the optimised CPU apps for the 'standard' MW app was still valid?
ID: 46285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 46286 - Posted: 19 Feb 2011, 0:01:08 UTC

They were all giving invalid results after they updated the core apps.
ID: 46286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BadBike2

Send message
Joined: 11 Feb 11
Posts: 5
Credit: 21,146,417
RAC: 0
Message 46293 - Posted: 19 Feb 2011, 5:36:28 UTC

Thanks for the information gentleman. I have decided to keep the official software; I can't risk compromising the computation results. I'll hold out until SSE3 or SSE4a comes integrated in official releases, if ever it does.

I have one more question though. My GPU is computing de_seperation tasks. Are these tasks gpu-only, or can the CPU process these too?
ID: 46293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 46298 - Posted: 19 Feb 2011, 12:56:33 UTC

Your task manager screenshot shows, you were running
3 nbody WUs on CPU
1 separation WU on CPU
1 separation WU on GPU
ID: 46298 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 46301 - Posted: 19 Feb 2011, 15:44:04 UTC - in response to Message 46251.  

...has support for SSE1,2,3, and 4A. My video processor is an HD4850 that is working fine with BOINC at the moment. My BOINC client version is X64, 6.10.58.

What I would like to improve: Milkyway@home processes are running in *32 bit mode, and in SSE2. Is there a way that I could have these processes run in X64, and SSE4a or SSE3? The BOINC client itself is running fine as an X64 process.

These instruction sets require either the compiler to be able to find ways to use them, or to hand-write them. Current compilers usually aren't particularly great at finding ways to use all of the special instructions potentially available to them. I see how some stuff in SSE3 could help if done by hand (I'm looking at haddpd and hsubpd), but the others I don't think are particularly useful. The jump from using the antique x87 FPU to SSE2 is huge, which is part of why by default you should get SSE2 applications (The N-body requires SSE2 since x87 it's a pain / in some cases impossible to get consistent results from it). I added to the build system a while ago an easy way to rebuild everything with every SSE level, and building with SSE3, SSE4* etc. didn't really show any improvement with GCC or clang.
ID: 46301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BadBike2

Send message
Joined: 11 Feb 11
Posts: 5
Credit: 21,146,417
RAC: 0
Message 46321 - Posted: 20 Feb 2011, 23:34:45 UTC - in response to Message 46301.  

Great, thank you very much!
ID: 46321 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Need specific instructions on how to optimize AMD Phenom for SSEa, and Milkyway@home for X64.

©2024 Astroinformatics Group