Welcome to MilkyWay@home

MW@home 0.82 (ati14) performance


Advanced search

Message boards : Number crunching : MW@home 0.82 (ati14) performance
Message board moderation

To post messages, you must log in.

AuthorMessage
VictordeHollander

Send message
Joined: 9 Nov 10
Posts: 19
Credit: 71,077,081
RAC: 0
50 million credit badge9 year member badge
Message 49311 - Posted: 14 Jun 2011, 21:53:22 UTC

The performance of the new ati separation seems to have gone down from the last build. The ati13 utilised 99% of my AMD 6950, ati14 now only about 70%. This leads to longer running times of the WUs (+30 sec). Can anybody explain these changes?
ID: 49311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge9 year member badge
Message 49313 - Posted: 14 Jun 2011, 23:22:44 UTC - in response to Message 49311.  

Are you talking about from 0.62 -> 0.82? Or from 0.23 to now? The 'last build' part makes me think from 0.62, but the ati13 part makes me think from 0.23
ID: 49313 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VictordeHollander

Send message
Joined: 9 Nov 10
Posts: 19
Credit: 71,077,081
RAC: 0
50 million credit badge9 year member badge
Message 49340 - Posted: 15 Jun 2011, 17:50:03 UTC

I was talking about 0.62 -> 0.82. Two weeks ago i still had 99% utilisation, now with no changes in hardware or software only 70%.
ID: 49340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
aad

Send message
Joined: 30 Mar 09
Posts: 63
Credit: 481,863,811
RAC: 211,719
300 million credit badge10 year member badge
Message 49341 - Posted: 15 Jun 2011, 17:59:02 UTC

A picture may help here;

In this case a 'fix10' wu on a HD6970 in 93 secs.
ID: 49341 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 49344 - Posted: 15 Jun 2011, 20:35:36 UTC

I found that if I leave one logical core free on an i7 I get 99% utilization again and nice run times. However, at full cpu load I get ~80+/-5%. The higher values were obtained with frequency 30 and priority "higher than normal", while the lower numbers where at frequency 60.

The higher frequency didn't improve anything - the screen is quite smooth even at 30 Hz.

MrS
Scanning for our furry friends since Jan 2002
ID: 49344 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 49347 - Posted: 15 Jun 2011, 20:55:46 UTC

I managed to grab the std output of 2 (hopefully) rather similar tasks. The interesting part is that the old (0.62) app underestimated completion time, whereas the new one (0.82) overestimates it by just the same amount. Overall a speed increase is seen due to the improved likelihood calculation.

0.62 wrote:

<core_client_version>6.12.26</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkywayathome_client separation 0.62 Windows x86 double CAL++ </search_application>
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_CAYMAN
Revision: 1
CAL Version: 1.4.1385
Engine clock: 900 Mhz
Memory clock: 625 Mhz
GPU RAM: 2048
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_TRUE
Number SIMD: 24
Number shader engines: 2
Pitch alignment: 256
Surface alignment: 4096
Max size 2D: { 16384, 16384 }

Estimated iteration time 98.708912 ms
Target frequency 60.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 5 chunks
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using { 1, 5 } chunk(s) of size { 1400, 320 }
Integration time = 67.226855 s, average per iteration = 105.041961 ms
Integral 0 time = 68.930041 s
Estimated iteration time 24.677228 ms
Target frequency 60.000000 Hz, polling mode 1, using responsiveness factor of 1.000000
Dividing into 1 chunks
Integration range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 }
Using { 1, 1 } chunk(s) of size { 1400, 400 }
Integration time = 16.644361 s, average per iteration = 26.006814 ms
Integral 1 time = 17.272463 s
Likelihood time = 6.127880 s
<background_integral> 0.000328657628885 </background_integral>
<stream_integral> 109.601905408431290 1949.937780539551300 232.263944079244770 </stream_integral>
<background_likelihood> -3.050879407064911 </background_likelihood>
<stream_only_likelihood> -4.831767611542396 -4.439308159704498 -4.636905522242733 </stream_only_likelihood>
<search_likelihood> -2.824946951341252 </search_likelihood>
19:48:01 (2692): called boinc_finish


0.82 wrote:

<core_client_version>6.12.26</core_client_version>
<![CDATA[
<stderr_txt>
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Found 1 CAL devices
Chose device 0

Device target: CAL_TARGET_CAYMAN
Revision: 1
CAL Version: 1.4.1385
Engine clock: 900 Mhz
Memory clock: 625 Mhz
GPU RAM: 2048
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_TRUE
Number SIMD: 24
Number shader engines: 2
Pitch alignment: 256
Surface alignment: 4096
Max size 2D: { 16384, 16384 }

Estimated iteration time 114.750579 ms
Target frequency 30.000000 Hz, polling mode 1
Dividing into 3 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Using 3 chunk(s) with sizes: 528 528 544
Integration time = 66.754351 s, average per iteration = 104.303674 ms
Integral 0 time = 67.639817 s
Estimated iteration time 28.687645 ms
Target frequency 30.000000 Hz, polling mode 1
Dividing into 1 chunks, initially sleeping for 0 ms
Integration range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 }
Using 1 chunk(s) with sizes: 400
Integration time = 16.836535 s, average per iteration = 26.307086 ms
Integral 1 time = 17.073187 s
Likelihood time = 2.197505 s (yeha!)
<background_integral> 0.000327968102825 </background_integral>
<stream_integral> 109.642392926183320 1949.951009314928100 231.478490107568520 </stream_integral>
<background_likelihood> -3.050736530637475 </background_likelihood>
<stream_only_likelihood> -4.837224658148022 -4.439110900617076 -4.637413792940732 </stream_only_likelihood>
<search_likelihood> -2.824947035720017 </search_likelihood>
<search_application> milkywayathome_client separation 0.82 Windows x86_64 double CAL++ </search_application>
04:11:26 (1428): called boinc_finish


MrS
Scanning for our furry friends since Jan 2002
ID: 49347 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VictordeHollander

Send message
Joined: 9 Nov 10
Posts: 19
Credit: 71,077,081
RAC: 0
50 million credit badge9 year member badge
Message 49349 - Posted: 15 Jun 2011, 22:37:37 UTC - in response to Message 49344.  

I found that if I leave one logical core free on an i7 I get 99% utilization again and nice run times. However, at full cpu load I get ~80+/-5%. The higher values were obtained with frequency 30 and priority "higher than normal", while the lower numbers where at frequency 60.

The higher frequency didn't improve anything - the screen is quite smooth even at 30 Hz.

MrS

Thanks for the quick reactions!

Leaving one logic core free has solved the utilization problem for me. Still it is a pity to leave one core doing nothing. Running times are now also back to normal, maybe even 1-2 sec faster.
ID: 49349 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileskgiven
Avatar

Send message
Joined: 22 Dec 07
Posts: 35
Credit: 18,433,204
RAC: 0
10 million credit badge10 year member badge
Message 49546 - Posted: 24 Jun 2011, 19:54:03 UTC - in response to Message 49349.  

I doubt that your 1 lost logical core is "doing nothing".
Some of it will be used by the system, some by MW and some by other tasks.
Typically, if you free a CPU core/thread you see a slight improvement from many types of CPU tasks, so it's rarely a complete loss of a CPU thread.
If you are using 7 from 8 threads then I would expect you to see overall CPU usage above 87.5% - the difference is what you are not losing. What does task manager say?
ID: 49546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 49590 - Posted: 26 Jun 2011, 11:47:45 UTC - in response to Message 49546.  

What does task manager say?


For me it's 87% :)
But then I don't like to run a ton of programs I don't need anyway.

MrS
Scanning for our furry friends since Jan 2002
ID: 49590 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 49593 - Posted: 26 Jun 2011, 14:32:40 UTC - in response to Message 49590.  

What does task manager say?

For me it's 87% :)
But then I don't like to run a ton of programs I don't need anyway. MrS

99% on all machines when running 2x WU/GPU.
ID: 49593 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileskgiven
Avatar

Send message
Joined: 22 Dec 07
Posts: 35
Credit: 18,433,204
RAC: 0
10 million credit badge10 year member badge
Message 49605 - Posted: 26 Jun 2011, 20:07:43 UTC - in response to Message 49593.  
Last modified: 26 Jun 2011, 20:20:13 UTC

I have an HD 5850 supported by a 2.13GHz Core 2 duo (6400). This thing uses about 40% CPU just to open Task Manager! When minimized to the tray it's still using around 10 or 20%. Clean XP install, and nothing else other than FF on the system (not open at the time). No CPU tasks running on that rig either - probably not worth the bother.
Point being, CPU usage is very much dependent on the CPU.
XP tends to have about a 3% overhead. Not sure about Vista and W7 but I would expect them to be at least 3%. As for the various Linux flavors, it can vary quite a bit.

PS. GPU utilization is 98%, except for the few seconds it takes to load a new task.
ID: 49605 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
100 million credit badge9 year member badge
Message 49607 - Posted: 26 Jun 2011, 21:02:05 UTC

PS. GPU utilization is 98%, except for the few seconds it takes to load a new task.

When you run 2 WU's concurrently, you can mask this 5 - 7 Sec. also. That will bring you 4 - 5 % more credits.

You need a App_info.xml file in your project data DIR.

It works as follows:
wu 1 and wu 2 need 50% GPU each. When wu 1 is ready and does the CPU stuff, WU 2 get's 100 % GPU. Then the next WU is loaded, and when wu 2 finish the new wu 1 get's 100% GPU for that time - As result the GPU ist most of the time 99 - 100 % in use.
ID: 49607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWerkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 128,778,929
RAC: 74,695
100 million credit badge10 year member badge
Message 49612 - Posted: 26 Jun 2011, 23:12:35 UTC - in response to Message 49607.  

PS. GPU utilization is 98%, except for the few seconds it takes to load a new task.

When you run 2 WU's concurrently, you can mask this 5 - 7 Sec. also. That will bring you 4 - 5 % more credits.


4-5% more credit is a bit optimistic, but basically right.

BTW, I never have seen a low GPU-load on my both systems, my mainsys runs 8 CPU's at 100% and 2 GPU's at 98-99%
Win7-64, latest driver, latest BM.
ID: 49612 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
100 million credit badge9 year member badge
Message 49617 - Posted: 27 Jun 2011, 7:31:31 UTC

4-5% more credit is a bit optimistic, but basically right.


You are right, a bit optimistic btw. I made this experince with the older 0.62 app and i have a rather old single CPU in my System so the CPU Delay was longer.

OT: Viele chruncher Grüsse aus Aderklaa :-)
ID: 49617 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1387
Credit: 186,726,858
RAC: 0
100 million credit badge10 year member badge
Message 49620 - Posted: 27 Jun 2011, 9:18:20 UTC

For those considering 100% GPU utilisation, see my post here and Zydors reply.

GPU %
ID: 49620 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profiledskagcommunity
Avatar

Send message
Joined: 26 Feb 11
Posts: 170
Credit: 183,085,176
RAC: 0
100 million credit badge8 year member badge
Message 49623 - Posted: 27 Jun 2011, 9:50:54 UTC - in response to Message 49617.  

4-5% more credit is a bit optimistic, but basically right.


You are right, a bit optimistic btw. I made this experince with the older 0.62 app and i have a rather old single CPU in my System so the CPU Delay was longer.

OT: Viele chruncher Grüsse aus Aderklaa :-)


OT2: lebst du in Wien? wenn ja grüße aus der brigittenau :P wenn ich könnte würde ich den donaukanal daneben für eine megawasserkühlung verwenden :D
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 49623 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
100 million credit badge9 year member badge
Message 49635 - Posted: 27 Jun 2011, 18:34:02 UTC

@dskagcommunity:
hast PN
ID: 49635 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 49638 - Posted: 27 Jun 2011, 20:26:51 UTC

Some (short) time ago I tested GPU utilization and found that I had to leave one CPU core free to get 98 - 99%. However, now I just went from 7 to 8 CPU threads and I'm still at normal GPU performance, not at ~80%.

The difference: back then I was running a ful load of Einstein@Home (global correlations search), now I've got a mix of WCG, Yoyo and Einstein. Einstein is.. quite successful at making optimum use of the CPU. Seems like that's leaving too little to serve the GPU, whereas other projects are *nicer*.

MrS
Scanning for our furry friends since Jan 2002
ID: 49638 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileskgiven
Avatar

Send message
Joined: 22 Dec 07
Posts: 35
Credit: 18,433,204
RAC: 0
10 million credit badge10 year member badge
Message 49810 - Posted: 2 Jul 2011, 20:17:16 UTC - in response to Message 49638.  

Stuck in a nicer CPU (Q8400) and now using 3 cores elsewhere. Sits at 77% CPU utilization and 98% GPU utilization.
That's with an old GT240 and attached to Einstein. Alas the GT240 keeps downclocking, hence it's not being used elsewhere.
ID: 49810 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilearkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
50 million credit badge10 year member badge
Message 49814 - Posted: 2 Jul 2011, 23:08:57 UTC - in response to Message 49810.  

Stuck in a nicer CPU (Q8400) and now using 3 cores elsewhere. Sits at 77% CPU utilization and 98% GPU utilization.
That's with an old GT240 and attached to Einstein. Alas the GT240 keeps downclocking, hence it's not being used elsewhere.


Drop your drivers down to 266.58 and the downclock issue should go away.
ID: 49814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : MW@home 0.82 (ati14) performance

©2019 Astroinformatics Group