ATI application updated to 0.60

Author	Message
Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 731,458,977 RAC: 33,451	Message 48220 - Posted: 25 Apr 2011, 14:07:57 UTC - in response to Message 48219. Busy waiting is constantly checking if the GPU is done, which causes 100% CPU usage but is least likely to waste extra time waiting after a packet is done one the GPU. Interesting. Used the busy waiting parameter (b-1) on all my machines with v.23 and it only increased CPU usage by 1-2 seconds/WU while significantly increasing GPU usage (to 99%). Something must be different... BTW, not seeing GUI lag here in v.62 running 2 WUs/GPU on even the 2 GPU boxes. ID: 48220 · Rating: 0 · rate: / Reply Quote

Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 48223 - Posted: 25 Apr 2011, 17:16:03 UTC - in response to Message 48220. Busy waiting is constantly checking if the GPU is done, which causes 100% CPU usage but is least likely to waste extra time waiting after a packet is done one the GPU. Interesting. Used the busy waiting parameter (b-1) on all my machines with v.23 and it only increased CPU usage by 1-2 seconds/WU while significantly increasing GPU usage (to 99%). Something must be different... BTW, not seeing GUI lag here in v.62 running 2 WUs/GPU on even the 2 GPU boxes. I think the old one used the time estimate as an initial waiting period or something to that effect, which I'll have in the next release. ID: 48223 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0	Message 48265 - Posted: 26 Apr 2011, 21:29:51 UTC - in response to Message 48223. I think the old one used the time estimate as an initial waiting period Yes, for 0.23 it controlled how the GPU was polled after the initial estimate had passed. MrS Scanning for our furry friends since Jan 2002 ID: 48265 · Rating: 0 · rate: / Reply Quote

[TA]Assimilator1 Send message Joined: 22 Jan 11 Posts: 377 Credit: 64,707,164 RAC: 0	Message 48333 - Posted: 28 Apr 2011, 19:53:51 UTC Hi Matt, thanks for the replie :). So what's the GPU polling about then? Hopefully then if I keep increasing the target freq it will at some point stop the lag, or at least near enough. ID: 48333 · Rating: 0 · rate: / Reply Quote

[TA]Assimilator1 Send message Joined: 22 Jan 11 Posts: 377 Credit: 64,707,164 RAC: 0	Message 48338 - Posted: 29 Apr 2011, 11:43:47 UTC OK so I've tried higher GPU target frequencies, & here are my notes. 240 - 97-98% load - tiny improvement in lag 300 - as above 600 - 96-99% load, 70C - much better, but still very laggy 800 - 97-99% load, 71C - no further improvement 1200 - 96-99% load, 70C - no further improvement Something is obviously wrong here, surely at 1200 MW should of just ground to a halt?? ID: 48338 · Rating: 0 · rate: / Reply Quote

Cannibal Corpse Send message Joined: 21 Mar 09 Posts: 25 Credit: 11,410,869 RAC: 0	Message 48339 - Posted: 29 Apr 2011, 13:50:46 UTC Having two dedicated rigs to gpu'ing M@H,no surfing or games just crunching, my AMD P-IIx4 has a sever lag (cpu'ing between D-W FP and Prime) but am I to understand it dose not hinder crunching? Just the anoying lag? Which dose not really bother me, on the other hand my i7 preforms just fine. It also sounds like I can crunch 2/3 wu on my 5870's? Do I need to insert a dumby plug, or is that the app_info stated above in an earler post that I need, or both? ID: 48339 · Rating: 0 · rate: / Reply Quote

Sunny129 Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0	Message 48340 - Posted: 29 Apr 2011, 14:29:15 UTC - in response to Message 48339. Last modified: 29 Apr 2011, 15:15:16 UTC It also sounds like I can crunch 2/3 wu on my 5870's? Do I need to insert a dummy plug, or is that the app_info stated above in an earlier post that I need, or both? you can try to run multiple WU's simultaneously on your GPUs...i've done 2 successfully, but i haven't tried 3 yet. i probably won't even bother b/c i can't imagine it'll improve my efficiency beyond what it is now. if you intend on using both 5870 GPUs, then one 5870 will require a dummy plug, assuming you're using the other 5870 to run your display. you'll also need to "extend the desktop" to the secondary 5870 w/ the dummy plug by right-clicking on the desktop and playing with the options under "properties." if you have a motherboard with an integrated GPU (IGP), and are using it to run your display, then both of your 5870's will need dummy plugs, and you'll have to "extend the desktop" to both 5870's since they'd both be functioning as secondary GPUs in this case. i think an app_info.xml file is only necessary for folks who are either running some sort of optimized MW@H application, want to run more than 1 WU simultaneously, or specifically want to play with the parameters that affect GUI lag. i need an app_info.xml file for my SETI@Home project b/c i use optimized apps. but with MW@H, i don't "need" one b/c its GPU app is a standard app, not an optimized app (unlike SETI@Home GPU apps). i do use a MW@H app_info.xml file 1) to allow 2 WU's to crunch simultaneously, and 2) just in case i ever have to mess with the GUI lag parameters, but i haven't yet had to do that. ID: 48340 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 48341 - Posted: 29 Apr 2011, 15:09:34 UTC - in response to Message 48339. Last modified: 29 Apr 2011, 15:17:42 UTC Having two dedicated rigs to gpu'ing M@H,no surfing or games just crunching, my AMD P-IIx4 has a sever lag (cpu'ing between D-W FP and Prime) but am I to understand it dose not hinder crunching? Just the anoying lag? Which dose not really bother me, on the other hand my i7 preforms just fine. It also sounds like I can crunch 2/3 wu on my 5870's? Do I need to insert a dumby plug, or is that the app_info stated above in an earler post that I need, or both? You sometimes need a dummy plug if you have two cards in a PC - depends which way the wind is blowing ... :) Some find they dont need it. You wont need a dummy plug as you are running just one card per machine. If you want to run 2 WUs per GPU - probably not a good idea to run three on one - then you need to run a app_info.xml file placed in the Project Data Directory. Suggested app_info file to use: <app_info> <app> <name>milkyway</name> </app> <file_info> <name>milkyway_0.62_windows_intelx86__ati14.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>62</version_num> <plan_class>ati14</plan_class> <flops>1.0e11</flops> <avg_ncpus>0.05</avg_ncpus> <max_ncpus>1</max_ncpus> <coproc> <type>ATI</type> <count>0.5</count> </coproc> <file_ref> <file_name>milkyway_0.62_windows_intelx86__ati14.exe</file_name> <main_program/> </file_ref> </app_version> </app_info> There are other things you can put in but as the other stuff is defaults for you there's little point. The bit that does the 2xWUs on the GPU is: <count>0.5</count>. If you were to do - say - four on a GPU (decidedly bad idea!) it would be <count>0.25</count>. You can add statements for <cmd line> to it which tweek various things, one being screen lag you mentioned, but dont rush to do that, if on first use you are happy with it, it aint broke so dont fix it :) Make it inside Notepad to prevent invisible charactors being added, and save it as app_info.xml inside the MW Project Data Directory. When you run an app_info file you will find the BOINC Client shows your applications running as "Anonymous Platform". Bare in mind when using an app_info file that its your responsibility to amend it when new versions of the application are released aka 0.63 and above, else from that point on all your WUs will fall over. Regards Zy ID: 48341 · Rating: 0 · rate: / Reply Quote

Cannibal Corpse Send message Joined: 21 Mar 09 Posts: 25 Credit: 11,410,869 RAC: 0	Message 48362 - Posted: 30 Apr 2011, 8:38:42 UTC - in response to Message 48341. Last modified: 30 Apr 2011, 8:41:02 UTC Very cool Zydor!! and thx! I run 1 1/2 wu per 2 min, versus 1 per 2 min on my AMD PIIX4 w/5870. But I still have the 3 to 5 second dead time run when complete..but Its all really the same. I have my gpu throtled to 140 deg f. There are valid and run good with PPS Sieve and Folkker-D wave cpu, fixin to convert my i7. Will this nagate an opp_app? I have looked at Boinc wiki and various places to learn how and what to put in .xml files to creat, but can you inlighten me on that cool stuff? As you mentioned, when a new app comes out I just have to edit the name of the app and version, correct? Stuff like that is what I want to learn...Help me Obe Won Zydor..your my only hope... :?) Any info or direction from you or any one, I will be greatfull!.. ID: 48362 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0	Message 48363 - Posted: 30 Apr 2011, 12:11:21 UTC - in response to Message 48338. OK so I've tried higher GPU target frequencies, & here are my notes. 240 - 97-98% load - tiny improvement in lag 300 - as above 600 - 96-99% load, 70C - much better, but still very laggy 800 - 97-99% load, 71C - no further improvement 1200 - 96-99% load, 70C - no further improvement Something is obviously wrong here, surely at 1200 MW should of just ground to a halt?? That's really odd. On 2 machines (one HD4870, the other one HD6850@6870) running Win 7 Aero I made the GUI much more responsive by going to target frequency 60, running 1 WU at a time. The difference is clearly felt and at 60 I consider the lag negligible. Movies are not totally smooth, though. However, when I run 2 WUs at a time it takes about 24h of run time to put the machine into a strange state, where most GUI effects (like pointing over an open program in the task bar) would cause a pause of ~30s. The GPU continues to crunch, but no screen update happens. That's why I'm running only 1 WU at a time now. Are you referring to this lag? MrS Scanning for our furry friends since Jan 2002 ID: 48363 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0	Message 48364 - Posted: 30 Apr 2011, 12:18:25 UTC Matt, how's the work on the next version progressing? The other day I just had this random thought again that the idle GPU time between WUs could easily be reduced by exploiting the fact that several computations / parameter sets / streams / original WUs (not sure how to call them properly..) are packed into each WU by now. After one of these is finished the CPU thread could send the next work to the GPU and then do the final likelihood calculation for this result, while the GPU is still busy with the next one. Thereby at the end of the WU only the final calculation for the last stream would have to be calculated, rather than all of them. At 4 streams per WU idle GPU time should be reduced to 1/4th. Regards, MrS Scanning for our furry friends since Jan 2002 ID: 48364 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 48365 - Posted: 30 Apr 2011, 12:41:57 UTC - in response to Message 48362. Last modified: 30 Apr 2011, 12:54:38 UTC ..Help me Obe Won Zydor..your my only hope... :?)... I that case you're doomed, the dark side took me long ago :) This is going to be a bit long but you did ask ..... so I put it inside a downloadable file to avoid a huge post blocking the thread: Click here for app_info outline It will look daunting at first sight, but take the time to work through it, you'll find its logical & easy to use. If you have issues using one, just paste it into a post, always someone around who will check its contents for you. The full formal instructions are at: BOINC App_Info.xml Formal Syntax .... But I still have the 3 to 5 second dead time run when complete..but Its all really the same..... Correct ... there is a myth spinning around that 2 WUs on a GPU will overcome the slowing of the last few seconds - it will not. At present the last few seconds use the CPU to calculate the final bit of the WU, it takes 4-8 seconds depending on your machine. Nothing will ever change that until Matt gets the time to write some code to do that bit on the GPU. Where some time can be saved by running 2 WUs on a GPU is the load/unload time when the WU gets sent from CPU to GPU, might save 1 or 2 seconds per GPU, and to do that you need an app_info file to tell BOINC you want 2 WUs per gpu. If your Graphics card tosses out - say - 40 WUs an hour, and - say - 2 seconds are saved per WU, thats 80 seconds perhour, roughly 5,000 credits a day. Is that worth it for you? Personal choice. Results will differ from person to person depending on the GPU type, speed of CPU and speed of the PCI-E bus (latter carries the WU from CPU to GPU). Pays your money, takes your choice as they say. I have my gpu throtled to 140 deg f TThrottle is a nifty utility, but itsd there as a last line of defense. It should not be used as a sledgehammer. If you are overheating, find out why, improve airflow, declock cpu/gpu etc. When back to "normal" and running fine, then set TThrottle to - say - 10 degrees above normal, and your done. There are valid and run good with PPS Sieve and Folkker-D wave cpu, fixin to convert my i7. Will this nagate an opp_app? No. You place an app_info.xml file into the Project Data Directory that it refers to. Therefore you can have several, one each in the Projects you run. The beast will apply the app_info instructions only to the Project its designed for. Its hardware independent, so the fact that you are sorting out an i7 is not relevant. As you mentioned, when a new app comes out I just have to edit the name of the app and version, correct? Yes, maybe some other stuff as well, just re read the attatchment I made re app_info files. You either make a new one, or take an old one and change the relevant parts. After creating the new one, you put it into the Project Data Directory, and your done. I think that covers it for the moment. If there is anything else please yell - no problem Regards Zy ID: 48365 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0	Message 48366 - Posted: 30 Apr 2011, 14:35:53 UTC - in response to Message 48365. there is a myth spinning around that 2 WUs on a GPU will overcome the slowing of the last few seconds - it will not. At present the last few seconds use the CPU to calculate the final bit of the WU, it takes 4-8 seconds depending on your machine. Nothing will ever change that until Matt gets the time to write some code to do that bit on the GPU. Where some time can be saved by running 2 WUs on a GPU is the load/unload time when the WU gets sent from CPU to GPU, might save 1 or 2 seconds per GPU Come on Zy, you can do better than this! It's really simple: if one of the two WUs finishes the CPU still has to do the final calculation. However, if another WU is running on the GPU then the GPU is not idle. It just crunches the other WU full time. Let it run for a while and your average GPU utilization hardly drops at all upon WU switches. Putting in some precise numbers: I'm running an oc'ed & unlocked HD6950 with a C2Q at 2.88 GHz. So it's a fairly powerful GPU with a moderately fast CPU - probably about the worst case for CPU overhead. Typically a 267.2 credit WU takes 94.4 s, 8.0 s of which being CPU time -> 86.4 s pure GPU time. That's 24 * 3600 * 267.2 / 94.4 = 244.5k RAC. Running 2 WUs at a time I'd finish 2 of them in 2 * 86.4 s, which would yield 24 * 3600 * (2267.2) / (286.4) = 267.2k RAC. That's still "only" a difference of 9.3%, but works out to be 22700 credits a day. I'm not saying everyone should use an app_info to run 2 WUs in parallel, but just calling it insignificant does not do it justice. Regards, MrS Scanning for our furry friends since Jan 2002 ID: 48366 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 48367 - Posted: 30 Apr 2011, 16:12:50 UTC - in response to Message 48366. Last modified: 30 Apr 2011, 16:14:06 UTC Ok, as its you I'll give it a whirl, see what happens :) Currently for that type of WU I'm running at 122 secs to completion, and a further four seconds for the additional CPU bit, total recorded as 126 secs, on each of four GPUs. The Beastie is a 1090T @3.7Ghz, 2x5970s @790/500, 16Gb RAM, and two CPU cores kept uncommitted (I have 4 PG 5oBs running on the other four). I'll just put in a clean app_info and run 8 x WU, two per core, see what happens. Be a while, need to clear the 48WU cache first. Drum Roll Mystro :) Regards Zy ID: 48367 · Rating: 0 · rate: / Reply Quote

Sunny129 Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0	Message 48368 - Posted: 30 Apr 2011, 16:24:46 UTC ^ looking forward to the results... i've been running 2 WU's at a time for the past few days, and my increase in PPD seems to be more inline with Zydor's estimate of 5000...perhaps i need to give it more time and see if average productivity increases more. ID: 48368 · Rating: 0 · rate: / Reply Quote

Matt Arsenault Volunteer moderator Project developer Project tester Project scientist Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0	Message 48369 - Posted: 30 Apr 2011, 16:27:45 UTC - in response to Message 48364. Matt, how's the work on the next version progressing? Kind of slow. I have a bunch of stuff I need to do since it's the end of the semester. ID: 48369 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 48373 - Posted: 30 Apr 2011, 17:12:28 UTC - in response to Message 48366. Last modified: 30 Apr 2011, 17:25:00 UTC Now derr's a 'ting - dont you hate it when ET can phone home for help :) Looking fine so far, the timings are a bit all over the place, but thats because the cards really are running full stretch with 8 WUs running. Its looking like about 10-12 seconds a WU, that definitely would be chunky :) [quote] ..... It's really simple: if one of the two WUs finishes the CPU still has to do the final calculation. However, if another WU is running on the GPU then the GPU is not idle. It just crunches the other WU full time..... Thats the bit that I missed - I expected 2 - 4 seconds just from the load/unload having done this before a while back prior to Matts version. What you picked up, and I failed to, was the fact that while the CPU bit is going ahead, the GPU is free to fully service the other one. Well Spotted :) For now assume a 10 second saving, it'll be one or two either side of that, @118secs running per WU at present. That'll workout at an additional 50,000 all up total for 4 GPUs. Could well be up to (say) 60,000 for 4 GPUs once its settled and I've tweeked a couple of things, I'm going to play with the memory a bit, going to need the bandwidth a bit more from what I'm seeing with eight running at once. Temperatures are not too bad, went up a degree or two on GPU but would expect that. I'll fiddle a bit and come back with something more definitive once its settled. Yup, give ET a cigar :) Regards Zy ID: 48373 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 48374 - Posted: 30 Apr 2011, 17:58:37 UTC Settled at 770/300 for now. The VRMs were getting hammered with 8 WUs going through the cards, and temperatures on the VRMs went up a lot. GPU temps are fine, if anything GPU temps are a little down. Looking a bit less saving than first thought, but still chunky, probably settle at around 40,000 to 45,000 additional credits a day as an all up total for both 5970s in the 1090T machine - around 11,000+ per GPU. I'll play around with the second machine shortly, that only has 1 x 5850, so lets see what happens with one GPU. Regards Zy ID: 48374 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 48377 - Posted: 30 Apr 2011, 19:02:32 UTC I ended up pulling out the app_info for the 1090T/2x5970s. The VRM temperatures were nuts, the extra load caused by 8 WUs going through at once was too much. Anyone reading this with 5870s, dont be put off. Yes a 5970 is essentially 2x5870s on one card, but there was a design fault with 5970s that mispositioned the VRMS such that they heat up abnormally, and the latter needs watching. Putting this many WUs through together with the design fault, means no good for me. However, run a 5870 with two and you should be fine. It would be about 12,000-15,000 extra credits depending how high its o/c. Just watch the memory (VRMs) temperature. They will not be affected as much as 5970s, but its there to keep an eye on. Setting up the 5850 now Regards Zy ID: 48377 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 48379 - Posted: 30 Apr 2011, 19:28:27 UTC 5850 was much more stable, took them easily. Phenom II 940 @3.3Ghz 8Gb RAM 775/300 I have a standard edition 5850, not a black edition, so with the latter you might tweek a little more out of it. For mine, it is about 5 seconds a card on average, works out at a gain of around 7,500 credits, which is decent enough to take the trouble to run an app_info when compared against a 5850 normal output. Temperatures looked fine, maybe set for manual fan and turn up to (say) 65, but that will depend on individual case cooling. Temps not an issue for 5850, easily controlled. Definitely leaving it on for the 5850. Unfortunately the 5970 was a no go for me. Regards Zy ID: 48379 · Rating: 0 · rate: / Reply Quote