Is there a way to force "trip" the next MW cycle in .36?

Author	Message
JerWA Send message Joined: 22 Jun 09 Posts: 52 Credit: 74,110,876 RAC: 0	Message 26388 - Posted: 24 Jun 2009, 17:42:27 UTC Last modified: 24 Jun 2009, 18:26:07 UTC I'm still bashing my setup here trying to get everything to play nicely together but not having much luck. I was running the .21 manager and thought all was going well until I got up this morning to find 15 instances of MW trying to run at once (with absolutely no changes from last night when it was only running between 1 and 3) and causing the BOINC manager to display incorrect info (showed ready to report units that were already gone) and crash when I tried to close it. I'm thinking that's a manager bug, not a MW one. So I upgraded to .36 again and it definitely behaves differently. On the plus side it seems way more stable and predictable than .21 for getting work. Unfortunately it seems to request much less work, and each time it runs dry and requests the next batch it doesn't start it. It's like it thinks MW is a CPU app and makes it wait for one of the others to hit it's timer, then trips the first MW unit and goes back to running 4 CPU apps as it should. As long as it's got MW work queued it will keep it running along side the other apps, starting up a new WU every time it finishes one, but each download batch ends up waiting again. So my question is: is there a way to "trip" that next cycle somehow? Or, is there a manager we like more than .21? Update: Tried 6.2.19 and it was even worse than 6.6.36 because it would run just 1 WU and then idle MW, wouldn't even make it through the whole batch. Trying 6.4.7 now and it seems like a happy medium so far. It's keeping MW running along with the 4 CPU projects and keeping the queue full. It does panic occasionally and high priority MW units but nothing is competing for GPU time so I don't care. I'll update after it's had some time to run and mess up the debts. It's also not incrementing the timer when a unit is "running" but not getting any GPU time yet because I'm on n1. 6.6.21 was incrementing all timers, so when it flipped out and had 13 WUs "running" they were showing 30-40 minutes to complete, which was messing up the prediction and queuing (because BOINC thought they were taking 30+ minutes each). It's also more "chatty" than .6.36 and .6.21 were, in that it's updating the project every 2-3 WUs and getting more, which I like but is probably adding overhead to the way 6.6.x was handling it by updating in large batches. ID: 26388 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 26401 - Posted: 24 Jun 2009, 21:51:06 UTC I was just thinking a similar thought. This morning I've noticed that my quady with the 2 GPUs has a lower RAC than it did last night. So my question for the GPU crunchers is what set up works well for you? BOINC version. MW resource share Other projects being run and their resource share. Network connect preferences. Work buffer prefences. app_info.xml info. ID: 26401 · Rating: 0 · rate: / Reply Quote

etrecords Send message Joined: 15 May 08 Posts: 7 Credit: 126,077,128 RAC: 0	Message 26435 - Posted: 25 Jun 2009, 7:38:10 UTC I have had simular problem with my I7 system from the start of using the GPU. My solution is to run only Climate Prediction and do the managment of the Climate tasks myself. I keep 7 tasks for Climate (When there are more downloaded I will suspent them) and climate s set to no new tasks. At the moment a task finalize i will manaualy arrange that a new task will be downloaded. Resource settings are 700 for Climate and 100 for Milkyway. In the app_info file i set the milkyway jobs should use 0.50 of a cpu and there are two units running on the GPU. This is running fine for a few weeks now. Problems occurs at the moment milkyway runs dry (Last week I have had a outage of my internet connection). After the network was working again MW start loading new workunits and start 6 jobs. After some time two mw jobs where pauzed and at that moment MW stops working. The only recover was restarting total. ID: 26435 · Rating: 0 · rate: / Reply Quote

Vid Vidmar* Send message Joined: 29 Aug 07 Posts: 81 Credit: 60,360,858 RAC: 0	Message 26535 - Posted: 26 Jun 2009, 10:13:02 UTC - in response to Message 26401. Last modified: 26 Jun 2009, 10:16:40 UTC Q9450, ATI 4870, BOINC CC 6.5.0, Cluster's .19f ATI opt. app, connect every 0.125 days, 0.25 days additional work buffer, no suspending on activity. Resource share is the same (100) as other projects (s@h, PG, ABC) on that computer. Since the WUs started flowing again, I never ran out of work except for occasional server hiccups. Oh, and I did have to adjust TDCF which was way too high, and FPOPS value in app_info.xml, which I can post later, when I get home from job. [edit] Added RS and other projects info. [/edit] BR, ID: 26535 · Rating: 0 · rate: / Reply Quote

JerWA Send message Joined: 22 Jun 09 Posts: 52 Credit: 74,110,876 RAC: 0	Message 26593 - Posted: 27 Jun 2009, 9:10:18 UTC Last modified: 27 Jun 2009, 9:14:43 UTC I ran 3 days full out before it got enough negative debt accrued to stop pulling work again. This is with manager 6.4.7, 200 resource share (everything else at 100), the only GPU app, with everything in MW default except n1 is set (I saw no gain letting it run concurrent WUs, think the 512GB of RAM is a limiting factor but may revisit). Easily worked-around (and now automated as a Windows scheduler task for me) (Run from your BOINC install directory, as running it from any other path will make it prompt you for your BOINC client auth information) boinccmd --set_debts http://milkyway.cs.rpi.edu/milkyway 0 0 Can be run while the client is active. Was sitting at around -18000 debt, no longer downloading MW work (which = idle GPU). Reset debts for just MW (which does nudge other debts around, so don't change it too drastically) and within 60 seconds it had contacted MW for more work and has been running since. I'll see what this does long-term and go from there. Pushing ~60,000/day from a 512GB HD4870. ID: 26593 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 26607 - Posted: 27 Jun 2009, 13:57:00 UTC I presume you could use a script like this for the duration_correction_factor being reset to 0.000000 on a regular basis from the usual 100.000000? Go away, I was asleep ID: 26607 · Rating: 0 · rate: / Reply Quote

JerWA Send message Joined: 22 Jun 09 Posts: 52 Credit: 74,110,876 RAC: 0	Message 26621 - Posted: 28 Jun 2009, 2:38:31 UTC - in response to Message 26607. I presume you could use a script like this for the duration_correction_factor being reset to 0.000000 on a regular basis from the usual 100.000000? Not sure, I see that value in client_state but not in boinccmd. I've run into some odd behavior today again. MW app got stuck in the manager, showed 3 running but none getting any GPU time. Had to restart the manager to sort that. Bumped up to n2, and set my debt reset to run every hour after watching it stop pulling work again today already. Also set it to debt 100 100 instead of 0 0 just to see if it made much difference in terms of how long before it ran into a problem again. ID: 26621 · Rating: 0 · rate: / Reply Quote