Delay in getting new work units untill all work units have cleared

Author	Message
Chooka Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,952,901 RAC: 0	Message 69135 - Posted: 28 Sep 2019, 5:19:31 UTC - in response to Message 69126. Got it sorted, thank you. You're right...the Primegrid WU's take longer than I thought. Einstein would be better. Kind regards Chooka ID: 69135 · Rating: 0 · rate: / Reply Quote

JHMarshall Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0	Message 69136 - Posted: 28 Sep 2019, 16:49:55 UTC - in response to Message 69131. We have been monitoring the situation, and it seems like the community has found fixes to some of the problems you are experiencing. Jake said that the problem appeared to be some obscure BOINC setting somewhere, and had asked BOINC forums about it. It looks like this issue disappears in the new beta of a BOINC client, so they must have patched whatever was causing problems. When that is released, hopefully the problem will be resolved. - Tom Tom, Sorry but I think you are incorrect. The community has not found fixes to the problem (delay in getting work). We use workarounds that process work from other projects while MW sits on its butt. I've seen nothing that shows this is a client issue especially since it started after MW server changes. I run many projects and only one project has this issue. I can duplicate the problem on all my systems: slow GPUs, fast GPUs, Nvidia GPUs, and AMD GPUs. I have logs showing the strange MW behavior and normal behavior from Einstein on the same system with identical settings. MW logs show a strange "resource backoff" of 10 minutes. That backoff doesn't show up in Einstein logs. MW really has two problems unique to MW: 1. Consistent failure to send new tasks when reporting a completed task. 2. Strange "resource backoff" that results in long delays in refiling the cache when all tasks are complete. Hence, we process other projects while waiting. I have a commented log (62K text file) with my settings and how to duplicate the issue. I can send it to you via private message if you are interested. I don't know what the limit is for text in a forum post. I could paste it in a forum post for all to see and analyze if a 62k post is allowed. I don't use any internet shares, so I can't post a link. All my data is kept local. I might be looking at the log incorrectly. But, I and maybe many others in the community would like to see some interest from the project in resolving this issue. It's not just going to go away. Pretty please, Joe ID: 69136 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 69137 - Posted: 29 Sep 2019, 13:24:10 UTC Last modified: 29 Sep 2019, 13:25:25 UTC For those thinking the new Boinc software will fix the problems here at MW will have to wait another day or so as the version 7.16.2 is out BUT has a major problem and will be replaced by versions 7.16.3 asap!! Do NOT download version 7.16.2 as it will immediately cause every existing workunit on your computer to have a computation error, as soon as you click on the update button for each Project though you will get new workunits and they seem to work just fine, as I said the Developers know what the problem is and will fix it in the next release which should be out today or tomorrow. These are STILL Beta versions though, so for most people I would suggest NOT trying it until some of the testers figure out if it really helps or not. ID: 69137 · Rating: 0 · rate: / Reply Quote

JHMarshall Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0	Message 69138 - Posted: 29 Sep 2019, 18:19:10 UTC OK â€¦ I finally found some information on the "resource backoff". This is normal client operation. If the client doesn't get work for a specific resource (in our case the GPU) when it requests work, it stops asking for a certain time interval. This is the resource backoff time. On all my system MW NEVER sends new tasks when reporting completed tasks. This results in the client setting a resource backoff. The problem is more apparent with fast GPUs because they always have tasks to report at the 90 sec RPC backoff interval. Therefore. they are always in a resource backoff situation until they no longer have tasks to report. The resource backoff seems to start at a value between 100 to 400 secs. with an increment of 600 secs. If a computer takes serveral minutes to run a MW task, it takes hours to empty the cache and a 5 to 15 minute gap is not very noticeable. On a system with very fast GPUs the cache can be emptied in 40 minutes or less. Then a 5 to 15 minute gap is an eternity and is very frustrating. After reporting the last completed task(s) and failing to get new tasks, the client will not request new MW tasks until the resource backoff has counted down. This is the gap we fill with "0 resource share" projects until the client is allowed to request new MW tasks. A user update request clears the resource backoff. This is why a user update request after all tasks are complete and the RPC backoff time has counted down refills the cache. If MW could figure out why the server NEVER sends new tasks when the client requests new tasks when reporting a completed task(s), I think our issues would be resolved. Joe ID: 69138 · Rating: 0 · rate: / Reply Quote

Joseph Stateson Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 1	Message 69140 - Posted: 30 Sep 2019, 2:04:52 UTC - in response to Message 69135. Last modified: 30 Sep 2019, 2:05:58 UTC Got it sorted, thank you. You're right...the Primegrid WU's take longer than I thought. Einstein would be better. Kind regards Chooka Interesting -- I just enabled the internal video on my i7-4790s (Haswell 4th gen cpu) and immediately got an Einstein "Open CL" work unit. It ook 11 minutes to execute. I currently have 3 pending validation. https://einsteinathome.org/task/885870568 I took the HDMI cable off the ATI board to use elsewhere and enabled the HD-4600 VGA as I had one of those cables.. Microsoft went off on its own and got a driver that had OpenCL. I forgot that Einstein has an Open CL intel beta app. ID: 69140 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,012,468,714 RAC: 21,423	Message 69143 - Posted: 30 Sep 2019, 13:49:49 UTC - in response to Message 69131. We have been monitoring the situation, and it seems like the community has found fixes to some of the problems you are experiencing. Jake said that the problem appeared to be some obscure BOINC setting somewhere, and had asked BOINC forums about it. It looks like this issue disappears in the new beta of a BOINC client, so they must have patched whatever was causing problems. When that is released, hopefully the problem will be resolved. - Tom What fixes are those? MW work runs out, waits a couple of minutes then the server finally gives us more work. The server should be give us more work the entire time, not wait until our MW queues are empty to provide more work. ID: 69143 · Rating: 0 · rate: / Reply Quote

Joseph Stateson Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 1	Message 69144 - Posted: 30 Sep 2019, 20:10:41 UTC Last modified: 30 Sep 2019, 20:29:39 UTC same problem with 7.16.3: One does not get any data for up to 15 minutes after running out, then getting 900 or so all downloaded at once. report bunch of tasks and immediately ask for more and get nothing [code] 9/30/2019 10:11:25 AM Starting BOINC client version 7.16.3 for windows_x86_64 107 Milkyway@Home 9/30/2019 10:15:44 AM Sending scheduler request: To fetch work. 108 Milkyway@Home 9/30/2019 10:15:44 AM Reporting 19 completed tasks 109 Milkyway@Home 9/30/2019 10:15:44 AM Requesting new tasks for AMD/ATI GPU 110 Milkyway@Home 9/30/2019 10:15:47 AM Scheduler request completed: got 0 new tasks [/code Curious: Is it possible, in the app, to ask for more data before reporting or uploading? Can the app be built with VS2017 or later? ID: 69144 · Rating: 0 · rate: / Reply Quote

JHMarshall Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0	Message 69145 - Posted: 30 Sep 2019, 23:24:19 UTC - in response to Message 69144. same problem with 7.16.3: One does not get any data for up to 15 minutes after running out, then getting 900 or so all downloaded at once. report bunch of tasks and immediately ask for more and get nothing [code] 9/30/2019 10:11:25 AM Starting BOINC client version 7.16.3 for windows_x86_64 107 Milkyway@Home 9/30/2019 10:15:44 AM Sending scheduler request: To fetch work. 108 Milkyway@Home 9/30/2019 10:15:44 AM Reporting 19 completed tasks 109 Milkyway@Home 9/30/2019 10:15:44 AM Requesting new tasks for AMD/ATI GPU 110 Milkyway@Home 9/30/2019 10:15:47 AM Scheduler request completed: got 0 new tasks [/code Curious: Is it possible, in the app, to ask for more data before reporting or uploading? Can the app be built with VS2017 or later? From the research I've done, the delay is normal operation for the BOINC client. The delay is designed to keep the client from continually pestering a project when it has no work. In our case, MW has work but fails to send it when the client requests it. This makes the client think the project has no work and the client backs off request times. The real problem with MW is exactly what you show in lines 107 to 110 in your log. The client asks for work and MW fails to send it. This is exactly what I see in my logs. Joe ID: 69145 · Rating: 0 · rate: / Reply Quote

gambatesa Send message Joined: 23 Feb 18 Posts: 26 Credit: 4,744,416,145 RAC: 0	Message 69146 - Posted: 1 Oct 2019, 17:11:37 UTC - in response to Message 69145. JHMarshall This was exactly what i was talking about.. Want your Kids stay off from Drugs? Get them building Crunching PC's and they'll never have enough money for drugs ID: 69146 · Rating: 0 · rate: / Reply Quote

Tom Donlon Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0	Message 69147 - Posted: 1 Oct 2019, 20:18:45 UTC Apologies, I thought the problem had been resolved. I'm looking into the problem and will hopefully have a solution soon (or at least an explanation). - Tom ID: 69147 · Rating: 0 · rate: / Reply Quote

JHMarshall Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0	Message 69148 - Posted: 2 Oct 2019, 4:17:19 UTC - in response to Message 69147. Apologies, I thought the problem had been resolved. I'm looking into the problem and will hopefully have a solution soon (or at least an explanation). - Tom Tom, Thank you, Joe ID: 69148 · Rating: 0 · rate: / Reply Quote

Chooka Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,952,901 RAC: 0	Message 69149 - Posted: 2 Oct 2019, 8:18:54 UTC Yes, thank you Tom. p.s JStateson - I went back to Primegrid. It actually takes slightly less time than Einstein and I'm happy with my E@H rank but PG needs some work :D ID: 69149 · Rating: 0 · rate: / Reply Quote

JAWS Send message Joined: 6 Oct 19 Posts: 4 Credit: 34,040,968 RAC: 0	Message 69164 - Posted: 8 Oct 2019, 4:14:53 UTC Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info! ID: 69164 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 69165 - Posted: 8 Oct 2019, 11:36:41 UTC - in response to Message 69164. Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info! What is your cache size? The smaller it is the fewer workunits you will get. ID: 69165 · Rating: 0 · rate: / Reply Quote

Joseph Stateson Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 1	Message 69166 - Posted: 8 Oct 2019, 14:59:14 UTC - in response to Message 69164. Last modified: 8 Oct 2019, 15:04:24 UTC Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info! Your card is virtually identical to my s9000 and is capable of running 4 or even 5 at a time. That should allow for a larger download. <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>0.19</gpu_usage> <cpu_usage>0.20</cpu_usage> <cmdline>--verbose</cmdline> </gpu_versions> </app> </app_config> I was never able to run more than one at a time when taking part in that seti WOW event. However, the seti app I as running used CUDA while MW uses OpenCL. If a problem change both number to .25 ID: 69166 · Rating: 0 · rate: / Reply Quote

JAWS Send message Joined: 6 Oct 19 Posts: 4 Credit: 34,040,968 RAC: 0	Message 69167 - Posted: 8 Oct 2019, 19:19:38 UTC - in response to Message 69166. Last modified: 8 Oct 2019, 19:22:41 UTC Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info! Your card is virtually identical to my s9000 and is capable of running 4 or even 5 at a time. That should allow for a larger download. <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>0.19</gpu_usage> <cpu_usage>0.20</cpu_usage> <cmdline>--verbose</cmdline> </gpu_versions> </app> </app_config> I was never able to run more than one at a time when taking part in that seti WOW event. However, the seti app I as running used CUDA while MW uses OpenCL. If a problem change both number to .25 Nice! I now see gpu0 running 4-5 wu's at a time. What do I add to have both gpu's run multiples? I have 2 280's. I think device 0 and 1. Mikey, where would I find my cache size? In my computing preferences I have 5 days worth of work. and 5gb of work. Thanks again for the help! edit: Nope it's just my wu limit. I see device 0 working on 5. Device 1 is only working on 2. so I have a setting somewhere only limiting me to 7 wu's at a time. hmmm? ID: 69167 · Rating: 0 · rate: / Reply Quote

Joseph Stateson Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 1	Message 69168 - Posted: 8 Oct 2019, 22:26:58 UTC - in response to Message 69167. Last modified: 8 Oct 2019, 22:33:23 UTC <gpu_usage>0.19</gpu_usage> <cpu_usage>0.20</cpu_usage> I see device 0 working on 5. Device 1 is only working on 2. so I have a setting somewhere only limiting me to 7 wu's at a time. hmmm? your Celeron(R) CPU G3930 has only 2 processors and a pair of r200. Not enough CPU to go around Change the .19 to .333 but leave the .20 for the CPU and restart the client. that should give you 6 work units and leave part of a cpu to run the OS. [edit] possible upgrade path to get 4 or 8 threads http://www.cpu-upgrade.com/CPUs/Intel/Celeron_Dual-Core/G3930.html ID: 69168 · Rating: 0 · rate: / Reply Quote

JAWS Send message Joined: 6 Oct 19 Posts: 4 Credit: 34,040,968 RAC: 0	Message 69169 - Posted: 9 Oct 2019, 1:32:51 UTC Last modified: 9 Oct 2019, 1:39:44 UTC Ok sounds good. I'll give it a try. Yeah I could tell the WCG would pause one wu when MW@home was running. It's probably not helping I'm running that too. I bought the cpu for an old mining rig back in the day. Thanks for the help! edit. ok that setting split the work on the gpu's. i waited for everything to finish. waited for the 1:30 countdown to finish. hit update downloaded 7 wu, device 0 had 3 and device 1 had 3. 1 left over. so it's working fine. Just only downloading 7 wu's at a time. ID: 69169 · Rating: 0 · rate: / Reply Quote

JAWS Send message Joined: 6 Oct 19 Posts: 4 Credit: 34,040,968 RAC: 0	Message 69170 - Posted: 9 Oct 2019, 4:14:37 UTC Sorry, wish I could edit. I don't know what I did but now I have at least 50 waiting. Maybe it was in the Boinc preferences. I bumped it back to 0.19 and it's running fine. thanks for all the help! ID: 69170 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,180,214 RAC: 510	Message 69193 - Posted: 30 Oct 2019, 14:07:56 UTC Last modified: 30 Oct 2019, 14:08:35 UTC Surely there is some way to set our boinc clients to not report completed tasks so often. I don't have to send back 2 or 3 work units every 2 minutes when I have 6 hours to compute. If I could get boinc to only report completed tasks every half hour, then it could request work inbetween (without simultaneous task reporting) and work around the problem of not being able to do both at once. ID: 69193 · Rating: 0 · rate: / Reply Quote