Message boards :
Number crunching :
Delay in getting new work units untill all work units have cleared
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
![]() ![]() Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 0 ![]() ![]() ![]() |
Got it sorted, thank you. You're right...the Primegrid WU's take longer than I thought. Einstein would be better. Kind regards Chooka ![]() ![]() |
Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0 ![]() ![]() |
We have been monitoring the situation, and it seems like the community has found fixes to some of the problems you are experiencing. Tom, Sorry but I think you are incorrect. The community has not found fixes to the problem (delay in getting work). We use workarounds that process work from other projects while MW sits on its butt. I've seen nothing that shows this is a client issue especially since it started after MW server changes. I run many projects and only one project has this issue. I can duplicate the problem on all my systems: slow GPUs, fast GPUs, Nvidia GPUs, and AMD GPUs. I have logs showing the strange MW behavior and normal behavior from Einstein on the same system with identical settings. MW logs show a strange "resource backoff" of 10 minutes. That backoff doesn't show up in Einstein logs. MW really has two problems unique to MW: 1. Consistent failure to send new tasks when reporting a completed task. 2. Strange "resource backoff" that results in long delays in refiling the cache when all tasks are complete. Hence, we process other projects while waiting. I have a commented log (62K text file) with my settings and how to duplicate the issue. I can send it to you via private message if you are interested. I don't know what the limit is for text in a forum post. I could paste it in a forum post for all to see and analyze if a 62k post is allowed. I don't use any internet shares, so I can't post a link. All my data is kept local. I might be looking at the log incorrectly. But, I and maybe many others in the community would like to see some interest from the project in resolving this issue. It's not just going to go away. Pretty please, Joe |
![]() ![]() Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 ![]() ![]() ![]() |
For those thinking the new Boinc software will fix the problems here at MW will have to wait another day or so as the version 7.16.2 is out BUT has a major problem and will be replaced by versions 7.16.3 asap!! Do NOT download version 7.16.2 as it will immediately cause every existing workunit on your computer to have a computation error, as soon as you click on the update button for each Project though you will get new workunits and they seem to work just fine, as I said the Developers know what the problem is and will fix it in the next release which should be out today or tomorrow. These are STILL Beta versions though, so for most people I would suggest NOT trying it until some of the testers figure out if it really helps or not. |
Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0 ![]() ![]() |
OK … I finally found some information on the "resource backoff". This is normal client operation. If the client doesn't get work for a specific resource (in our case the GPU) when it requests work, it stops asking for a certain time interval. This is the resource backoff time. On all my system MW NEVER sends new tasks when reporting completed tasks. This results in the client setting a resource backoff. The problem is more apparent with fast GPUs because they always have tasks to report at the 90 sec RPC backoff interval. Therefore. they are always in a resource backoff situation until they no longer have tasks to report. The resource backoff seems to start at a value between 100 to 400 secs. with an increment of 600 secs. If a computer takes serveral minutes to run a MW task, it takes hours to empty the cache and a 5 to 15 minute gap is not very noticeable. On a system with very fast GPUs the cache can be emptied in 40 minutes or less. Then a 5 to 15 minute gap is an eternity and is very frustrating. After reporting the last completed task(s) and failing to get new tasks, the client will not request new MW tasks until the resource backoff has counted down. This is the gap we fill with "0 resource share" projects until the client is allowed to request new MW tasks. A user update request clears the resource backoff. This is why a user update request after all tasks are complete and the RPC backoff time has counted down refills the cache. If MW could figure out why the server NEVER sends new tasks when the client requests new tasks when reporting a completed task(s), I think our issues would be resolved. Joe |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,462,105,537 RAC: 17,880 ![]() ![]() ![]() |
Got it sorted, thank you. Interesting -- I just enabled the internal video on my i7-4790s (Haswell 4th gen cpu) and immediately got an Einstein "Open CL" work unit. It ook 11 minutes to execute. I currently have 3 pending validation. https://einsteinathome.org/task/885870568 I took the HDMI cable off the ATI board to use elsewhere and enabled the HD-4600 VGA as I had one of those cables.. Microsoft went off on its own and got a driver that had OpenCL. I forgot that Einstein has an Open CL intel beta app. |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,010,669,144 RAC: 38,501 ![]() ![]() ![]() |
We have been monitoring the situation, and it seems like the community has found fixes to some of the problems you are experiencing. What fixes are those? MW work runs out, waits a couple of minutes then the server finally gives us more work. The server should be give us more work the entire time, not wait until our MW queues are empty to provide more work. |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,462,105,537 RAC: 17,880 ![]() ![]() ![]() |
same problem with 7.16.3: One does not get any data for up to 15 minutes after running out, then getting 900 or so all downloaded at once. report bunch of tasks and immediately ask for more and get nothing [code] 9/30/2019 10:11:25 AM Starting BOINC client version 7.16.3 for windows_x86_64 107 Milkyway@Home 9/30/2019 10:15:44 AM Sending scheduler request: To fetch work. 108 Milkyway@Home 9/30/2019 10:15:44 AM Reporting 19 completed tasks 109 Milkyway@Home 9/30/2019 10:15:44 AM Requesting new tasks for AMD/ATI GPU 110 Milkyway@Home 9/30/2019 10:15:47 AM Scheduler request completed: got 0 new tasks [/code Curious: Is it possible, in the app, to ask for more data before reporting or uploading? Can the app be built with VS2017 or later? |
Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0 ![]() ![]() |
same problem with 7.16.3: One does not get any data for up to 15 minutes after running out, then getting 900 or so all downloaded at once. From the research I've done, the delay is normal operation for the BOINC client. The delay is designed to keep the client from continually pestering a project when it has no work. In our case, MW has work but fails to send it when the client requests it. This makes the client think the project has no work and the client backs off request times. The real problem with MW is exactly what you show in lines 107 to 110 in your log. The client asks for work and MW fails to send it. This is exactly what I see in my logs. Joe |
![]() Send message Joined: 23 Feb 18 Posts: 26 Credit: 4,744,416,145 RAC: 0 ![]() ![]() |
JHMarshall This was exactly what i was talking about.. Want your Kids stay off from Drugs? Get them building Crunching PC's and they'll never have enough money for drugs |
![]() Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 ![]() ![]() |
Apologies, I thought the problem had been resolved. I'm looking into the problem and will hopefully have a solution soon (or at least an explanation). - Tom |
Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0 ![]() ![]() |
Apologies, I thought the problem had been resolved. I'm looking into the problem and will hopefully have a solution soon (or at least an explanation). Tom, Thank you, Joe |
![]() ![]() Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 0 ![]() ![]() ![]() |
Yes, thank you Tom. p.s JStateson - I went back to Primegrid. It actually takes slightly less time than Einstein and I'm happy with my E@H rank but PG needs some work :D ![]() ![]() |
Send message Joined: 6 Oct 19 Posts: 4 Credit: 34,040,968 RAC: 0 ![]() ![]() |
Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info! |
![]() ![]() Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 ![]() ![]() ![]() |
Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info! What is your cache size? The smaller it is the fewer workunits you will get. |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,462,105,537 RAC: 17,880 ![]() ![]() ![]() |
Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info! Your card is virtually identical to my s9000 and is capable of running 4 or even 5 at a time. That should allow for a larger download. <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>0.19</gpu_usage> <cpu_usage>0.20</cpu_usage> <cmdline>--verbose</cmdline> </gpu_versions> </app> </app_config> I was never able to run more than one at a time when taking part in that seti WOW event. However, the seti app I as running used CUDA while MW uses OpenCL. If a problem change both number to .25 |
Send message Joined: 6 Oct 19 Posts: 4 Credit: 34,040,968 RAC: 0 ![]() ![]() |
Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info! Nice! I now see gpu0 running 4-5 wu's at a time. What do I add to have both gpu's run multiples? I have 2 280's. I think device 0 and 1. Mikey, where would I find my cache size? In my computing preferences I have 5 days worth of work. and 5gb of work. Thanks again for the help! edit: Nope it's just my wu limit. I see device 0 working on 5. Device 1 is only working on 2. so I have a setting somewhere only limiting me to 7 wu's at a time. hmmm? |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,462,105,537 RAC: 17,880 ![]() ![]() ![]() |
your Celeron(R) CPU G3930 has only 2 processors and a pair of r200. Not enough CPU to go around Change the .19 to .333 but leave the .20 for the CPU and restart the client. that should give you 6 work units and leave part of a cpu to run the OS. [edit] possible upgrade path to get 4 or 8 threads http://www.cpu-upgrade.com/CPUs/Intel/Celeron_Dual-Core/G3930.html |
Send message Joined: 6 Oct 19 Posts: 4 Credit: 34,040,968 RAC: 0 ![]() ![]() |
Ok sounds good. I'll give it a try. Yeah I could tell the WCG would pause one wu when MW@home was running. It's probably not helping I'm running that too. I bought the cpu for an old mining rig back in the day. Thanks for the help! edit. ok that setting split the work on the gpu's. i waited for everything to finish. waited for the 1:30 countdown to finish. hit update downloaded 7 wu, device 0 had 3 and device 1 had 3. 1 left over. so it's working fine. Just only downloading 7 wu's at a time. |
Send message Joined: 6 Oct 19 Posts: 4 Credit: 34,040,968 RAC: 0 ![]() ![]() |
Sorry, wish I could edit. I don't know what I did but now I have at least 50 waiting. Maybe it was in the Boinc preferences. I bumped it back to 0.19 and it's running fine. thanks for all the help! |
![]() Send message Joined: 5 Jul 11 Posts: 991 Credit: 376,804,361 RAC: 2,402 ![]() ![]() ![]() |
Surely there is some way to set our boinc clients to not report completed tasks so often. I don't have to send back 2 or 3 work units every 2 minutes when I have 6 hours to compute. If I could get boinc to only report completed tasks every half hour, then it could request work inbetween (without simultaneous task reporting) and work around the problem of not being able to do both at once. |
©2025 Astroinformatics Group