Welcome to MilkyWay@home

Delay in getting new work units untill all work units have cleared

Message boards : Number crunching : Delay in getting new work units untill all work units have cleared
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Chooka
Avatar

Send message
Joined: 13 Dec 12
Posts: 101
Credit: 1,782,658,327
RAC: 0
Message 69135 - Posted: 28 Sep 2019, 5:19:31 UTC - in response to Message 69126.  

Got it sorted, thank you.
You're right...the Primegrid WU's take longer than I thought. Einstein would be better.

Kind regards
Chooka

ID: 69135 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JHMarshall

Send message
Joined: 24 Jul 12
Posts: 40
Credit: 7,123,301,054
RAC: 0
Message 69136 - Posted: 28 Sep 2019, 16:49:55 UTC - in response to Message 69131.  

We have been monitoring the situation, and it seems like the community has found fixes to some of the problems you are experiencing.

Jake said that the problem appeared to be some obscure BOINC setting somewhere, and had asked BOINC forums about it. It looks like this issue disappears in the new beta of a BOINC client, so they must have patched whatever was causing problems. When that is released, hopefully the problem will be resolved.

- Tom



Tom,

Sorry but I think you are incorrect. The community has not found fixes to the problem (delay in getting work). We use workarounds that process work from other projects
while MW sits on its butt. I've seen nothing that shows this is a client issue especially since it started after MW server changes. I run many projects and only one project
has this issue.

I can duplicate the problem on all my systems: slow GPUs, fast GPUs, Nvidia GPUs, and AMD GPUs. I have logs showing the strange MW behavior
and normal behavior from Einstein on the same system with identical settings. MW logs show a strange "resource backoff" of 10 minutes.
That backoff doesn't show up in Einstein logs.

MW really has two problems unique to MW:
1. Consistent failure to send new tasks when reporting a completed task.
2. Strange "resource backoff" that results in long delays in refiling the cache when all tasks are complete. Hence, we process other projects while waiting.

I have a commented log (62K text file) with my settings and how to duplicate the issue. I can send it to you via private message if you are interested. I don't
know what the limit is for text in a forum post. I could paste it in a forum post for all to see and analyze if a 62k post is allowed. I don't use any internet shares,
so I can't post a link. All my data is kept local.

I might be looking at the log incorrectly. But, I and maybe many others in the community would like to see some interest from the project in resolving this issue.
It's not just going to go away.

Pretty please,

Joe
ID: 69136 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3328
Credit: 522,584,620
RAC: 82,357
Message 69137 - Posted: 29 Sep 2019, 13:24:10 UTC
Last modified: 29 Sep 2019, 13:25:25 UTC

For those thinking the new Boinc software will fix the problems here at MW will have to wait another day or so as the version 7.16.2 is out BUT has a major problem and will be replaced by versions 7.16.3 asap!! Do NOT download version 7.16.2 as it will immediately cause every existing workunit on your computer to have a computation error, as soon as you click on the update button for each Project though you will get new workunits and they seem to work just fine, as I said the Developers know what the problem is and will fix it in the next release which should be out today or tomorrow. These are STILL Beta versions though, so for most people I would suggest NOT trying it until some of the testers figure out if it really helps or not.
ID: 69137 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JHMarshall

Send message
Joined: 24 Jul 12
Posts: 40
Credit: 7,123,301,054
RAC: 0
Message 69138 - Posted: 29 Sep 2019, 18:19:10 UTC

OK … I finally found some information on the "resource backoff". This is normal client operation. If the client doesn't get work for a specific
resource (in our case the GPU) when it requests work, it stops asking for a certain time interval. This is the resource backoff time.

On all my system MW NEVER sends new tasks when reporting completed tasks. This results in the client setting a resource backoff.

The problem is more apparent with fast GPUs because they always have tasks to report at the 90 sec RPC backoff interval. Therefore. they are
always in a resource backoff situation until they no longer have tasks to report. The resource backoff seems to start at a value between 100 to 400 secs.
with an increment of 600 secs. If a computer takes serveral minutes to run a MW task, it takes hours to empty the cache and a 5 to 15 minute gap is not very
noticeable. On a system with very fast GPUs the cache can be emptied in 40 minutes or less. Then a 5 to 15 minute gap is an eternity and is very frustrating.

After reporting the last completed task(s) and failing to get new tasks, the client will not request new MW tasks until the resource backoff has counted down.
This is the gap we fill with "0 resource share" projects until the client is allowed to request new MW tasks.

A user update request clears the resource backoff. This is why a user update request after all tasks are complete and the RPC backoff time has
counted down refills the cache.

If MW could figure out why the server NEVER sends new tasks when the client requests new tasks when reporting a completed task(s), I think our issues would be resolved.

Joe
ID: 69138 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 69140 - Posted: 30 Sep 2019, 2:04:52 UTC - in response to Message 69135.  
Last modified: 30 Sep 2019, 2:05:58 UTC

Got it sorted, thank you.
You're right...the Primegrid WU's take longer than I thought. Einstein would be better.

Kind regards
Chooka


Interesting -- I just enabled the internal video on my i7-4790s (Haswell 4th gen cpu) and immediately got an Einstein "Open CL" work unit. It ook 11 minutes to execute. I currently have 3 pending validation.
https://einsteinathome.org/task/885870568

I took the HDMI cable off the ATI board to use elsewhere and enabled the HD-4600 VGA as I had one of those cables..

Microsoft went off on its own and got a driver that had OpenCL. I forgot that Einstein has an Open CL intel beta app.
ID: 69140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 167
Credit: 1,006,572,984
RAC: 5,593
Message 69143 - Posted: 30 Sep 2019, 13:49:49 UTC - in response to Message 69131.  

We have been monitoring the situation, and it seems like the community has found fixes to some of the problems you are experiencing.

Jake said that the problem appeared to be some obscure BOINC setting somewhere, and had asked BOINC forums about it. It looks like this issue disappears in the new beta of a BOINC client, so they must have patched whatever was causing problems. When that is released, hopefully the problem will be resolved.

- Tom


What fixes are those? MW work runs out, waits a couple of minutes then the server finally gives us more work. The server should be give us more work the entire time, not wait until our MW queues are empty to provide more work.
ID: 69143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 69144 - Posted: 30 Sep 2019, 20:10:41 UTC
Last modified: 30 Sep 2019, 20:29:39 UTC

same problem with 7.16.3: One does not get any data for up to 15 minutes after running out, then getting 900 or so all downloaded at once.

report bunch of tasks and immediately ask for more and get nothing

[code]
9/30/2019 10:11:25 AM Starting BOINC client version 7.16.3 for windows_x86_64

107 Milkyway@Home 9/30/2019 10:15:44 AM Sending scheduler request: To fetch work.
108 Milkyway@Home 9/30/2019 10:15:44 AM Reporting 19 completed tasks
109 Milkyway@Home 9/30/2019 10:15:44 AM Requesting new tasks for AMD/ATI GPU
110 Milkyway@Home 9/30/2019 10:15:47 AM Scheduler request completed: got 0 new tasks
[/code

Curious: Is it possible, in the app, to ask for more data before reporting or uploading? Can the app be built with VS2017 or later?
ID: 69144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JHMarshall

Send message
Joined: 24 Jul 12
Posts: 40
Credit: 7,123,301,054
RAC: 0
Message 69145 - Posted: 30 Sep 2019, 23:24:19 UTC - in response to Message 69144.  

same problem with 7.16.3: One does not get any data for up to 15 minutes after running out, then getting 900 or so all downloaded at once.

report bunch of tasks and immediately ask for more and get nothing

[code]
9/30/2019 10:11:25 AM Starting BOINC client version 7.16.3 for windows_x86_64

107 Milkyway@Home 9/30/2019 10:15:44 AM Sending scheduler request: To fetch work.
108 Milkyway@Home 9/30/2019 10:15:44 AM Reporting 19 completed tasks
109 Milkyway@Home 9/30/2019 10:15:44 AM Requesting new tasks for AMD/ATI GPU
110 Milkyway@Home 9/30/2019 10:15:47 AM Scheduler request completed: got 0 new tasks
[/code

Curious: Is it possible, in the app, to ask for more data before reporting or uploading? Can the app be built with VS2017 or later?


From the research I've done, the delay is normal operation for the BOINC client. The delay is designed to keep the client from continually
pestering a project when it has no work.

In our case, MW has work but fails to send it when the client requests it. This makes the client think the project has no work and the client
backs off request times.

The real problem with MW is exactly what you show in lines 107 to 110 in your log. The client asks for work and MW fails to send it.

This is exactly what I see in my logs.

Joe
ID: 69145 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gambatesa
Avatar

Send message
Joined: 23 Feb 18
Posts: 26
Credit: 4,744,416,145
RAC: 0
Message 69146 - Posted: 1 Oct 2019, 17:11:37 UTC - in response to Message 69145.  

JHMarshall This was exactly what i was talking about..
Want your Kids stay off from Drugs? Get them building Crunching PC's and they'll never have enough money for drugs
ID: 69146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 69147 - Posted: 1 Oct 2019, 20:18:45 UTC

Apologies, I thought the problem had been resolved. I'm looking into the problem and will hopefully have a solution soon (or at least an explanation).

- Tom
ID: 69147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JHMarshall

Send message
Joined: 24 Jul 12
Posts: 40
Credit: 7,123,301,054
RAC: 0
Message 69148 - Posted: 2 Oct 2019, 4:17:19 UTC - in response to Message 69147.  

Apologies, I thought the problem had been resolved. I'm looking into the problem and will hopefully have a solution soon (or at least an explanation).

- Tom

Tom,

Thank you,

Joe
ID: 69148 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chooka
Avatar

Send message
Joined: 13 Dec 12
Posts: 101
Credit: 1,782,658,327
RAC: 0
Message 69149 - Posted: 2 Oct 2019, 8:18:54 UTC

Yes, thank you Tom.

p.s JStateson - I went back to Primegrid. It actually takes slightly less time than Einstein and I'm happy with my E@H rank but PG needs some work :D

ID: 69149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAWS

Send message
Joined: 6 Oct 19
Posts: 4
Credit: 34,040,968
RAC: 0
Message 69164 - Posted: 8 Oct 2019, 4:14:53 UTC

Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info!
ID: 69164 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3328
Credit: 522,584,620
RAC: 82,357
Message 69165 - Posted: 8 Oct 2019, 11:36:41 UTC - in response to Message 69164.  

Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info!


What is your cache size? The smaller it is the fewer workunits you will get.
ID: 69165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 69166 - Posted: 8 Oct 2019, 14:59:14 UTC - in response to Message 69164.  
Last modified: 8 Oct 2019, 15:04:24 UTC

Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info!


Your card is virtually identical to my s9000 and is capable of running 4 or even 5 at a time. That should allow for a larger download.
<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>0.19</gpu_usage>
<cpu_usage>0.20</cpu_usage>
<cmdline>--verbose</cmdline>
</gpu_versions>
</app>
</app_config>



I was never able to run more than one at a time when taking part in that seti WOW event. However, the seti app I as running used CUDA while MW uses OpenCL. If a problem change both number to .25
ID: 69166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAWS

Send message
Joined: 6 Oct 19
Posts: 4
Credit: 34,040,968
RAC: 0
Message 69167 - Posted: 8 Oct 2019, 19:19:38 UTC - in response to Message 69166.  
Last modified: 8 Oct 2019, 19:22:41 UTC

Ok, so I'm not going crazy. I've been with seti since 2001, but this Sunday they had another outage. Instead of my gpu's just sitting there I attached to MW@home. I have two R9 280's and can't believe how fast they crunch a wu here. Having to manually update to receive any wu's threw me for a loop. I've been working on it for an hour trying different settings and then I find this thread. anyways, how do you guys get so many downloaded? I get 9 at a time, then I manually update to get 9 more. Is it just because I'm new? Just nice to see my older cards working so good! thanks for any info!


Your card is virtually identical to my s9000 and is capable of running 4 or even 5 at a time. That should allow for a larger download.
<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>0.19</gpu_usage>
<cpu_usage>0.20</cpu_usage>
<cmdline>--verbose</cmdline>
</gpu_versions>
</app>
</app_config>


I was never able to run more than one at a time when taking part in that seti WOW event. However, the seti app I as running used CUDA while MW uses OpenCL. If a problem change both number to .25


Nice! I now see gpu0 running 4-5 wu's at a time. What do I add to have both gpu's run multiples? I have 2 280's. I think device 0 and 1.

Mikey, where would I find my cache size? In my computing preferences I have 5 days worth of work. and 5gb of work.

Thanks again for the help! edit: Nope it's just my wu limit. I see device 0 working on 5. Device 1 is only working on 2. so I have a setting somewhere only limiting me to 7 wu's at a time. hmmm?
ID: 69167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 69168 - Posted: 8 Oct 2019, 22:26:58 UTC - in response to Message 69167.  
Last modified: 8 Oct 2019, 22:33:23 UTC


<gpu_usage>0.19</gpu_usage>
<cpu_usage>0.20</cpu_usage>
I see device 0 working on 5. Device 1 is only working on 2. so I have a setting somewhere only limiting me to 7 wu's at a time. hmmm?


your Celeron(R) CPU G3930 has only 2 processors and a pair of r200. Not enough CPU to go around

Change the .19 to .333 but leave the .20 for the CPU and restart the client.

that should give you 6 work units and leave part of a cpu to run the OS.

[edit] possible upgrade path to get 4 or 8 threads
http://www.cpu-upgrade.com/CPUs/Intel/Celeron_Dual-Core/G3930.html
ID: 69168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAWS

Send message
Joined: 6 Oct 19
Posts: 4
Credit: 34,040,968
RAC: 0
Message 69169 - Posted: 9 Oct 2019, 1:32:51 UTC
Last modified: 9 Oct 2019, 1:39:44 UTC

Ok sounds good. I'll give it a try. Yeah I could tell the WCG would pause one wu when MW@home was running. It's probably not helping I'm running that too. I bought the cpu for an old mining rig back in the day. Thanks for the help!

edit. ok that setting split the work on the gpu's. i waited for everything to finish. waited for the 1:30 countdown to finish. hit update downloaded 7 wu, device 0 had 3 and device 1 had 3. 1 left over. so it's working fine. Just only downloading 7 wu's at a time.
ID: 69169 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAWS

Send message
Joined: 6 Oct 19
Posts: 4
Credit: 34,040,968
RAC: 0
Message 69170 - Posted: 9 Oct 2019, 4:14:37 UTC

Sorry, wish I could edit. I don't know what I did but now I have at least 50 waiting. Maybe it was in the Boinc preferences. I bumped it back to 0.19 and it's running fine. thanks for all the help!
ID: 69170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 69193 - Posted: 30 Oct 2019, 14:07:56 UTC
Last modified: 30 Oct 2019, 14:08:35 UTC

Surely there is some way to set our boinc clients to not report completed tasks so often. I don't have to send back 2 or 3 work units every 2 minutes when I have 6 hours to compute. If I could get boinc to only report completed tasks every half hour, then it could request work inbetween (without simultaneous task reporting) and work around the problem of not being able to do both at once.
ID: 69193 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Delay in getting new work units untill all work units have cleared

©2024 Astroinformatics Group