Welcome to MilkyWay@home

30 Workunit Limit Per Request - Fix Implemented

Message boards : News : 30 Workunit Limit Per Request - Fix Implemented
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 6 · Next

AuthorMessage
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 68428 - Posted: 27 Mar 2019, 20:43:25 UTC

Hey Everyone,

Some users were only able to request 30 workunits at a time max. This is far too low to ensure optimal GPU usage. We have increased the feeder's shared memory size to hopefully increase the number of workunits available to request at any given time. Please let me know here if this has helped the issue.

Best,

Jake
ID: 68428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 638,784,850
RAC: 55,527
Message 68429 - Posted: 27 Mar 2019, 20:50:56 UTC

Jake -

Just tried a "user update" and got another 30 tasks.

And still, every time I complete a task and try to "replace it", no tasks are down loaded.
ID: 68429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 68430 - Posted: 27 Mar 2019, 21:09:29 UTC

Hey wb8ili,

Can you give it another try for me and let me know what you see?

Best,

Jake
ID: 68430 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 68432 - Posted: 27 Mar 2019, 22:09:31 UTC

Getting 200 tasks now per request, well done Jake. Was only getting 40-50 of them before this fix.
ID: 68432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 167
Credit: 1,008,062,758
RAC: 834
Message 68434 - Posted: 27 Mar 2019, 22:13:53 UTC - in response to Message 68429.  

Jake -

Just tried a "user update" and got another 30 tasks.

And still, every time I complete a task and try to "replace it", no tasks are down loaded.


This is the issue.
ID: 68434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 638,784,850
RAC: 55,527
Message 68435 - Posted: 28 Mar 2019, 0:50:24 UTC

Jake -

I just did a "user request for tasks" and received 51 new ones.

As I have written before, every 4 minutes or so I complete a task, report it, request new tasks, and get nothing. I have about 200 tasks now from my user requests so I will have to wait until tomorrow and see if my stock pile bleeds down.
ID: 68435 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 167
Credit: 1,008,062,758
RAC: 834
Message 68438 - Posted: 28 Mar 2019, 12:23:07 UTC - in response to Message 68435.  

Jake -

I just did a "user request for tasks" and received 51 new ones.

As I have written before, every 4 minutes or so I complete a task, report it, request new tasks, and get nothing. I have about 200 tasks now from my user requests so I will have to wait until tomorrow and see if my stock pile bleeds down.


I've ran out of tasks several times overnight. BOINCTasks history shows a lot of MW tasks, 1 task from Moo or Collatz (my 0% resource share backup projects) then a lot more MW tasks. There were several cycles of this.
ID: 68438 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 638,784,850
RAC: 55,527
Message 68440 - Posted: 28 Mar 2019, 14:31:56 UTC

Jake -

27 Mar 2019 20:44:37 Received 51 tasks via a User Requested Update. Now have approx. 200 tasks on-hand.

28 Mar 2019 10:26:00 Since previous time have completed 190 tasks. Automatically reported each completed task and asked for new tasks each time. Received none. Now have 10 tasks on-hand. Will probably run out of work less than an hour. Will report back what happens then.
ID: 68440 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 68441 - Posted: 28 Mar 2019, 15:15:15 UTC

Hey guys,

So the current set up allows for users to have up to 200 workunits per GPU on their computer and another 40 workunits per CPU with a maximum of 600 possible workunits.

On the server, we try to store a cache of 10,000 workunits. Sometimes when a lot of people request work all at the same time, this cache will run low.

So all of the numbers I have listed are tunable. What would you guys recommend for changes to these numbers?

Jake
ID: 68441 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 638,784,850
RAC: 55,527
Message 68442 - Posted: 28 Mar 2019, 15:27:26 UTC
Last modified: 28 Mar 2019, 15:47:37 UTC

Here is the rest of my story -

28 Mar 2019 110629 I finished my last task and was out of work

28 Mar 2019 111308 The BOINC Manager (I assume) sent a request for new tasks. Got 200. I am good for the next 15 hours.

I was out of work for 6.5 minutes with no user intervention.

Jake - something has changed in the recent past (new server?) that MY queue isn't being maintained. I still think it has something to do with requesting tasks too frequently. And, I still think around 6 minutes is the cutoff. However, I don't get a message "Too recent since last request" or similiar.

Edit: I checked the log on another of my computers that takes 14 minutes to complete a task and the log shows that that computer gets a new task pretty much every time it finishes one.
ID: 68442 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 68443 - Posted: 28 Mar 2019, 19:13:40 UTC

I have been closely monitoring my BOINC machines as well and I can confirm that, despite downloading 200 tasks per request, the app is still running out of work. As wb8ili said, the queue is not maintained and the client goes through those 200 tasks without downloading new ones. The Event Log repeatedly shows something like this:

28/03/2019 20:06:26 | Milkyway@Home | Sending scheduler request: To fetch work.
28/03/2019 20:06:26 | Milkyway@Home | Reporting 8 completed tasks
28/03/2019 20:06:26 | Milkyway@Home | Requesting new tasks for NVIDIA GPU
28/03/2019 20:06:29 | Milkyway@Home | Scheduler request completed: got 0 new tasks

Eventually, after the queue is completely cleared and when there are no more tasks to crunch, new 200 tasks are downloaded - but there is always a period of time during which the client is out of work before the queue is refilled.
ID: 68443 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 68444 - Posted: 28 Mar 2019, 19:33:00 UTC

I'm not sure how to force the client to download new work when it returns workunits to the scheduler. I've been searching the BOINC documentation looking for flags to try, but I can't find anything.

Anyone have any suggestions?

Jake
ID: 68444 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 638,784,850
RAC: 55,527
Message 68447 - Posted: 28 Mar 2019, 20:26:46 UTC - in response to Message 68444.  
Last modified: 28 Mar 2019, 20:33:24 UTC

Jake -

It is timing thing. Somewhere between 4.5 minutes and 14 minutes (6 minutes?), there is a "timer" in Milkyway that stops new tasks from being downloaded unless it is more than the "timer" since the last request. All other projects that I am familiar with have the feature. In CPDN it is one hour. And, all projects that I use reset the timer to the "max" if you request task before the "timer" has expired.

I have 6 machines running Milkyway GPU tasks. Only my fastest has this issue (running out of work).

Edit: I just checked Einstein and it looks like the "timer" is about 1 minute.
ID: 68447 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 167
Credit: 1,008,062,758
RAC: 834
Message 68450 - Posted: 28 Mar 2019, 22:10:59 UTC - in response to Message 68441.  
Last modified: 28 Mar 2019, 22:12:07 UTC

Hey guys,

So the current set up allows for users to have up to 200 workunits per GPU on their computer and another 40 workunits per CPU with a maximum of 600 possible workunits.

On the server, we try to store a cache of 10,000 workunits. Sometimes when a lot of people request work all at the same time, this cache will run low.

So all of the numbers I have listed are tunable. What would you guys recommend for changes to these numbers?

Jake


It's not any of these settings. When the server allows work, we get work. But there is a timeout to prevent users from spamming projects with frequent requests. These tasks are so quick tasks are constantly uploading. About ever 30-35 seconds for me. So we are constantly requesting too frequently until all tasks are done, the delay passes and then we can get more work.

I still have a PC that has not contacted the server after the upgrade. The sched_reply_milkyway.cs.rpi.edu_milkyway.xml file does not have this line at all in the old version.

<next_rpc_delay>600.000000</next_rpc_delay>

https://boinc.berkeley.edu/trac/wiki/ProjectOptions#client-control

For reference, the entire old version of the file minus some user info.

<scheduler_reply>
<scheduler_version>707</scheduler_version>
<dont_use_dcf/>
<master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url>
<request_delay>91.000000</request_delay>
<project_name>Milkyway@Home</project_name>
<project_preferences>
<resource_share>10</resource_share>
<no_cpu>1</no_cpu>
<no_ati>0</no_ati>
<no_cuda>0</no_cuda>
<project_specific>
<max_gfx_cpu_pct>20</max_gfx_cpu_pct>
<gpu_target_frequency>60</gpu_target_frequency>
<nbody_graphics_poll_period>30</nbody_graphics_poll_period>
<nbody_graphics_float_speed>5</nbody_graphics_float_speed>
<nbody_graphics_textured_point_size>250</nbody_graphics_textured_point_size>
<nbody_graphics_point_point_size>40</nbody_graphics_point_point_size>
</project_specific>
<venue name="home">
<resource_share>50</resource_share>
<no_cpu>0</no_cpu>
<no_ati>1</no_ati>
<no_cuda>1</no_cuda>
<project_specific>
<max_gfx_cpu_pct>20</max_gfx_cpu_pct>
<gpu_target_frequency>60</gpu_target_frequency>
<nbody_graphics_poll_period>30</nbody_graphics_poll_period>
<nbody_graphics_float_speed>5</nbody_graphics_float_speed>
<nbody_graphics_textured_point_size>250</nbody_graphics_textured_point_size>
<nbody_graphics_point_point_size>40</nbody_graphics_point_point_size>
</project_specific>
</venue>
</project_preferences>

<result_ack>
    <name>de_modfit_sim19fixed_bundle4_4s_NoContraintsWithDisk260_3_1533467104_9447502_1</name>
</result_ack>
<result_ack>
    <name>de_modfit_sim19fixed_bundle4_4s_NoContraintsWithDisk260_1_1533467104_9241919_1</name>
</result_ack>
</scheduler_reply>
ID: 68450 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,959,376
RAC: 45,802
Message 68453 - Posted: 29 Mar 2019, 0:45:20 UTC - in response to Message 68450.  

I think mmonnin has found the problem for fast clients. The rpc_delay is much too long for fast turnaround clients. They can exhaust all 200 tasks in the ten minute span before next connection.

This is a server side parameter that project staff can alter. Seti has a 303 second rpc_delay which is borderline too long for fast clients also.
ID: 68453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 167
Credit: 1,008,062,758
RAC: 834
Message 68455 - Posted: 29 Mar 2019, 3:14:20 UTC

Well for my 280x it takes over 2 hours but there are quite a few requests within 10min. I wish for 30min. :) I'd guess a PC would need to run 1 task for over 10min to not run into the issue.

These 4 lines come every 2-4 task completions . Some complete, none are downloaded. Queue runs dry. Moo/Collatz take over 10min for a task and a new set of MW tasks arrive.

362062 Milkyway@Home 3/28/2019 11:01:04 PM Sending scheduler request: To fetch work.
362063 Milkyway@Home 3/28/2019 11:01:04 PM Reporting 2 completed tasks
362064 Milkyway@Home 3/28/2019 11:01:04 PM Requesting new tasks for AMD/ATI GPU
362065 Milkyway@Home 3/28/2019 11:01:06 PM Scheduler request completed: got 0 new tasks

Can the server distinguish between auto updates like those above and user updates so that the former could have a lower limit than possible user spam? The log file mentions a user update but does the server know?
ID: 68455 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JHMarshall

Send message
Joined: 24 Jul 12
Posts: 40
Credit: 7,123,301,054
RAC: 0
Message 68461 - Posted: 31 Mar 2019, 19:17:58 UTC - in response to Message 68450.  

I also think mmonnin has found the problem for fast clients. On my fastest system I complete the 200 WUs in 40 min, getting no new work for requests each time complete WUs are reported. After all WUs are completed the my BOINC client does not request any new work for 10 minutes. So 40 min computing and 10 minutes sitting idle is not very efficient!!


I would certaintly like to see the next_rpc_delay reduced.

Joe
ID: 68461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hurr1cane78

Send message
Joined: 7 May 14
Posts: 57
Credit: 206,540,646
RAC: 2
Message 68462 - Posted: 31 Mar 2019, 21:25:45 UTC - in response to Message 68428.  
Last modified: 31 Mar 2019, 21:28:08 UTC

hi, I need atleast 8days worth of WU's stored, i need offline wu's for atleast 2days at a time, im doing 19s per WU_cheers
ID: 68462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 68464 - Posted: 1 Apr 2019, 10:29:03 UTC - in response to Message 68462.  

hi, I need atleast 8days worth of WU's stored, i need offline wu's for atleast 2days at a time, im doing 19s per WU_cheers


Unless you have a very slow gpu that won't happen here at MilkyWay, they don't want to tie up that many workunits on one machine at one time.
The key here is to return a wu so you can get another wu, recently that has been a minor problem but the Admin has been working very hard to get it back up and running like it did under the old Server side software version.
ID: 68464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mad_Max

Send message
Joined: 2 Aug 11
Posts: 13
Credit: 44,453,057
RAC: 0
Message 68465 - Posted: 1 Apr 2019, 18:34:15 UTC - in response to Message 68453.  
Last modified: 1 Apr 2019, 18:42:05 UTC

I think mmonnin has found the problem for fast clients. The rpc_delay is much too long for fast turnaround clients. They can exhaust all 200 tasks in the ten minute span before next connection.

This is a server side parameter that project staff can alter. Seti has a 303 second rpc_delay which is borderline too long for fast clients also.


200 computed tasks in less 10 minutes? It is not possible even for fastest machines. Very best computers with few modern powerful GPUs working in parallel dedicated to the single project of MW can do 200 tasks "only" in ~20-40 min.

Real problem is not rpc_delay by itself but mismatch between <next_rpc_delay> = 600 sec and <request_delay> = 91 sec.
Server currently asks client to wait at-least 91 sec before next request, but "ban" (does not give new tasks and reset countdown timer back to 600 sec) if <600 sec passed after previous request.
Fast computers report completed and request new tasks every few min (as they think it is OK to do so as the server ask to wait only 91 sec, so sending new request after 120 sec for example looks OK) and get "banned" every time because <600 sec passed from latest request.

With <request_delay> > <next_rpc_delay> there will be no such problems. So INCREASE of <request_delay> to >600 too (shown as "communication deferred countdown timer" in project status of BOINC client ) sec should fix this problem. Also it will reduce server load significantly.
Or decrease server side timeout.
In either case client slide delay should be bigger or atleast equal to the server side delay/timeout to avoid false detection of fast clients as "misbehaving" ones.
ID: 68465 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 6 · Next

Message boards : News : 30 Workunit Limit Per Request - Fix Implemented

©2024 Astroinformatics Group