Welcome to MilkyWay@home

30 Workunit Limit Per Request - Fix Implemented

Message boards : News : 30 Workunit Limit Per Request - Fix Implemented
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
Mad_Max

Send message
Joined: 2 Aug 11
Posts: 13
Credit: 44,453,057
RAC: 0
Message 68466 - Posted: 1 Apr 2019, 19:06:14 UTC
Last modified: 1 Apr 2019, 19:18:46 UTC

P.S.

Actually <next_rpc_delay> is NOT a server side delay. It has other name (forgot it).
<next_rpc_delay> is also a client side delay, but this time not a min delay (do not contact server until this time pass) but max delay (DO contact server after this time pass even there is no any need for it - eg. nothing to report and no need to ask new for new task).

Is it really needed? This option force ALL attached client to contact server every 10 mins even if client does not actually work for the project currently. For example here is log snippet from one of my computers where MW set to backoff project(low priority):
....................................
01/04/2019 15:54:30 | Milkyway@Home | Sending scheduler request: Requested by project.
01/04/2019 15:54:30 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full)
01/04/2019 15:54:32 | Milkyway@Home | Scheduler request completed
01/04/2019 16:04:37 | Milkyway@Home | Sending scheduler request: Requested by project.
01/04/2019 16:04:37 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full)
01/04/2019 16:04:40 | Milkyway@Home | Scheduler request completed
01/04/2019 16:14:44 | Milkyway@Home | Sending scheduler request: Requested by project.
01/04/2019 16:14:44 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full)
01/04/2019 16:14:46 | Milkyway@Home | Scheduler request completed
01/04/2019 16:24:52 | Milkyway@Home | Sending scheduler request: Requested by project.
01/04/2019 16:24:52 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full)
01/04/2019 16:24:54 | Milkyway@Home | Scheduler request completed
01/04/2019 16:34:56 | Milkyway@Home | Sending scheduler request: Requested by project.
01/04/2019 16:34:56 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full)
01/04/2019 16:34:58 | Milkyway@Home | Scheduler request completed
01/04/2019 16:45:03 | Milkyway@Home | Sending scheduler request: Requested by project.
01/04/2019 16:45:03 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full)
01/04/2019 16:45:05 | Milkyway@Home | Scheduler request completed
01/04/2019 16:55:06 | Milkyway@Home | Sending scheduler request: Requested by project.
01/04/2019 16:55:06 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full)
01/04/2019 16:55:09 | Milkyway@Home | Scheduler request completed
01/04/2019 17:05:14 | Milkyway@Home | Sending scheduler request: Requested by project.
...........................................
and so on every ~10 min
It keeps hammering server with useless requests. Usually it is useful only for specific purposes like canceling WU in progress from the server side. Server can not contact client directly so instead it ask client to "check in" every X min/hours for possible new instructions. But even for this case usually few hours is enough. Every 10 min is a overkill.
ID: 68466 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jpmboy
Avatar

Send message
Joined: 29 Apr 17
Posts: 33
Credit: 7,041,502,264
RAC: 0
Message 68467 - Posted: 1 Apr 2019, 23:55:51 UTC - in response to Message 68465.  
Last modified: 2 Apr 2019, 0:43:55 UTC

I'm having this issue, and my two Titan Vs sit idle for most of the day waiting for a task-batch download. Each task completes in less than 1 min, I run 12 tasks at a time (6 per GPU). 200 tasks complete in ~ 17 min.

WE really need a fix to this which accounts for GPUs that are not DP crippled. ;)
ID: 68467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VietOZ

Send message
Joined: 28 Mar 18
Posts: 14
Credit: 761,475,797
RAC: 0
Message 68468 - Posted: 2 Apr 2019, 4:24:39 UTC - in response to Message 68467.  

2 GPUs should give you 400 tasks. Run 1 task per GPU and setup 6 instances. That still give you 12 tasks at a time. Add another GPU to your coproc.xml and lock it. Set cache to 10/10 and you'll get 600 tasks per instance. Make a tickler for every 5 minutes, that way when 600 tasks run out, the machine still get work after a few minutes idle. I know it's not a long term solution, but help Jake out. Give him time to pin point the problem. There are many ways to get around this. My VII maybe losing about 80k points per day with this issue ... not really a big deal.
ID: 68468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jpmboy
Avatar

Send message
Joined: 29 Apr 17
Posts: 33
Credit: 7,041,502,264
RAC: 0
Message 68469 - Posted: 2 Apr 2019, 12:08:46 UTC - in response to Message 68468.  

Thank you for the reply. I'll try jerry-rigging something when I get a chance.
ID: 68469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 9 Jul 17
Posts: 100
Credit: 16,967,906
RAC: 0
Message 68475 - Posted: 4 Apr 2019, 21:06:50 UTC

I don't know if this is the right thread or not, but since no one seems to know what the problem is, I will try it.
I attached my RX 570 (Win7 64-bit) at 10 AM and got 74 work units. The take 1 minute 49 seconds to run.
Then after a couple of hours I got another 72, and then a few hours therafter 71 work units.

Then, at 5 PM I ran out and got nothing. But a manual request caused another 71 to download.
If they run out again, it is back to Folding. I can't stay up all night to get them.
ID: 68475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 9 Jul 17
Posts: 100
Credit: 16,967,906
RAC: 0
Message 68478 - Posted: 5 Apr 2019, 16:05:07 UTC - in response to Message 68475.  

Then, at 5 PM I ran out and got nothing. But a manual request caused another 71 to download.
If they run out again, it is back to Folding. I can't stay up all night to get them.

After the last work unit finishes and it gets zero on the next request, BOINC waits 10 minutes and tries again.
Then, it gets a full load (85 on the last request). It is a strange problem, but not a major one for me.
ID: 68478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 627
Credit: 19,303,095
RAC: 1,042
Message 68479 - Posted: 5 Apr 2019, 19:19:24 UTC
Last modified: 5 Apr 2019, 19:22:10 UTC

@ all having issue to get WUs here because of that 10 minutes thing...

Have you tried to set your cache to:
"Store at least 0.01 days of work" and whatever else additional and then set "network activity based on preferences" and not "network activity always"?

This should actually limit scheduler requests to once in 14.4 minutes... I think. I don't have a Milkyway compatible GPU to try that.

Just a thought until it's fixed on the server...
ID: 68479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,959,689
RAC: 45,795
Message 68480 - Posted: 5 Apr 2019, 20:07:16 UTC - in response to Message 68465.  

200 computed tasks in less 10 minutes? It is not possible even for fastest machines. Very best computers with few modern powerful GPUs working in parallel dedicated to the single project of MW can do 200 tasks "only" in ~20-40 min.


Beg to differ. If I have a host with 8 RTX 2080 TI cards or similar, I can easily crunch through 200 tasks in ten minutes. There are many hosts with mining rig pedigrees that have multiple gpus. I have a minimum of 3 cards in every host.
ID: 68480 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 112
Credit: 1,174,293,644
RAC: 0
Message 68481 - Posted: 5 Apr 2019, 22:53:57 UTC - in response to Message 68480.  

200 computed tasks in less 10 minutes? It is not possible even for fastest machines. Very best computers with few modern powerful GPUs working in parallel dedicated to the single project of MW can do 200 tasks "only" in ~20-40 min.


Beg to differ. If I have a host with 8 RTX 2080 TI cards or similar, I can easily crunch through 200 tasks in ten minutes. There are many hosts with mining rig pedigrees that have multiple gpus. I have a minimum of 3 cards in every host.


Now that's an XtremeSystem! Just like our team likes :)
ID: 68481 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 167
Credit: 1,008,062,758
RAC: 834
Message 68483 - Posted: 6 Apr 2019, 1:59:19 UTC - in response to Message 68480.  
Last modified: 6 Apr 2019, 2:01:26 UTC

200 computed tasks in less 10 minutes? It is not possible even for fastest machines. Very best computers with few modern powerful GPUs working in parallel dedicated to the single project of MW can do 200 tasks "only" in ~20-40 min.


Beg to differ. If I have a host with 8 RTX 2080 TI cards or similar, I can easily crunch through 200 tasks in ten minutes. There are many hosts with mining rig pedigrees that have multiple gpus. I have a minimum of 3 cards in every host.


Then your task count is higher with more cards. 200 is the limit for 1 GPU and the statement was in regards to 1 single GPU. Only a TV or 7 is crunching in that time with 200 tasks per GPU.

It seems like its hard enough to get the admins to realize the issue wasn't how many task can be downloaded at once but the timeout issue completely preventing tasks from downloading at all. Please stay on topic instead of e-peening about omg my gpus can do it in 10minutes.
ID: 68483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 112
Credit: 1,174,293,644
RAC: 0
Message 68484 - Posted: 6 Apr 2019, 3:06:26 UTC

8x 2080ti = 1x Radeon VII for MilkyWay lol

(Sorry, couldn't resist)
ID: 68484 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 68485 - Posted: 6 Apr 2019, 7:41:54 UTC - in response to Message 68483.  

... Please stay on topic instead of e-peening about omg my gpus can do it in 10minutes.


THIS last sentence of yours is way off ....

You are missing the point.

Have a nice day!

(just had to say it)
ID: 68485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,959,689
RAC: 45,795
Message 68487 - Posted: 6 Apr 2019, 22:52:10 UTC - in response to Message 68483.  

and the statement was in regards to 1 single GPU

That WAS NOT apparent from just your post that no computer could crunch through 200 tasks in 10 minutes.
ID: 68487 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hurr1cane78

Send message
Joined: 7 May 14
Posts: 57
Credit: 206,540,646
RAC: 2
Message 68490 - Posted: 7 Apr 2019, 11:43:15 UTC

keen observer of the skies and this forum, enlighten me with more stats of the rtx 2080 ti owners please, seriously , with max out instances on only single rtx2080ti , how many WU's can you peel please
ID: 68490 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 11 Jul 17
Posts: 20
Credit: 1,429,841,456
RAC: 0
Message 68491 - Posted: 8 Apr 2019, 20:42:41 UTC
Last modified: 8 Apr 2019, 20:43:11 UTC

I never get enough MW WUs. I can run 200 at a time. Some computers don't get any while some others get a steady supply. They all have the same app_config.xml. Server Status shows 10,000 WUs ready but I have computers that haven't gotten any in forever.
Is there some kind of governor on this project???
I can't find anything to explain the difference.

ID: 68491 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 68492 - Posted: 9 Apr 2019, 15:25:23 UTC

Hey Everyone,

Sorry I didn't had a chance to reply last week. I was at a conference sharing some of the great new results we have. I am back now so I am going to start reading through this thread to see what I can do to fix this issue on my end. I saw that some people were suggesting changing the rpc_delay which I think we have set to 90 seconds.

I'll post again soon when I put a plan together.

Jake
ID: 68492 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 68493 - Posted: 9 Apr 2019, 16:01:48 UTC

Hey Everyone,

I see that the rpc_delay what not set to 90 seconds as I thought. I have updated this accordingly. Hopefully this will solve our issues.

Best,

Jake
ID: 68493 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Manfred Reiff
Avatar

Send message
Joined: 27 Apr 18
Posts: 11
Credit: 72,923,580
RAC: 0
Message 68494 - Posted: 9 Apr 2019, 16:44:16 UTC - in response to Message 68428.  

To me the increase of the feeder's shared memory size did not work. I still receive only 20 (!) workunits at any given time. I'm using an Intel Core i9-7900X with 20 processors. Increasing the amount of downloaded workunits (from 2 + 2 days to, for instance, 2 + 4 or 2 + 5 days) does not work. The maximum number of workunits will never exceed 20.
I can't see a parallel with the type of CPU used because I also use a computer working with an Intel Core i7-8700K (with 12 processors). That computer is getting much more workunits at any given time. At present there are 41 WUs waiting and I'll get new workunits for finished and uploaded WUs. That's a bit irritating...

Greetings from the summerly warm german midlands
Manfred
ID: 68494 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 68495 - Posted: 9 Apr 2019, 17:47:35 UTC

Hey Manfred,

Most of these solutions were implemented with GPUs in mind and are limited to GPUs. I'll take a look at the number of CPU workunits currently allowed and maybe try doubling it.

Best,
Jake
ID: 68495 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 68496 - Posted: 10 Apr 2019, 10:20:54 UTC
Last modified: 10 Apr 2019, 10:21:35 UTC

Looks like rpc_delay of 90 secs is a bit hard on the server? Since yesterday, my Event Log is showing a lot of these:

10/04/2019 12:09:37 | Project communication failed: attempting access to reference site
10/04/2019 12:09:37 | Milkyway@Home | Scheduler request failed: Failure when receiving data from the peer
10/04/2019 12:09:38 | Internet access OK - project servers may be temporarily down.

and these

10/04/2019 12:17:09 | Milkyway@Home | Scheduler request failed: Couldn't connect to server
10/04/2019 12:17:10 | Project communication failed: attempting access to reference site
10/04/2019 12:17:12 | Internet access OK - project servers may be temporarily down
ID: 68496 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : News : 30 Workunit Limit Per Request - Fix Implemented

©2024 Astroinformatics Group