Message boards :
News :
30 Workunit Limit Per Request - Fix Implemented
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author | Message |
---|---|
Send message Joined: 2 Aug 11 Posts: 13 Credit: 44,453,057 RAC: 0 |
P.S. Actually <next_rpc_delay> is NOT a server side delay. It has other name (forgot it). <next_rpc_delay> is also a client side delay, but this time not a min delay (do not contact server until this time pass) but max delay (DO contact server after this time pass even there is no any need for it - eg. nothing to report and no need to ask new for new task). Is it really needed? This option force ALL attached client to contact server every 10 mins even if client does not actually work for the project currently. For example here is log snippet from one of my computers where MW set to backoff project(low priority): .................................... 01/04/2019 15:54:30 | Milkyway@Home | Sending scheduler request: Requested by project. 01/04/2019 15:54:30 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full) 01/04/2019 15:54:32 | Milkyway@Home | Scheduler request completed 01/04/2019 16:04:37 | Milkyway@Home | Sending scheduler request: Requested by project. 01/04/2019 16:04:37 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full) 01/04/2019 16:04:40 | Milkyway@Home | Scheduler request completed 01/04/2019 16:14:44 | Milkyway@Home | Sending scheduler request: Requested by project. 01/04/2019 16:14:44 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full) 01/04/2019 16:14:46 | Milkyway@Home | Scheduler request completed 01/04/2019 16:24:52 | Milkyway@Home | Sending scheduler request: Requested by project. 01/04/2019 16:24:52 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full) 01/04/2019 16:24:54 | Milkyway@Home | Scheduler request completed 01/04/2019 16:34:56 | Milkyway@Home | Sending scheduler request: Requested by project. 01/04/2019 16:34:56 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full) 01/04/2019 16:34:58 | Milkyway@Home | Scheduler request completed 01/04/2019 16:45:03 | Milkyway@Home | Sending scheduler request: Requested by project. 01/04/2019 16:45:03 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full) 01/04/2019 16:45:05 | Milkyway@Home | Scheduler request completed 01/04/2019 16:55:06 | Milkyway@Home | Sending scheduler request: Requested by project. 01/04/2019 16:55:06 | Milkyway@Home | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: job cache full) 01/04/2019 16:55:09 | Milkyway@Home | Scheduler request completed 01/04/2019 17:05:14 | Milkyway@Home | Sending scheduler request: Requested by project. ...........................................and so on every ~10 min It keeps hammering server with useless requests. Usually it is useful only for specific purposes like canceling WU in progress from the server side. Server can not contact client directly so instead it ask client to "check in" every X min/hours for possible new instructions. But even for this case usually few hours is enough. Every 10 min is a overkill. |
Send message Joined: 29 Apr 17 Posts: 33 Credit: 7,041,502,264 RAC: 0 |
I'm having this issue, and my two Titan Vs sit idle for most of the day waiting for a task-batch download. Each task completes in less than 1 min, I run 12 tasks at a time (6 per GPU). 200 tasks complete in ~ 17 min. WE really need a fix to this which accounts for GPUs that are not DP crippled. ;) |
Send message Joined: 28 Mar 18 Posts: 14 Credit: 761,475,797 RAC: 0 |
2 GPUs should give you 400 tasks. Run 1 task per GPU and setup 6 instances. That still give you 12 tasks at a time. Add another GPU to your coproc.xml and lock it. Set cache to 10/10 and you'll get 600 tasks per instance. Make a tickler for every 5 minutes, that way when 600 tasks run out, the machine still get work after a few minutes idle. I know it's not a long term solution, but help Jake out. Give him time to pin point the problem. There are many ways to get around this. My VII maybe losing about 80k points per day with this issue ... not really a big deal. |
Send message Joined: 29 Apr 17 Posts: 33 Credit: 7,041,502,264 RAC: 0 |
Thank you for the reply. I'll try jerry-rigging something when I get a chance. |
Send message Joined: 9 Jul 17 Posts: 100 Credit: 16,967,906 RAC: 0 |
I don't know if this is the right thread or not, but since no one seems to know what the problem is, I will try it. I attached my RX 570 (Win7 64-bit) at 10 AM and got 74 work units. The take 1 minute 49 seconds to run. Then after a couple of hours I got another 72, and then a few hours therafter 71 work units. Then, at 5 PM I ran out and got nothing. But a manual request caused another 71 to download. If they run out again, it is back to Folding. I can't stay up all night to get them. |
Send message Joined: 9 Jul 17 Posts: 100 Credit: 16,967,906 RAC: 0 |
Then, at 5 PM I ran out and got nothing. But a manual request caused another 71 to download. After the last work unit finishes and it gets zero on the next request, BOINC waits 10 minutes and tries again. Then, it gets a full load (85 on the last request). It is a strange problem, but not a major one for me. |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,300,007 RAC: 2,492 |
@ all having issue to get WUs here because of that 10 minutes thing... Have you tried to set your cache to: "Store at least 0.01 days of work" and whatever else additional and then set "network activity based on preferences" and not "network activity always"? This should actually limit scheduler requests to once in 14.4 minutes... I think. I don't have a Milkyway compatible GPU to try that. Just a thought until it's fixed on the server... |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,470,547 RAC: 38,476 |
200 computed tasks in less 10 minutes? It is not possible even for fastest machines. Very best computers with few modern powerful GPUs working in parallel dedicated to the single project of MW can do 200 tasks "only" in ~20-40 min. Beg to differ. If I have a host with 8 RTX 2080 TI cards or similar, I can easily crunch through 200 tasks in ten minutes. There are many hosts with mining rig pedigrees that have multiple gpus. I have a minimum of 3 cards in every host. |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
200 computed tasks in less 10 minutes? It is not possible even for fastest machines. Very best computers with few modern powerful GPUs working in parallel dedicated to the single project of MW can do 200 tasks "only" in ~20-40 min. Now that's an XtremeSystem! Just like our team likes :) |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,008,062,758 RAC: 2,245 |
200 computed tasks in less 10 minutes? It is not possible even for fastest machines. Very best computers with few modern powerful GPUs working in parallel dedicated to the single project of MW can do 200 tasks "only" in ~20-40 min. Then your task count is higher with more cards. 200 is the limit for 1 GPU and the statement was in regards to 1 single GPU. Only a TV or 7 is crunching in that time with 200 tasks per GPU. It seems like its hard enough to get the admins to realize the issue wasn't how many task can be downloaded at once but the timeout issue completely preventing tasks from downloading at all. Please stay on topic instead of e-peening about omg my gpus can do it in 10minutes. |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
8x 2080ti = 1x Radeon VII for MilkyWay lol (Sorry, couldn't resist) |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
... Please stay on topic instead of e-peening about omg my gpus can do it in 10minutes. THIS last sentence of yours is way off .... You are missing the point. Have a nice day! (just had to say it) |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,470,547 RAC: 38,476 |
and the statement was in regards to 1 single GPU That WAS NOT apparent from just your post that no computer could crunch through 200 tasks in 10 minutes. |
Send message Joined: 7 May 14 Posts: 57 Credit: 206,540,646 RAC: 5 |
keen observer of the skies and this forum, enlighten me with more stats of the rtx 2080 ti owners please, seriously , with max out instances on only single rtx2080ti , how many WU's can you peel please |
Send message Joined: 11 Jul 17 Posts: 20 Credit: 1,429,841,456 RAC: 0 |
I never get enough MW WUs. I can run 200 at a time. Some computers don't get any while some others get a steady supply. They all have the same app_config.xml. Server Status shows 10,000 WUs ready but I have computers that haven't gotten any in forever. Is there some kind of governor on this project??? I can't find anything to explain the difference. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, Sorry I didn't had a chance to reply last week. I was at a conference sharing some of the great new results we have. I am back now so I am going to start reading through this thread to see what I can do to fix this issue on my end. I saw that some people were suggesting changing the rpc_delay which I think we have set to 90 seconds. I'll post again soon when I put a plan together. Jake |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, I see that the rpc_delay what not set to 90 seconds as I thought. I have updated this accordingly. Hopefully this will solve our issues. Best, Jake |
Send message Joined: 27 Apr 18 Posts: 11 Credit: 72,923,580 RAC: 0 |
To me the increase of the feeder's shared memory size did not work. I still receive only 20 (!) workunits at any given time. I'm using an Intel Core i9-7900X with 20 processors. Increasing the amount of downloaded workunits (from 2 + 2 days to, for instance, 2 + 4 or 2 + 5 days) does not work. The maximum number of workunits will never exceed 20. I can't see a parallel with the type of CPU used because I also use a computer working with an Intel Core i7-8700K (with 12 processors). That computer is getting much more workunits at any given time. At present there are 41 WUs waiting and I'll get new workunits for finished and uploaded WUs. That's a bit irritating... Greetings from the summerly warm german midlands Manfred |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Manfred, Most of these solutions were implemented with GPUs in mind and are limited to GPUs. I'll take a look at the number of CPU workunits currently allowed and maybe try doubling it. Best, Jake |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Looks like rpc_delay of 90 secs is a bit hard on the server? Since yesterday, my Event Log is showing a lot of these: 10/04/2019 12:09:37 | Project communication failed: attempting access to reference site 10/04/2019 12:09:37 | Milkyway@Home | Scheduler request failed: Failure when receiving data from the peer 10/04/2019 12:09:38 | Internet access OK - project servers may be temporarily down. and these 10/04/2019 12:17:09 | Milkyway@Home | Scheduler request failed: Couldn't connect to server 10/04/2019 12:17:10 | Project communication failed: attempting access to reference site 10/04/2019 12:17:12 | Internet access OK - project servers may be temporarily down |
©2024 Astroinformatics Group