Message boards :
Number crunching :
Validation Pending too many tasks
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Now it’s 3690194….not sure who is getting WU’s validated it certainly is not me. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Response times are back "in the cellar". |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
Cheer up folks! It's only gonna get worse! The dead time between separation GPU reloads is gone! Wohoo! I seem to be getting new work at about 125 or so left to finish. Noice, so to speak. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 226,871,939 RAC: 18,214 |
Really? I haven't seen it yet and just had to manually request tasks as the queue emptied out. Even with an empty queue I haven't been getting the max 300 tasks like before, just got 224 and even less before. |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,171,817 RAC: 46 |
I believe when you get sent over 200 _0 GPU tasks it doesn't help the situation of clearing the queue, In fact it makes it longer
Stop creating new tasks allow the queue(s) to clear Allow new tasks to be created to allow work to be validated |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
Really? I haven't seen it yet and just had to manually request tasks as the queue emptied out. Even with an empty queue I haven't been getting the max 300 tasks like before, just got 224 and even less before.Well krap. Just watched it count down to zero and had to tickle it to send more work. Earlier today I was characterizing the AMD GPU by changing the app config file. Maybe that is what did it. I must confess, I did go all happy feet over it. At least I got the 300 WUs. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I believe when you get sent over 200 _0 GPU tasks it doesn't help the situation of clearing the queue, In fact it makes it longer It would also help if they could put any resends as next to be sent out instead of at the end of the now very long queue, maybe put resends in their own folder and tell the Server to send those out before going back to the normal queue to send out tasks |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I believe when you get sent over 200 _0 GPU tasks it doesn't help the situation of clearing the queue, In fact it makes it longer It would also help if they could put any task with a non _0 as next to be sent out instead of at the end of the now very long queue, maybe put any task with a non _0 in it's name in their own folder and tell the Server to send those out before going back to the normal folder to send out tasks |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,171,817 RAC: 46 |
I believe when you get sent over 200 _0 GPU tasks it doesn't help the situation of clearing the queue, In fact it makes it longer Another way this could be achieved is anything higher than _0 is set with a shorter deadline, in theory this should push it to the front of your processing queue |
Send message Joined: 16 Mar 10 Posts: 211 Credit: 108,207,841 RAC: 5,151 |
There is an option to the feeder that allows for "priority" tasks to get precedence, but priority is apparently only assigned to work units that have overdue results (which, I think, includes "Not started by deadline" as well as "No Reply") so it wouldn't apply here.I believe when you get sent over 200 _0 GPU tasks it doesn't help the situation of clearing the queue, In fact it makes it longer However, there is also an option to use priority then work-unit id -- theoretically that would help clear out the older units first, so it looks like a sensible default for projects that don't need to prioritize new work[1]. If that's already engaged here, it doesn't seem to be the solution to clearing this backlog, although it ought to push out non _0 tasks as early as possible... [Edit - just seen Speedy51's post, which effectively points to the same place!] Cheers - Al. [1] In general, using adaptive replication (as at MilkyWay) might suggest that getting as many work units as possible processed as quickly as possible is the goal, in which case prioritizing retries that are not due to time-outs may not be appropriate (or necessary if there isn't already a huge backlog!...) |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
The waiting for validation backlog is over 4 Million and climbing. Simulation backlog seems to be around 14 days and Separation around 7. Would it be a good idea to stop generating new tasks until the backlog is reduced ? |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 226,871,939 RAC: 18,214 |
It seems like whatever the problem is it's affecting the Validator. If tasks were getting validation attempts they'd be getting marked as valid, invalid, or inconclusive. Instead they're stuck in pending. Upon quick look it seems like almost none of the pending tasks have "wing-man" tasks generated yet. Otherwise they'd show up as assigned (to a machine) or unsent. So the only thing that the Task Generator can do is generate new, _0, tasks. But that also doesn't seem to be working well, at least for Separation, as work there is hard to get. In general, tasks get assigned to users in the order they were created so we just need to keep crunching since when the validation starts working again, and second-attempt tasks start getting generated, they'd be going to the back of the queue and so to get to them we need to process everything in front of them. So I'd say that the best thing to do is just to keep crunching. Hopefully things can get resolved soon on the server side of it. Good thing is that there's a plan to replace the server, by the end of the year I believe, which should prevent the recurrence of significant problems the project has experienced this year. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
It seems like whatever the problem is it's affecting the Validator. If tasks were getting validation attempts they'd be getting marked as valid, invalid, or inconclusive. Instead they're stuck in pending. Upon quick look it seems like almost none of the pending tasks have "wing-man" tasks generated yet. Otherwise they'd show up as assigned (to a machine) or unsent. So the only thing that the Task Generator can do is generate new, _0, tasks. But that also doesn't seem to be working well, at least for Separation, as work there is hard to get. You say you are having trouble getting tasks and I see that you are doing gpu separation tasks, I'm doing both cpu and gpu separation tasks, not on the same machine, and am only having problems getting tasks when the website is non responsive. I have had to setup a backup cpu project when I can't get those tasks but I seem to be keeping the cache full enough gpu tasks for the few machines running those. I have a 1 day plus 1/2 day setup as my cache for each machine. |
Send message Joined: 22 Jul 12 Posts: 11 Credit: 1,008,373 RAC: 0 |
Putting this here so I'll remember what it is now: Workunits waiting for validation 3685233 Now Workunits waiting for validation 4329195 Higher than before but some have been validated. Backlog + new? |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Putting this here so I'll remember what it is now: Workunits waiting for validation 3685233 Now its 4379516....gone up again. Simulation tasks around 17 days wait for validation, Separation some (one or two) instant most around 7 days. Have emptied my cache now. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
On 11 Oct 2022 I posted this: In progress (747) · Validation pending (4510) · Validation inconclusive (256) · Valid (3269) · Invalid (2) · Error (1) And today, 16 Oct 2022, I have this: In progress (692) · Validation pending (7400) · Validation inconclusive (409) · Valid (4311) · Invalid (2) · Error (2) Workunits waiting for validation 4397896. So almost 11 days worth of tasks have gotten validated in 5 calendar days I am ONLY doing Separation tasks both cpu and gpu ones |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
I can only do CPU tasks as I h a an Intel GPU which not supported. Maybe GPU tasks are validated quicker as they will undoubtedly be shorter. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Hate to say it, but my N-Body Simulation tasks are doing fine ... |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 12 |
Here is mine State: All (125) · In progress (91) · Validation pending (0) · Validation inconclusive (6) · Valid (28) · Invalid (0) · Error (0) Application: All (133) · Milkyway@home N-Body Simulation (125) I am doing NBody which seems to be validating tasks from... 4 weeks ago Available here https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=milkyway@home&from=now-7d&to=now&viewPanel=3 The above link is for last 7 days, but you can increase or decrease the time range |
Send message Joined: 28 May 17 Posts: 76 Credit: 4,398,902,681 RAC: 15,487 |
I can only do CPU tasks as I h a an Intel GPU which not supported. Maybe GPU tasks are validated quicker as they will undoubtedly be shorter. Nope, I have over 200k GPU tasks waiting validation and that number just keeps increasing. I backed a few of my rigs off it to work on something else. Might end up taking them all off to work on something else until the problem starts to resolve itself. |
©2024 Astroinformatics Group