Message boards :
Number crunching :
Validation Pending too many tasks
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 4 Feb 11 Posts: 86 Credit: 60,913,150 RAC: 0 |
I am wondering if something is wrong with either the transitioner or the validator. When sufficient numbers of successful task for a work unit are turned in, it is normally the job of the transitioner to send the work unit to the validator's queue. The validator can then declare the tasks as valid, invalid, or having an inconclusive validation where another task must be generated for the work unit so that the validator can try again with more results. Generally, inconclusive for most projects means that one of the tasks appears to be bad, so another task must be made to find and disqualify the bad result. This project uses that for either that or when it decides that it needs more tasks before validation can be performed. I have noticed that when the transitioner malfunctions in some projects, I have to wait for my result to hit its deadline. At the time that the deadline passes, something notices that enough tasks to attempt validation of the work unit have been returned, so the work unit is sent to the validator instead of having another task generated to replace a late task. I have noticed that the only tasks in my queue that are validating are the ones that have 3 or more tasks in the work unit. I currently have taken my machine off of BOINC duty, but that is because my apartment's air conditioner has failed and has nothing to do with this project's validation problem. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
On 11 Oct 2022 I posted this: And today, 16 Oct 2022, I have this: In progress (692) · Validation pending (7400) · Validation inconclusive (409) · Valid (4311) · Invalid (2) · Error (2) Total Workunits waiting for validation 4397896. 17 Oct 2022 for me: In progress (978) · Validation pending (8503) · Validation inconclusive (388) · Valid (3783) · Invalid (3) · Error (5) Very little progress being made at all!! Total Workunits waiting for validation 4922896 |
Send message Joined: 22 Jul 12 Posts: 11 Credit: 1,008,373 RAC: 0 |
I have three ready to report from last night that aren't going anyway. 50 with validation pending. |
Send message Joined: 22 Jul 12 Posts: 11 Credit: 1,008,373 RAC: 0 |
I have three ready to report from last night that aren't going anyway. 50 with validation pending. I just realized servers are down. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
mikey: I guess you are talking about Separation tasks ... |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
It seems like whatever the problem is it's affecting the Validator. If tasks were getting validation attempts they'd be getting marked as valid, invalid, or inconclusive. Instead they're stuck in pending. Upon quick look it seems like almost none of the pending tasks have "wing-man" tasks generated yet. Otherwise they'd show up as assigned (to a machine) or unsent. So the only thing that the Task Generator can do is generate new, _0, tasks. But that also doesn't seem to be working well, at least for Separation, as work there is hard to get. The workunit generators don't generate tasks (wingman tasks or initial tasks) if the WU pools have more tasks than they should. So when the nbody pool had like 100k tasks in it, any tasks that you sent back were essentially put on hold because the WU generator wouldn't make any wingman tasks for validation until the pool was cleared. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 227,128,361 RAC: 12,803 |
So it seems like the problems are unlikely to go away until after the migration to new hardware (and fixing the issues that might come with that) as the workunit pool overfill bug seems to be very persistent. The WU overfill bug still doesn't explain the growing validation queue, I don't think. When the task generator creates new tasks, why does it seem like no wingman/validation tasks are being created, just new, initial ones, given the large and ever-growing Waiting for Validation queue? Overfill or not, I don't see why the Validation queue is so large and growing. |
Send message Joined: 28 May 17 Posts: 76 Credit: 4,398,910,125 RAC: 128 |
So it seems like the problems are unlikely to go away until after the migration to new hardware (and fixing the issues that might come with that) as the workunit pool overfill bug seems to be very persistent. The bug is the reason why. When there are already too many tasks waiting to be sent, new tasks are not created. The new tasks are the wingman tasks that need to be created and they wont create those tasks until the WU backlog of tasks waiting to send go down enough so the work generators will start generating new work. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
mikey: yes |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
When the task generator creates new tasks, why does it seem like no wingman/validation tasks are being created, just new, initial ones, given the large and ever-growing Waiting for Validation queue? Overfill or not, I don't see why the Validation queue is so large and growing. Because in alot of cases, in the past anyway, no wingman task is needed and instead of cancelling them they just never get generated in the first place. The problem seems to be that when the Server decides it needs a wingman task it adds it to the end of the task list not a separate folder that would get cleared before more _0 tasks are sent out. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 227,128,361 RAC: 12,803 |
Skillz, mikey, There's no huge backlog like a few months ago that will take many weeks to clear. Current queues get cleared within a couple of days or so as the numbers are in thousands and tens of thousands instead of millions or tens of millions. So work generation still occurs just somewhat irregularly. With such a huge and growing validation queue, I'm wondering why the work that's being generated is not almost all wingman/validation work. Yes, as tasks get generated (new or resends) they go to the back of the queue but the queue gets cleared out every couple of days or so, thus I'd expect validation to be occurring regularly and not be piling up. If you look at your Validation Inconclusive tasks you'll notice that there's a task In Progress or Unsent. In Validation Pending - there's nothing like that so that makes me think that validation hasn't been attempted yet on those tasks. I could be missing something but I suspect that something may be up with the validator. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Skillz, mikey, That makes sense why it's the way it is then, thanks for seeing that. Alot of people have had 'ghost tasks' in the past and most projects get a few of them here and there, it will time out for me and then hopefully get sent out for real to someone else My current numbers are: In progress (957) · Validation pending (7855) · Validation inconclusive (484) · Valid (3588) · Invalid (2) · Error (5) Which is a few more errors than the other day and ALOT more Validation pending tasks, the Validation inconclusive tasks have also gone up but the Valid tasks is still about the same, you can see I have a couple hundred more tasks in progress and that's because Einstein is running out of their GRP#1 tasks until they get a new batch at the end of the month posted on 17 Oct: In progress (747) · Validation pending (4510) · Validation inconclusive (256) · Valid (3269) · Invalid (2) · Error (1) |
Send message Joined: 28 May 17 Posts: 76 Credit: 4,398,910,125 RAC: 128 |
Oh yes, you are right. I didn't realize the works waiting to send was so low. I just assumed it was like last time and the work ready to send was way up there. Something is wrong with the validator. It seems everytime the server restarts it starts working down the validation backlog, but then it stops and the validations just start to rise again. Whatever the server does when it first starts up and starts validating tasks isn't able to keep up with the amount of tasks being returned, so the few thousand it does gets over shadowed quickly in a day with the returns. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Skillz, mikey, That's a good thing If you look at your Validation Inconclusive tasks you'll notice that there's a task In Progress or Unsent. In Validation Pending - there's nothing like that so that makes me think that validation hasn't been attempted yet on those tasks. I could be missing something but I suspect that something may be up with the validator. The first part makes sense and why it's growing I guess, I agree that the validator seems to have a problem, now whether that's the memory problem Tom was talking about or something else I don't know. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
As far as I can tell the current problem with validation did not start until early September when the profile of Nbody jobs changed from a few minutes to an unspecified number of hours, even days, ever since then the total waiting for validation has escalated. All the Nbody jobs I did have been aborted at least twice before I got them. Whether they will ever validate I don’t know. The coincidence between increasing Nbody run times and the validation backlog surely needs checking out ? |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
As far as I can tell the current problem with validation did not start until early September when the profile of Nbody jobs changed from a few minutes to an unspecified number of hours, even days, ever since then the total waiting for validation has escalated. All the Nbody jobs I did have been aborted at least twice before I got them. Whether they will ever validate I don’t know. The coincidence between increasing Nbody run times and the validation backlog surely needs checking out ? I agree |
Send message Joined: 31 Oct 10 Posts: 15 Credit: 281,009,768 RAC: 2 |
I have stopped running for now, yesterday my count was Validation pending (4414) This morning it is at Validation pending (3550). Without me running any WU I will stop for the weekend and see what the count is Monday. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 227,128,361 RAC: 12,803 |
Validation is still happening just at a very slow pace so if one stops crunching a reduction in one's Validation Pending is to be expected. Even without stopping there are occasional temporary reductions. However, users stopping is likely to slow things down even more as there will be even less machines doing the little validation that is happening. Unfortunately, it seems unlikely that things will get fixed until after the server migration. I haven't ran N-Body much over the last few months. Is N-Body validation also a problem or is it just Separation? |
Send message Joined: 28 May 22 Posts: 17 Credit: 402,111,833 RAC: 0 |
Suddenly, current tasks I've just returned are showing Completed and validated ! Martin |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Suddenly, current tasks I've just returned are showing Completed and validated ! WOO HOO!!! |
©2024 Astroinformatics Group