Message boards :
News :
Admin Updates Discussion
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author | Message |
---|---|
Send message Joined: 23 Dec 18 Posts: 23 Credit: 10,214,542 RAC: 102 |
As WUs stopped being validated and accredited like usual without any official explanation, I felt like reallocating resources towards another project rather than wasting power. This is coming from a volunteer not very versed in Boinc inner workings. We can speculate on the reason why no credits are being given, but I'm not convinced until we hear back from the developers. Everything stays But it still changes Ever so slightly Daily and nightly In little ways When everything stays... |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
... but that's nothing we can change by not crunching for Milkyway anyway, only project admin can fix it). You are welcome to do what you like but they are making a MILLION tasks for us to crunch, I don't think some people quitting crunching is something they are going to notice, with a MILLION tasks they most certainly have very long term goals. |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,975,727 RAC: 29,966 |
Link, interesting comments... The transitioner code I looked at has a backstop mechanism whereby in some code paths it sets a safety-net time for when it should next look at the workunit1. So it presumably doesn't matter whether the retry has been sent out or is still stuck at Waiting to be sent with no deadline - the transitioner will look at the workunit anyway and will act the same as it does when it sees certain sorts of non-success returns (where it seems to push a retry into the feeder at once rather than just queueing a request..)An example from elsewhere... At the time of writing, WCG seems to be having problems getting retries issued in some circumstances; eventually the transitioner seems to notice there has been no activity and the retry tasks get sent out. It takes 6 days for that to happen, which just happens to be the deadline length...That happens however at the deadline of any of the completed tasks, not the new ones, they don't have a deadline yet, they get it when they are sent out. When that happens, my guess is, that the validator will put them back to the inconclusive state. You may be right about how that happened; it makes more sense than the validator doing it :-), and would suggest that the "Is there any point in only sending out one initial task?" issue remains (i.e. the first result is extremely unlikely to go valid at once, rather than Inconclusive)2...alanb1951 wrote:Both results are different, so actually they are inconclusive. I don't think the validator marked them as valid, it was more likely Kevin trying to get rid of all separation tasks by marking all inconclusive results as valid in the hope they will be purged from the database after that. Well, they are still there, more than 48 hour after becoming valid, so that didn't work I guess.It seems that the MilkyWay validator can mark the first result Valid without calling it the canonical result (and hence not awarding a credit score or invoking the assimilator!)Here is a workunit which even has two completed-and-validated 0.00-credit results at the moment, plus one task in progress: 963764114 However, patching like that wouldn't work to get rid of the orphaned Separation results because the related workunits aren't there any longer and [as far as I can tell] the purge system operates on WUs, not individual results :-) -- I fear that the only way to be rid of them is to explicitly hack out all traces of results that have very high result-IDs and which don't have a valid workunit-ID value, and that's not a task I'd want to do without shutting down all BOINC activity for as long as it takes to do a full backup and the "hack" (which sounds familiar from comments back when Separation was shut down .) If the strange valid tags were the result of a database hack (either explicit or using something in the Admin toolkit [which would be broken if it did that!], it will be interesting to see what happens when the third result comes in :-) It shouldn't struggle to pick a canonical result, but... Cheers - Al P.S. I hope we aren't "talking past one another"... 1 One situation where this happens is if a retry is requested when there are no other tasks still out in the field for a workunit; if there are tasks out there, it usually seems to leave the existing "next look" time in place (and there are other cases that will keep a shorter wait time on record, if I recall correctly...) 2 That said, I don't know whether turning off the BOINC Adaptive Replication status for the application might break the TAO logic. |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,362,373 RAC: 3,550 |
However, patching like that wouldn't work to get rid of the orphaned Separation results because the related workunits aren't there any longerSure they are, just click on any separation WU and you get a list of tasks for that WU. So they are still in the database incl. all results (they are in std_err, not separate files), the corresponding IDs are valid (why shouldn't they, the ID becomes invalid when the WU is purged from db), but they are not purged for the same reason as the N-Body WUs "validated" by Kevin: no canonical result. This WU for example can be purged, but not all those without a canonical result. If the strange valid tags were the result of a database hack (either explicit or using something in the Admin toolkit [which would be broken if it did that!], it will be interesting to see what happens when the third result comes in :-) It shouldn't struggle to pick a canonical result, but...You mean for Separation or N-Body? Separation will be stuck in waiting for validation, while the validator for N-Body will simply do it's job (unless something is completely broken because of the hack/forced validation). |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,975,727 RAC: 29,966 |
Link -- sorry if my wording wasn't clear enough; I'll try to clarify... I should perhaps have defined "orphan"... My remark was about Separation tasks that have low task numbers and workunit numbers such as 2141411706 -- good luck finding anything other than "Unable to handle request: can't find workunit" in those cases! :-) I didn't regard Separation tasks from after the mass WU/task renumbering that was needed in early 2021 as orphaned; their [parent] WUs are usually still present! (Most of my left-over Separation tasks are from 2021!)However, patching like that wouldn't work to get rid of the orphaned Separation results because the related workunits aren't there any longerSure they are, just click on any separation WU and you get a list of tasks for that WU. So they are still in the database incl. all results (they are in std_err, not separate files), the corresponding IDs are valid (why shouldn't they, the ID becomes invalid when the WU is purged from db), but they are not purged for the same reason as the N-Body WUs "validated" by Kevin: no canonical result. This WU for example can be purged, but not all those without a canonical result. In this case I was talking about NBody; sorry if that wasn't clear from the context :-)If the strange valid tags were the result of a database hack (either explicit or using something in the Admin toolkit [which would be broken if it did that!], it will be interesting to see what happens when the third result comes in :-) It shouldn't struggle to pick a canonical result, but...You mean for Separation or N-Body? Separation will be stuck in waiting for validation, while the validator for N-Body will simply do it's job (unless something is completely broken because of the hack/forced validation). Cheers - Al. |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,362,373 RAC: 3,550 |
My remark was about Separation tasks that have low task numbers and workunit numbers such as 2141411706 -- good luck finding anything other than "Unable to handle request: can't find workunit" in those cases! :-) I didn't regard Separation tasks from after the mass WU/task renumbering that was needed in early 2021 as orphaned; their [parent] WUs are usually still present! (Most of my left-over Separation tasks are from 2021!)Ah, OK, I wasn't crunching Separation 2021, so don't have those in my list, only what was left when they "finished" it. Anyway, now they will have to remove all Separation WUs manually, if they can't "simply" find and delete Separation WUs and results, AFAICT anything with WU number 953xxxxxx and task number 912xxxxxx (and below) can be deleted, we are now at WU number 96xxxxxxx and task number 93xxxxxxx. Well, anything below those and than any 10-digit WU numbers, since the example you posted is a 10-digit number and we are at 9-digit numbers. Now that I see those numbers I remember they had to start over with counting as they have reached the 2^31 limit. Perhaps BOINC needs to be updated to 64-bit. ;-) |
Send message Joined: 23 Dec 18 Posts: 23 Credit: 10,214,542 RAC: 102 |
Seems like the previously marked valid but zero-credit WUs have been moved to validation inconclusive again. Is anyone else seeing this? Hopefully now, this means we will get those validation WUs sent properly now. Everything stays But it still changes Ever so slightly Daily and nightly In little ways When everything stays... |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,362,373 RAC: 3,550 |
Yes, same here. Probably the mechanism alanb1951 was talking about kicked in. alanb1951 wrote: The transitioner code I looked at has a backstop mechanism whereby in some code paths it sets a safety-net time for when it should next look at the workunit1. So it presumably doesn't matter whether the retry has been sent out or is still stuck at Waiting to be sent with no deadline - the transitioner will look at the workunit anyway and will act the same as it does when it sees certain sorts of non-success returns |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,362,373 RAC: 3,550 |
Kevin Roux wrote: QuestionNo warnings here, but they were gone for me since a while. |
Send message Joined: 4 Jul 09 Posts: 99 Credit: 17,434,413 RAC: 2,338 |
The number of Tasks ready to send is still increasing.... so the correction to fix the over supply is probably not working quite as intended. I may be wrong but the number of tasks actually being in progress seems smaller than I think it it should be. Are Tasks nor being released and assigned quite right ? Thanks Bill F In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic; There was no expiration date. |
Send message Joined: 1 Jan 17 Posts: 39 Credit: 113,155,484 RAC: 40,539 |
@Kevin Roux, thanks for your continuous work on fixes and improvements! Bill F wrote: The number of Tasks ready to send is still increasing.... so the correction to fix the over supply is probably not working quite as intended.I'd say it is stagnating. Bill F wrote: I may be wrong but the number of tasks actually being in progress seems smaller than I think it it should be. Are Tasks nor being released and assigned quite right ?This figure is fluctuating around a somewhat constant level. Have a look for yourself: server_stats.php history of the past 30 days -- https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=milkyway@home&from=now-30d&to=now |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,362,373 RAC: 3,550 |
This figure is fluctuating around a somewhat constant level.Yes, and it should start to drop once we are through the pile of _0 tasks and start processing the resends. Until than (on average) we report one task, we get a replacement, and for the reported task a resend task is created, so the amount of ready to send tasks is pretty constant. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
This figure is fluctuating around a somewhat constant level.Yes, and it should start to drop once we are through the pile of _0 tasks and start processing the resends. Until than (on average) we report one task, we get a replacement, and for the reported task a resend task is created, so the amount of ready to send tasks is pretty constant. Personally I wish they could insert the _1 task at the beginning of the list so people aren't waiting as long, either that or just generate both the _0 and _1 task at the same time and then delay the _1 task by 1 or 2 days in case it's not needed. Then if it's not needed after it's sent out just delete it from the Server side so that it deletes it from us crunchers too. |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,362,373 RAC: 3,550 |
Once we processed that huge pile of WUs and are back to the new buffer of 10000 ready to send workunits, than this type of micromanagement won't be necessary. Perhaps we can even return to 1000, AFAICT there were no issues with that. |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,225 RAC: 0 |
Seems like something helped the server. Image source is in the header of the image or is available here: https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=milkyway@home&from=now-7d&to=now |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,362,373 RAC: 3,550 |
All my Separation tasks are gone. *thumbsup* |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
All my Separation tasks are gone. *thumbsup* mine too WOO HOO!!! |
Send message Joined: 18 Feb 10 Posts: 57 Credit: 222,646,444 RAC: 5,838 |
All my Separation tasks are gone. *thumbsup* They're not gone from my 2 hosts, maybe it will happen soon... |
Send message Joined: 19 Jul 10 Posts: 627 Credit: 19,362,373 RAC: 3,550 |
They're not gone from my 2 hosts, maybe it will happen soon...Those are from 2021, looks like Kevin will have to look at those separately, the tasks exist, but not the WUs, so it might be a bit more complicated. |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,975,727 RAC: 29,966 |
There is a fairly simple script for SysAdmins in the source repository that looks as if it would do the trick if it is present on the MW site.They're not gone from my 2 hosts, maybe it will happen soon...Those are from 2021, looks like Kevin will have to look at those separately, the tasks exist, but not the WUs, so it might be a bit more complicated. Its name is delete_orphan_results.php :-) Cheers - Al. |
©2024 Astroinformatics Group