Message boards :
News :
Admin Updates Discussion
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Aug 22 Posts: 82 Credit: 2,839,211 RAC: 6,824 |
|
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,294,135 RAC: 2,403 |
Cancelled all of the "Ready to send" separation workunitsI guess you need to cancel separation tasks "waiting for assimilation" now, so they can finally be removed from our results lists. Regarding one other change that appeared after the server maintenance: is ~3 million ready to send N-Body tasks the new target for the work generators? With that it will take up to two weeks before the _1 and any additional resend tasks make it through that pile (when we had 1.5 millions they needed around 5-7 days). And since N-Body seem to always need two results to validate, wouldn't in make sense to set minimum quorum and with that initial replication to 2 ? Or are there any WUs, that validate with only one result? |
Send message Joined: 23 Dec 18 Posts: 23 Credit: 10,213,119 RAC: 0 |
I haven't had any tasks validated for the past two days now with 80+ validation inconclusive because of that. Maybe we should prioritize sending WUs that need validation? Everything stays But it still changes Ever so slightly Daily and nightly In little ways When everything stays... |
Send message Joined: 9 Aug 22 Posts: 82 Credit: 2,839,211 RAC: 6,824 |
I guess you need to cancel separation tasks "waiting for assimilation" now, so they can finally be removed from our results lists. I am not sure why 3 million workunits were generated. The cap was set pretty low to 1000 (now 10,000) but it just ignored that which I still haven't found the reason why. I will try to go in and remove/cancel these workunits so validations can be done. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I guess you need to cancel separation tasks "waiting for assimilation" now, so they can finally be removed from our results lists. I am not sure why 3 million workunits were generated. The cap was set pretty low to 1000 (now 10,000) but it just ignored that which I still haven't found the reason why. I will try to go in and remove/cancel these workunits so validations can be done.[/quote] I obvioussly have no clue how intimate you are with the Boinc Server side code but apparently things are in SEVERAL places instead of just one in the coding. ie one admin at a different project tried to change the credits given out for a task and found they were hard coded in at least 3 different sections, I'm NOT asking for a credit award change I'm just using it as an example. Also don't know if you know it but there is a Boinc Admin email group that you can ask questions from other Boinc Project Admins that may have already been thru what you are, assuming now, still learning about. |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,294,135 RAC: 2,403 |
I am not sure why 3 million workunits were generated. The cap was set pretty low to 1000 (now 10,000) but it just ignored that which I still haven't found the reason why.It happened in the past after server maintenance, that during the maintenance lots of N-Body tasks were generated, but than it dropped to 1000 as the tasks were processed. This time however the work generators maintain the 3 millions of ready to send WUs, that's why I asked. 1000 worked pretty well actually AFAICT, enough to always get new work when asking for it while resend tasks were out just few minutes after they were created, but 10,000 should work too I guess. If you can get the work generators to follow the limit, the issue will clear itself, no need to abort anything. |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,358,235 RAC: 4,774 |
I am not sure why 3 million workunits were generated. The cap was set pretty low to 1000 (now 10,000) but it just ignored that which I still haven't found the reason why.It happened in the past after server maintenance, that during the maintenance lots of N-Body tasks were generated, but than it dropped to 1000 as the tasks were processed. This time however the work generators maintain the 3 millions of ready to send WUs, that's why I asked. 1000 worked pretty well actually AFAICT, enough to always get new work when asking for it while resend tasks were out just few minutes after they were created, but 10,000 should work too I guess. If you can get the work generators to follow the limit, the issue will clear itself, no need to abort anything. I think there may be an issue in the base MilkyWay WU generator that could cause runaway WU creation if there was a transitioner backlog. If the new build is using the original generator any fixes applied might have been lost :-( I did a bit of a code dive at the time of the previous manifestation of this issue, and I posted about it (without going into too much technical detail) in a thread called Server Trouble. I also sent Tom a private message with details about a possible solution based on how recent examples of the example BOINC WU generator code made sure that transitioner backlogs would not cause a problem. I have no idea whether any fix applied bore any resemblance to what I highlighted :-) Just a thought... Cheers - Al. |
Send message Joined: 2 Aug 20 Posts: 1 Credit: 14,835,833 RAC: 51,537 |
It seems that all of my pending jobs have now been marked as validated with no credit - is this a temporary problem? (ie https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=963887023) |
Send message Joined: 9 Aug 22 Posts: 82 Credit: 2,839,211 RAC: 6,824 |
It seems that all of my pending jobs have now been marked as validated with no credit - is this a temporary problem? (ie https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=963887023) I will take a look at that. |
Send message Joined: 27 Jun 14 Posts: 3 Credit: 3,445,901 RAC: 0 |
same issue here all of yesterday and today WUs validated with 0.00 credits |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,294,135 RAC: 2,403 |
Same here. Interestingly the resend tasks waiting in the queue have not been canceled, they are still "unsent" and not "didn't need" as usual when a WU validates late after a resend task has been created but before it was sent out. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
same issue here all of yesterday and today WUs validated with 0.00 credits Mine too but there is a new thing I THINK, it now says "initial replication 2" name de_nbody_11_02_2023_v183_pal5__data__3_1705426859_64425 application Milkyway@home N-Body Simulation created 16 Jan 2024, 17:51:23 UTC minimum quorum 1 initial replication 2 932928572 857711 19 Jan 2024, 7:50:31 UTC 19 Jan 2024, 18:27:19 UTC Completed and validated 24,333.81 45,806.73 0.00 Milkyway@home N-Body Simulation v1.83 (mt) windows_x86_64 |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,294,135 RAC: 2,403 |
Mine too but there is a new thing I THINK, it now says "initial replication 2"No, that's not new, for some reason "initial replication" always increases by one here when a returned result becomes inconclusive, for the old separation tasks you can find even initial replication of 3 if already two results were returned. WUs for which the first result has not been returned yet still have initial replication of 1. |
Send message Joined: 23 Dec 18 Posts: 23 Credit: 10,213,119 RAC: 0 |
Well i'll probably stop crunching for a bit till this gets sorted Everything stays But it still changes Ever so slightly Daily and nightly In little ways When everything stays... |
Send message Joined: 27 Jun 14 Posts: 3 Credit: 3,445,901 RAC: 0 |
Well i'll probably stop crunching for a bit till this gets sorted Agreed. We're donating, not squandering. Especially with the current cost of electricity |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,358,235 RAC: 4,774 |
Regarding validated results without credit... I've just sifted through [most of] my NBody results that are listed as Valid and I note that the ones with no credit don't seem to have a canonical result yet (the validator could pass them to the assimilator if they did! and they all have an unsent task or a wingman in progress. I also have tasks Pending Verification (listed as Validation inconclusive here), and those all have an unsent task and say "pending" in the credit column instead (as one might expect...) It seems that the MilkyWay validator can mark the first result Valid without calling it the canonical result (and hence not awarding a credit score or invoking the assimilator!) Given the use of the Toolkit for Asynchronous Optimization, this may actually be intentional1. I think that when the previous runaway WU generation problems happened the Adaptive Replication wasn't working properly for NBody (I never saw a task that didn't get wingmen!) so it may not have shown the same behaviour back then, leaving the first result Pending rather than Valid! It may well sort itself out as tasks get their confirmation results validated... I have just seen two results I returned nearly a fortnight ago that eventually got a wingman to return something on the 20th; both those have a credit score now and they hadn't been fully assimilated (purged) yet. How long it takes the others to catch up may depend on other aspects such as how many other WUs have tasks (initial or retry) waiting to go out, and the choice of WUs considered by the feeder on each pass; hopefully it won't take the length of the deadline interval (12 days?) to get round to forcing the retries into the feeder2 but I fear that it might :-( Cheers - Al. 1 The TAO puts an extra layer of decision into the validation process, and [from a cursory investigation during the previous crash and runaway] it seems it can decide whether retries are necessary or not based on the outcome of previous workunits. (I'm willing to be told otherwise by someone at MW or by Travis Desell...) 2 An example from elsewhere... At the time of writing, WCG seems to be having problems getting retries issued in some circumstances; eventually the transitioner seems to notice there has been no activity and the retry tasks get sent out. It takes 6 days for that to happen, which just happens to be the deadline length... |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,294,135 RAC: 2,403 |
Well i'll probably stop crunching for a bit till this gets sortedProbably not necessary IMHO, results returned now seem to become inconclusive as usual, so they should validate one day, the more we crunch, the sooner this will happen. The _1+ tasks needed proper validation of the 0 credits tasks have also not been canceled after the "validation", so that should sort out itself as well once they be sent out and returned (unless the servers will decide they are not needed before sending them out, but that's nothing we can change by not crunching for Milkyway anyway, only project admin can fix it). |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,019,624 RAC: 38,592 |
alanb1951 wrote: It seems that the MilkyWay validator can mark the first result Valid without calling it the canonical result (and hence not awarding a credit score or invoking the assimilator!)Here is a workunit which even has two completed-and-validated 0.00-credit results at the moment, plus one task in progress: 963764114 PS, here is a timeline of various server_status.php data, if it helps in any way: https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=milkyway@home&from=1704585600000&to=now |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,294,135 RAC: 2,403 |
An example from elsewhere... At the time of writing, WCG seems to be having problems getting retries issued in some circumstances; eventually the transitioner seems to notice there has been no activity and the retry tasks get sent out. It takes 6 days for that to happen, which just happens to be the deadline length...That happens however at the deadline of any of the completed tasks, not the new ones, they don't have a deadline yet, they get it when they are sent out. When that happens, my guess is, that the validator will put them back to the inconclusive state. alanb1951 wrote:Both results are different, so actually they are inconclusive. I don't think the validator marked them as valid, it was more likely Kevin trying to get rid of all separation tasks by marking all inconclusive results as valid in the hope they will be purged from the database after that. Well, they are still there, more than 48 hour after becoming valid, so that didn't work I guess.It seems that the MilkyWay validator can mark the first result Valid without calling it the canonical result (and hence not awarding a credit score or invoking the assimilator!)Here is a workunit which even has two completed-and-validated 0.00-credit results at the moment, plus one task in progress: 963764114 |
Send message Joined: 27 Jun 14 Posts: 3 Credit: 3,445,901 RAC: 0 |
... but that's nothing we can change by not crunching for Milkyway anyway, only project admin can fix it). Well, by not crunching we can express our disappointment (and motivate project admin to fix the issue). |
©2024 Astroinformatics Group