Welcome to MilkyWay@home

Admin Updates Discussion

Message boards : News : Admin Updates Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

AuthorMessage
Profile Finn the Human
Avatar

Send message
Joined: 23 Dec 18
Posts: 23
Credit: 10,211,388
RAC: 768
Message 76787 - Posted: 21 Jan 2024, 21:14:52 UTC

As WUs stopped being validated and accredited like usual without any official explanation, I felt like reallocating resources towards another project rather than wasting power. This is coming from a volunteer not very versed in Boinc inner workings. We can speculate on the reason why no credits are being given, but I'm not convinced until we hear back from the developers.
Everything stays
But it still changes
Ever so slightly
Daily and nightly
In little ways
When everything stays...

ID: 76787 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3322
Credit: 520,666,371
RAC: 33,249
Message 76790 - Posted: 21 Jan 2024, 22:19:30 UTC - in response to Message 76786.  

... but that's nothing we can change by not crunching for Milkyway anyway, only project admin can fix it).


Well, by not crunching we can express our disappointment (and motivate project admin to fix the issue).


You are welcome to do what you like but they are making a MILLION tasks for us to crunch, I don't think some people quitting crunching is something they are going to notice, with a MILLION tasks they most certainly have very long term goals.
ID: 76790 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 210
Credit: 106,132,063
RAC: 23,923
Message 76791 - Posted: 22 Jan 2024, 8:23:02 UTC - in response to Message 76785.  
Last modified: 22 Jan 2024, 8:23:47 UTC

Link, interesting comments...

An example from elsewhere... At the time of writing, WCG seems to be having problems getting retries issued in some circumstances; eventually the transitioner seems to notice there has been no activity and the retry tasks get sent out. It takes 6 days for that to happen, which just happens to be the deadline length...
That happens however at the deadline of any of the completed tasks, not the new ones, they don't have a deadline yet, they get it when they are sent out. When that happens, my guess is, that the validator will put them back to the inconclusive state.
The transitioner code I looked at has a backstop mechanism whereby in some code paths it sets a safety-net time for when it should next look at the workunit1. So it presumably doesn't matter whether the retry has been sent out or is still stuck at Waiting to be sent with no deadline - the transitioner will look at the workunit anyway and will act the same as it does when it sees certain sorts of non-success returns (where it seems to push a retry into the feeder at once rather than just queueing a request..)

alanb1951 wrote:
It seems that the MilkyWay validator can mark the first result Valid without calling it the canonical result (and hence not awarding a credit score or invoking the assimilator!)
Here is a workunit which even has two completed-and-validated 0.00-credit results at the moment, plus one task in progress: 963764114
Both results are different, so actually they are inconclusive. I don't think the validator marked them as valid, it was more likely Kevin trying to get rid of all separation tasks by marking all inconclusive results as valid in the hope they will be purged from the database after that. Well, they are still there, more than 48 hour after becoming valid, so that didn't work I guess.
You may be right about how that happened; it makes more sense than the validator doing it :-), and would suggest that the "Is there any point in only sending out one initial task?" issue remains (i.e. the first result is extremely unlikely to go valid at once, rather than Inconclusive)2...

However, patching like that wouldn't work to get rid of the orphaned Separation results because the related workunits aren't there any longer and [as far as I can tell] the purge system operates on WUs, not individual results :-) -- I fear that the only way to be rid of them is to explicitly hack out all traces of results that have very high result-IDs and which don't have a valid workunit-ID value, and that's not a task I'd want to do without shutting down all BOINC activity for as long as it takes to do a full backup and the "hack" (which sounds familiar from comments back when Separation was shut down .)

If the strange valid tags were the result of a database hack (either explicit or using something in the Admin toolkit [which would be broken if it did that!], it will be interesting to see what happens when the third result comes in :-) It shouldn't struggle to pick a canonical result, but...

Cheers - Al

P.S. I hope we aren't "talking past one another"...

1 One situation where this happens is if a retry is requested when there are no other tasks still out in the field for a workunit; if there are tasks out there, it usually seems to leave the existing "next look" time in place (and there are other cases that will keep a shorter wait time on record, if I recall correctly...)

2 That said, I don't know whether turning off the BOINC Adaptive Replication status for the application might break the TAO logic.
ID: 76791 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,980,267
RAC: 5,797
Message 76793 - Posted: 22 Jan 2024, 9:34:48 UTC - in response to Message 76791.  

However, patching like that wouldn't work to get rid of the orphaned Separation results because the related workunits aren't there any longer
Sure they are, just click on any separation WU and you get a list of tasks for that WU. So they are still in the database incl. all results (they are in std_err, not separate files), the corresponding IDs are valid (why shouldn't they, the ID becomes invalid when the WU is purged from db), but they are not purged for the same reason as the N-Body WUs "validated" by Kevin: no canonical result. This WU for example can be purged, but not all those without a canonical result.


If the strange valid tags were the result of a database hack (either explicit or using something in the Admin toolkit [which would be broken if it did that!], it will be interesting to see what happens when the third result comes in :-) It shouldn't struggle to pick a canonical result, but...
You mean for Separation or N-Body? Separation will be stuck in waiting for validation, while the validator for N-Body will simply do it's job (unless something is completely broken because of the hack/forced validation).
ID: 76793 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 210
Credit: 106,132,063
RAC: 23,923
Message 76795 - Posted: 22 Jan 2024, 10:30:38 UTC - in response to Message 76793.  

Link -- sorry if my wording wasn't clear enough; I'll try to clarify...

However, patching like that wouldn't work to get rid of the orphaned Separation results because the related workunits aren't there any longer
Sure they are, just click on any separation WU and you get a list of tasks for that WU. So they are still in the database incl. all results (they are in std_err, not separate files), the corresponding IDs are valid (why shouldn't they, the ID becomes invalid when the WU is purged from db), but they are not purged for the same reason as the N-Body WUs "validated" by Kevin: no canonical result. This WU for example can be purged, but not all those without a canonical result.
I should perhaps have defined "orphan"... My remark was about Separation tasks that have low task numbers and workunit numbers such as 2141411706 -- good luck finding anything other than "Unable to handle request: can't find workunit" in those cases! :-) I didn't regard Separation tasks from after the mass WU/task renumbering that was needed in early 2021 as orphaned; their [parent] WUs are usually still present! (Most of my left-over Separation tasks are from 2021!)
If the strange valid tags were the result of a database hack (either explicit or using something in the Admin toolkit [which would be broken if it did that!], it will be interesting to see what happens when the third result comes in :-) It shouldn't struggle to pick a canonical result, but...
You mean for Separation or N-Body? Separation will be stuck in waiting for validation, while the validator for N-Body will simply do it's job (unless something is completely broken because of the hack/forced validation).
In this case I was talking about NBody; sorry if that wasn't clear from the context :-)

Cheers - Al.
ID: 76795 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,980,267
RAC: 5,797
Message 76797 - Posted: 23 Jan 2024, 11:23:50 UTC - in response to Message 76795.  

My remark was about Separation tasks that have low task numbers and workunit numbers such as 2141411706 -- good luck finding anything other than "Unable to handle request: can't find workunit" in those cases! :-) I didn't regard Separation tasks from after the mass WU/task renumbering that was needed in early 2021 as orphaned; their [parent] WUs are usually still present! (Most of my left-over Separation tasks are from 2021!)
Ah, OK, I wasn't crunching Separation 2021, so don't have those in my list, only what was left when they "finished" it. Anyway, now they will have to remove all Separation WUs manually, if they can't "simply" find and delete Separation WUs and results, AFAICT anything with WU number 953xxxxxx and task number 912xxxxxx (and below) can be deleted, we are now at WU number 96xxxxxxx and task number 93xxxxxxx. Well, anything below those and than any 10-digit WU numbers, since the example you posted is a 10-digit number and we are at 9-digit numbers. Now that I see those numbers I remember they had to start over with counting as they have reached the 2^31 limit. Perhaps BOINC needs to be updated to 64-bit. ;-)
ID: 76797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Finn the Human
Avatar

Send message
Joined: 23 Dec 18
Posts: 23
Credit: 10,211,388
RAC: 768
Message 76799 - Posted: 23 Jan 2024, 20:56:26 UTC

Seems like the previously marked valid but zero-credit WUs have been moved to validation inconclusive again. Is anyone else seeing this? Hopefully now, this means we will get those validation WUs sent properly now.
Everything stays
But it still changes
Ever so slightly
Daily and nightly
In little ways
When everything stays...

ID: 76799 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,980,267
RAC: 5,797
Message 76800 - Posted: 24 Jan 2024, 10:08:29 UTC - in response to Message 76799.  

Yes, same here. Probably the mechanism alanb1951 was talking about kicked in.

alanb1951 wrote:
The transitioner code I looked at has a backstop mechanism whereby in some code paths it sets a safety-net time for when it should next look at the workunit1. So it presumably doesn't matter whether the retry has been sent out or is still stuck at Waiting to be sent with no deadline - the transitioner will look at the workunit anyway and will act the same as it does when it sees certain sorts of non-success returns

ID: 76800 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,980,267
RAC: 5,797
Message 76806 - Posted: 25 Jan 2024, 10:47:16 UTC

Kevin Roux wrote:
Question
- I do not see the PHP warnings anymore on the result pages. Is this still an issue that people are seeing? Please let me know if it is so I can take a look at it.
No warnings here, but they were gone for me since a while.
ID: 76806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 4 Jul 09
Posts: 87
Credit: 16,788,157
RAC: 2,908
Message 76810 - Posted: 26 Jan 2024, 5:14:34 UTC

The number of Tasks ready to send is still increasing.... so the correction to fix the over supply is probably not working quite as intended. I may be wrong but the number of tasks actually being in progress seems smaller than I think it it should be. Are Tasks nor being released and assigned quite right ?

Thanks
Bill F
In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic;
There was no expiration date.


ID: 76810 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 1 Jan 17
Posts: 34
Credit: 100,662,486
RAC: 285,526
Message 76811 - Posted: 26 Jan 2024, 7:31:20 UTC - in response to Message 76810.  
Last modified: 26 Jan 2024, 7:33:09 UTC

@Kevin Roux, thanks for your continuous work on fixes and improvements!

Bill F wrote:
The number of Tasks ready to send is still increasing.... so the correction to fix the over supply is probably not working quite as intended.
I'd say it is stagnating.

Bill F wrote:
I may be wrong but the number of tasks actually being in progress seems smaller than I think it it should be. Are Tasks nor being released and assigned quite right ?
This figure is fluctuating around a somewhat constant level.

Have a look for yourself: server_stats.php history of the past 30 days -- https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=milkyway@home&from=now-30d&to=now
ID: 76811 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,980,267
RAC: 5,797
Message 76812 - Posted: 26 Jan 2024, 10:42:05 UTC - in response to Message 76811.  

This figure is fluctuating around a somewhat constant level.
Yes, and it should start to drop once we are through the pile of _0 tasks and start processing the resends. Until than (on average) we report one task, we get a replacement, and for the reported task a resend task is created, so the amount of ready to send tasks is pretty constant.
ID: 76812 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3322
Credit: 520,666,371
RAC: 33,249
Message 76813 - Posted: 26 Jan 2024, 11:21:15 UTC - in response to Message 76812.  

This figure is fluctuating around a somewhat constant level.
Yes, and it should start to drop once we are through the pile of _0 tasks and start processing the resends. Until than (on average) we report one task, we get a replacement, and for the reported task a resend task is created, so the amount of ready to send tasks is pretty constant.


Personally I wish they could insert the _1 task at the beginning of the list so people aren't waiting as long, either that or just generate both the _0 and _1 task at the same time and then delay the _1 task by 1 or 2 days in case it's not needed. Then if it's not needed after it's sent out just delete it from the Server side so that it deletes it from us crunchers too.
ID: 76813 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,980,267
RAC: 5,797
Message 76814 - Posted: 26 Jan 2024, 14:26:55 UTC - in response to Message 76813.  

Once we processed that huge pile of WUs and are back to the new buffer of 10000 ready to send workunits, than this type of micromanagement won't be necessary. Perhaps we can even return to 1000, AFAICT there were no issues with that.
ID: 76814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,956,524
RAC: 1,103
Message 76815 - Posted: 26 Jan 2024, 15:03:10 UTC

Seems like something helped the server.


Image source is in the header of the image or is available here: https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=milkyway@home&from=now-7d&to=now
ID: 76815 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,980,267
RAC: 5,797
Message 76824 - Posted: 27 Jan 2024, 12:02:27 UTC

All my Separation tasks are gone. *thumbsup*
ID: 76824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3322
Credit: 520,666,371
RAC: 33,249
Message 76828 - Posted: 28 Jan 2024, 11:43:30 UTC - in response to Message 76824.  

All my Separation tasks are gone. *thumbsup*


mine too WOO HOO!!!
ID: 76828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnDK
Avatar

Send message
Joined: 18 Feb 10
Posts: 53
Credit: 221,728,720
RAC: 4,126
Message 76831 - Posted: 28 Jan 2024, 15:36:36 UTC - in response to Message 76828.  

All my Separation tasks are gone. *thumbsup*


mine too WOO HOO!!!

They're not gone from my 2 hosts, maybe it will happen soon...
ID: 76831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,980,267
RAC: 5,797
Message 76832 - Posted: 28 Jan 2024, 15:40:19 UTC - in response to Message 76831.  
Last modified: 28 Jan 2024, 15:42:05 UTC

They're not gone from my 2 hosts, maybe it will happen soon...
Those are from 2021, looks like Kevin will have to look at those separately, the tasks exist, but not the WUs, so it might be a bit more complicated.
ID: 76832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 210
Credit: 106,132,063
RAC: 23,923
Message 76838 - Posted: 29 Jan 2024, 14:50:01 UTC - in response to Message 76832.  

They're not gone from my 2 hosts, maybe it will happen soon...
Those are from 2021, looks like Kevin will have to look at those separately, the tasks exist, but not the WUs, so it might be a bit more complicated.
There is a fairly simple script for SysAdmins in the source repository that looks as if it would do the trick if it is present on the MW site.

Its name is delete_orphan_results.php :-)

Cheers - Al.
ID: 76838 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next

Message boards : News : Admin Updates Discussion

©2024 Astroinformatics Group