Message boards :
News :
Admin Updates Discussion
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 · Next
| Author | Message |
|---|---|
GWGeorge007Send message Joined: 6 Jan 18 Posts: 18 Credit: 91,073,942 RAC: 13,557 |
Thank you! I sure will! Best regards, GWGeorge007 George
|
GWGeorge007Send message Joined: 6 Jan 18 Posts: 18 Credit: 91,073,942 RAC: 13,557 |
Somewhat annoying...Well, this WU was created before the change (well, at least before Kevin's message), not sure if the change applies to new results or just new WUs. Your message indicates that your link to https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1003011105 was created BEFORE Kevin's post: Your link: Created 28 May 2025, 19:09:11 UTC Kevin's post sent: Posted: 28 May 2025, 18:56:56 UTC - in response to Message 77453. George
|
|
Send message Joined: 18 Feb 10 Posts: 62 Credit: 224,641,383 RAC: 4,104 |
Well yes, he made a new post, will see if that solves the problem :) |
GWGeorge007Send message Joined: 6 Jan 18 Posts: 18 Credit: 91,073,942 RAC: 13,557 |
I'm still getting errors and invalids... as well as some Valid's also. You seem to be on the right track but I think you need to make more changes. George
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
I'm still getting errors and invalids... as well as some Valid's also.As far as I can see, all the WUs, that errored out for you, were created before Kevin's latest message. it will take some time, before those are out from the system, I'm not sure if Kevin was able to increase the limit for new results or only new WUs.
|
Bill FSend message Joined: 4 Jul 09 Posts: 108 Credit: 18,317,753 RAC: 2,586 |
Slightly off current topic... but still a MilkyWay project problem we also still have the issue of GFLOPS being computing incorrectly. Once this is resolved Task allocation for volunteer systems running multiple Projects should improve. Thanks Bill F In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic; There was no expiration date.
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
I don't know, if this had something to do with the recent changes, but I'm not sure this should have happened: 02/06/2025 15:10:05 | Milkyway@home | Result de_nbody_orbit_fitting_03_25_2025_v186_OCS__data__32_1740880091_2549890_3 is no longer usable 02/06/2025 15:10:05 | Milkyway@home | Result de_nbody_orbit_fitting_03_25_2025_v186_OCS__data__33_1740880091_2606249_5 is no longer usable Both results had few days left until the deadline, so it can't be that.
|
|
Send message Joined: 16 Mar 10 Posts: 218 Credit: 110,420,422 RAC: 3,848 |
There are two sorts of abort request that can be sent from the scheduler to the client. The one we are used to seeing is an "abort if not started"; it doesn't have a scheduler message text associated with it. However, if a WU is marked as not needed for some reason (such as cancellation of bad units) it will send a "abort" request with the "Result xxxxxxx is no longer usable" message as accompaniment. Note that this sort of abort is mandatory (which might be why BOINC admins sometimes don't seem to like using it?) The client handles the two abort types as separate cases; if it's a mandatory abort request the message from the scheduler will appear, and if the task happens to be active there should also be a client-generated "aborted by project - no longer usable" message in the BOINC log. So if bad [batches of] WUs are killed off properly, there will be sightings of that message (and probably some users complaining about wasted CPU time!) Cheers - Al. References: current sources -- sched/handle_request.cpp and client/cs_scheduler.cpp |
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
And the next two: 03/06/2025 09:54:29 | Milkyway@home | Result de_nbody_orbit_fitting_03_25_2025_v186_OCS__data__33_1740880091_2546703_6 is no longer usable 03/06/2025 09:54:29 | Milkyway@home | Result de_nbody_orbit_fitting_03_25_2025_v186_OCS__data__33_1740880091_2313614_5 is no longer usable
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
if a WU is marked as not needed for some reason (such as cancellation of bad units) it will send a "abort" request with the "Result xxxxxxx is no longer usable" message as accompaniment. Note that this sort of abort is mandatory (which might be why BOINC admins sometimes don't seem to like using it?)I think this has something more to do with message 77440: "I made a mistake when I was trying to perform some tests and cancelled a lot of workunits which included tasks that were in progress." Those were WUs, that were canceled once, than send out to new hosts, but for some reason they were now purged from database just like they were completed at the time of cancellation (or at the time they were send to me, when it was the last time something happend with them). There were even other "completed, validation inconclusive" results waiting for me to return my result. Just like the server didn't notice, that the WUs were uncancelled and simply purged them x hours after the last update. If some of the three temporarily cancelled WUs, that I still have running, will become suddenly not needed, than this would confirm this theory, as I don't think someone is sitting there at Milkyway@home and cancels some WUs every now and than, if they need to do that, they'd do it at once for all not needed WUs.
|
GWGeorge007Send message Joined: 6 Jan 18 Posts: 18 Credit: 91,073,942 RAC: 13,557 |
I'm still getting errors and invalids... as well as some Valid's also.As far as I can see, all the WUs, that errored out for you, were created before Kevin's latest message. it will take some time, before those are out from the system, I'm not sure if Kevin was able to increase the limit for new results or only new WUs. As far as I can tell myself, these are all resends - those ending in numbers greater than 1 - with #'s of 2 thru 5. Am I mistaken, or were you supposed to have deleted these error prone tasks? Either way, I have only a few left in my cache - less than 100. Do you still have more 'resends' to send out? George
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
Am I mistaken, or were you supposed to have deleted these error prone tasks?No, sometimes they succeed, like this one for example. Either way, I have only a few left in my cache - less than 100. Do you still have more 'resends' to send out?All v1.87 are resends now.
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
And this one errored out with EXIT_DISK_LIMIT_EXCEEDED on one computer, three other computers could however successfully complete it (one was invalid, but that's a different story). So no, definitely do not abort those.Am I mistaken, or were you supposed to have deleted these error prone tasks?No, sometimes they succeed, like this one for example.
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
@Kevin: you might want to check the checkpointing and resuming from checkpoints. There might be an indication, that it doesn't work as it should and causes in some cases wrong results.
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
I think that was still not the right setting. de_nbody_orbit_fitting_05_23_2025_v190_OCS__data__02_1748458520_904835, created 23 Jun 2025 17:50:00 UTC, has <rsc_disk_bound>52428800.000000</rsc_disk_bound> according to my client_state.xml. It's the same for all Milkyway WUs, no matter when they were created.Seems like I might have changed the wrong thing?1002940398, 1003025268 and 1002958647, all created 3-15 days after your post error out with EXIT_DISK_LIMIT_EXCEEDED after exceeding 50MB. You might need to check again.@Kevin: you might need to check the disk limit for the current WUs, it might be necessary to increase it. I got EXIT_DISK_LIMIT_EXCEEDED errors on some of the de_nbody_orbit_fitting_03_25_2025_v186_OCS__data__33_1740880091_* tasks, the wingmen got them too. The slot directories of the running WUs seem to be quite a bit larger than what they used to be in the past as far as I remember, some of them close to or even over 30MB, the errored out tasks went over 50MB. Perhaps the current WU set needs 100-200MB as limit.I set the limit to 100MB.
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
It seems there's something wrong with some of the tasks from the batch de_nbody_orbit_fitting_06_25_2025_v192_OCS_lmc__data__01_1750887887. Examples: de_nbody_orbit_fitting_06_25_2025_v192_OCS_lmc__data__01_1750887887_7089 de_nbody_orbit_fitting_06_25_2025_v192_OCS_lmc__data__01_1750887887_7871 <core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
The system cannot find the drive specified.
(0xf) - exit code 15 (0xf)</message>
<stderr_txt>
<search_application> milkyway_nbody 1.92 Windows x86_64 double OpenMP, Crlibm </search_application>
Using OpenMP 16 max threads on a system with 16 processors
Running MilkyWay@home Nbody v1.92
Optimal Softening Length = 0.085112487261321 kpc
Dwarf Initial Position: [1.281420702397406,10.476795349009908,-5.431031550046485]
Dwarf Initial Velocity: [-30.499564089416765,-120.779390042410128,137.859170158334422]
Initial LMC position: [102.342473508093903,649.777072098520534,-172.926723231397546]
Initial LMC velocity: [-14.218141115165659,-111.129275159404870,5.342240784179240]
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
Error reading histogram line 43: EMDStart = {-33,138}
strftime() failed called boinc_finish(15)
</stderr_txt>
]]>
|
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
More of those: de_nbody_orbit_fitting_06_25_2025_v192_OCS_lmc__data__01_1750887887_475 de_nbody_orbit_fitting_06_25_2025_v192_OCS_lmc__data__01_1750887887_6989 de_nbody_orbit_fitting_06_25_2025_v192_OCS_lmc__data__01_1750887887_6990 de_nbody_orbit_fitting_06_25_2025_v192_OCS_lmc__data__01_1750887887_7030 And also it was perhaps not a good idea to let v1.92 cruch tasks originally made for v1.87: de_nbody_orbit_fitting_03_25_2025_v186_OCS__data__32_1747416295_73267 de_nbody_orbit_fitting_03_25_2025_v186_OCS__data__32_1747416295_75085 The new application doesn't like them: <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> The data is invalid. (0xd) - exit code 13 (0xd)</message> <stderr_txt> <search_application> milkyway_nbody 1.92 Windows x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 8 max threads on a system with 8 processors Running MilkyWay@home Nbody v1.86 Optimal Softening Length = 0.000011993015169 kpc Error evaluating NBodyCtx: [string "-- /* Copyright (c) 2016 - 2018 Siddhartha ..."]:106: bad argument #1 to 'create' (Missing required named argument 'PMSigma') Failed to read input parameters file strftime() failed called boinc_finish(13) </stderr_txt> ]]>
|
Kevin RouxSend message Joined: 9 Aug 22 Posts: 98 Credit: 4,474,934 RAC: 8 |
I think that was still not the right setting. de_nbody_orbit_fitting_05_23_2025_v190_OCS__data__02_1748458520_904835, created 23 Jun 2025 17:50:00 UTC, hasSeems like I might have changed the wrong thing?1002940398, 1003025268 and 1002958647, all created 3-15 days after your post error out with EXIT_DISK_LIMIT_EXCEEDED after exceeding 50MB. You might need to check again.@Kevin: you might need to check the disk limit for the current WUs, it might be necessary to increase it. I got EXIT_DISK_LIMIT_EXCEEDED errors on some of the de_nbody_orbit_fitting_03_25_2025_v186_OCS__data__33_1740880091_* tasks, the wingmen got them too. The slot directories of the running WUs seem to be quite a bit larger than what they used to be in the past as far as I remember, some of them close to or even over 30MB, the errored out tasks went over 50MB. Perhaps the current WU set needs 100-200MB as limit.I set the limit to 100MB. Third time's a charm. I changed rsc_disk_bound to 100MB in the compiled server code so hopefully it will be fixed now. |
|
Send message Joined: 11 Sep 24 Posts: 13 Credit: 32,459 RAC: 1,485 |
It seems there's something wrong with some of the tasks from the batch de_nbody_orbit_fitting_06_25_2025_v192_OCS_lmc__data__01_1750887887. This has been taken care of. The run was put up with an older version of the histogram which ended up not being compatible with the final version of the new code. All of the workunits with this error should be taken down now. |
|
Send message Joined: 19 Jul 10 Posts: 775 Credit: 20,502,551 RAC: 9,758 |
Third time's a charm. I changed rsc_disk_bound to 100MB in the compiled server code so hopefully it will be fixed now.Yes, for v1.92 WUs it's <rsc_disk_bound>104858000.000000</rsc_disk_bound>, for the v1.90 WUs it's still the old value, but I guess this will change later with new runs. There were no issues with those anyway, only v1.87 as far as I have seen. Thanks for the fix.
|
©2025 Astroinformatics Group