Welcome to MilkyWay@home

Admin Updates Discussion

Message boards : News : Admin Updates Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 11 Sep 24
Posts: 13
Credit: 32,581
RAC: 1,457
Message 77645 - Posted: 22 Sep 2025, 21:18:22 UTC

We have found what appears to be the source of the different results between Windows and Linux. The problematic function is not essential, so we will be doing runs without it until the code is replaced in our next version. We expect the amount of invalids to be significantly reduced going forward, but we will be keeping an eye out for additional issues. As mentioned, one might be an issue with runs that restart after a shut down. If you happen to shut down during some tasks, let us know if you notice anything!
ID: 77645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 11 Sep 24
Posts: 13
Credit: 32,581
RAC: 1,457
Message 77650 - Posted: 1 Oct 2025, 14:47:44 UTC

The invalid results given when resuming a run were caused by some checkpoint files not storing all of the information that was needed. A fix will be included in the next update.
ID: 77650 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77651 - Posted: 1 Oct 2025, 16:15:20 UTC - in response to Message 77650.  
Last modified: 1 Oct 2025, 16:38:31 UTC

Thank you for keeping us updated. :-)
ID: 77651 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cavalary
Avatar

Send message
Joined: 23 Aug 11
Posts: 58
Credit: 18,325,901
RAC: 21,567
Message 77652 - Posted: 2 Oct 2025, 16:43:41 UTC - in response to Message 77650.  

Good to know. Thanks for the update.
ID: 77652 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bill F
Avatar

Send message
Joined: 4 Jul 09
Posts: 108
Credit: 18,317,753
RAC: 2,586
Message 77699 - Posted: 13 Nov 2025, 4:34:26 UTC

The tasks generated after the changes on Nov 11 appear to have increased overall Task Flow if you look at BOINCStats at the Project level

https://www.boincstats.com/stats/61/project/detail/credit

Bill F
ID: 77699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77700 - Posted: 13 Nov 2025, 12:27:21 UTC

The rsc_fpops_est update has been implemented on the server.
Let me know if there are any issues.
You can expect to see these changes on Workunits generated after 11 Nov 2025 at 19:30:00 UTC.

So here are the results for my first 9 WUs generated after this point:

WU #:          1011108637
Name:          de_nbody_orbit_fitting_10_23_2025_v193_OCS_north_MW2014__data__3_1762888230_1618
created:       11 Nov 2025, 20:14:36 UTC
est GFLOPs:    743,470
est run time:  19:01:28
run time:      00:00:02


WU #:          1011109141
Name:          de_nbody_10_27_2025_v193_OCS_north_MW2014__data__01_1762888230_2122
created:       11 Nov 2025, 20:46:39 UTC
est GFLOPs:    93,901
est run time:  02:24:10
run time:      00:27:22
avg GFLOPs:    57.19


WU #:          1011118536
Name:          de_nbody_orbit_fitting_10_23_2025_v193_OCS_north_MW2014__data__3_1762888230_11517
created:       12 Nov 2025, 7:13:57 UTC
est GFLOPs:    992,244
est run time:  1d 01:23:25
run time:      0d 00:00:02


WU #:          1011125009
Name:          de_nbody_10_27_2025_v193_OCS_north_MW2014__data__01_1762888230_17990
created:       12 Nov 2025, 14:52:50 UTC
est GFLOPs:    14,058
est run time:  00:20:27
run time:      00:01:24
avg GFLOPs:    167.36


WU #:          1011127295
Name:          de_nbody_orbit_fitting_10_23_2025_v193_OCS_north_MW2014__data__3_1762888230_20276
created:       12 Nov 2025, 17:54:14 UTC
est GFLOPs:    318,937
est run time:  08:00:53
run time:      01:36:33
avg GFLOPs:    55.06


WU #:          1011127296
Name:          de_nbody_orbit_fitting_10_23_2025_v193_OCS_north_MW2014__data__3_1762888230_20277
created:       12 Nov 2025, 17:54:14 UTC
est GFLOPs:    364,805
est run time:  09:10:02
run time:      01:52:22
avg GFLOPs:    54.11


WU #:          1011127322
Name:          de_nbody_10_27_2025_v193_OCS_north_MW2014__data__01_1762888230_20303
created:       12 Nov 2025, 17:54:54 UTC
est GFLOPs:    13,250
est run time:  00:19:58
run time:      00:01:15
avg GFLOPs:    176.67


WU #:          1011125579
Name:          de_nbody_10_27_2025_v193_OCS_north_MW2014__data__01_1762888230_18560
created:       12 Nov 2025, 15:52:54 UTC
est GFLOPs:    64,766
est run time:  01:36:08
run time:      00:17:22
avg GFLOPs:    62.16


WU #:          1011127324
Name:          de_nbody_10_27_2025_v193_OCS_north__data__07_1762888230_20305
created:       12 Nov 2025, 17:55:02 UTC
est GFLOPs:    23,655
est run time:  00:35:39
run time:      00:07:04
avg GFLOPs:    55.79

I assume the estimated runtimes will be lower in general once the APR adjusts itself to the new estimates, but the tasks seem to be split into two groups (if we ignore those tasks, which nearly completely filled up my 1.2 days cache only for to end after 2 seconds): one group of longer running tasks and one group of "shorties" and as you see from the calculated processing rates, one of that groups have either too high or too low GFLOPs estimation by a factor of about 3. If this can be corrected (and once the APR adjusts itself), I think the new estimates are going to be OK-ish.
ID: 77700 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vester
Avatar

Send message
Joined: 30 Dec 14
Posts: 35
Credit: 911,354,348
RAC: 29,368
Message 77701 - Posted: 13 Nov 2025, 12:27:31 UTC - in response to Message 76946.  
Last modified: 13 Nov 2025, 12:28:23 UTC

Deleted.
ID: 77701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77703 - Posted: 15 Nov 2025, 14:33:42 UTC - in response to Message 77700.  
Last modified: 15 Nov 2025, 14:44:14 UTC

one group of longer running tasks and one group of "shorties" and as you see from the calculated processing rates, one of that groups have either too high or too low GFLOPs estimation by a factor of about 3.
After checking some random short tasks it ssems like it might be some kind of exponential type of overestimation for short tasks, which starts to be visible on tasks running below 15-20 minutes on my system. So everything above those 15-20 minutes has an APR of 55-60 GFLOPs, the task in my previous post, which run 17m22s had an calculated APR of 62 GFLOPs, so slightly above, tasks running arond 10 minutes have already around 80-120 GFLOPs and the real shorties (1-2 minutes run time) are somewhere in the range of 150-180 GFLOPs.

But in general it gets better and better as the APR of the app adapts to the new estimates, so I think this should be good enough to keep the desired cache size, at least as long as there are not too many 2-second-WUs in there.

Btw, any reason why you still didn't set initial replication to 2 considering that AFAICT every WU needs two results to validate? This would speed up validation a lot.
ID: 77703 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bobsmith18

Send message
Joined: 1 Nov 10
Posts: 18
Credit: 2,335,992
RAC: 6,149
Message 77704 - Posted: 15 Nov 2025, 16:01:31 UTC - in response to Message 77703.  

What you say would be true if it were the vast majority of tasks that required a third (or more) result to be returned and validate against each other.
BUT
The majority of tasks validate with only two results being returned (In the case of my computers it looks to be about 2%).
There's also an issue with the way MilkyWay categorises tasks - when the first result is returned it shows as "validation inconclusive", which is not right, and there is no result returned to validate against, such tasks should be categorised as "Validation pending". Only once two results have been returned, and they don't validate, can "validation inconclusive" be applied and thus a third task be sent out.
Sending out three tasks initially would result in about a third of returned work being "wasted" in that it would not be required in the validation process.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 77704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77705 - Posted: 16 Nov 2025, 10:05:10 UTC - in response to Message 77704.  
Last modified: 16 Nov 2025, 10:14:32 UTC

Sending out three tasks initially would result in about a third of returned work being "wasted" in that it would not be required in the validation process.
I wrote they should send 2 tasks, not 3. They are sending now just one and that's not enough for the validation. It was IIRC enough for most Separation WUs so there initial replication of 1 was perfect, but that part of the project finished few years ago and for N-body this setting seems wrong IMHO, at least I can't remember any times, when N-body WUs were able to validate with just one result.
ID: 77705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bobsmith18

Send message
Joined: 1 Nov 10
Posts: 18
Credit: 2,335,992
RAC: 6,149
Message 77706 - Posted: 16 Nov 2025, 17:03:31 UTC - in response to Message 77705.  

Sorry, I was confused by having a number of tasks on my screen showing "initial replication" as 2 and 3. Looking at other tasks I see that the real initial task shows "initial replication" as 1.
This being the case I agree with you, the project should move to the nomenclature and practice that most other projects that require "proper" validation do - send out 2 tasks in the first batch, with "initial replication" set to 2, use the "validation pending" status correctly and not abuse the "validation inconclusive" when only one o the initial pair of tasks has been returned.
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 77706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cavalary
Avatar

Send message
Joined: 23 Aug 11
Posts: 58
Credit: 18,325,901
RAC: 21,567
Message 77707 - Posted: 17 Nov 2025, 1:54:36 UTC

The issues with an initial replication of 1 and the incorrect use of "validation inconclusive" to lump together actual inconclusives with pending have existed all along though...
What is new now is the awfully long incorrect estimates since this last change. I have estimates of up to 14 days while few WUs take more than one (running single thread) and most way less. Looking right now at three running tasks with initial estimates around a week which should be done in under one day. So the buffer that so far tended to need to be kept low because estimates could be shorter and a big buffer risked having tasks time out, now has to be high so you don't risk running out.
ID: 77707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bobsmith18

Send message
Joined: 1 Nov 10
Posts: 18
Credit: 2,335,992
RAC: 6,149
Message 77708 - Posted: 17 Nov 2025, 9:34:46 UTC - in response to Message 77707.  

Hmmm......
I am seeing estimated runtimes in the same ballpark as the real runtimes - this disparity in our estimated vs real runtime is a bit of a concern. Perhaps it is time for Milkyway to sit down and look at their server configuration/BOINC software and cure these issues rather than having us having to develop workarounds.....
Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?
ID: 77708 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77709 - Posted: 17 Nov 2025, 11:00:14 UTC - in response to Message 77708.  

I guess it's going to take some time for the estimates to adjust, my are difinitely getting better and better every day.

@Cavalary: Contrary to the original assumption running Milkyway in single thread mode isn't most efficient way to run it, you might want to check out my "Milkyway Nbody ST vs. MT: real benchmarking" thread. On my 5700G it's best to run 2x 7-thread tasks (I leave 2 threads free for iGPU feeding), we are talking about around 2-4x work done per day depending on the size of the WU. Since your 8700G has same cache size, it should be similar.
ID: 77709 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77719 - Posted: 18 Nov 2025, 16:25:23 UTC
Last modified: 18 Nov 2025, 16:32:22 UTC

The update to N-body version 1.94 will be happening tomorrow, November 18th at around 18:00 UTC. (...)
This update includes a new momentum likelihood component, an updated softening length, and various other small changes and bugfixes.
In case the new application is not compatible with WUs created for v1.93 (and those release notes suggest that it's not or at least will generate different results), please make sure that resends of old WUs won't be assigned to the new version like it happened during the switch from v1.87 to v1.92.
ID: 77719 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 11 Sep 24
Posts: 13
Credit: 32,581
RAC: 1,457
Message 77721 - Posted: 18 Nov 2025, 19:16:12 UTC - in response to Message 77719.  

We should not have any issues with WUs this time around, since we are putting the new version on the nbody application which has had no tasks for a while now. This update should hopefully be much less trouble than the last.

We are going to keep the initial replication for runs at 1 for the time being. We are doing this because we plan to make changes to improve our optimizations sometime soon, and as part of this we will not require all WUs to validate with two results (it will work like separation used to).
ID: 77721 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77722 - Posted: 19 Nov 2025, 11:02:30 UTC - in response to Message 77721.  
Last modified: 19 Nov 2025, 11:15:37 UTC

Thanks for the info.

There seems to be an issue with "Max # of threads for each MilkyWay@home task" for the new application. I have set it to 7 and for v1.93 I was getting only MT tasks using 7 threads, for v1.94 I got now a single thread task.

I fixed this for myself since BOINC can be a major PITA when it should run a mix of ST and MT tasks, but maybe you can find the reason, why the server sends ST tasks for Milkyway@home N-Body Simulation but not for Milkyway@home N-Body Simulation with Orbit Fitting when the user has set it to specific value.

This is how I fixed it via app_config.xml (not tested yet, but it should as it's same binary, BOINC Manager shows it already as a 7-thread WU):
<app_config>
 <app>
  <name>milkyway_nbody</name>
  <fraction_done_exact/>
 </app>
 <app>
  <name>milkyway_nbody_orbit_fitting</name>
  <fraction_done_exact/>
 </app>
 <app_version>
  <app_name>milkyway_nbody</app_name>
  <version_num>194</version_num>
  <platform>windows_x86_64</platform>
  <avg_ncpus>7.000000</avg_ncpus>
  <cmdline>--nthreads 7</cmdline>
 </app_version>
 <project_max_concurrent>16</project_max_concurrent>
</app_config>

ID: 77722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77723 - Posted: 19 Nov 2025, 17:01:11 UTC - in response to Message 77722.  

OK, it's different binary this time, so my method didn't work. Hopefully you can fix it on the server side.
ID: 77723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnDK
Avatar

Send message
Joined: 18 Feb 10
Posts: 62
Credit: 224,641,383
RAC: 4,104
Message 77724 - Posted: 19 Nov 2025, 17:24:49 UTC

I had the same problem but some time ago. I, so far, fixed it by choosing No limit in Max threads and use app_config.xml to the number I want.
ID: 77724 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 775
Credit: 20,503,497
RAC: 9,818
Message 77726 - Posted: 20 Nov 2025, 9:05:06 UTC - in response to Message 77724.  

Well, yes, there are ways to fix it for yourself, but we are supposed to report bugs here so they can fix it for everyone.
ID: 77726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · Next

Message boards : News : Admin Updates Discussion

©2025 Astroinformatics Group