Welcome to MilkyWay@home

Repeat/Duplicate Nbody Workunits?

Message boards : Number crunching : Repeat/Duplicate Nbody Workunits?
Message board moderation

To post messages, you must log in.

AuthorMessage
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 102
Credit: 656,449,543
RAC: 177,942
Message 77970 - Posted: 12 May 2026, 1:04:14 UTC

So as others might have seen, I created the CUDA application for Nbody recently, and I've been testing it and checking the invalids to re-test for bugs and whatnot. to do this, some additional prints were added, specifically the input arguments and WU types/potentials/etc, so that I can more easily re-produce and re-run WUs that I get invalids on.

while looking at some of the invalids, I noticed that I seem to have received the same exact WU (same input arguments, same output, same calculation performed) multiple times. normally this kind of thing is transparent to most users since the default application does not print this data. but my app does, so it seems strange to run the same exact calculation in separate discrete WUs. when you look at inputs and the resulting likelihoods, they are all exactly the same WU-to-WU. the only thing different is the seed, but that doesnt seem to impact the computation at all.

examples:
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056901926
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056840161
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056812286
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056781232
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056616567
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056525411
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056469706

maybe the admins can elaborate here. why is essentially the same WU being sent out and crunched so many different times?

ID: 77970 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 102
Credit: 656,449,543
RAC: 177,942
Message 77972 - Posted: 12 May 2026, 15:56:41 UTC - in response to Message 77970.  
Last modified: 12 May 2026, 15:57:24 UTC

update, thanks Pavel/ahorek's team for pointing it out

the seed is being held fixed in the lua file, so the seed argument from the input args essentially does nothing.

is this intended? and if so, why send out the same WU so many times with different seeds when you get the same output via the fixed seed from the lua file?

ID: 77972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 11 Sep 24
Posts: 35
Credit: 710,354
RAC: 98
Message 77974 - Posted: 14 May 2026, 1:49:37 UTC

This is due to the way our optimizations work. Without going in to details, there will be more repeat workunits as our run narrows in on a final answer. When we have our optimal parameters, all workunits will be identical.

You happened to catch this run just as it was finishing up. It has been taken down and a new run will replace it. You may still see a few of these repeat tasks as the last of them validate.
ID: 77974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 102
Credit: 656,449,543
RAC: 177,942
Message 77975 - Posted: 14 May 2026, 14:10:30 UTC - in response to Message 77974.  

Thanks gimmy. I just wanted to make sure that it wasnt some kind of problem or bug.

at the very least i've been able to use it as a point of reference for some performance tracking when testing out different host configurations.

ID: 77975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 832
Credit: 21,855,781
RAC: 8,177
Message 77976 - Posted: 14 May 2026, 15:22:36 UTC - in response to Message 77974.  

Thanks for the explanation. From our end it looked like a bad batch or something like that. :-)
ID: 77976 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team

Send message
Joined: 8 Sep 07
Posts: 13
Credit: 2,582,026
RAC: 931
Message 77977 - Posted: 14 May 2026, 16:33:54 UTC

It sounds to me like a missed optimization. The scheduler should detect already processed WUs based on their input parameters and avoid creating new WUs that end up computing the same thing. That seems pretty wasteful.

If those duplicates are extremely rare, such an optimization might not be worth the added complexity. However, Ian’s random example shows that the same task has already been successfully computed at least 14 times. Eliminating duplicates on the server side, even across millions of tasks, shouldn’t be impossible, and it’s certainly less demanding than recomputing the same work over and over again. Some cross-validation between different hosts is necessary to ensure reliability, but computing the exact same work 14 times has no additional value. If this redundancy can be avoided, it could save a significant amount of computing resources.

It might be more difficult than I’m assuming, and I don’t know the details of how the work is generated or whether it would actually save as much effort as I expect, but I think it’s worth considering.
ID: 77977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 11 Sep 24
Posts: 35
Credit: 710,354
RAC: 98
Message 77983 - Posted: 17 May 2026, 0:47:02 UTC

I agree; I'll get this looked at. I wouldn't be surprised if there's a catch that makes this difficult, but it sounds easy enough to at least check against the best results that we store. Doing so should catch a majority of the duplicate work.
ID: 77983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Repeat/Duplicate Nbody Workunits?

©2026 Astroinformatics Group