Repeat/Duplicate Nbody Workunits?

Author	Message
Ian&Steve C. Send message Joined: 18 Nov 22 Posts: 102 Credit: 656,449,543 RAC: 177,942	Message 77970 - Posted: 12 May 2026, 1:04:14 UTC So as others might have seen, I created the CUDA application for Nbody recently, and I've been testing it and checking the invalids to re-test for bugs and whatnot. to do this, some additional prints were added, specifically the input arguments and WU types/potentials/etc, so that I can more easily re-produce and re-run WUs that I get invalids on. while looking at some of the invalids, I noticed that I seem to have received the same exact WU (same input arguments, same output, same calculation performed) multiple times. normally this kind of thing is transparent to most users since the default application does not print this data. but my app does, so it seems strange to run the same exact calculation in separate discrete WUs. when you look at inputs and the resulting likelihoods, they are all exactly the same WU-to-WU. the only thing different is the seed, but that doesnt seem to impact the computation at all. examples: https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056901926 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056840161 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056812286 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056781232 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056616567 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056525411 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056469706 maybe the admins can elaborate here. why is essentially the same WU being sent out and crunched so many different times? ID: 77970 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 18 Nov 22 Posts: 102 Credit: 656,449,543 RAC: 177,942	Message 77972 - Posted: 12 May 2026, 15:56:41 UTC - in response to Message 77970. Last modified: 12 May 2026, 15:57:24 UTC update, thanks Pavel/ahorek's team for pointing it out the seed is being held fixed in the lua file, so the seed argument from the input args essentially does nothing. is this intended? and if so, why send out the same WU so many times with different seeds when you get the same output via the fixed seed from the lua file? ID: 77972 · Rating: 0 · rate: / Reply Quote

gimmyk Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 11 Sep 24 Posts: 35 Credit: 710,354 RAC: 98	Message 77974 - Posted: 14 May 2026, 1:49:37 UTC This is due to the way our optimizations work. Without going in to details, there will be more repeat workunits as our run narrows in on a final answer. When we have our optimal parameters, all workunits will be identical. You happened to catch this run just as it was finishing up. It has been taken down and a new run will replace it. You may still see a few of these repeat tasks as the last of them validate. ID: 77974 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Send message Joined: 18 Nov 22 Posts: 102 Credit: 656,449,543 RAC: 177,942	Message 77975 - Posted: 14 May 2026, 14:10:30 UTC - in response to Message 77974. Thanks gimmy. I just wanted to make sure that it wasnt some kind of problem or bug. at the very least i've been able to use it as a point of reference for some performance tracking when testing out different host configurations. ID: 77975 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 19 Jul 10 Posts: 832 Credit: 21,855,781 RAC: 8,177	Message 77976 - Posted: 14 May 2026, 15:22:36 UTC - in response to Message 77974. Thanks for the explanation. From our end it looked like a bad batch or something like that. :-) ID: 77976 · Rating: 0 · rate: / Reply Quote

ahorek's team Send message Joined: 8 Sep 07 Posts: 13 Credit: 2,582,026 RAC: 931	Message 77977 - Posted: 14 May 2026, 16:33:54 UTC It sounds to me like a missed optimization. The scheduler should detect already processed WUs based on their input parameters and avoid creating new WUs that end up computing the same thing. That seems pretty wasteful. If those duplicates are extremely rare, such an optimization might not be worth the added complexity. However, Ian’s random example shows that the same task has already been successfully computed at least 14 times. Eliminating duplicates on the server side, even across millions of tasks, shouldn’t be impossible, and it’s certainly less demanding than recomputing the same work over and over again. Some cross-validation between different hosts is necessary to ensure reliability, but computing the exact same work 14 times has no additional value. If this redundancy can be avoided, it could save a significant amount of computing resources. It might be more difficult than I’m assuming, and I don’t know the details of how the work is generated or whether it would actually save as much effort as I expect, but I think it’s worth considering. ID: 77977 · Rating: 0 · rate: / Reply Quote

gimmyk Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 11 Sep 24 Posts: 35 Credit: 710,354 RAC: 98	Message 77983 - Posted: 17 May 2026, 0:47:02 UTC I agree; I'll get this looked at. I wouldn't be surprised if there's a catch that makes this difficult, but it sounds easy enough to at least check against the best results that we store. Doing so should catch a majority of the duplicate work. ID: 77983 · Rating: 0 · rate: / Reply Quote