Message boards :
Number crunching :
Repeat/Duplicate Nbody Workunits?
Message board moderation
| Author | Message |
|---|---|
|
Send message Joined: 18 Nov 22 Posts: 102 Credit: 656,573,935 RAC: 182,434 |
So as others might have seen, I created the CUDA application for Nbody recently, and I've been testing it and checking the invalids to re-test for bugs and whatnot. to do this, some additional prints were added, specifically the input arguments and WU types/potentials/etc, so that I can more easily re-produce and re-run WUs that I get invalids on. while looking at some of the invalids, I noticed that I seem to have received the same exact WU (same input arguments, same output, same calculation performed) multiple times. normally this kind of thing is transparent to most users since the default application does not print this data. but my app does, so it seems strange to run the same exact calculation in separate discrete WUs. when you look at inputs and the resulting likelihoods, they are all exactly the same WU-to-WU. the only thing different is the seed, but that doesnt seem to impact the computation at all. examples: https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056901926 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056840161 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056812286 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056781232 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056616567 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056525411 https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1056469706 maybe the admins can elaborate here. why is essentially the same WU being sent out and crunched so many different times?
|
|
Send message Joined: 18 Nov 22 Posts: 102 Credit: 656,573,935 RAC: 182,434 |
update, thanks Pavel/ahorek's team for pointing it out the seed is being held fixed in the lua file, so the seed argument from the input args essentially does nothing. is this intended? and if so, why send out the same WU so many times with different seeds when you get the same output via the fixed seed from the lua file?
|
|
Send message Joined: 11 Sep 24 Posts: 35 Credit: 710,354 RAC: 98 |
This is due to the way our optimizations work. Without going in to details, there will be more repeat workunits as our run narrows in on a final answer. When we have our optimal parameters, all workunits will be identical. You happened to catch this run just as it was finishing up. It has been taken down and a new run will replace it. You may still see a few of these repeat tasks as the last of them validate. |
|
Send message Joined: 18 Nov 22 Posts: 102 Credit: 656,573,935 RAC: 182,434 |
Thanks gimmy. I just wanted to make sure that it wasnt some kind of problem or bug. at the very least i've been able to use it as a point of reference for some performance tracking when testing out different host configurations.
|
|
Send message Joined: 19 Jul 10 Posts: 832 Credit: 21,858,634 RAC: 8,030 |
Thanks for the explanation. From our end it looked like a bad batch or something like that. :-)
|
|
Send message Joined: 8 Sep 07 Posts: 13 Credit: 2,582,026 RAC: 803 |
It sounds to me like a missed optimization. The scheduler should detect already processed WUs based on their input parameters and avoid creating new WUs that end up computing the same thing. That seems pretty wasteful. If those duplicates are extremely rare, such an optimization might not be worth the added complexity. However, Ian’s random example shows that the same task has already been successfully computed at least 14 times. Eliminating duplicates on the server side, even across millions of tasks, shouldn’t be impossible, and it’s certainly less demanding than recomputing the same work over and over again. Some cross-validation between different hosts is necessary to ensure reliability, but computing the exact same work 14 times has no additional value. If this redundancy can be avoided, it could save a significant amount of computing resources. It might be more difficult than I’m assuming, and I don’t know the details of how the work is generated or whether it would actually save as much effort as I expect, but I think it’s worth considering. |
|
Send message Joined: 11 Sep 24 Posts: 35 Credit: 710,354 RAC: 98 |
I agree; I'll get this looked at. I wouldn't be surprised if there's a catch that makes this difficult, but it sounds easy enough to at least check against the best results that we store. Doing so should catch a majority of the duplicate work. |
©2026 Astroinformatics Group