Message boards :
Number crunching :
Some tasks seem never-ending.
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Feb 12 Posts: 3 Credit: 5,236,155 RAC: 3,470 |
For some time now, the tasks of this project have been behaving strangely from time to time. For example, at the beginning it is estimated that the whole task should take less than 2 hours to calculate. Meanwhile, after more than 2 hours of calculation, it turns out that more than 23 days are still needed to complete the calculation and this time keeps increasing! I understand that calculation algorithms can be complex. However, I believe that in a situation like this, where there is no chance of completing these calculations in a reasonable amount of time, such a task should terminate such an activity on its own, rather than wasting the power of my processor and my electricity bills on worthless calculations that nobody will need for anything after such a long time. It would probably be possible to implement some kind of watchdog to force the abandonment of calculations in such situations. I have already had several such situations on two different computers. One such task I aborted after more than 24 hours of calculation. A few others – after a few hours. One of those cases: https://drive.google.com/file/d/1-6UbrvuVOs91uJylDOo-u4ztubtPZvaH/view?usp=sharing |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,297,971 RAC: 2,484 |
Is the % completed increasing? Is the WU using any CPU time? If yes, just let them run, the estimate is just an estimate and in case of n-Body often very wrong. If not, do you allow BOINC to use 100% of CPU time? If not, that's usually the reson why such WU gets stuck. Set BOINC to 100% CPU time and restart it, the WU should than continue from the last checkpoint. |
Send message Joined: 22 Oct 10 Posts: 17 Credit: 144,544,832 RAC: 3,390 |
N-Body seems to be broken like this When you notice it, end BOINC and restart, and they should finish normally |
Send message Joined: 12 Oct 11 Posts: 7 Credit: 23,330,249 RAC: 7,587 |
I'm getting a large number of never-ending work units (de_nbody_orbit_fitting_03-06-2024_v186_OCS_data_3_1709197135_xxxxx) at least five of them in the past five days. I've just rejoined MilkyWay@home after an absence from the project, and am immediately getting an alarming number of them: WU's that are estimated at a mere couple of minutes are taking over twenty four hours with increasing time remaining. I can't keep monitoring the progress every now and again to abort bad WU's, and don't want to waste processing time and energy. Is this a bad lot of WU's, or could there be something wrong with my setup? Other projects: Einstein@Home, Rosetta@Home are behaving normally; even WCU, thought WU's come down only sporadically. |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,297,971 RAC: 2,484 |
or could there be something wrong with my setup?If you don't allow computing 100% of the CPU time, than it's your setup. It's OK to limit the number of CPUs used, but for Milkyway you must allow 100% of CPU time. |
Send message Joined: 12 Oct 11 Posts: 7 Credit: 23,330,249 RAC: 7,587 |
Thanks, glad it's something simple. I'll make that change. |
Send message Joined: 12 Oct 11 Posts: 7 Credit: 23,330,249 RAC: 7,587 |
Quick note to confirm that setting processor time to 100% solved the problem, thanks Link :-) |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,297,971 RAC: 2,484 |
Thanks for confirmation, that it worked for you, there are many people with this issue. |
©2024 Astroinformatics Group