Welcome to MilkyWay@home

Some tasks seem never-ending.

Message boards : Number crunching : Some tasks seem never-ending.
Message board moderation

To post messages, you must log in.

AuthorMessage
Grzegorz Skoczylas

Send message
Joined: 2 Feb 12
Posts: 3
Credit: 4,413,644
RAC: 3,088
Message 76388 - Posted: 20 Sep 2023, 17:55:51 UTC

For some time now, the tasks of this project have been behaving strangely from time to time. For example, at the beginning it is estimated that the whole task should take less than 2 hours to calculate. Meanwhile, after more than 2 hours of calculation, it turns out that more than 23 days are still needed to complete the calculation and this time keeps increasing!

I understand that calculation algorithms can be complex. However, I believe that in a situation like this, where there is no chance of completing these calculations in a reasonable amount of time, such a task should terminate such an activity on its own, rather than wasting the power of my processor and my electricity bills on worthless calculations that nobody will need for anything after such a long time.

It would probably be possible to implement some kind of watchdog to force the abandonment of calculations in such situations.

I have already had several such situations on two different computers. One such task I aborted after more than 24 hours of calculation. A few others – after a few hours.

One of those cases: https://drive.google.com/file/d/1-6UbrvuVOs91uJylDOo-u4ztubtPZvaH/view?usp=sharing
ID: 76388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,982,792
RAC: 5,697
Message 76389 - Posted: 20 Sep 2023, 19:27:22 UTC - in response to Message 76388.  

Is the % completed increasing? Is the WU using any CPU time?

If yes, just let them run, the estimate is just an estimate and in case of n-Body often very wrong.

If not, do you allow BOINC to use 100% of CPU time? If not, that's usually the reson why such WU gets stuck. Set BOINC to 100% CPU time and restart it, the WU should than continue from the last checkpoint.
ID: 76389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
keputnam

Send message
Joined: 22 Oct 10
Posts: 16
Credit: 143,923,624
RAC: 3,967
Message 76426 - Posted: 13 Oct 2023, 1:16:35 UTC - in response to Message 76388.  

N-Body seems to be broken like this

When you notice it, end BOINC and restart, and they should finish normally
ID: 76426 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
julianop

Send message
Joined: 12 Oct 11
Posts: 7
Credit: 22,235,194
RAC: 1,331
Message 76983 - Posted: 25 Mar 2024, 2:57:03 UTC

I'm getting a large number of never-ending work units (de_nbody_orbit_fitting_03-06-2024_v186_OCS_data_3_1709197135_xxxxx) at least five of them in the past five days.
I've just rejoined MilkyWay@home after an absence from the project, and am immediately getting an alarming number of them: WU's that are estimated at a mere couple of minutes are taking over twenty four hours with increasing time remaining.
I can't keep monitoring the progress every now and again to abort bad WU's, and don't want to waste processing time and energy.

Is this a bad lot of WU's, or could there be something wrong with my setup? Other projects: Einstein@Home, Rosetta@Home are behaving normally; even WCU, thought WU's come down only sporadically.
ID: 76983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,982,792
RAC: 5,697
Message 76984 - Posted: 25 Mar 2024, 13:35:30 UTC - in response to Message 76983.  

or could there be something wrong with my setup?
If you don't allow computing 100% of the CPU time, than it's your setup. It's OK to limit the number of CPUs used, but for Milkyway you must allow 100% of CPU time.
ID: 76984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
julianop

Send message
Joined: 12 Oct 11
Posts: 7
Credit: 22,235,194
RAC: 1,331
Message 76986 - Posted: 25 Mar 2024, 21:52:47 UTC - in response to Message 76984.  

Thanks, glad it's something simple. I'll make that change.
ID: 76986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
julianop

Send message
Joined: 12 Oct 11
Posts: 7
Credit: 22,235,194
RAC: 1,331
Message 76988 - Posted: 27 Mar 2024, 18:36:28 UTC - in response to Message 76986.  
Last modified: 27 Mar 2024, 18:36:51 UTC

Quick note to confirm that setting processor time to 100% solved the problem, thanks Link :-)
ID: 76988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 597
Credit: 18,982,792
RAC: 5,697
Message 76989 - Posted: 27 Mar 2024, 19:25:30 UTC - in response to Message 76988.  

Thanks for confirmation, that it worked for you, there are many people with this issue.
ID: 76989 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Some tasks seem never-ending.

©2024 Astroinformatics Group