Welcome to MilkyWay@home

n-body WU going for 7 day deadline. Bug in n-body app regards /ncpus/N//ncpus/ config setting?

Questions and Answers : Windows : n-body WU going for 7 day deadline. Bug in n-body app regards /ncpus/N//ncpus/ config setting?
Message board moderation

To post messages, you must log in.

AuthorMessage
marmot
Avatar

Send message
Joined: 12 Dec 15
Posts: 53
Credit: 132,600,165
RAC: 30,940
Message 64253 - Posted: 17 Jan 2016, 15:01:56 UTC
Last modified: 17 Jan 2016, 15:03:27 UTC

Reprinted from the n-body release thread where no responses given.
I have a 6 core n-body WU that has run for 3 days and 10 hours and is only at 48% completion. The other 6 core n-body's in que report estimates of 38 to 58 minutes.
Is a 7 day run time possible?
Will this 7 day WU get paid an appropriate amount of credit?
Should I abort this WU?

I can't make a determination of normality because your server deletes my results so quickly. It would be much appreciated if you would maintain 2 weeks of results in our account history so we can get an idea of when packets failed and which machines are under-performing or if a WU app is behaving badly.

Besides this extrremely long calculation time, I've noticed with n-body that sometimes the 8 core n-body will be running and along with another 2 WU's from other projects even though BOINC only thinks this machine has 8 cores. It seems the n-body WU doesn't suspend when BOINC does a WU switch over every 30 minutes.

My configuration is probably rare. Many of my machines are set to a cc_config.xml options <ncpus>N</ncpus> where N is 2 cores higher than actual system cores. It's the only solution that actually fixes the work fetch anomaly where BOINC debt/workfetch algorithm idles a core (or 2) so that a high resource project with no current WU's has a core ready to go. I see this work fetch problem on many of my machines that have Citizen Grid (or a few other intermittent projects) set to 99 resource while the 6 or 8 other projects are set to 20 or less resource share. All the real cores are kept working 24/7 and when a intermittent high priority project actually gets work fetched the BOINC virtual cores get that WU and the OS deals with the extra thread sharing.

Is the <ncpus>N</ncpus> > than real ncpu's an issue for the n-body app?


The questions I'm seeking answers to again are:

1) Is a 7 day run time possible?
2) Will this 7 day WU get paid an appropriate amount of credit?
3) Should I abort this WU?
ID: 64253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 8 Aug 08
Posts: 21
Credit: 344,402
RAC: 170
Message 64264 - Posted: 21 Jan 2016, 5:37:05 UTC - in response to Message 64253.  

KILL IT!!!!! - IT SHOULD NEVER RUN THAT LONG. One to Two hours is the MOST I have ever seen on a normal N-body MT task.
Sounds like a very OLD BUG has crept into their code again.

ID: 64264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jeremy

Send message
Joined: 6 Jan 19
Posts: 1
Credit: 105,257,031
RAC: 0
Message 68060 - Posted: 24 Jan 2019, 20:21:08 UTC

Almost every N-body job I have allowed to run has failed to complete. I'm currently running one that has 2d 19h elapsed and 8d remaining. It's original estimate was for 12 minutes. (I would have aborted it earlier but I've been away.)
ID: 68060 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Windows : n-body WU going for 7 day deadline. Bug in n-body app regards /ncpus/N//ncpus/ config setting?

©2024 Astroinformatics Group