Welcome to MilkyWay@home

N-Body's Trashing Task & Project Queues...

Message boards : Number crunching : N-Body's Trashing Task & Project Queues...
Message board moderation

To post messages, you must log in.

AuthorMessage
Jimmy Gondek

Send message
Joined: 28 Sep 11
Posts: 60
Credit: 22,764,173
RAC: 0
Message 52139 - Posted: 31 Dec 2011, 18:27:10 UTC

Here's a new one by me...

MWAH server has been sending out long runs of N-Body 0.84 (mt) WUs the past couple of days (with an intermittent 0.82 tossed in occasionally). After crunching nothing but N-body's for 10 minutes-or-so the Task and Project queues in BOINC Manager suddenly go blank, empty, nada, zilch. No error report occurs, no Wus are being crunched (as they've all disappeared) and BOINC Manager needs to be restarted...where it happily picks up where it left off with everything it had intact prior to this, er, "crash". This requires constant babysitting of BOINC Manager/MWAH.

Related to this is that BOINC Manager continues to send and receive WUs for only <5 minute intervals until it comes up dry on new WUs and then requires a manual update to begin the cycle anew. This, also requires constant babysitting of BOINC Manager/MWAH.

Log Report leading up to most recent occurrence...

Sat Dec 31 12:46:18 2011 | Milkyway@home | [task] task_state=EXITED for nbody-Plum_Embedded_3297335_0 from handle_exited_app
Sat Dec 31 12:46:18 2011 | Milkyway@home | [task] process exited with status 0
Sat Dec 31 12:46:18 2011 | Milkyway@home | Computation for task nbody-Plum_Embedded_3297335_0 finished
Sat Dec 31 12:46:18 2011 | Milkyway@home | [task] result state=FILES_UPLOADING for nbody-Plum_Embedded_3297335_0 from CS::app_finished
Sat Dec 31 12:46:18 2011 | Milkyway@home | [task] result state=FILES_UPLOADED for nbody-Plum_Embedded_3297335_0 from CS::update_results
Sat Dec 31 12:46:18 2011 | Milkyway@home | [task] ACTIVE_TASK::start(): forked process: pid 10390
Sat Dec 31 12:46:18 2011 | Milkyway@home | [task] task_state=EXECUTING for nbody-Plum_Embedded_3297328_0 from start
Sat Dec 31 12:46:18 2011 | Milkyway@home | Starting task nbody-Plum_Embedded_3297328_0 using milkyway_nbody version 84
Sat Dec 31 12:46:21 2011 | Milkyway@home | Sending scheduler request: To fetch work.
Sat Dec 31 12:46:21 2011 | Milkyway@home | Reporting 10 completed tasks, requesting new tasks for CPU
Sat Dec 31 12:46:21 2011 | | [error] No HTTP input file sched_request_milkyway.cs.rpi.edu_milkyway.xml
Sat Dec 31 12:46:21 2011 | Milkyway@home | Scheduler request initialization failed: fopen() failed

I'm reporting this here as I've never encountered this with my other two projects, SETI & CPDN...if this needs to be reported at BOINC instead I'll be glad to send it their way.

:)
ID: 52139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jimmy Gondek

Send message
Joined: 28 Sep 11
Posts: 60
Credit: 22,764,173
RAC: 0
Message 52142 - Posted: 31 Dec 2011, 20:15:59 UTC

Another occurrence, this time followed by an auto-reboot about a minute later...

Sat Dec 31 15:11:39 2011 | Milkyway@home | [task] task_state=EXECUTING for nbody-Plum_Embedded_3430140_1 from start
Sat Dec 31 15:11:39 2011 | Milkyway@home | Starting task nbody-Plum_Embedded_3430140_1 using milkyway_nbody version 84
Sat Dec 31 15:11:45 2011 | Milkyway@home | [task] Process for nbody-Plum_Embedded_3430140_1 exited
Sat Dec 31 15:11:45 2011 | Milkyway@home | [task] task_state=EXITED for nbody-Plum_Embedded_3430140_1 from handle_exited_app
Sat Dec 31 15:11:45 2011 | Milkyway@home | [task] process exited with status 0
Sat Dec 31 15:11:45 2011 | Milkyway@home | Computation for task nbody-Plum_Embedded_3430140_1 finished
Sat Dec 31 15:11:45 2011 | Milkyway@home | [task] result state=FILES_UPLOADING for nbody-Plum_Embedded_3430140_1 from CS::app_finished
Sat Dec 31 15:11:45 2011 | Milkyway@home | [task] result state=FILES_UPLOADED for nbody-Plum_Embedded_3430140_1 from CS::update_results
Sat Dec 31 15:11:45 2011 | Milkyway@home | [task] ACTIVE_TASK::start(): forked process: pid 13778
Sat Dec 31 15:11:45 2011 | Milkyway@home | [task] task_state=EXECUTING for nbody-Plum_Embedded_3430129_1 from start
Sat Dec 31 15:11:45 2011 | Milkyway@home | Starting task nbody-Plum_Embedded_3430129_1 using milkyway_nbody version 84
ID: 52142 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jimmy Gondek

Send message
Joined: 28 Sep 11
Posts: 60
Credit: 22,764,173
RAC: 0
Message 52146 - Posted: 31 Dec 2011, 22:58:53 UTC
Last modified: 31 Dec 2011, 22:59:34 UTC

Another variation on the theme, this time causing a WU freeze while forcing 4 CPU's to run continuously for over an hour (until it was babysat) with zero work being performed! This one required a manual BOINC restart...

31-Dec-2011 16:29:58 [Milkyway@home] Computation for task nbody-Plum_Embedded_3510080_0 finished
31-Dec-2011 16:29:58 [Milkyway@home] [task] result state=FILES_UPLOADING for nbody-Plum_Embedded_3510080_0 from CS::app_finished
31-Dec-2011 16:29:58 [Milkyway@home] [task] result state=FILES_UPLOADED for nbody-Plum_Embedded_3510080_0 from CS::update_results
31-Dec-2011 16:29:58 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 15500
31-Dec-2011 16:29:58 [Milkyway@home] [task] task_state=EXECUTING for nbody-Plum_Embedded_3510079_0 from start
31-Dec-2011 16:29:58 [Milkyway@home] Starting task nbody-Plum_Embedded_3510079_0 using milkyway_nbody version 84
31-Dec-2011 16:30:04 [Milkyway@home] [task] Process for nbody-Plum_Embedded_3510079_0 exited
31-Dec-2011 16:30:04 [Milkyway@home] [task] task_state=EXITED for nbody-Plum_Embedded_3510079_0 from handle_exited_app
31-Dec-2011 16:30:04 [Milkyway@home] [task] process exited with status 0
31-Dec-2011 16:30:04 [Milkyway@home] Computation for task nbody-Plum_Embedded_3510079_0 finished
31-Dec-2011 16:30:04 [Milkyway@home] [task] result state=FILES_UPLOADING for nbody-Plum_Embedded_3510079_0 from CS::app_finished
31-Dec-2011 16:30:04 [Milkyway@home] [task] result state=FILES_UPLOADED for nbody-Plum_Embedded_3510079_0 from CS::update_results
31-Dec-2011 16:30:04 [Milkyway@home] [task] ACTIVE_TASK::start(): forked process: pid 15506
31-Dec-2011 16:30:04 [Milkyway@home] [task] task_state=EXECUTING for nbody-Plum_Embedded_3511208_0 from start
31-Dec-2011 16:30:04 [Milkyway@home] Starting task nbody-Plum_Embedded_3511208_0 using milkyway_nbody version 84
31-Dec-2011 17:48:22 [---] Starting BOINC client version 6.12.35 for x86_64-apple-darwin
31-Dec-2011 17:48:22 [---] log flags: file_xfer, sched_ops, task, task_debug
31-Dec-2011 17:48:22 [---] Libraries: libcurl/7.19.7 OpenSSL/0.9.7l zlib/1.2.3 c-ares/1.6.0
31-Dec-2011 17:48:22 [---] Data directory: /Library/Application Support/BOINC Data
31-Dec-2011 17:48:22 [---] Processor: 8 GenuineIntel Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz [x86 Family 6 Model 42 Stepping 7]
31-Dec-2011 17:48:22 [---] Processor features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 CX16 TPR PDCM SSE4.1 SSE4.2 xAPIC POPCNT AES XSAVE OSXSAVE PCID TSCTMR AVX1.0
31-Dec-2011 17:48:22 [---] OS: Mac OS X 10.6.8 (Darwin 10.8.0)
ID: 52146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
paris
Avatar

Send message
Joined: 26 Apr 08
Posts: 87
Credit: 64,801,496
RAC: 0
Message 52148 - Posted: 1 Jan 2012, 0:40:47 UTC

I'm not getting the problems the OP has been having, but several dozen N-body units have finished with an error after a few seconds of crunching. It only happens on a Mac mini Core Duo running Tiger (10.4.11). It does not occur on my Mac mini running with a Core 2 Duo and Leopard (10.5.x). Separation units are fine on both machines.


Plus SETI Classic = 21,082 WUs
ID: 52148 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jimmy Gondek

Send message
Joined: 28 Sep 11
Posts: 60
Credit: 22,764,173
RAC: 0
Message 52174 - Posted: 2 Jan 2012, 12:19:25 UTC

Sunday Update: The strings of sporadically available N-Body's that were available were all received, crunched and sent back without incident... :)

Monday Update: MWAH server again sending full compliment of WUs, things seem to be behaving normally.

As for the weekend glitches I can only guess...corrupt WUs? BOINC Manager/MWAH has difficulty with scheduling/processing long runs of 5-second-long 8-CPU N-Body's?

Just reporting the facts...thought you folks would like to know about any errors that occurred when the system functions in those conditions.

:)
ID: 52174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Roger

Send message
Joined: 18 Jun 08
Posts: 7
Credit: 712,972
RAC: 0
Message 52567 - Posted: 18 Jan 2012, 15:06:37 UTC

Not an expert so I am not sure if this is the correct place to put this or not.. I am having an issue with these N-Body tasks with BOINC.. however a bit different than what I have ready so far..

when first started I get about a dozen WU's downloaded to BOINC and it happily starts working along with the other projects - I notice there are two primary types of files however.

ps_separation_82_2S_mix0_3_xxxxxx_x

and

nbody-Plum_embedded_xxxxxx_0 for example


any file with the ps_separation name works fine and gets processed with no issues - once they run out however they seem to all be replaced with the nbody-plum type files and they NEVER run - I have watched it over the course of days and once I reach the point of all downloaded files being of the nbody type I never get another WU processed - I have tried resetting the project which basically resets ok but repeats the same pattern.. I have also detatched and re-attached the project - again with the same results.

Any reason why Milkyway gets hung up on these small files? I have no objection to the small WU but I do when it reaches a point of stopping and never doing another WU without babysitting it..


any suggestions?
ID: 52567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jimmy Gondek

Send message
Joined: 28 Sep 11
Posts: 60
Credit: 22,764,173
RAC: 0
Message 53470 - Posted: 28 Feb 2012, 13:57:10 UTC

Resolution Update: Looks like the problem was identified as a narrow Mac OSX 10.6.8 BOINC 6.12.34/35 issue which was cleared with BOINC 6.12.41 as noted here...

"WU Freezes BOINC Manager" Redux...:
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2789&nowrap=true#53464

...thanks Kashi and scasady for figuring this out! Also thanks to everyone at MWAH & BOINC who spent time with me trying to solve this!

Oh, happy day,
:)
ID: 53470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jimmy Gondek

Send message
Joined: 28 Sep 11
Posts: 60
Credit: 22,764,173
RAC: 0
Message 54068 - Posted: 19 Apr 2012, 13:00:37 UTC

6-Week Final Update:...looks like BOINC Manager 6.12.43 has solved all of my issues with spontaneous restarts, waiting for gpu statuses and mdnsresponder system freezes...yay!...

...seeing how 6.12.35 was not playing well with OSX 10.6.8, perhaps the kind folks at BOINC would consider elevating 6.12.43 as their preferred v.6 OSX install on this page?...

Download BOINC client software:
http://boinc.berkeley.edu/download_all.php

...and again, my sincere thanks to everyone for all your time, help, insights and suggestions!... :)
ID: 54068 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : N-Body's Trashing Task & Project Queues...

©2024 Astroinformatics Group