New Nbody version
log in

Advanced search

Message boards : News : New Nbody version

1 · 2 · Next
Author Message
Sidd
Project developer
Project tester
Project scientist
Send message
Joined: 19 May 14
Posts: 60
Credit: 326,757
RAC: 1,065

Message 62395 - Posted: 24 Sep 2014, 18:00:27 UTC

Hey all,

I put out a new version of Nbody, version 1.44. This should fix some library issues people were having. If any issues arise, let me know.

-Sidd

greg_be
Send message
Joined: 18 Aug 09
Posts: 83
Credit: 3,772,113
RAC: 4,478

Message 62410 - Posted: 27 Sep 2014, 20:29:11 UTC - in response to Message 62395.

9/27/2014 9:59:52 PM | Milkyway@Home | Task ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited with zero status but no 'finished' file
9/27/2014 9:59:52 PM | Milkyway@Home | If this happens repeatedly you may need to reset the project.
9/27/2014 9:59:52 PM | Milkyway@Home | [task] task_state=UNINITIALIZED for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 from handle_premature_exit
9/27/2014 9:59:52 PM | Milkyway@Home | [task] task_state=EXECUTING for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 from start
9/27/2014 10:00:59 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:02:04 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:03:08 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:04:12 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:05:17 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:06:22 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:07:26 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:08:30 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:09:35 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:10:40 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:11:45 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:12:49 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:13:55 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:14:59 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:16:04 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:17:08 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:18:12 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:19:16 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:20:21 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:21:25 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:22:24 PM | Milkyway@Home | [task] Process for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited, exit code 0, task state 1
9/27/2014 10:22:24 PM | Milkyway@Home | Task ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited with zero status but no 'finished' file
9/27/2014 10:22:24 PM | Milkyway@Home | If this happens repeatedly you may need to reset the project.


(Task Aborted after this cycle repeats 3 or 4 times)

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0

Message 62414 - Posted: 28 Sep 2014, 16:30:41 UTC - in response to Message 62410.

Is this an isolated instance or does this consistently happen?

Eric Findley
Send message
Joined: 1 Jan 14
Posts: 24
Credit: 4,261,451
RAC: 0

Message 62465 - Posted: 4 Oct 2014, 10:58:41 UTC

Task 839721663

erifi_000 · log out

Name ps_nbody_08_05_orphan_sim_3_1411504411_79004_0
Workunit 624947649
Created 30 Sep 2014, 17:06:39 UTC
Sent 2 Oct 2014, 21:06:27 UTC
Report deadline 14 Oct 2014, 21:06:27 UTC
Received 3 Oct 2014, 17:30:20 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED
Computer ID 554457
Run time 4 hours 26 min 21 sec
CPU time 1 days 5 hours 0 min 48 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 28.32 GFLOPS
Application version MilkyWay@Home N-Body Simulation v1.44 (mt)

Stderr output
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>
<search_application> milkyway_nbody 1.44 Windows x86_64 double OpenMP, Crlibm </search_application>
Using OpenMP 8 max threads on a system with 8 processors
RHO MAX IS 3192.19230
3192.19230Using OpenMP 8 max threads on a system with 8 processors
Using OpenMP 8 max threads on a system with 8 processors
Using OpenMP 8 max threads on a system with 8 processors
Using OpenMP 8 max threads on a system with 8 processors

</stderr_txt>
]]>




Main page · Your account · Message boards


Copyright © 2014 AstroInformatics Group
recent error I got. First one in awhile.
____________

Eric Findley
Send message
Joined: 1 Jan 14
Posts: 24
Credit: 4,261,451
RAC: 0

Message 62501 - Posted: 7 Oct 2014, 0:07:42 UTC

Name ps_nbody_08_05_orphan_sim_3_1411504411_79004_0
Workunit 624947649
Created 30 Sep 2014, 17:06:39 UTC
Sent 2 Oct 2014, 21:06:27 UTC
Report deadline 14 Oct 2014, 21:06:27 UTC
Received 3 Oct 2014, 17:30:20 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED
Computer ID 554457
Run time 4 hours 26 min 21 sec
CPU time 1 days 5 hours 0 min 48 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 28.32 GFLOPS
Application version MilkyWay@Home N-Body Simulation v1.44 (mt)
Another one for you
____________

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0

Message 62503 - Posted: 7 Oct 2014, 14:48:54 UTC - in response to Message 62465.

This appears to be an issue with that specific work unit. RHO_MAX = 3000 is absurd. If it's just an isolated instance, I wouldn't worry.


Jake

mycal
Send message
Joined: 13 Jan 08
Posts: 19
Credit: 820,482
RAC: 16

Message 62522 - Posted: 9 Oct 2014, 9:52:15 UTC

All appear to be going well except this one.

Task 843900336
mycal · log out
Name de_nbody_08_05_orphan_sim_3_1411504411_49264_4
Workunit 624873849
Created 6 Oct 2014, 4:47:51 UTC
Sent 7 Oct 2014, 0:06:01 UTC
Report deadline 19 Oct 2014, 0:06:01 UTC
Received 9 Oct 2014, 3:18:18 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 572628
Run time 1 days 12 hours 46 min 25 sec
CPU time 5 days 3 hours 15 min 14 sec
Validate state Checked, but no consensus yet
Credit 0.00
Device peak FLOPS 9.41 GFLOPS
Application version MilkyWay@Home N-Body Simulation v1.44 (mt)
Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_nbody 1.44 Windows x86 double OpenMP, Crlibm </search_application>
Using OpenMP 4 max threads on a system with 4 processors
RHO MAX IS 994.77569
994.77569Using OpenMP 4 max threads on a system with 4 processors
Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Using OpenMP 4 max threads on a system with 4 processors
Using OpenMP 4 max threads on a system with 4 processors
<search_likelihood>-481.668042842351160</search_likelihood>
03:17:15 (6108): called boinc_finish

</stderr_txt>
]]>

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0

Message 62527 - Posted: 9 Oct 2014, 11:44:06 UTC - in response to Message 62522.

This succeeded. I see some checkpointing issues, but nothing critical. Am I missing something?

Jake

greg_be
Send message
Joined: 18 Aug 09
Posts: 83
Credit: 3,772,113
RAC: 4,478

Message 62536 - Posted: 11 Oct 2014, 7:00:16 UTC

There are some serious bugs in this task: http://milkyway.cs.rpi.edu/milkyway/results.php?userid=57700

Only 2 users have completed this task successfully and their validation was inconclusive. 3 have had errors while computing and the task was terminated.
1 user (me) has aborted the task after 1,007,779.00 seconds
Another user got the tasks new.

The 3 with computation errors are using Linux with Intel(R) Core(TM)2 Duo CPU E8500, Intel(R) Xeon(R) CPU E5410, Intel(R) Xeon(R) CPU E5504
I use Win8.1 64 Pro with AMD FX(tm)-6350 Six-Core Processor (this task hogged all 6 of my CPU's)

The two with success have:Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (Linux machine) and Intel(R) Core(TM) i7-4770 CPU (Win 7 64 pro SP1)

The new person is using: Intel(R) Core(TM) i3-2350M CPU with Win 7 64 pro SP1

I have been busy all week so I did not have time to look more closely at the logs for my system. But today I aborted it (all I seem to do is abort tasks from you guys) because it got caught up in the Exit 0 (What is this????) loop and kept restarting.

Maybe you guys should add some code that checks for Exit 0 errors and if the tasks gets more than 3 or 5 it forces Boincmgr to terminate the task.

Eric Findley
Send message
Joined: 1 Jan 14
Posts: 24
Credit: 4,261,451
RAC: 0

Message 62541 - Posted: 11 Oct 2014, 22:42:02 UTC

Name de_nbody_08_05_orphan_sim_1_1411504411_153_4
Workunit 621565038
Created 10 Oct 2014, 5:17:34 UTC
Sent 10 Oct 2014, 6:12:16 UTC
Report deadline 22 Oct 2014, 6:12:16 UTC
Received 11 Oct 2014, 1:54:46 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED
Computer ID 554457
Run time 1 hours 53 min 55 sec
CPU time 12 hours 30 min 15 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 28.34 GFLOPS
Application version MilkyWay@Home N-Body Simulation v1.44 (mt
Here I have another
____________

Chris Lee
Send message
Joined: 18 Feb 10
Posts: 3
Credit: 1,265,618
RAC: 122

Message 62596 - Posted: 19 Oct 2014, 8:34:46 UTC - in response to Message 62541.

I only have a single task at the moment de_nbody_08_05_orphan_sim_2_ etc etc
It has been running for 28+ hours and has over 448 hours estimated remaining.
Is this the new normal??

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0

Message 62598 - Posted: 19 Oct 2014, 16:30:24 UTC - in response to Message 62596.

This is normal. N-body simulations are proving very difficult to estimate completion times for. Apologies. It should not actually take that long (probably no more than 15 hours).

Jake

Chris Lee
Send message
Joined: 18 Feb 10
Posts: 3
Credit: 1,265,618
RAC: 122

Message 62600 - Posted: 19 Oct 2014, 21:01:56 UTC - in response to Message 62598.

Hi Jake. Thanks.
In addition to the extraordinary length of time being estimated, my other "concern" is that I have no other tasks being queued. Maybe this is a function of the time estimate (?)

swiftmallard
Avatar
Send message
Joined: 18 Jul 09
Posts: 289
Credit: 302,980,648
RAC: 0

Message 62601 - Posted: 20 Oct 2014, 0:25:23 UTC - in response to Message 62600.

Hi Jake. Thanks.
In addition to the extraordinary length of time being estimated, my other "concern" is that I have no other tasks being queued. Maybe this is a function of the time estimate (?)

There is no maybe about it. I always abort n-body WUs that take longer than 24 hours.

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0

Message 62602 - Posted: 20 Oct 2014, 13:53:21 UTC - in response to Message 62601.

Yes. This is absolutely true. If I recall, the algorithm that did this was changed recently, and we had to significantly revise this calculation. The issue with time estimates in n-body simulations is the the time estimate is potentially dependent on any parameters you use to describe your system, the number of bodies (resolution), and whatever n-body solver you are using. By time, I of course am referring to CPU and not clock time. Actually, this issue has piqued my interest in this topic. I might take another stab at it to see if we can get a better time estimate.

Jake

Profile Andrew Sanchez
Send message
Joined: 19 Oct 14
Posts: 1
Credit: 27,506
RAC: 0

Message 62608 - Posted: 21 Oct 2014, 21:58:26 UTC
Last modified: 21 Oct 2014, 22:13:08 UTC

Hi, i just joined mw@home and i'm having issues with these nbody wus. I don't mind that they use all 4 cpus and take a long time but i don't like the amount of "validation inconclusive"s i'm getting for the amount of time i'm spending crunching these wus. And it's not just me; the inconclusive wus are inconclusive on other computers too, not just mine. Some computers can't even run them without errors. These are the culprets:
ps_nbody_09_10_orphan_real_1_1413455402_15397
de_nbody_09_10_orphan_real_1_1413455402_19277
ps_nbody_09_10_orphan_real_2_1413455402_17734
de_nbody_08_05_orphan_sim_0_1413455402_1910
de_83_DR8_rev_8_4_00001_1413455402_1542781
de_nbody_08_05_orphan_sim_0_1413455402_14328

I guess one of those isn't an nbody but it inconclusive for me and another computer also.
I was going to stop crunching these nbody wus but i thought i'd post here first and see what you guys had to say about it. Will i get credit for these sometime or will they stay inconclusive? How many computers have to successfully run the wu before a consensus is reached (or is it not like that)?


LOL. I just went back to my "validation inconclusive" page and noticed one was missing since i made the above list. ps_nbody_09_10_orphan_real_1_1413455402_15397
that wu must have been validated while i was typing. So i guess i answered my own questions; yes, i will get credit and it take 3 successful runs to reach a consensus. LOL
(instead of erasing this post i'll leave it in case someone else has the same issue/questions i had)

Russell
Send message
Joined: 25 Dec 10
Posts: 1
Credit: 2,672,901
RAC: 499

Message 62616 - Posted: 23 Oct 2014, 18:22:46 UTC

Hi,
Still getting a problem where the task starts as ~6 hours of processing and then over time increases to 30, 50 and up to 90 hours of processing Remaining.

I have had to kill several tasks that have been running for days with less than 10% complete.

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0

Message 62623 - Posted: 24 Oct 2014, 12:32:48 UTC - in response to Message 62616.

I almost have a solution. I'm just testing now. Abort these if you wish, but they shouldn't actually take that long.

Profile [AF>Le_Pommier] Jerome_C2005
Send message
Joined: 1 Apr 08
Posts: 19
Credit: 452,024
RAC: 1

Message 62630 - Posted: 26 Oct 2014, 11:05:54 UTC

Hi

I juste realized I have a MilkyWay@Home N-Body Simulation 1.44 (mt) running on my iMac (i7 late 2009) since more than a day

Nom ps_nbody_08_05_orphan_sim_3_1413455402_39907_0
Application MilkyWay@Home N-Body Simulation 1.44 (mt)
Nom unité ps_nbody_08_05_orphan_sim_3_1413455402_39907
Etat Calcul en cours
Reçue 24/10/2014 23:00:02
Date limite d'envoi 05/11/2014 22:00:02
Vitesse estimée 13,03 GFLOPs/sec
Taille de la tâche estimée 159 243 GFLOPs
Ressources 8 CPUs
Temps CPU au dernier point de reprise 09d,07:26:27
Temps CPU 09d,07:30:09
Temps de calcul 01d,09:48:49
Estimation temps restant 01d,04:35:09
Pourcentage effectué 54,189%
Taille mémoire virtuelle 2 472,48 MB
Réglage du volume de travail 30,50 MB
Répertoire slots/9
Processus ID 86875


The % is moving slowly, but it's moving. The WU is eating avg 600/650 % CPU on the mac (out of 800% max, Mac OS X measures 100% per core), so it's working.

So the calculation estimate is more than another day, it will be over 2 days of calculation if it terminates based on this estimate.

Is this OK ? should I let it run ?

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0

Message 62638 - Posted: 27 Oct 2014, 15:52:57 UTC - in response to Message 62630.

2 days is absurd and you shouldn't need this long to compute any of these simulations running on 6 cores. I would recommend aborting if it looks like it is going to take this long.

1 · 2 · Next
Post to thread

Message boards : News : New Nbody version


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group