Welcome to MilkyWay@home

New Nbody version

Message boards : News : New Nbody version
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Sidd
Project developer
Project tester
Project scientist

Send message
Joined: 19 May 14
Posts: 73
Credit: 356,131
RAC: 0
Message 62395 - Posted: 24 Sep 2014, 18:00:27 UTC

Hey all,

I put out a new version of Nbody, version 1.44. This should fix some library issues people were having. If any issues arise, let me know.

-Sidd
ID: 62395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 123
Credit: 21,096,302
RAC: 1,845
Message 62410 - Posted: 27 Sep 2014, 20:29:11 UTC - in response to Message 62395.  

9/27/2014 9:59:52 PM | Milkyway@Home | Task ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited with zero status but no 'finished' file
9/27/2014 9:59:52 PM | Milkyway@Home | If this happens repeatedly you may need to reset the project.
9/27/2014 9:59:52 PM | Milkyway@Home | [task] task_state=UNINITIALIZED for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 from handle_premature_exit
9/27/2014 9:59:52 PM | Milkyway@Home | [task] task_state=EXECUTING for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 from start
9/27/2014 10:00:59 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:02:04 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:03:08 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:04:12 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:05:17 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:06:22 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:07:26 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:08:30 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:09:35 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:10:40 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:11:45 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:12:49 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:13:55 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:14:59 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:16:04 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:17:08 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:18:12 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:19:16 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:20:21 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:21:25 PM | Milkyway@Home | [checkpoint] result ps_nbody_09_10_orphan_real_2_1411504411_17126_2 checkpointed
9/27/2014 10:22:24 PM | Milkyway@Home | [task] Process for ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited, exit code 0, task state 1
9/27/2014 10:22:24 PM | Milkyway@Home | Task ps_nbody_09_10_orphan_real_2_1411504411_17126_2 exited with zero status but no 'finished' file
9/27/2014 10:22:24 PM | Milkyway@Home | If this happens repeatedly you may need to reset the project.


(Task Aborted after this cycle repeats 3 or 4 times)
ID: 62410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 62414 - Posted: 28 Sep 2014, 16:30:41 UTC - in response to Message 62410.  

Is this an isolated instance or does this consistently happen?
ID: 62414 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Eric Findley

Send message
Joined: 1 Jan 14
Posts: 24
Credit: 4,277,349
RAC: 0
Message 62465 - Posted: 4 Oct 2014, 10:58:41 UTC

Task 839721663

erifi_000 · log out

Name ps_nbody_08_05_orphan_sim_3_1411504411_79004_0
Workunit 624947649
Created 30 Sep 2014, 17:06:39 UTC
Sent 2 Oct 2014, 21:06:27 UTC
Report deadline 14 Oct 2014, 21:06:27 UTC
Received 3 Oct 2014, 17:30:20 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED
Computer ID 554457
Run time 4 hours 26 min 21 sec
CPU time 1 days 5 hours 0 min 48 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 28.32 GFLOPS
Application version MilkyWay@Home N-Body Simulation v1.44 (mt)

Stderr output
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>
<stderr_txt>
<search_application> milkyway_nbody 1.44 Windows x86_64 double OpenMP, Crlibm </search_application>
Using OpenMP 8 max threads on a system with 8 processors
RHO MAX IS 3192.19230
3192.19230Using OpenMP 8 max threads on a system with 8 processors
Using OpenMP 8 max threads on a system with 8 processors
Using OpenMP 8 max threads on a system with 8 processors
Using OpenMP 8 max threads on a system with 8 processors

</stderr_txt>
]]>




Main page · Your account · Message boards


Copyright © 2014 AstroInformatics Group
recent error I got. First one in awhile.
ID: 62465 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Eric Findley

Send message
Joined: 1 Jan 14
Posts: 24
Credit: 4,277,349
RAC: 0
Message 62501 - Posted: 7 Oct 2014, 0:07:42 UTC

Name ps_nbody_08_05_orphan_sim_3_1411504411_79004_0
Workunit 624947649
Created 30 Sep 2014, 17:06:39 UTC
Sent 2 Oct 2014, 21:06:27 UTC
Report deadline 14 Oct 2014, 21:06:27 UTC
Received 3 Oct 2014, 17:30:20 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED
Computer ID 554457
Run time 4 hours 26 min 21 sec
CPU time 1 days 5 hours 0 min 48 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 28.32 GFLOPS
Application version MilkyWay@Home N-Body Simulation v1.44 (mt)
Another one for you
ID: 62501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 62503 - Posted: 7 Oct 2014, 14:48:54 UTC - in response to Message 62465.  

This appears to be an issue with that specific work unit. RHO_MAX = 3000 is absurd. If it's just an isolated instance, I wouldn't worry.


Jake
ID: 62503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mycal

Send message
Joined: 13 Jan 08
Posts: 19
Credit: 820,482
RAC: 0
Message 62522 - Posted: 9 Oct 2014, 9:52:15 UTC

All appear to be going well except this one.

Task 843900336
mycal · log out
Name de_nbody_08_05_orphan_sim_3_1411504411_49264_4
Workunit 624873849
Created 6 Oct 2014, 4:47:51 UTC
Sent 7 Oct 2014, 0:06:01 UTC
Report deadline 19 Oct 2014, 0:06:01 UTC
Received 9 Oct 2014, 3:18:18 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 572628
Run time 1 days 12 hours 46 min 25 sec
CPU time 5 days 3 hours 15 min 14 sec
Validate state Checked, but no consensus yet
Credit 0.00
Device peak FLOPS 9.41 GFLOPS
Application version MilkyWay@Home N-Body Simulation v1.44 (mt)
Stderr output

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_nbody 1.44 Windows x86 double OpenMP, Crlibm </search_application>
Using OpenMP 4 max threads on a system with 4 processors
RHO MAX IS 994.77569
994.77569Using OpenMP 4 max threads on a system with 4 processors
Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Failed to move file 'nbody_checkpoint_tmp_5240' to 'nbody_checkpoint' (317): (null)Using OpenMP 4 max threads on a system with 4 processors
Using OpenMP 4 max threads on a system with 4 processors
<search_likelihood>-481.668042842351160</search_likelihood>
03:17:15 (6108): called boinc_finish

</stderr_txt>
]]>
ID: 62522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 62527 - Posted: 9 Oct 2014, 11:44:06 UTC - in response to Message 62522.  

This succeeded. I see some checkpointing issues, but nothing critical. Am I missing something?

Jake
ID: 62527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 123
Credit: 21,096,302
RAC: 1,845
Message 62536 - Posted: 11 Oct 2014, 7:00:16 UTC

There are some serious bugs in this task: http://milkyway.cs.rpi.edu/milkyway/results.php?userid=57700

Only 2 users have completed this task successfully and their validation was inconclusive. 3 have had errors while computing and the task was terminated.
1 user (me) has aborted the task after 1,007,779.00 seconds
Another user got the tasks new.

The 3 with computation errors are using Linux with Intel(R) Core(TM)2 Duo CPU E8500, Intel(R) Xeon(R) CPU E5410, Intel(R) Xeon(R) CPU E5504
I use Win8.1 64 Pro with AMD FX(tm)-6350 Six-Core Processor (this task hogged all 6 of my CPU's)

The two with success have:Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (Linux machine) and Intel(R) Core(TM) i7-4770 CPU (Win 7 64 pro SP1)

The new person is using: Intel(R) Core(TM) i3-2350M CPU with Win 7 64 pro SP1

I have been busy all week so I did not have time to look more closely at the logs for my system. But today I aborted it (all I seem to do is abort tasks from you guys) because it got caught up in the Exit 0 (What is this????) loop and kept restarting.

Maybe you guys should add some code that checks for Exit 0 errors and if the tasks gets more than 3 or 5 it forces Boincmgr to terminate the task.
ID: 62536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Eric Findley

Send message
Joined: 1 Jan 14
Posts: 24
Credit: 4,277,349
RAC: 0
Message 62541 - Posted: 11 Oct 2014, 22:42:02 UTC

Name de_nbody_08_05_orphan_sim_1_1411504411_153_4
Workunit 621565038
Created 10 Oct 2014, 5:17:34 UTC
Sent 10 Oct 2014, 6:12:16 UTC
Report deadline 22 Oct 2014, 6:12:16 UTC
Received 11 Oct 2014, 1:54:46 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED
Computer ID 554457
Run time 1 hours 53 min 55 sec
CPU time 12 hours 30 min 15 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 28.34 GFLOPS
Application version MilkyWay@Home N-Body Simulation v1.44 (mt
Here I have another
ID: 62541 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris Lee

Send message
Joined: 18 Feb 10
Posts: 3
Credit: 1,265,618
RAC: 0
Message 62596 - Posted: 19 Oct 2014, 8:34:46 UTC - in response to Message 62541.  

I only have a single task at the moment de_nbody_08_05_orphan_sim_2_ etc etc
It has been running for 28+ hours and has over 448 hours estimated remaining.
Is this the new normal??
ID: 62596 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 62598 - Posted: 19 Oct 2014, 16:30:24 UTC - in response to Message 62596.  

This is normal. N-body simulations are proving very difficult to estimate completion times for. Apologies. It should not actually take that long (probably no more than 15 hours).

Jake
ID: 62598 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris Lee

Send message
Joined: 18 Feb 10
Posts: 3
Credit: 1,265,618
RAC: 0
Message 62600 - Posted: 19 Oct 2014, 21:01:56 UTC - in response to Message 62598.  

Hi Jake. Thanks.
In addition to the extraordinary length of time being estimated, my other "concern" is that I have no other tasks being queued. Maybe this is a function of the time estimate (?)
ID: 62600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,565,482
RAC: 0
Message 62601 - Posted: 20 Oct 2014, 0:25:23 UTC - in response to Message 62600.  

Hi Jake. Thanks.
In addition to the extraordinary length of time being estimated, my other "concern" is that I have no other tasks being queued. Maybe this is a function of the time estimate (?)

There is no maybe about it. I always abort n-body WUs that take longer than 24 hours.
ID: 62601 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 62602 - Posted: 20 Oct 2014, 13:53:21 UTC - in response to Message 62601.  

Yes. This is absolutely true. If I recall, the algorithm that did this was changed recently, and we had to significantly revise this calculation. The issue with time estimates in n-body simulations is the the time estimate is potentially dependent on any parameters you use to describe your system, the number of bodies (resolution), and whatever n-body solver you are using. By time, I of course am referring to CPU and not clock time. Actually, this issue has piqued my interest in this topic. I might take another stab at it to see if we can get a better time estimate.

Jake
ID: 62602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Andrew Sanchez

Send message
Joined: 19 Oct 14
Posts: 1
Credit: 27,506
RAC: 0
Message 62608 - Posted: 21 Oct 2014, 21:58:26 UTC
Last modified: 21 Oct 2014, 22:13:08 UTC

Hi, i just joined mw@home and i'm having issues with these nbody wus. I don't mind that they use all 4 cpus and take a long time but i don't like the amount of "validation inconclusive"s i'm getting for the amount of time i'm spending crunching these wus. And it's not just me; the inconclusive wus are inconclusive on other computers too, not just mine. Some computers can't even run them without errors. These are the culprets:
ps_nbody_09_10_orphan_real_1_1413455402_15397
de_nbody_09_10_orphan_real_1_1413455402_19277
ps_nbody_09_10_orphan_real_2_1413455402_17734
de_nbody_08_05_orphan_sim_0_1413455402_1910
de_83_DR8_rev_8_4_00001_1413455402_1542781
de_nbody_08_05_orphan_sim_0_1413455402_14328

I guess one of those isn't an nbody but it inconclusive for me and another computer also.
I was going to stop crunching these nbody wus but i thought i'd post here first and see what you guys had to say about it. Will i get credit for these sometime or will they stay inconclusive? How many computers have to successfully run the wu before a consensus is reached (or is it not like that)?


LOL. I just went back to my "validation inconclusive" page and noticed one was missing since i made the above list. ps_nbody_09_10_orphan_real_1_1413455402_15397
that wu must have been validated while i was typing. So i guess i answered my own questions; yes, i will get credit and it take 3 successful runs to reach a consensus. LOL
(instead of erasing this post i'll leave it in case someone else has the same issue/questions i had)
ID: 62608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Russell

Send message
Joined: 25 Dec 10
Posts: 1
Credit: 3,757,042
RAC: 0
Message 62616 - Posted: 23 Oct 2014, 18:22:46 UTC

Hi,
Still getting a problem where the task starts as ~6 hours of processing and then over time increases to 30, 50 and up to 90 hours of processing Remaining.

I have had to kill several tasks that have been running for days with less than 10% complete.
ID: 62616 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 62623 - Posted: 24 Oct 2014, 12:32:48 UTC - in response to Message 62616.  

I almost have a solution. I'm just testing now. Abort these if you wish, but they shouldn't actually take that long.
ID: 62623 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 1 Apr 08
Posts: 30
Credit: 84,658,304
RAC: 28
Message 62630 - Posted: 26 Oct 2014, 11:05:54 UTC

Hi

I juste realized I have a MilkyWay@Home N-Body Simulation 1.44 (mt) running on my iMac (i7 late 2009) since more than a day

Nom ps_nbody_08_05_orphan_sim_3_1413455402_39907_0
Application MilkyWay@Home N-Body Simulation 1.44 (mt)
Nom unité ps_nbody_08_05_orphan_sim_3_1413455402_39907
Etat Calcul en cours
Reçue 24/10/2014 23:00:02
Date limite d'envoi 05/11/2014 22:00:02
Vitesse estimée 13,03 GFLOPs/sec
Taille de la tâche estimée 159 243 GFLOPs
Ressources 8 CPUs
Temps CPU au dernier point de reprise 09d,07:26:27
Temps CPU 09d,07:30:09
Temps de calcul 01d,09:48:49
Estimation temps restant 01d,04:35:09
Pourcentage effectué 54,189%
Taille mémoire virtuelle 2 472,48 MB
Réglage du volume de travail 30,50 MB
Répertoire slots/9
Processus ID 86875


The % is moving slowly, but it's moving. The WU is eating avg 600/650 % CPU on the mac (out of 800% max, Mac OS X measures 100% per core), so it's working.

So the calculation estimate is more than another day, it will be over 2 days of calculation if it terminates based on this estimate.

Is this OK ? should I let it run ?
ID: 62630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 62638 - Posted: 27 Oct 2014, 15:52:57 UTC - in response to Message 62630.  

2 days is absurd and you shouldn't need this long to compute any of these simulations running on 6 cores. I would recommend aborting if it looks like it is going to take this long.
ID: 62638 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : New Nbody version

©2024 Astroinformatics Group