Welcome to MilkyWay@home

New Release- Nbody version 1.52

Message boards : News : New Release- Nbody version 1.52
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Sidd
Project developer
Project tester
Project scientist

Send message
Joined: 19 May 14
Posts: 73
Credit: 356,131
RAC: 0
Message 63907 - Posted: 8 Sep 2015, 18:08:31 UTC
Last modified: 8 Sep 2015, 18:10:05 UTC

Hey all,

We will be rolling out a new version of Nbody, v1.52, within the next 24hrs. I have left such a large time window because, as these things sometimes go, there could be unforeseen setbacks. I will update everyone once we are about to hit the go button (if only we had an actual go button. That would be so cool.)

We are releasing for windows again. I tested the binaries on a windows machine and they appear to be working. But you never know. Therefore, I will be paying very close attention to the forums for weird behavior. Please let us know!

If you would like more details on what is included in this release, check out my development blog:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3798

Thanks everyone for your support
Cheers,
Sidd
ID: 63907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sidd
Project developer
Project tester
Project scientist

Send message
Joined: 19 May 14
Posts: 73
Credit: 356,131
RAC: 0
Message 63913 - Posted: 9 Sep 2015, 17:15:00 UTC - in response to Message 63907.  

Hey All,

I have released the new version and about to put up runs. Please let me know if there are any problems.


Thanks!
Cheers,
Sidd
ID: 63913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Robert Kapernick

Send message
Joined: 21 Dec 09
Posts: 8
Credit: 30,540,368
RAC: 0
Message 63914 - Posted: 9 Sep 2015, 22:30:20 UTC - in response to Message 63913.  

I am running N-Body 1.52 and the first one that started (ps_nbody_9_09_15_orphan_sim_0_1437561602_553_1) has been running for over an hour and has been at 100% complete most of the time. Estimated time remaining when I first downloaded it was approx. 2 minutes. I am running it on a Mac 3.4 Ghz i7 , AMD HD 6970M Graphics card with 2 G video ram, running 10.9.5.
ID: 63914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sidd
Project developer
Project tester
Project scientist

Send message
Joined: 19 May 14
Posts: 73
Credit: 356,131
RAC: 0
Message 63915 - Posted: 9 Sep 2015, 23:47:02 UTC - in response to Message 63914.  

Hey,

Thanks for letting me know. Is it still at 100%? If it remains there for a few more hours I would say cancel it. Can you send me the work unit id or the parameters that went with it?


Thanks,
Sidd
ID: 63915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Robert Kapernick

Send message
Joined: 21 Dec 09
Posts: 8
Credit: 30,540,368
RAC: 0
Message 63921 - Posted: 11 Sep 2015, 2:11:13 UTC - in response to Message 63915.  

I have run 4 Nbody 1.52s so far. Here are my results:

1 - ps_nbody_9_09_15_orphan_sim_0_1437561602_553_1 - killed after running 9:44:39

Task 1251716235 WU - 921752030

2 - ps_nbody_9_09_15_orphan_sim_0_1437561602_13760_2 - completed

Task 1251925803 WU - 921905293

3 - de_nbody_9_09_15_orphan_sim_0_1437561602_12617_0 - completed

Task 1251715785 WU - 921889142

4 - de_nbody_9_09_15_orphan_sim_0_1437561602_10185_1 - Computational error

CPU Time 00:43:52 Elapsed Time 00:10:52

Task 1251715808 WU 921858448
ID: 63921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kararom

Send message
Joined: 9 Jan 09
Posts: 19
Credit: 1,009,149,134
RAC: 0
Message 63925 - Posted: 12 Sep 2015, 4:51:35 UTC

What should be app_config.xml for cranching this application?
ID: 63925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Robert Kapernick

Send message
Joined: 21 Dec 09
Posts: 8
Credit: 30,540,368
RAC: 0
Message 63927 - Posted: 12 Sep 2015, 7:13:13 UTC - in response to Message 63913.  

Sidd,

FYI, I ran 7 more. 2 completed, 5 failed because of computational errors (WUs - 923563904, 923563905,

923563906, 923399449, 923563898)

Robert
ID: 63927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wisesooth

Send message
Joined: 2 Oct 14
Posts: 43
Credit: 54,788,421
RAC: 2,543
Message 63929 - Posted: 13 Sep 2015, 19:22:26 UTC - in response to Message 63914.  

I run Milkyway@home on two machines: an i7 Intel liquid-cooled screamer, and a mini-ITX PC with an Intel i5. Most of the time the N-Body tasks finish quickly. However, I found two very long-run tasks; one on each machine.

Task 1252250516 WU 922273871 CPU i7 RT 6719.88 CPU-Time 40511.71

Task 1253332518 WU 923062623 CPU i5 RT 5528.59 CPU-time 20362.66

Both finished and await consensus. Remember, you asked.
ID: 63929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sidd
Project developer
Project tester
Project scientist

Send message
Joined: 19 May 14
Posts: 73
Credit: 356,131
RAC: 0
Message 63931 - Posted: 14 Sep 2015, 16:56:53 UTC - in response to Message 63929.  

Thank you!

I think it is ok if they run for a while because for those parameters the timestep was probably determined to be a bit small. The real issue is if they hang at 100% for a long time.


Cheers,
Sidd
ID: 63931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Captiosus

Send message
Joined: 9 Apr 14
Posts: 35
Credit: 9,708,616
RAC: 0
Message 63934 - Posted: 17 Sep 2015, 12:19:10 UTC

It works! Awesome! I just re-enabled it and my CPU is chewing through MT tasks like they're candy.


I would like to ask though: Is there anything that can be done to further optimize CPU useage so there arent large periods of low (single thread) CPU utilization?

As it stands right now, on my CPU with the MT tasks, theres about a minute of single thread activity (which as I understand it is the initialization period that cannot be multi-threaded), and once the initialization period is complete theres a quick (30sec) burst where the task uses all of the designated threads and completes itself (in my case, 15).
Is there any way that this could be altered without breaking NBody again?
ID: 63934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sidd
Project developer
Project tester
Project scientist

Send message
Joined: 19 May 14
Posts: 73
Credit: 356,131
RAC: 0
Message 63936 - Posted: 18 Sep 2015, 0:13:06 UTC - in response to Message 63934.  

Hey,


That would be an awesome thing to get working. We have been tossing ideas around about how to do it. But it is still, unfortunately, a work in progress.

Also, right now we want to focus on making sure the application can actually return results before tinkering with the code again. But, again, we have that on our to do list!

Cheers,
Sidd
ID: 63936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TomB

Send message
Joined: 6 Oct 09
Posts: 3
Credit: 2,757,339
RAC: 0
Message 63937 - Posted: 18 Sep 2015, 4:43:12 UTC

I've got one that's seems to be causing problems. This one has been running for 78 hours and it's been at 100% since about 9 hours and it never did save a checkpoint. The original estimate was for about 1.5 hours. BOINC indicates it is using 8 cpus, but the activity monitor shows it is only using 1.3% cpu with 2 threads. I've seen some with the previous version of N-body work like this, so I'll probably watch it for a little while longer to see if it finishes.

I'm running a Mac Mini with OS 10.10.5

The work unit is de_nbody_9_09_15_orphan_sim_0_1437561602_56033_0
ID: 63937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Captiosus

Send message
Joined: 9 Apr 14
Posts: 35
Credit: 9,708,616
RAC: 0
Message 63938 - Posted: 18 Sep 2015, 9:25:12 UTC - in response to Message 63936.  

Hey,


That would be an awesome thing to get working. We have been tossing ideas around about how to do it. But it is still, unfortunately, a work in progress.

Also, right now we want to focus on making sure the application can actually return results before tinkering with the code again. But, again, we have that on our to do list!

Cheers,
Sidd

Oh goodie. Any ideas on doing that that seem viable?
ID: 63938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Phil1966

Send message
Joined: 25 Dec 11
Posts: 20
Credit: 119,498,787
RAC: 0
Message 63943 - Posted: 20 Sep 2015, 13:16:52 UTC - in response to Message 63937.  
Last modified: 20 Sep 2015, 13:53:01 UTC

I've got one that's seems to be causing problems. This one has been running for 78 hours and it's been at 100% since about 9 hours and it never did save a checkpoint. The original estimate was for about 1.5 hours. BOINC indicates it is using 8 cpus, but the activity monitor shows it is only using 1.3% cpu with 2 threads. I've seen some with the previous version of N-body work like this, so I'll probably watch it for a little while longer to see if it finishes.

I'm running a Mac Mini with OS 10.10.5

The work unit is de_nbody_9_09_15_orphan_sim_0_1437561602_56033_0



Hello !

Same for me :

This app is supposed to use X cores, but it seems, regardless the CPU's type, it uses only up to 25 % of the CPU during 50 % of the WU's lenght, and then up to 100 % for the remaining 50 % of the time.

Concerning "never ending WU's" (same problem with GPU WU's), why don't you add a WU time limit, like ie Bitcoin Utopia and others ?

NB In general, MW WU's could easily be 10x up to 100x bigger :)

Thank You

Best

Phil1966
ID: 63943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Captiosus

Send message
Joined: 9 Apr 14
Posts: 35
Credit: 9,708,616
RAC: 0
Message 63948 - Posted: 22 Sep 2015, 11:38:04 UTC - in response to Message 63943.  

I've got one that's seems to be causing problems. This one has been running for 78 hours and it's been at 100% since about 9 hours and it never did save a checkpoint. The original estimate was for about 1.5 hours. BOINC indicates it is using 8 cpus, but the activity monitor shows it is only using 1.3% cpu with 2 threads. I've seen some with the previous version of N-body work like this, so I'll probably watch it for a little while longer to see if it finishes.

I'm running a Mac Mini with OS 10.10.5

The work unit is de_nbody_9_09_15_orphan_sim_0_1437561602_56033_0

This app is supposed to use X cores, but it seems, regardless the CPU's type, it uses only up to 25 % of the CPU during 50 % of the WU's lenght, and then up to 100 % for the remaining 50 % of the time.


I think the long period of having low CPU Utilization (1-2 cores at most) is the initialization period, the setting up of the work so that computation can actually proceed. The problem with it is that it is one of those serial tasks that cant easily be split up into multiple threads for processing. I get the same thing as well; a long batch of single threaded work (3-5 min) running on a single core, and then a short burst using all of the set cores to do the actual compute.

What I was thinking about suggesting was the splitting of the initialization period and compute period into 2 distinct tasks. Uninitialized work is sent out in batches, and they get prepped for computing in groups (with my Xeon it'd be 15 tasks at once getting initialized). Once initialized, the workunit is passed through an SHA hash function, which is then sent to the MW@H servers for comparison. If the results from (n) number of clients match, the workunits on those computers are flagged as ready for compute and will be processed at the next task switch. Once the MT tasks are complete, they are sent in for standard end of work processing and credit is awarded.

Alternatively, work that is initialized is sent to the MW@H servers (not just the hash) for comparison to ensure they are initialized properly. A small block of credit is awarded if they are, and then the initialized work is sent back out to begin the actual compute process like any other work unit. Once complete, its sent back in and normal end of work processing is done.

A third alternative would be to have the MW@H program take uninitialized workunits, initialize them in batches, checkpoint them at the end of the initialization period, then once a number are ready, switch to MT mode and rip through them before sending the work in.


Now, I am aware this would increase overhead for the project by a not insignificant amount, but in the end the whole idea would be to minimize idle time on the clients (the major issue at the moment) so more work can be done. Any ideas to improve it would be nice.
ID: 63948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Amis des Lapins] Phil1966

Send message
Joined: 25 Dec 11
Posts: 20
Credit: 119,498,787
RAC: 0
Message 63952 - Posted: 26 Sep 2015, 20:07:29 UTC - in response to Message 63931.  
Last modified: 26 Sep 2015, 20:08:47 UTC

Dear Sidd,

Any idea why this application / the WU's are staying at 0 % for 1 minute before it starts computing ?

NB No progression + almost 0 % CPU use for 1 minute, at least on my computers
ID 551092 + 612238

Thank You

Best Regards
ID: 63952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 539,266,369
RAC: 95,384
Message 63953 - Posted: 26 Sep 2015, 20:20:56 UTC - in response to Message 63952.  

The effect of running the newest BOINC Manager 7.6.9. The changes were put into effect for some other troublesome projects regarding displaying percentages.

BOINC Version History 7.6.9
ID: 63953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wisesooth

Send message
Joined: 2 Oct 14
Posts: 43
Credit: 54,788,421
RAC: 2,543
Message 63988 - Posted: 11 Oct 2015, 16:53:52 UTC

Just upgraded my liquid-cooled Intel i7 to Windows version 10. Noticed two errors on my stats within an hour or two, even though I suspended the tasks without leaving their state in cache before I started. Windows the new Win 10 OS called shutdown() for some strange reason (probably an update requiring restart). It did not bring BOINC down gracefully. Checkpoint did not recover the task successfully after I resumed activity. Thought you might like to know.
ID: 63988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Robert Kapernick

Send message
Joined: 21 Dec 09
Posts: 8
Credit: 30,540,368
RAC: 0
Message 64018 - Posted: 20 Oct 2015, 17:40:42 UTC - in response to Message 63913.  

I have found out that if one runs long (more than a reasonable time) all I have to do is suspend it and then restart BOINC Manager and it will reset itself and run to completion. I have not had to abort any since I started doing this.

Robert
ID: 64018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 1 Apr 08
Posts: 30
Credit: 84,549,863
RAC: 0
Message 64064 - Posted: 6 Nov 2015, 7:21:02 UTC

Hi

Nbody is really not working well on my iMac : it behaves like mt app (ie stops all other apps running on my i7 8 cores) BUT it only takes 1/8th of the CPU power.




(99% means "of one core", not all 8 cores, look at the CPU charge at the bottom)

Besides sometimes it doesn't stop to run after 100 % completion, and just continues forever using 0% CPU, blocking boinc, I have to cancel it ! (if I can see it, I don't spend all day in front of my computer...)

Actually the current one is not 100% yet but already stopped using CPU for a while (and lasted already much more than the 9 min of the previous one, that ran well)





Or it will crash by itself (there were others).

I'm removing nBody from my preferences for the moment, I hope this will be solved soon.

Thanks for your help.
ID: 64064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : New Release- Nbody version 1.52

©2024 Astroinformatics Group