Welcome to MilkyWay@home

updated the nbody applications again

Message boards : News : updated the nbody applications again
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 42109 - Posted: 14 Sep 2010, 8:09:24 UTC

Now at v0.06. Let us know how they're running here.
ID: 42109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
(retired account)
Avatar

Send message
Joined: 17 Oct 08
Posts: 36
Credit: 411,744
RAC: 0
Message 42115 - Posted: 14 Sep 2010, 13:54:00 UTC
Last modified: 14 Sep 2010, 13:54:39 UTC

I received no workunits for nbody 0.06, only for milkyway 0.19. However, I prefer to leave the latter to the ATI guys. *grin* So I set up an app_info.xml again and downloaded nbody 0.06 manually. This is the app_info part only for nbody 0.06 CPU tasks:

<app_info>
 <app>
  <name>milkyway_nbody</name>
 </app>
 <file_info>
  <name>milkyway_nbody_0.06_windows_x86_64__sse2.exe</name>
  <executable/>
 </file_info>
 <app_version>
  <app_name>milkyway_nbody</app_name>
  <version_num>6</version_num>
  <file_ref>
   <file_name>milkyway_nbody_0.06_windows_x86_64__sse2.exe</file_name>
   <main_program/>
  </file_ref>
 </app_version>
</app_info>


This is for Windows 64bit, for 32bit remove the _64 part from the download link and the app_info file references. Btw, I think it's a good idea to include the sse2 requirement now, should save people with old computers from some frustation.
ID: 42115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
(retired account)
Avatar

Send message
Joined: 17 Oct 08
Posts: 36
Credit: 411,744
RAC: 0
Message 42116 - Posted: 14 Sep 2010, 14:23:32 UTC
Last modified: 14 Sep 2010, 14:37:46 UTC

Another issue: Is there (or was there) some glitch in the database concerning the nbody application name? For example see here: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=150916643 Above the tasklist it says: "This is displayed on the workunit pageDatabase Error" and in the cell for the application name there is noted "v0.00" which is also repeated in the task itself. I've seen a number of instances. Maybe this is already resolved, because here is a workunit where a task with linux app 0.06 is finished and the app version is shown correctly.

EDIT: Checkpointing seems to work now. I shut down BOINC on purpose and work was resumed at the last checkpoint. This is Win 7 64bit. Great.

Checkpoint: tnow = 2.01929. time since last = 361.459s
Checkpoint exists. Attempting to resume from it.
Thawing state
Successfully read checkpoint
Checkpoint: tnow = 2.46124. time since last = 972626s


Regards
Alex
ID: 42116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 42120 - Posted: 14 Sep 2010, 18:45:40 UTC

I have 3 cruched wu's, none of them validated. One example:
Checked, but no consensus yet
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=195759632

Regards

Alexander
ID: 42120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 42121 - Posted: 14 Sep 2010, 18:50:47 UTC - in response to Message 42120.  

I have 3 cruched wu's, none of them validated. One example:
Checked, but no consensus yet
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=195759632

Regards

Alexander

That doesn't look like the new release. That is an old one with the restarting checkpointing on windows.
ID: 42121 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 42122 - Posted: 14 Sep 2010, 18:59:26 UTC

ID: 42122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
(retired account)
Avatar

Send message
Joined: 17 Oct 08
Posts: 36
Credit: 411,744
RAC: 0
Message 42123 - Posted: 14 Sep 2010, 19:19:28 UTC - in response to Message 42122.  

Hmm.. from my result list it seems that v0.06 will not validate against v0.04. Anyone else seeing this? My only valid v0.06 result (win 64bit) so far was against linux v0.06. On the other hand, I've also seen some v0.04 results which did not validate against another v0.04. Guess we could need some more data here *g*.
ID: 42123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 42124 - Posted: 14 Sep 2010, 19:23:48 UTC
Last modified: 14 Sep 2010, 19:38:47 UTC

I've checked another result:
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=196316955

It contains the message
Number of bins does not match those in histogram file. Expected 34, got 0
Failed to calculate chisq
<search_likelihood>1.#QNAN</search_likelihood>
<search_application>milkywayathome nbody 0.06 Windows x86 double</search_application>
21:02:22 (5220): called boinc_finish

HTH

Alexander


Hi Matt, I still have ~12 wu's in cache. Does it make sense to crunch them or can they be killed?
ID: 42124 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
(retired account)
Avatar

Send message
Joined: 17 Oct 08
Posts: 36
Credit: 411,744
RAC: 0
Message 42125 - Posted: 14 Sep 2010, 19:31:25 UTC - in response to Message 42124.  
Last modified: 14 Sep 2010, 19:34:05 UTC


Failed to calculate chisq


Yeah, seen this, too. On my only valid v0.06 result here, it was included in both results output. So the big question is, if this is really a valid result to the project?

Btw, Guten Abend, Alexander! *g*
ID: 42125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 42126 - Posted: 14 Sep 2010, 19:36:04 UTC - in response to Message 42125.  



Btw, Guten Abend, Alexander! *g*


Hi, nice to meet you again!
You're too still working?

Alexander
ID: 42126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
(retired account)
Avatar

Send message
Joined: 17 Oct 08
Posts: 36
Credit: 411,744
RAC: 0
Message 42127 - Posted: 14 Sep 2010, 19:42:53 UTC - in response to Message 42126.  

Depends on how you define 'working'. I'm not getting paid for what I do right now. But that's fine with me. :) Guess, we shouldn't chat here too much. *g*

The result I linked to below was just purged from the database...
ID: 42127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 42129 - Posted: 14 Sep 2010, 20:44:21 UTC

Turns out I had a stupid build system issue so that the histogram wasn't being resolved with BOINC, so it gets opened as an empty file. I'll start making another release now.
ID: 42129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 42130 - Posted: 14 Sep 2010, 21:39:28 UTC
Last modified: 14 Sep 2010, 21:43:33 UTC

Can confirm the working checkpoint.
I see the histogram issue too.
7 WU's between 1000 and 3000 seconds seems to claim credits like ~5 per 1000 seconds. None validated, so none granted yet. Wonder what multiplier will be used when they are validated.
1 WU (150658380) was at 2 hrs and 20%; canceled it as it seemed so far out of line with the other runtimes. Still the checkpointing looked ok and showed slow progress of the WU.
ID: 42130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 42133 - Posted: 14 Sep 2010, 22:26:08 UTC - in response to Message 42130.  

I updated to Matt's new release. Hopefully this is working better.
ID: 42133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 42144 - Posted: 15 Sep 2010, 0:16:02 UTC

Might still be a small problem, there is no % progression and I see this in the stderr out.

<core_client_version>6.11.7</core_client_version>
<![CDATA[
<stderr_txt>
shmget in attach_shmem: Invalid argument
16:10:20 (83546): Can't set up shared mem: -1. Will run in standalone mode.
Starting fresh nbody run
Starting nbody system
<plummer_r> -4.0267114691334 14.424159068236 6.8061497609999 </plummer_r>
<plummer_v> 199.37230954409 111.54102111951 -177.06111744164 </plummer_v>
Checkpoint: tnow = 0.905146. time since last = 1.28451e+09s
Checkpoint: tnow = 1.89042. time since last = 304.232s
Checkpoint: tnow = 3.16037. time since last = 303.48s
Making final checkpoint
Simulation complete
<search_likelihood>-88.370444765717294899</search_likelihood>
<search_application>milkywayathome nbody 0.07 Darwin x86_64 double</search_application>
16:27:25 (83546): called boinc_finish

</stderr_txt>


http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=196437612

I have several model 1_1 units that are over an hour and still at 0%, but I will let them run just to see if they will finish or error out.
ID: 42144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
(retired account)
Avatar

Send message
Joined: 17 Oct 08
Posts: 36
Credit: 411,744
RAC: 0
Message 42145 - Posted: 15 Sep 2010, 0:25:05 UTC - in response to Message 42144.  


shmget in attach_shmem: Invalid argument
16:10:20 (83546): Can't set up shared mem: -1. Will run in standalone mode.



I've seen the same error on this wingman of mine, also a Mac running Darwin x86_64: http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=196251224
ID: 42145 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 42146 - Posted: 15 Sep 2010, 0:29:02 UTC - in response to Message 42144.  

Might still be a small problem, there is no % progression and I see this in the stderr out.

<core_client_version>6.11.7</core_client_version>
<![CDATA[
<stderr_txt>
shmget in attach_shmem: Invalid argument
16:10:20 (83546): Can't set up shared mem: -1. Will run in standalone mode.
Starting fresh nbody run
Starting nbody system
<plummer_r> -4.0267114691334 14.424159068236 6.8061497609999 </plummer_r>
<plummer_v> 199.37230954409 111.54102111951 -177.06111744164 </plummer_v>
Checkpoint: tnow = 0.905146. time since last = 1.28451e+09s
Checkpoint: tnow = 1.89042. time since last = 304.232s
Checkpoint: tnow = 3.16037. time since last = 303.48s
Making final checkpoint
Simulation complete
<search_likelihood>-88.370444765717294899</search_likelihood>
<search_application>milkywayathome nbody 0.07 Darwin x86_64 double</search_application>
16:27:25 (83546): called boinc_finish

</stderr_txt>


http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=196437612

I have several model 1_1 units that are over an hour and still at 0%, but I will let them run just to see if they will finish or error out.


Yeah, I'm seeing that when I run it on my OS X. From your log it worked (except the will run in standalone mode part, which is probably the problem). It didn't happen in the old releases. It actually works, but the only thing seems to be that the progress bars don't show up in the manager. It is actually progressing and working when I manually inspect the checkpoints. I have no idea what I could have done to stop the progress bars from working, but I'll look into it.
ID: 42146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 42147 - Posted: 15 Sep 2010, 0:29:41 UTC - in response to Message 42146.  

They also appear to work on Linux and Windows.
ID: 42147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 42149 - Posted: 15 Sep 2010, 0:34:12 UTC

Ok, 2 WU's with version 0.07 done and validated.
The histogram message is gone now.


That's on a Phenom II @ 2.8GHz:
2,996.59s claimed 12.78 granted 12.77 -> 16cr/h
2,649.30s claimed 11.60 granted 11.80 -> 15.33cd/h

Time to define a reasonable multiplier
ID: 42149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henry Bundy

Send message
Joined: 2 Mar 10
Posts: 5
Credit: 105,634,798
RAC: 0
Message 42167 - Posted: 15 Sep 2010, 16:23:33 UTC

This version is gobbling my processor 100%, ignoring the restrictions I have put in place. I'm not doing any more processing until you get this fixed.
ID: 42167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : updated the nbody applications again

©2024 Astroinformatics Group