Welcome to MilkyWay@home

Nbody 1.06

Message boards : News : Nbody 1.06
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 57126 - Posted: 1 Feb 2013, 12:04:58 UTC

Instead of resetting the project you could copy this
pthreadGC2_64_nbody_1.06.dll into your mw directory and rename it to pthreadGC2_64.dll after you deleted the old one.
Remember to stop mw while doing this.
ID: 57126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 8 Oct 07
Posts: 24
Credit: 111,325
RAC: 0
Message 57127 - Posted: 1 Feb 2013, 13:24:43 UTC

Maybe there needs to be a front page announcement or whatever method there is to post a Notice in everyone's Boinc Manager as there will be users who do not read these boards. Others may run "headless" and simply "set and forget", only checking their machines occasionally and be oblivius to the problem. The wingman I cited below has now munched his/her way through 2000 WUS and counting.
ID: 57127 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jeffery M. Thompson
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 159
Credit: 16,977,106
RAC: 0
Message 57129 - Posted: 1 Feb 2013, 13:37:53 UTC

I just posted to the client and front page.


Jeff

ID: 57129 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile I7 James

Send message
Joined: 27 Jan 13
Posts: 1
Credit: 794,635
RAC: 0
Message 57130 - Posted: 1 Feb 2013, 13:59:17 UTC - in response to Message 57073.  
Last modified: 1 Feb 2013, 14:09:20 UTC

This update for n-body dll is interesting. I dl'd the new file and placed it in the directory on my windows 7 computer. Specifically found in F:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway\ - pthreadGC2_64_nbody_1.04.dll but no pthreadGC2_64.dll, so when I renamed pthreadGC2_64_nbody_1.06.dll to pthreadGC2_64.dll, there was no file being replaced.
ID: 57130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 57133 - Posted: 1 Feb 2013, 15:27:54 UTC - in response to Message 57126.  

Instead of resetting the project you could copy this
pthreadGC2_64_nbody_1.06.dll into your mw directory and rename it to pthreadGC2_64.dll after you deleted the old one.
Remember to stop mw while doing this.

There's a subtle technical reason why I think this is perhaps unwise advice - it may cause unexpected and hard-to-diagnose problems in the future. It's to do with the Windows DLL search order described in MSDN - Dynamic-Link Library Search Order (Windows).

Look at the 'SafeDllSearchMode' list (enabled by default starting with Windows XP with Service Pack 2) under 'Standard Search Order for Desktop Applications'.

NBody (all versions) requires access to a file called "pthreadGC2_64.dll". Milkyway distributes a 'versioned' copy of this file with the name (currently) "pthreadGC2_64_nbody_1.06.dll": this name will change over time, and the contents of the file may or may not change at the same time.

The way BOINC copes with this is with

    <file_ref>
        <file_name>pthreadGC2_64_nbody_1.06.dll</file_name>
        <open_name>pthreadGC2_64.dll</open_name>
        <copy_file/>
    </file_ref>

That places a copy of the file in the 'slot' (working) directory which BOINC provides as a scratchpad for input, output, checkpoint, and sundry similar files while the task is running.

But look back at that DLL search order:

1. The directory from which the application loaded.
2. The system directory. Use the GetSystemDirectory function to get the path of this directory.
3. The 16-bit system directory. There is no function that obtains the path of this directory, but it is searched.
4. The Windows directory. Use the GetWindowsDirectory function to get the path of this directory.
5. The current directory.
6. The directories that are listed in the PATH environment variable.

In BOINC terms, #1 is the project directory, and the slot directory is #5

This means that if a file called pthreadGC2_64.dll is present in the project directory - no matter how old it is - it will always be loaded by preference, even if a newer version has been distributed as pthreadGC2_64_nbody_x.yz.dll and copied to the slot directory.

Note that this behaviour is surprising and unexpected for many programmers (including scientific programmers) whose primary OS is *nix based - I gather they do things differently there...
ID: 57133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 57134 - Posted: 1 Feb 2013, 15:34:53 UTC

P.S. One of the scientific Linux programmers who was surprised when we uncovered this at SETI was David Anderson. It's been put in the BOINC documentation now:

Dynamic library naming issues
ID: 57134 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 57141 - Posted: 2 Feb 2013, 2:42:02 UTC - in response to Message 57133.  

Milkyway distributes a 'versioned' copy of this file with the name (currently) "pthreadGC2_64_nbody_1.06.dll"

AFAIR last time I did the automatic download for nbody I think it did the renaming while downloading. It's been so long ago that my memory could be wrong. You are right that the file nowadays comes down to the client with the version number. Personally I prefer using an app_info, so I can set the command line params I want.

It should not be too hard to see the proper name from the faulty dll; the correct filename is there, only the wrong binary inside. So it's no big science to replace it with a fresh copy from the server. ;)
I see Jeffery posted the proper steps in his announcement Message 57128.
Makes it clear that the versioned filname is to be used now.

BOINC's soft link 'feature' and it's side effects (search path etc.) ... I really don't want to comment on that.
ID: 57141 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 57142 - Posted: 2 Feb 2013, 11:24:59 UTC

I tried the bug-fix, still 6 not validated wu's, and a lot of computation errors from the wingmen.
ID: 57142 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cornishteddyboy

Send message
Joined: 29 Nov 10
Posts: 7
Credit: 17,351,897
RAC: 0
Message 57143 - Posted: 2 Feb 2013, 12:52:18 UTC
Last modified: 2 Feb 2013, 12:53:21 UTC

I must have had one of these work units but to be truthful didn't look when I aborted it.

Left it running as always when I next looked it said it was High Priority. No problem there but 8 hours has passed whilst a usual unit takes 2-3 hours on my laptop.

Still no problem but what made me abort was it was showing 16,500 odd hours to completion and rising by a few minutes each second!

I'm used to long hours as my Climate Prediction work units take 6-700 hours to complete but 16,500 hours is a lifetime!
ID: 57143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 57145 - Posted: 2 Feb 2013, 14:10:30 UTC - in response to Message 57142.  

I tried the bug-fix, still 6 not validated wu's, and a lot of computation errors from the wingmen.


Seems it's working for you. Your invalid list has only tasks that 'can't validate' because of winkmen with still faulty dlls.
I don't know if there is a server option to force reload the app with dlls without pushing out a v1.07?

ID: 57145 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 8 Oct 07
Posts: 24
Credit: 111,325
RAC: 0
Message 57147 - Posted: 2 Feb 2013, 19:36:53 UTC

Bet you were hoping all the very long ones had been weeded out but I've just aborted this one after spotting it at 1.1% after 14hrs with estimate having increased from an initial 135hrs to 165hrs and climbing when I pulled the plug on it.
ID: 57147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
James L. Neill

Send message
Joined: 28 Aug 11
Posts: 7
Credit: 29,852,657
RAC: 0
Message 57150 - Posted: 3 Feb 2013, 9:53:54 UTC

To abort or not to abort that is the question? Whether it is nobler in the mind.................?

I have a work unit that has:-

Remaining(estimated) completion : 11455:02:48
Deadline : 14 feb 2013 at 09:47:01 GMT
Elapsed : 10:16:30
Progress : 2.226%

Using the Remaining(estimated) time the unit will complete in approximately 477 days from now.
Using the Elapsed/Progress figures which are a reasonable yardstick the unit will complete in approximately 19 days time.
There are 11 days between now and 14 feb 2013 including today.

While I can accept that some units will be longer or much longer than others this unit is a little strange.
Is there some explanation for the rather contradictory time values.

Thanks

James

PS: a). Task : http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=393848038
b). Computer : http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=493180
ID: 57150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EdwardPF

Send message
Joined: 8 Apr 10
Posts: 25
Credit: 268,525
RAC: 0
Message 57154 - Posted: 3 Feb 2013, 15:50:55 UTC - in response to Message 57150.  

My 2 cents:

The new 1.06 WU are better behaved then the previous release in regards to EST time but still need some refining ;-)

I'm not sure how they can produce a good number for remaining time since the nbody WU's are multi-threaded ... I.E. it can run on one or all of your CPU's at the same time - in spite of what your BOINC params say.

My current nbody is running on 4 CPUs even though I have BOINC set to 12.5% of my ("8") CPU's. its orig est time was 59hrs and has run for 64 hrs with 36.75 hrs remaining ... I'm sure the "boys back at the office" are working on this.

Ed F
ID: 57154 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matthias Lehmkuhl

Send message
Joined: 29 Sep 07
Posts: 18
Credit: 4,533,464
RAC: 0
Message 57157 - Posted: 4 Feb 2013, 12:23:59 UTC
Last modified: 4 Feb 2013, 12:26:12 UTC

I got also a resent of an old result with error.
previous I could finish newer results without a problem.
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=330097

here only MilkyWay@Home N-Body Simulation
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=330097&offset=0&show_names=1&state=0&appid=7

this is a result with long time running (shows only round 0.4% after runtime of 1300 sec.)
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=304430097
The two previous results have crashed.

Before getting these results I've done a reset on the project.

edit:
aborting the result causes a crash
"Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x000007FEFD743C72

Engaging BOINC Windows Runtime Debugger..."
Matthias

ID: 57157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : News : Nbody 1.06

©2024 Astroinformatics Group