Message boards :
News :
Nbody 1.06
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
Instead of resetting the project you could copy this pthreadGC2_64_nbody_1.06.dll into your mw directory and rename it to pthreadGC2_64.dll after you deleted the old one. Remember to stop mw while doing this. |
Send message Joined: 8 Oct 07 Posts: 24 Credit: 111,325 RAC: 0 |
Maybe there needs to be a front page announcement or whatever method there is to post a Notice in everyone's Boinc Manager as there will be users who do not read these boards. Others may run "headless" and simply "set and forget", only checking their machines occasionally and be oblivius to the problem. The wingman I cited below has now munched his/her way through 2000 WUS and counting. |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I just posted to the client and front page. Jeff |
Send message Joined: 27 Jan 13 Posts: 1 Credit: 794,635 RAC: 0 |
This update for n-body dll is interesting. I dl'd the new file and placed it in the directory on my windows 7 computer. Specifically found in F:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway\ - pthreadGC2_64_nbody_1.04.dll but no pthreadGC2_64.dll, so when I renamed pthreadGC2_64_nbody_1.06.dll to pthreadGC2_64.dll, there was no file being replaced. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
Instead of resetting the project you could copy this There's a subtle technical reason why I think this is perhaps unwise advice - it may cause unexpected and hard-to-diagnose problems in the future. It's to do with the Windows DLL search order described in MSDN - Dynamic-Link Library Search Order (Windows). Look at the 'SafeDllSearchMode' list (enabled by default starting with Windows XP with Service Pack 2) under 'Standard Search Order for Desktop Applications'. NBody (all versions) requires access to a file called "pthreadGC2_64.dll". Milkyway distributes a 'versioned' copy of this file with the name (currently) "pthreadGC2_64_nbody_1.06.dll": this name will change over time, and the contents of the file may or may not change at the same time. The way BOINC copes with this is with <file_ref> <file_name>pthreadGC2_64_nbody_1.06.dll</file_name> <open_name>pthreadGC2_64.dll</open_name> <copy_file/> </file_ref> That places a copy of the file in the 'slot' (working) directory which BOINC provides as a scratchpad for input, output, checkpoint, and sundry similar files while the task is running. But look back at that DLL search order: 1. The directory from which the application loaded. In BOINC terms, #1 is the project directory, and the slot directory is #5 This means that if a file called pthreadGC2_64.dll is present in the project directory - no matter how old it is - it will always be loaded by preference, even if a newer version has been distributed as pthreadGC2_64_nbody_x.yz.dll and copied to the slot directory. Note that this behaviour is surprising and unexpected for many programmers (including scientific programmers) whose primary OS is *nix based - I gather they do things differently there... |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
P.S. One of the scientific Linux programmers who was surprised when we uncovered this at SETI was David Anderson. It's been put in the BOINC documentation now: Dynamic library naming issues |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
Milkyway distributes a 'versioned' copy of this file with the name (currently) "pthreadGC2_64_nbody_1.06.dll" AFAIR last time I did the automatic download for nbody I think it did the renaming while downloading. It's been so long ago that my memory could be wrong. You are right that the file nowadays comes down to the client with the version number. Personally I prefer using an app_info, so I can set the command line params I want. It should not be too hard to see the proper name from the faulty dll; the correct filename is there, only the wrong binary inside. So it's no big science to replace it with a fresh copy from the server. ;) I see Jeffery posted the proper steps in his announcement Message 57128. Makes it clear that the versioned filname is to be used now. BOINC's soft link 'feature' and it's side effects (search path etc.) ... I really don't want to comment on that. |
Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0 |
I tried the bug-fix, still 6 not validated wu's, and a lot of computation errors from the wingmen. |
Send message Joined: 29 Nov 10 Posts: 7 Credit: 17,351,897 RAC: 0 |
I must have had one of these work units but to be truthful didn't look when I aborted it. Left it running as always when I next looked it said it was High Priority. No problem there but 8 hours has passed whilst a usual unit takes 2-3 hours on my laptop. Still no problem but what made me abort was it was showing 16,500 odd hours to completion and rising by a few minutes each second! I'm used to long hours as my Climate Prediction work units take 6-700 hours to complete but 16,500 hours is a lifetime! |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
I tried the bug-fix, still 6 not validated wu's, and a lot of computation errors from the wingmen. Seems it's working for you. Your invalid list has only tasks that 'can't validate' because of winkmen with still faulty dlls. I don't know if there is a server option to force reload the app with dlls without pushing out a v1.07? |
Send message Joined: 8 Oct 07 Posts: 24 Credit: 111,325 RAC: 0 |
Bet you were hoping all the very long ones had been weeded out but I've just aborted this one after spotting it at 1.1% after 14hrs with estimate having increased from an initial 135hrs to 165hrs and climbing when I pulled the plug on it. |
Send message Joined: 28 Aug 11 Posts: 7 Credit: 29,852,657 RAC: 0 |
To abort or not to abort that is the question? Whether it is nobler in the mind.................? I have a work unit that has:- Remaining(estimated) completion : 11455:02:48 Deadline : 14 feb 2013 at 09:47:01 GMT Elapsed : 10:16:30 Progress : 2.226% Using the Remaining(estimated) time the unit will complete in approximately 477 days from now. Using the Elapsed/Progress figures which are a reasonable yardstick the unit will complete in approximately 19 days time. There are 11 days between now and 14 feb 2013 including today. While I can accept that some units will be longer or much longer than others this unit is a little strange. Is there some explanation for the rather contradictory time values. Thanks James PS: a). Task : http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=393848038 b). Computer : http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=493180 |
Send message Joined: 8 Apr 10 Posts: 25 Credit: 268,525 RAC: 0 |
My 2 cents: The new 1.06 WU are better behaved then the previous release in regards to EST time but still need some refining ;-) I'm not sure how they can produce a good number for remaining time since the nbody WU's are multi-threaded ... I.E. it can run on one or all of your CPU's at the same time - in spite of what your BOINC params say. My current nbody is running on 4 CPUs even though I have BOINC set to 12.5% of my ("8") CPU's. its orig est time was 59hrs and has run for 64 hrs with 36.75 hrs remaining ... I'm sure the "boys back at the office" are working on this. Ed F |
Send message Joined: 29 Sep 07 Posts: 18 Credit: 4,533,464 RAC: 0 |
I got also a resent of an old result with error. previous I could finish newer results without a problem. http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=330097 here only MilkyWay@Home N-Body Simulation http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=330097&offset=0&show_names=1&state=0&appid=7 this is a result with long time running (shows only round 0.4% after runtime of 1300 sec.) http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=304430097 The two previous results have crashed. Before getting these results I've done a reset on the project. edit: aborting the result causes a crash "Unhandled Exception Detected... - Unhandled Exception Record - Reason: Breakpoint Encountered (0x80000003) at address 0x000007FEFD743C72 Engaging BOINC Windows Runtime Debugger..." Matthias |
©2024 Astroinformatics Group