Message boards :
News :
Nbody 1.06
Message board moderation
Author | Message |
---|---|
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I have released a version of the nbody binaries to address the large work unit sizes and errors with checkpoint files. Jake will be starting some runs to test the the values generated. I am looking into addressing the issue with threads on windows 64 bit systems. Jeff Thompson |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I have released a version of the nbody binaries to address the large work unit sizes and errors with checkpoint files. Jake will be starting some runs to test the the values generated. I am looking into addressing the issue with threads on windows 64 bit systems. Is this app supposed to be 1) Multithreaded? 2) CPU only? You (or your BOINC Administrator) have defined Linux OpenCL plan_classes for "amd_ati" and "nvidia" - again. And no MT plan_classes, for any platform, yet. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
|
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
Thanks looking at the signature file. Now. |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I updated the signature file it is verifying with the public key can you test via the interface again. I will work with the admin to get the the plan classes populated. All the nbody binaries are compiled against openmp to be multithreaded. Jeff |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
... can you test via the interface again. Will do, but can you ask Jake to (re-)submit a batch of tasks, please? I think all last night's must have errored out and emptied the pool. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
It's OK - managed to get some resends, and the app has downloaded properly. But from WU 297041595, it appears that the validator thinks that -inf is different from -1.#INF00000000000 |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I will get Jake to start a new run. And look at that work unit. Thank you Jeff |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I will get Jake to start a new run. Have a look at WU 296570890 while you're at it. Four successful returns: two 'inf', two '-1.#INF00000000000' - and all four still inconclusive. |
Send message Joined: 8 Oct 07 Posts: 24 Credit: 111,325 RAC: 0 |
All 5 of my 1.06s failed on startup with -1073741511 (0xffffffffc0000139) Unknown error number I was going to try a project reset in case the new ones had missed something in the download but I've got a couple of ordinary 1.00s running so won't be able to reset for a couple of hours until they've finished. Win 7 (64) |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I am seeing these being returned. Only on windows clients, I am checking the files to make sure the support dlls are coming down with the exe. Thank you |
Send message Joined: 29 Nov 11 Posts: 18 Credit: 815,433 RAC: 0 |
Rcvd 5 n-body (s) and all of them went "computation error" <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073741511 (0xc0000139) </message> ]]> this is the STDERR output for one of them. Same for all 5 |
Send message Joined: 8 Oct 07 Posts: 24 Credit: 111,325 RAC: 0 |
Did the reset just now and got 3 more (estimates of 40s - 40hrs) and got the same error. The accompanying downloads were: libgomp_64-1_nbody_1.06.dll pthreadGC2_64_nbody_1.06.dll milkyway_nbody_1.06_windows_x86_64__mt.exe Is that what you expect or should there be something else? |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I am seeing these being returned. Only on windows clients, I am checking the files to make sure the support dlls are coming down with the exe. OK, some suggestions. Error 0xc0000139 is 'Entry point not found' - usually a DLL error, but more likely a wrong version rather than a completely missing DLL. I've just tried reactivating host 479865. The new DLLs downloaded correctly, and the first task ran OK. But... Checking with Process Explorer, I found that the application was using copies of libgomp_64-1.dll (48,128 bytes) pthreadGC2_64.dll (49,664 bytes) which I had downloaded manually and placed in the project directory when the relaunch of NBody was announced - my copies are dated 13 November 2012 When I removed these manual downloads, and tried using the 1.06 versions of the files - 49,152 bytes and 49,152 bytes respectively (??? identical size - suspicious) - the app failed with a 0xc0000139 error. Ah - the two DLLs are binary identical, according to FC. Finger-fumble with the copying/versioning process? |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I pulled the original files from the Windows Development box they have the same size on disk due to block sizes but differ running diff to verify. I am currently replacing them on the server and updating the signatures. I will stop the current run and start a new one after I test they are going out correctly. |
Send message Joined: 8 Oct 07 Posts: 24 Credit: 111,325 RAC: 0 |
Just did another reset to force download of the new/old dlls and all seems to be working fine again. I missed the estimate of the first one that finished in one minute but of those I've paid attention to so far; 1+3/4 hour estimate finished in 2 mins 4+1/2 hour estimate finished in 6 mins 8 hour estimate finished in 8 mins 10+1/4 hour estimate finished in 10 mins 10+1/2 hour estimate finished in 12 mins No rouge, huge estimates and ticks down rather than up. A 113 hour estimate just in. Estimate ticks up between progress steps but drops at each progress update. If there is a similar initial overestimate of c.60X then I expect this to finish after 113mins rather than 113Hours which would sit about right compared to the Wingman's faster machine finishing in 89 mins. |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
The functions that calculate the estimates of time are based on the space having a large number of calculations. In some of the extreme parameters we are sweeping over there may be less calculations in this space. As these results come back the server will target more relevant space and the actual calculations will go up. Jeff |
Send message Joined: 8 Oct 07 Posts: 24 Credit: 111,325 RAC: 0 |
Should the corrected dlls be going out with any new WUs or are people who got the faulty ones now stuck with them until they either reset the project (as I did to get the corrected dlls as shown in the posts below) or 1.07 is issued? The reason I ask is after looking at my only invalid task. All others who got that WU errored out, voiding the WU. One wingman there who got the task after me presumably still has the faulty dlls so errored out, but he has over 1000 errors with 1.06. |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
They will need to reset at this point. Pushing a 1.07 isn't happening right now. Jeff |
Send message Joined: 29 Nov 11 Posts: 18 Credit: 815,433 RAC: 0 |
Does resetting the project cause me to lose all previous results/data? |
©2024 Astroinformatics Group