Message boards :
News :
N-Body 1.36
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
Just kidding! I messed up the release for 1.34, so expect 1.36 with the same changes mentioned in the 1.34 thread in a couple hours. Sorry about that, Jake |
Send message Joined: 21 Jun 13 Posts: 2 Credit: 10,795 RAC: 0 |
Are the updates to the software downloaded automatically? If not, where is the location of the software to download? Thanks, Gary |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
They should be downloaded automatically. Are you seeing errors because of this? |
Send message Joined: 21 Jun 13 Posts: 2 Credit: 10,795 RAC: 0 |
No errors. Glad it is downloaded automatically! Thanks, Gary |
Send message Joined: 21 May 10 Posts: 19 Credit: 100,867,126 RAC: 0 |
Hi Jake, FYI, I did receive several "Validation inconclusive MilkyWay@Home N-Body Simulation" on the v1.36. http://milkyway.cs.rpi.edu/milkyway/results.php?userid=105340&offset=0&show_names=0&state=3&appid=7 Thank you! BDDave |
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0 |
Do you know if you were awarded credit? This is a rare occurrence and something this group is looking into. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
Do you know if you were awarded credit? On the contrary, it's something I see quite regularly - six from the last two days alone: Validation inconclusive MilkyWay@Home N-Body Simulation tasks for computer 479865 Using standard BOINC terminology, it seems as if - after the task has been processed and reported - the validator randomly decides whether the host is 'reliable' or 'trusted' (not quite sure which applies here). If the host is not reliable or trusted, a second replication is generated and sent out. And when that second copy is returned, it is invariably (in my experience) treated as unreliable as well, leading to a third replication being issued. Only once the third copy is complete does true validation (a comparison of the results) take place, and tasks which pass the test are granted credit: see for example WU 408707072 (same host), where the three successive replicated tasks have all been granted credit. Most BOINC projects either require every workunit to be replicated and the results compared, or none of them. Milkyway seems to have an unusual server configuration with optional validation. Projects which require 100% validation usually send all replicated copies out at the same time - that saves a lot of time (and server storage space) when long-running tasks need to be compared: the serial implementation here has kept two of my inconclusives waiting since 30 July and 15 July respectively, while 'wingmates' (as we call them) slowly catch up. Edit - it looks as if the schema you're using is Adaptive Replication. |
Send message Joined: 21 May 10 Posts: 19 Credit: 100,867,126 RAC: 0 |
Hi Richard, I've been monitoring what you have stated and it looks to be so. I'm now down to only 2 showing "Validation Inconclusive." Thanks for the feedback, I'm happy to see all is running smooth. Get Cruchin' BDDave |
Send message Joined: 3 Aug 13 Posts: 1 Credit: 14,873,781 RAC: 0 |
Hi, New to Milkyway and its part of my teams current challenge but I've tried to run a few units like this one; ps_nbody_07_23_no_dark_2_1372784655_1072292_0 all six of them have errored out instantly with this message; Stderr output <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073741515 (0xc0000135) </message> ]]> Can you help, or do you need some more info. Thanks in advance. Paul |
Send message Joined: 27 Jun 11 Posts: 4 Credit: 2,409,006 RAC: 0 |
<core_client_version>7.0.64</core_client_version> All (100%) of the n-body WUs have errored out this way on a machine I added in mid-June - machine id 523191, Win7 x64 Sp1. The only references to this error msg I've found to date all talk about a missing or bad piece of microsoft's .net 3.x package, which appears to be OK on the machine. So I've just been watching them crash, crunching the rest of the WUs, and checking back here from time to time Bob |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
The binaries should be compiled completely self contained except for the two dlls that are distributed with the application. But of the reports of this we have been seeing they have all been Windows 7 machines. We test against Windows 7. What I will try to do is load one of the VMs I have of Windows 7 and see if it errors. I will also try some of the updates to see if there is a conflict that came down the pike. I can't duplicate and from what I am seeing currently it appears to be os limited. If others have more details posting them to here would be a good place unless the thread gets too big and we may start a secondary thread. I will let you know how the tests go and I will see Jake about where he was with the cases of the issues he was looking at with users directly. Now granted the error code being reported is a Windows error code so saying we are only seeing it on Windows 7 may just be because of how the error reports in Windows and we are seeing these errors on other operating systems just looking differently. The overall error reports are consistent with other binaries when they are running well. I am off to break the VM's to see what I can see. Jeff |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
Error 0xc0000135 is a generic Windows error code which translates as "The application failed to initialize properly". The commonest cause is a missing DLL, and the commonest answer found by Google is that the DLL is question is part of the Microsoft .NET framework - but that ain't necessarily so: it's more likely to be one of the support DLLs provided from this project's servers. N-Body 1.36 is running fine on this Windows 7/64 machine as I type. Client_state.xml says that both plan_class variants (MT and single-threaded) are correctly specified to reference <file_ref> <file_name>libgomp_64-1_nbody_1.36.dll</file_name> <open_name>libgomp_64-1.dll</open_name> <copy_file/> </file_ref> <file_ref> <file_name>pthreadGC2_64_nbody_1.36.dll</file_name> <open_name>pthreadGC2_64.dll</open_name> <copy_file/> </file_ref> If either are missing, the download urls are: <download_url>http://milkyway.cs.rpi.edu/milkyway/download/libgomp_64-1_nbody_1.36.dll</download_url> <download_url>http://milkyway.cs.rpi.edu/milkyway/download/pthreadGC2_64_nbody_1.36.dll</download_url> I've tested both, and both are working currently. The running MW app, according to Process explorer, has loaded C:\Windows\libgomp_64-1.dll C:\BOINCdata\slots\5\pthreadGC2_64.dll BOINCdata\slots\ is the correct load location for this machine, given the specifications above and my configuration. C:\Windows\libgomp_64-1.dll is suspicious, and maybe the result of my manual hacking in the early days of N-Body, when the DLLs weren't being correctly specified. But it does appear that BOINC has loaded a full copy of libgomp_64-1.dll into Slot5 as directed, so it seems things are working correctly. |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
I will test the 32 bit specifically then as you ruled out the 64 bit. I remember these errors before. If anyone getting the error can confirm if they are 32 bit or 64 bit os that would be helpful. Jeff |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
There's no 32-bit version of N-Body for Windows listed on the Applications page, and Bob's host 523191 is 64-bit - unless (unlikely) he's loaded a 32-bit version of BOINC? |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
Sorry that is correct we have a 32 bit we are trying to get working again after Jake W. fixed some issues with 32 bit in the separation binaries. Sorry it hasn't gone live yet. Jeff |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
Checking the server we have the dlls named as your versions, they are in the version.xml. libgomp and pthread in the last bundle from mingw were the same size so I always check them with diff to make sure they are different files. So what is on the server is matching what should be there. I will run a fresh vm install tomorrow and see if everything comes down to a brand new install cleanly. Jeff |
Send message Joined: 27 Jun 11 Posts: 4 Credit: 2,409,006 RAC: 0 |
The binaries should be compiled completely self contained except for the two dlls Gave me something to look at. Here's what I saw, hope it helps - First, the other milkyway projects have a pretty long list of dlls they use - at least as reported by Process Explorer. Such as advapi32, KernalBase, lpk, ntdll, sechost, many more. The n-body craters so fast I haven't been able to catch it - perhaps if you run it in a debugger Second, the pthread.my.dll (I won't type that long name cuz I'd mistype it) has a different name from other pthread dlls installed on the machine. As it should if you've done something to it. None of those other dlls are loaded, or should be loaded, when n-body craters. I can't verify which pthread is actually being loaded, or if it even gets that far Third, when I examine the interfaces exported from pthread.my.dll, the names match those in other pthread dlls, the dll address and relative address don't (expected, different versions). All the pthreads on the machine show Relative Address of the form 0x000010c3, which looks relative. All pthreads on the machine EXCEPT pthreads.my export an Address in the same form, while the pthreads.my exports addrs like 0x672c1ae0, which looks absolute to me and could cause an out of addr space error if interpreted as relative (what we called 0C4 errors back in the mainframe days) And fourth, the error code is frequently linked to .Net 3.5. This machine came only with .Net 4. Matches a couple other Win7 machines I looked at, but they're not involved in any @home activity Hope this helps tracking down the problem Bob |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
First, the other milkyway projects have a pretty long list of dlls they use - at least as reported by Process Explorer. Such as advapi32, KernalBase, lpk, ntdll, sechost, many more. The n-body craters so fast I haven't been able to catch it - perhaps if you run it in a debugger Most of those will be standard Windows system DLLs. If you run Dependency Walker against the main N-Body executable, it will tell you which DLLs are linked and whether they are present on your computer. It will probably flag up libgomp_64-1.dll and pthreadGC2_64.dll (because the distribution names are different), and you usually get warnings about late-loading dependencies too - they can be ignored. But it helps to narrow the list of suspects down. |
Send message Joined: 27 Jun 11 Posts: 4 Credit: 2,409,006 RAC: 0 |
Most of those will be standard Windows system DLLs. Indeed, but disproves the stated "self-contained binaries" premise. Sorry, I come from The Old Days where "self contained" meant, well, everything's there, no exceptions or external dependencies, save the external base OS, which dlls really aren't, since there can be multiple different versions, as on my machine Walker appears to be a tool I've been looking for off and on. Thanks. Past number of years I've been breaking systems (white hat) more than building them, which is more networks and people than code. Bob |
Send message Joined: 2 Apr 11 Posts: 14 Credit: 4,527,461 RAC: 0 |
The majority of the MT tasks my 2-core Intel CPU Mac receives bomb out at the 100% mark (I think). Most of those which survive get an inconclusive validation when reported, but are validated eventually. No problems at all with the non-MT app, but then it's not had any tasks lately! NG |
©2024 Astroinformatics Group