N-Body 1.36
log in

Advanced search

Message boards : News : N-Body 1.36

1 · 2 · Next
Author Message
Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 59
Credit: 406,916
RAC: 2

Message 59436 - Posted: 23 Jul 2013, 19:05:16 UTC

Just kidding!

I messed up the release for 1.34, so expect 1.36 with the same changes mentioned in the 1.34 thread in a couple hours.

Sorry about that,

Jake

Profile Gary J. Ussia
Send message
Joined: 21 Jun 13
Posts: 2
Credit: 10,795
RAC: 0

Message 59476 - Posted: 29 Jul 2013, 4:23:58 UTC

Are the updates to the software downloaded automatically? If not, where is the location of the software to download?

Thanks,

Gary

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 59
Credit: 406,916
RAC: 2

Message 59478 - Posted: 29 Jul 2013, 12:59:30 UTC - in response to Message 59476.

They should be downloaded automatically. Are you seeing errors because of this?

Profile Gary J. Ussia
Send message
Joined: 21 Jun 13
Posts: 2
Credit: 10,795
RAC: 0

Message 59487 - Posted: 31 Jul 2013, 6:19:28 UTC - in response to Message 59478.

No errors. Glad it is downloaded automatically!

Thanks,

Gary

Profile BDDave
Avatar
Send message
Joined: 21 May 10
Posts: 17
Credit: 10,794,621
RAC: 14,533

Message 59516 - Posted: 4 Aug 2013, 13:56:27 UTC - in response to Message 59436.

Hi Jake,

FYI, I did receive several "Validation inconclusive MilkyWay@Home N-Body Simulation" on the v1.36.

http://milkyway.cs.rpi.edu/milkyway/results.php?userid=105340&offset=0&show_names=0&state=3&appid=7


Thank you!

BDDave


____________

Jake Bauer
Project developer
Project tester
Project scientist
Send message
Joined: 20 Aug 12
Posts: 59
Credit: 406,916
RAC: 2

Message 59528 - Posted: 6 Aug 2013, 17:56:11 UTC - in response to Message 59516.

Do you know if you were awarded credit?

This is a rare occurrence and something this group is looking into.

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 168
Credit: 190,899
RAC: 0

Message 59529 - Posted: 6 Aug 2013, 18:19:56 UTC - in response to Message 59528.
Last modified: 6 Aug 2013, 18:38:35 UTC

Do you know if you were awarded credit?

This is a rare occurrence and something this group is looking into.

On the contrary, it's something I see quite regularly - six from the last two days alone:

Validation inconclusive MilkyWay@Home N-Body Simulation tasks for computer 479865

Using standard BOINC terminology, it seems as if - after the task has been processed and reported - the validator randomly decides whether the host is 'reliable' or 'trusted' (not quite sure which applies here). If the host is not reliable or trusted, a second replication is generated and sent out. And when that second copy is returned, it is invariably (in my experience) treated as unreliable as well, leading to a third replication being issued.

Only once the third copy is complete does true validation (a comparison of the results) take place, and tasks which pass the test are granted credit: see for example WU 408707072 (same host), where the three successive replicated tasks have all been granted credit.

Most BOINC projects either require every workunit to be replicated and the results compared, or none of them. Milkyway seems to have an unusual server configuration with optional validation.

Projects which require 100% validation usually send all replicated copies out at the same time - that saves a lot of time (and server storage space) when long-running tasks need to be compared: the serial implementation here has kept two of my inconclusives waiting since 30 July and 15 July respectively, while 'wingmates' (as we call them) slowly catch up.

Edit - it looks as if the schema you're using is Adaptive Replication.

Profile BDDave
Avatar
Send message
Joined: 21 May 10
Posts: 17
Credit: 10,794,621
RAC: 14,533

Message 59603 - Posted: 16 Aug 2013, 3:42:55 UTC - in response to Message 59529.

Hi Richard,

I've been monitoring what you have stated and it looks to be so. I'm now down to only 2 showing "Validation Inconclusive." Thanks for the feedback, I'm happy to see all is running smooth.

Get Cruchin'
BDDave
____________

Paul of TSBT
Send message
Joined: 3 Aug 13
Posts: 1
Credit: 11,441,394
RAC: 5,832

Message 59623 - Posted: 19 Aug 2013, 23:25:42 UTC

Hi,

New to Milkyway and its part of my teams current challenge but
I've tried to run a few units like this one;
ps_nbody_07_23_no_dark_2_1372784655_1072292_0

all six of them have errored out instantly with this message;
Stderr output

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741515 (0xc0000135)
</message>
]]>


Can you help, or do you need some more info.

Thanks in advance.

Paul

Bob Benson
Send message
Joined: 27 Jun 11
Posts: 4
Credit: 725,733
RAC: 2,242

Message 59665 - Posted: 25 Aug 2013, 16:49:30 UTC - in response to Message 59623.

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741515 (0xc0000135)
</message>
]]>


All (100%) of the n-body WUs have errored out this way on a machine I added in mid-June - machine id 523191, Win7 x64 Sp1. The only references to this error msg I've found to date all talk about a missing or bad piece of microsoft's .net 3.x package, which appears to be OK on the machine. So I've just been watching them crash, crunching the rest of the WUs, and checking back here from time to time

Bob

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 144
Credit: 6,706,199
RAC: 9

Message 59666 - Posted: 25 Aug 2013, 17:20:30 UTC
Last modified: 25 Aug 2013, 17:21:37 UTC

The binaries should be compiled completely self contained except for the two dlls
that are distributed with the application. But of the reports of this we have been seeing they have all been Windows 7 machines. We test against Windows 7. What I will try to do is load one of the VMs I have of Windows 7 and see if it errors. I will also try some of the updates to see if there is a conflict that came down the pike. I can't duplicate and from what I am seeing currently it appears to be os limited. If others have more details posting them to here would be a good place unless the thread gets too big and we may start a secondary thread.

I will let you know how the tests go and I will see Jake about where he was with the cases of the issues he was looking at with users directly.

Now granted the error code being reported is a Windows error code so saying we are only seeing it on Windows 7 may just be because of how the error reports in Windows and we are seeing these errors on other operating systems just looking differently.

The overall error reports are consistent with other binaries when they are running well.

I am off to break the VM's to see what I can see.


Jeff

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 168
Credit: 190,899
RAC: 0

Message 59672 - Posted: 25 Aug 2013, 23:38:00 UTC - in response to Message 59666.

Error 0xc0000135 is a generic Windows error code which translates as "The application failed to initialize properly". The commonest cause is a missing DLL, and the commonest answer found by Google is that the DLL is question is part of the Microsoft .NET framework - but that ain't necessarily so: it's more likely to be one of the support DLLs provided from this project's servers.

N-Body 1.36 is running fine on this Windows 7/64 machine as I type.

Client_state.xml says that both plan_class variants (MT and single-threaded) are correctly specified to reference

<file_ref>
<file_name>libgomp_64-1_nbody_1.36.dll</file_name>
<open_name>libgomp_64-1.dll</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>pthreadGC2_64_nbody_1.36.dll</file_name>
<open_name>pthreadGC2_64.dll</open_name>
<copy_file/>
</file_ref>

If either are missing, the download urls are:

<download_url>http://milkyway.cs.rpi.edu/milkyway/download/libgomp_64-1_nbody_1.36.dll</download_url>
<download_url>http://milkyway.cs.rpi.edu/milkyway/download/pthreadGC2_64_nbody_1.36.dll</download_url>

I've tested both, and both are working currently.

The running MW app, according to Process explorer, has loaded

C:\Windows\libgomp_64-1.dll
C:\BOINCdata\slots\5\pthreadGC2_64.dll

BOINCdata\slots\ is the correct load location for this machine, given the specifications above and my configuration.

C:\Windows\libgomp_64-1.dll is suspicious, and maybe the result of my manual hacking in the early days of N-Body, when the DLLs weren't being correctly specified. But it does appear that BOINC has loaded a full copy of libgomp_64-1.dll into Slot5 as directed, so it seems things are working correctly.

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 144
Credit: 6,706,199
RAC: 9

Message 59673 - Posted: 25 Aug 2013, 23:39:50 UTC

I will test the 32 bit specifically then as you ruled out the 64 bit.
I remember these errors before.

If anyone getting the error can confirm if they are 32 bit or 64 bit os that would be helpful.


Jeff

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 168
Credit: 190,899
RAC: 0

Message 59674 - Posted: 25 Aug 2013, 23:46:51 UTC - in response to Message 59673.

There's no 32-bit version of N-Body for Windows listed on the Applications page, and Bob's host 523191 is 64-bit - unless (unlikely) he's loaded a 32-bit version of BOINC?

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 144
Credit: 6,706,199
RAC: 9

Message 59675 - Posted: 25 Aug 2013, 23:51:37 UTC

Sorry that is correct we have a 32 bit we are trying to get working again after Jake W. fixed some issues with 32 bit in the separation binaries. Sorry it hasn't gone live yet.

Jeff

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 144
Credit: 6,706,199
RAC: 9

Message 59676 - Posted: 26 Aug 2013, 0:19:32 UTC

Checking the server we have the dlls named as your versions, they are in the version.xml.

libgomp and pthread in the last bundle from mingw were the same size so I always check them with diff to make sure they are different files.

So what is on the server is matching what should be there.

I will run a fresh vm install tomorrow and see if everything comes down to a brand new install cleanly.



Jeff

Bob Benson
Send message
Joined: 27 Jun 11
Posts: 4
Credit: 725,733
RAC: 2,242

Message 59680 - Posted: 26 Aug 2013, 6:34:41 UTC - in response to Message 59666.

The binaries should be compiled completely self contained except for the two dlls
that are distributed with the application. But of the reports of this we have been seeing they have all been Windows 7 machines. We test against Windows 7.


Gave me something to look at. Here's what I saw, hope it helps -

First, the other milkyway projects have a pretty long list of dlls they use - at least as reported by Process Explorer. Such as advapi32, KernalBase, lpk, ntdll, sechost, many more. The n-body craters so fast I haven't been able to catch it - perhaps if you run it in a debugger

Second, the pthread.my.dll (I won't type that long name cuz I'd mistype it) has a different name from other pthread dlls installed on the machine. As it should if you've done something to it. None of those other dlls are loaded, or should be loaded, when n-body craters. I can't verify which pthread is actually being loaded, or if it even gets that far

Third, when I examine the interfaces exported from pthread.my.dll, the names match those in other pthread dlls, the dll address and relative address don't (expected, different versions). All the pthreads on the machine show Relative Address of the form 0x000010c3, which looks relative. All pthreads on the machine EXCEPT pthreads.my export an Address in the same form, while the pthreads.my exports addrs like 0x672c1ae0, which looks absolute to me and could cause an out of addr space error if interpreted as relative (what we called 0C4 errors back in the mainframe days)

And fourth, the error code is frequently linked to .Net 3.5. This machine came only with .Net 4. Matches a couple other Win7 machines I looked at, but they're not involved in any @home activity

Hope this helps tracking down the problem

Bob

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 168
Credit: 190,899
RAC: 0

Message 59681 - Posted: 26 Aug 2013, 8:19:01 UTC - in response to Message 59680.

First, the other milkyway projects have a pretty long list of dlls they use - at least as reported by Process Explorer. Such as advapi32, KernalBase, lpk, ntdll, sechost, many more. The n-body craters so fast I haven't been able to catch it - perhaps if you run it in a debugger

Most of those will be standard Windows system DLLs.

If you run Dependency Walker against the main N-Body executable, it will tell you which DLLs are linked and whether they are present on your computer. It will probably flag up libgomp_64-1.dll and pthreadGC2_64.dll (because the distribution names are different), and you usually get warnings about late-loading dependencies too - they can be ignored. But it helps to narrow the list of suspects down.

Bob Benson
Send message
Joined: 27 Jun 11
Posts: 4
Credit: 725,733
RAC: 2,242

Message 59693 - Posted: 27 Aug 2013, 8:44:03 UTC - in response to Message 59681.
Last modified: 27 Aug 2013, 9:06:05 UTC

Most of those will be standard Windows system DLLs.

Indeed, but disproves the stated "self-contained binaries" premise. Sorry, I come from The Old Days where "self contained" meant, well, everything's there, no exceptions or external dependencies, save the external base OS, which dlls really aren't, since there can be multiple different versions, as on my machine

Walker appears to be a tool I've been looking for off and on. Thanks. Past number of years I've been breaking systems (white hat) more than building them, which is more networks and people than code.

Bob

Nigel Garvey
Send message
Joined: 2 Apr 11
Posts: 8
Credit: 465,145
RAC: 243

Message 59748 - Posted: 29 Aug 2013, 22:53:24 UTC

The majority of the MT tasks my 2-core Intel CPU Mac receives bomb out at the 100% mark (I think). Most of those which survive get an inconclusive validation when reported, but are validated eventually. No problems at all with the non-MT app, but then it's not had any tasks lately!

NG

1 · 2 · Next
Post to thread

Message boards : News : N-Body 1.36


Main page · Your account · Message boards


Copyright © 2014 AstroInformatics Group