Welcome to MilkyWay@home

Posts by Jeffery M. Thompson

41) Message boards : News : New Separation Runs (Message 60615)
Posted 17 Dec 2013 by Profile Jeffery M. Thompson
Post:
I have updated the bounds of the dataset.
Some of the searches were looking for a center out of dataset.
I believe this may have been causing some parameters to search in completely empty parts of the dataset.

The old searches on DR8_82_Rev_4_2_2 are down and I put up DR8_82_Rev_4_3 in their place if you have any more issues pleases let me know.



Jeff
42) Message boards : News : New Separation Runs (Message 60587)
Posted 14 Dec 2013 by Profile Jeffery M. Thompson
Post:
Thanks for the info

There are two things I have to look at in this and I am focussing right now on the different values being returned there is also the problem of the wrong app being assigned.



Jeff
43) Message boards : News : New Separation Runs (Message 60585)
Posted 14 Dec 2013 by Profile Jeffery M. Thompson
Post:
These applications haven't changed. So the problem is in the data and specifically in how the systems are processing an unconstrained stream.

The error rate overall for the application is not peaking or rising past a normal run.
So I am trying to find out why it is spiking for this one set of parameters and the spiking for a particular user base.

RIght now I am not seeing a pattern to hone in on what part of the system is causing the bug.

The background and the third stream set should change with each other if one is off the other should be off also.

I am going to grab some invalids on different runs also to see if it is a problem that is there in other data runs.


So I am looking into this and I will ping back a bit more on what we see.
But I have nothing conclusive to report right now. I
brought down one run on the stripe in question and am waiting a bit to bring down the other shortly. When they clear through I want to see if the problem is on other runs also.

Jeff
44) Message boards : News : New Separation Runs (Message 60546)
Posted 10 Dec 2013 by Profile Jeffery M. Thompson
Post:
Looking into to it. I will post follow ups here.


Jeff Thompson
45) Message boards : News : Server Updates (Message 60089)
Posted 3 Oct 2013 by Profile Jeffery M. Thompson
Post:
It looks like it the reboot along with a dns edit resolved the issue, we are still monitoring this. If it hasn't fixed everything it has improved a good portion of it.



Jeff
46) Message boards : News : Server Reboot (Message 60084)
Posted 2 Oct 2013 by Profile Jeffery M. Thompson
Post:
The server is going down for a reboot in one hour.
It should be up shortly after 3pm Eastern Time.


Jeff
47) Message boards : News : Server Updates (Message 60075)
Posted 2 Oct 2013 by Profile Jeffery M. Thompson
Post:
Investigating further getting some lag but not in all functions.


Jeff
48) Message boards : News : Server Updates (Message 60074)
Posted 2 Oct 2013 by Profile Jeffery M. Thompson
Post:
Trying to duplicate and see if there is anything common among the profiles experiencing the problem.
49) Message boards : News : Server Updates (Message 59858)
Posted 9 Sep 2013 by Profile Jeffery M. Thompson
Post:
All systems appear to be back up and functioning.
We will continue to monitor.


Thank you,


Jeff
50) Message boards : News : Server Updates (Message 59853)
Posted 9 Sep 2013 by Profile Jeffery M. Thompson
Post:
The updates have begun.
Services will be down for a brief period.

More information to follow.


Jeff
51) Message boards : News : Server Updates (Message 59830)
Posted 6 Sep 2013 by Profile Jeffery M. Thompson
Post:
We are in the process of updating the server operating system.
This process should begin Monday and continue through the first part of next week.

I will keep posting details in here of the times scales as this task progresses.


Thank you,

Jeff Thompson
52) Message boards : News : N-Body 1.36 (Message 59755)
Posted 30 Aug 2013 by Profile Jeffery M. Thompson
Post:
The errors seem to be all related to when it is checkpointing.

I will try the parameters on a test machine and see what we get.


No details outside of this right now.

Jeff
53) Message boards : News : N-Body 1.36 (Message 59752)
Posted 29 Aug 2013 by Profile Jeffery M. Thompson
Post:
I have been running MT on different machines to check the validation. The delay in the eventual validation seems to be coming from some of the longer running work units taking time until there are enough units to validate against in the system. For the gpu separation code the units are processed at the same relative rate that the system flow fairly quickly. I am still monitoring it and trying to get a more coherent picture of the issue. If you could post the work units that are failing I could look at them and try to run them on my machines to see if I can see why it was failing.

I haven't had an error on the MTs on my 10.7 Os X or 10.8 Os X I haven't checked our 10.6 machine for errors lately so any details on system specifics (Os version / Boinc Version) may help isolate an error. Though the binaries for the Unix based operating systems are much more self contained than the windows systems.


jeff
54) Message boards : News : N-Body 1.36 (Message 59676)
Posted 26 Aug 2013 by Profile Jeffery M. Thompson
Post:
Checking the server we have the dlls named as your versions, they are in the version.xml.

libgomp and pthread in the last bundle from mingw were the same size so I always check them with diff to make sure they are different files.

So what is on the server is matching what should be there.

I will run a fresh vm install tomorrow and see if everything comes down to a brand new install cleanly.



Jeff
55) Message boards : News : N-Body 1.36 (Message 59675)
Posted 25 Aug 2013 by Profile Jeffery M. Thompson
Post:
Sorry that is correct we have a 32 bit we are trying to get working again after Jake W. fixed some issues with 32 bit in the separation binaries. Sorry it hasn't gone live yet.

Jeff
56) Message boards : News : N-Body 1.36 (Message 59673)
Posted 25 Aug 2013 by Profile Jeffery M. Thompson
Post:
I will test the 32 bit specifically then as you ruled out the 64 bit.
I remember these errors before.

If anyone getting the error can confirm if they are 32 bit or 64 bit os that would be helpful.


Jeff
57) Message boards : News : N-Body 1.36 (Message 59666)
Posted 25 Aug 2013 by Profile Jeffery M. Thompson
Post:
The binaries should be compiled completely self contained except for the two dlls
that are distributed with the application. But of the reports of this we have been seeing they have all been Windows 7 machines. We test against Windows 7. What I will try to do is load one of the VMs I have of Windows 7 and see if it errors. I will also try some of the updates to see if there is a conflict that came down the pike. I can't duplicate and from what I am seeing currently it appears to be os limited. If others have more details posting them to here would be a good place unless the thread gets too big and we may start a secondary thread.

I will let you know how the tests go and I will see Jake about where he was with the cases of the issues he was looking at with users directly.

Now granted the error code being reported is a Windows error code so saying we are only seeing it on Windows 7 may just be because of how the error reports in Windows and we are seeing these errors on other operating systems just looking differently.

The overall error reports are consistent with other binaries when they are running well.

I am off to break the VM's to see what I can see.


Jeff
58) Message boards : News : New Separation Runs (Message 59507)
Posted 2 Aug 2013 by Profile Jeffery M. Thompson
Post:
I have added the run

de_separation_80_DR_8_rev_3_2


Jeff
59) Message boards : News : New Separation Runs (Message 59486)
Posted 31 Jul 2013 by Profile Jeffery M. Thompson
Post:
On the Vista Machine

It appears the first bad workunit gets this

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.00 Windows x86 double </search_application>
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Failed to commit move of 'separation_checkpoint_tmp' to 'separation_checkpoint' (6704): It is too late to perform the requested operation, since the Transaction has already been aborted.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to update checkpoint file ('separation_checkpoint_tmp' to 'separation_checkpoint') (2): No such file or directory
Write checkpoint failed
04:55:30 (6020): called boinc_finish

</stderr_txt>
]]>



Let me know if the reboot helps.....
That particular work unit has made it through two other machines and your version of the app hasn't changed so it is local to the machine...

60) Message boards : News : New Separation Runs (Message 59485)
Posted 31 Jul 2013 by Profile Jeffery M. Thompson
Post:
Reboot.
The previous problems were with validation issues due to user aborts.
When a machine starts to have a run of errors like that the first thing to do is reboot it.

If it continues after the reboot what tasks are you getting the error for and what is the stderr.txt output for the errors.


Previous 20 · Next 20

©2024 Astroinformatics Group