Message boards :
Number crunching :
Problem with new W/Us
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 Oct 07 Posts: 12 Credit: 1,130,149 RAC: 0 |
the new w/u's with task names starting with gs_373082 etc will not run on my up to date Ubuntu Hardy box. They just sit at 0% and don't move?? Working on all other boxes with Hardy that hasn't been updated. Anyone else have this problem?? |
Send message Joined: 9 Feb 08 Posts: 3 Credit: 126,332 RAC: 0 |
it's starting with gs_3737082, for example these tasks: gs_3737082_1211940453_948699_0 gs_3737082_1211948419_976766_0 The client returns an error code of -1, with this error message: <core_client_version>6.1.16</core_client_version> <![CDATA[ <message> - exit code -1 (0xffffffff) </message> <stderr_txt> Error reading into background_parameters: </stderr_txt> ]]> They are reported as 0 cpu seconds, and they consistently fail immediately upon execution on my system, computer 7004, app version 1.22 on Windows XP Pro SP2 using BOINC 6.1.16 prerelease. It also occurs on clients 5.10.45 and 6.1.0, and on Windows Vista and Windows Server. I have not seen any errored out units to Linux machines. This workunit [1] has errored out on 5 machines, and this [2] is a sample return. This [3] is another return, where "Process got signal ll". [1] http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=30463612 [2] http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=30889520 [3] http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=30876823 Update: I have located several units that do not fail. I'm using the workunit name in the boinc client for these information, and do not know how to find these pages on the website. Once the results post, I will give you links. These names fail: gs_3737082_1211949629_980245_0 gs_3737082_1211949631_980276_0 These do not: gs_3737082_1211949631_980277_0 gs_3737082_1211949631_980275_0 gs_3737082_1211949631_980278_0 gs_3737082_1211949629_980246_0 |
Send message Joined: 22 Mar 08 Posts: 7 Credit: 9,175,991 RAC: 0 |
the new w/u's with task names starting with gs_373082 etc will not run on my up to date Ubuntu Hardy box. They just sit at 0% and don't move?? Working on all other boxes with Hardy that hasn't been updated. Anyone else have this problem?? I'm also having lots of compute errors. Running BOINC 5.10.45 on Win XP Pro (SP3). 27/05/2008 23:11:16|Milkyway@home|Output file gs_3737082_1211945903_968750_0_0 for task gs_3737082_1211945903_968750_0 absent |
Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0 |
I haven't noticed any Problems with My Win Box's yet but some of my Ubuntu Box' are having Problems. I noticed 1 Wu had been running 58 Min's with no Progression and noticed a few other Box's with hung Wu's. I aborted them and set to NNW until I see whats going on ... :) |
Send message Joined: 9 Feb 08 Posts: 3 Credit: 126,332 RAC: 0 |
This is obviously a new problem, but app 1.22 has been out for a while. Interesting that it hangs on Linux boxes but fails immediately on Windows. That explains why I haven't found any compute errors on non-Windows OSes. I have a few good results that I'm going to allow to finish, then I'm going to no new work, at least until I'm available to test some more. Good luck to the dev team on this one! |
Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0 |
I've had 3 Wu's fail so far today (Nothing really out of the ordinary) on 10 Win Box's, other than that they seem to be running ok on those Box's. As long as they do I'll keep running them for now, but the Linux Box's will stay on NNW until we hear something form the Dev's on the matter ... :) |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I've had 3 Wu's fail so far today (Nothing really out of the ordinary) on 10 Win Box's, other than that they seem to be running ok on those Box's. As long as they do I'll keep running them for now, but the Linux Box's will stay on NNW until we hear something form the Dev's on the matter ... :) i'm taking a look into this to try and figure out what was wrong. it doesnt look like nate has done anything different than i have been doing. hopefully we'll figure this out shortly. |
Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0 |
It looks like they are failing on my Win Box's too Travis, I just wasn't getting that many was why i didn't see many Error's. But as I got more they failed > gs_3737082_ ones that is, the other ones are ok so far ... :) I just went threw my Box's and Aborted the gs_3737082 ones so they can keep running that way at least. |
Send message Joined: 28 Aug 07 Posts: 2 Credit: 2,084,862 RAC: 0 |
I have here several of those WU's as well there are now running more then 3h on a Q6600 running Ubuntu linux 64-bit (normal finishing time is about 4 or 5 mins). I have aborted those WU's , then rest of the WU's are running normal. wuid=30469638 The run time so far for this one is 2h 24 mins. |
Send message Joined: 21 Dec 07 Posts: 69 Credit: 7,048,412 RAC: 0 |
Don't know if I have had any error out, but all my boxes that were crunching Milkyway overnight had results stuck at 0% after 2 to 5 hours of crunching. All of them had "gs_373082_" as the beginning of the result name. Others (gs_59#_) are working OK All are running 64 bit Ubuntu (Hardy) with 64bit BOINC client. They all worked fine yesterday and I have changed nothing at my end (no software updates installed overnight either). Join the #1 Aussie Alliance on MilkyWay! |
Send message Joined: 30 Mar 08 Posts: 50 Credit: 11,593,755 RAC: 0 |
Don't know if I have had any error out, but all my boxes that were crunching Milkyway overnight had results stuck at 0% after 2 to 5 hours of crunching. All of them had "gs_373082_" as the beginning of the result name. Others (gs_59#_) are working OK I have aborted all the WU'S "373082" on my rigs. I hope the server stops sending them. This is not the time for the project to start playing that popular distributed computing game called "poison the code" Voltron |
Send message Joined: 5 Feb 08 Posts: 3 Credit: 3,833,679 RAC: 0 |
Same problem here running ubuntu (fiesty fawn) on two machines 2 units 7hours+ and the other 2 9hours+ both sitting at 0% |
Send message Joined: 22 Dec 07 Posts: 13 Credit: 46,606,530 RAC: 0 |
It could not be something as simple as the name of the text file associated with this series of work, could it? The other units all seem to link to files with the words convolved or unconvolved in them, and this one doesn't. It is just called stars_82.txt. Proud member of BOINC@AUSTRALIA |
Send message Joined: 11 Mar 08 Posts: 28 Credit: 818,194 RAC: 0 |
I've had a couple of failures here too on an AMD Duron, with Boinc 5.10.30 running on WinME. Here's an example: 28/05/2008 07:45:47|Milkyway@home|Starting task gs_3737082_1211965821_1020895_1 using astronomy version 122 All failed after 1 second with similar output messages to the above. And, just to confuse things, I've had a couple complete with no problems. They ran for 13 minutes exactly, rather than the old workunits' 13:46 - 13:48: 28/05/2008 07:18:56|Milkyway@home|Starting gs_3737082_1211974875_1041980_0 |
Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0 |
Well, here's the reason why it's not working (not an app issue) (parameters_generated_1211968864_1027907) number_parameters: 4 background_weight: 0.000000 background_parameters: 1.000000, nan, nan, 1.000000 background_h: 0.020000, 0.004000, 0.090000, 0.020000 background_mutation_range: 0.200000, 0.200000, 20.000000, 0.200000 background_parameter_min: 0.000000, 0.300000, 1.000000, 0.100000 background_parameter_max: 3.000000, 1.000000, 30.000000, 3.000000 optimize_parameter: false, true, true, false number_streams: 1, 5 stream_weight: nan stream_weight_h: 0.001000 stream_weight_mutation_range: 5.000000 stream_weight_min: -20.000000 stream_weight_max: 20.000000 optimize_weight: true stream_parameters: nan, nan, nan, nan, nan stream_h: 0.030000, 0.010000, 0.010000, 0.010000, 0.001000 stream_mutation_range: 20.000000, 10.000000, 0.500000, 0.500000, 1.000000 stream_parameter_min: -53.000000, 0.700000, 0.000000, 0.000000, 1.000000 stream_parameter_max: 76.000000, 45.700000, 6.283185, 6.283185, 20.000000 optimize_parameter: true, true, true, true, true convolve: 30 wedge: 82 r_steps: 175 mu_steps: 400 nu_steps: 20 info: generated: 2807, rand: 0.051824 as you can see, the wus got messed up. here's a copy of that wu so you can check it yourself parameters_generated_1211968864_1027907 (gs_3737082_1211941340_952205_1) Join Support science! Joinc Team BOINC United now! |
Send message Joined: 21 Dec 07 Posts: 69 Credit: 7,048,412 RAC: 0 |
Well, here's the reason why it's not working (not an app issue) And I guess that means "Not a Number" and is an error. I've been aborting work units all day but can't be bothered baby-sitting my computers any further so have set MW to "nnw" (which is "No New Work") for now. Join the #1 Aussie Alliance on MilkyWay! |
Send message Joined: 7 Sep 07 Posts: 444 Credit: 5,712,523 RAC: 0 |
It could not be something as simple as the name of the text file associated with this series of work, could it? The other units all seem to link to files with the words convolved or unconvolved in them, and this one doesn't. It is just called stars_82.txt. A 5MB file named stars_82.txt was downloaded recently, so that is correct. I was wondering if it was the cause, but as Crunch3r has posted, it appears to be the WU itself. And as I type this another WU had an error, so No New Work till I see some posting from the admins. Edit: Didn't mean that to sound harsh - I just won't have time to 'babysit' my hosts much over the next 2-3 days. Hope to be crunching here again soon. |
Send message Joined: 25 May 08 Posts: 1 Credit: 8,471,005 RAC: 0 |
Well I'm glad that I'm not the only one.... I had one W/U go for just over 3 hours today with 0% progression. Also on P4 running Ubuntu Hardy. |
Send message Joined: 9 Sep 07 Posts: 22 Credit: 320,035 RAC: 0 |
I killed one on Fedora 7 x64 after it had run 9+ hours. Kathryn :o) The BOINC FAQ Service The Unofficial BOINC Wiki The Trac System More BOINC information than you can shake a stick of RAM at. |
Send message Joined: 30 Mar 08 Posts: 50 Credit: 11,593,755 RAC: 0 |
I killed one on Fedora 7 x64 after it had run 9+ hours. Presumably, the class of noobie programmers that produced this junk will be correcting the code and reissuing them. So be prepared for another round of "weeding" your work que. Thanks Nate! Voltron |
©2024 Astroinformatics Group