Welcome to MilkyWay@home

Problem with new W/Us

Message boards : Number crunching : Problem with new W/Us
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile mgpower0
Avatar

Send message
Joined: 13 Oct 07
Posts: 12
Credit: 1,130,149
RAC: 0
Message 3525 - Posted: 27 May 2008, 21:07:13 UTC
Last modified: 27 May 2008, 21:07:40 UTC

the new w/u's with task names starting with gs_373082 etc will not run on my up to date Ubuntu Hardy box. They just sit at 0% and don't move?? Working on all other boxes with Hardy that hasn't been updated. Anyone else have this problem??
ID: 3525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[B^S]ST47

Send message
Joined: 9 Feb 08
Posts: 3
Credit: 126,332
RAC: 0
Message 3526 - Posted: 27 May 2008, 22:14:34 UTC
Last modified: 27 May 2008, 22:23:06 UTC

it's starting with gs_3737082, for example these tasks:
gs_3737082_1211940453_948699_0
gs_3737082_1211948419_976766_0

The client returns an error code of -1, with this error message:

<core_client_version>6.1.16</core_client_version>
<![CDATA[
<message>
- exit code -1 (0xffffffff)
</message>
<stderr_txt>
Error reading into background_parameters:

</stderr_txt>
]]>

They are reported as 0 cpu seconds, and they consistently fail immediately upon execution on my system, computer 7004, app version 1.22 on Windows XP Pro SP2 using BOINC 6.1.16 prerelease. It also occurs on clients 5.10.45 and 6.1.0, and on Windows Vista and Windows Server. I have not seen any errored out units to Linux machines. This workunit [1] has errored out on 5 machines, and this [2] is a sample return. This [3] is another return, where "Process got signal ll".

[1] http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=30463612
[2] http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=30889520
[3] http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=30876823

Update: I have located several units that do not fail. I'm using the workunit name in the boinc client for these information, and do not know how to find these pages on the website. Once the results post, I will give you links.

These names fail:
gs_3737082_1211949629_980245_0
gs_3737082_1211949631_980276_0

These do not:
gs_3737082_1211949631_980277_0
gs_3737082_1211949631_980275_0
gs_3737082_1211949631_980278_0
gs_3737082_1211949629_980246_0
ID: 3526 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lazarus-uk

Send message
Joined: 22 Mar 08
Posts: 7
Credit: 9,175,991
RAC: 0
Message 3527 - Posted: 27 May 2008, 22:22:05 UTC - in response to Message 3525.  

the new w/u's with task names starting with gs_373082 etc will not run on my up to date Ubuntu Hardy box. They just sit at 0% and don't move?? Working on all other boxes with Hardy that hasn't been updated. Anyone else have this problem??



I'm also having lots of compute errors. Running BOINC 5.10.45 on Win XP Pro (SP3).

27/05/2008 23:11:16|Milkyway@home|Output file gs_3737082_1211945903_968750_0_0 for task gs_3737082_1211945903_968750_0 absent
ID: 3527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 29 Aug 07
Posts: 486
Credit: 576,548,171
RAC: 0
Message 3528 - Posted: 27 May 2008, 22:26:45 UTC

I haven't noticed any Problems with My Win Box's yet but some of my Ubuntu Box' are having Problems. I noticed 1 Wu had been running 58 Min's with no Progression and noticed a few other Box's with hung Wu's.

I aborted them and set to NNW until I see whats going on ... :)
ID: 3528 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[B^S]ST47

Send message
Joined: 9 Feb 08
Posts: 3
Credit: 126,332
RAC: 0
Message 3529 - Posted: 27 May 2008, 22:30:46 UTC

This is obviously a new problem, but app 1.22 has been out for a while. Interesting that it hangs on Linux boxes but fails immediately on Windows. That explains why I haven't found any compute errors on non-Windows OSes. I have a few good results that I'm going to allow to finish, then I'm going to no new work, at least until I'm available to test some more. Good luck to the dev team on this one!
ID: 3529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 29 Aug 07
Posts: 486
Credit: 576,548,171
RAC: 0
Message 3530 - Posted: 27 May 2008, 23:02:34 UTC
Last modified: 27 May 2008, 23:03:04 UTC

I've had 3 Wu's fail so far today (Nothing really out of the ordinary) on 10 Win Box's, other than that they seem to be running ok on those Box's. As long as they do I'll keep running them for now, but the Linux Box's will stay on NNW until we hear something form the Dev's on the matter ... :)
ID: 3530 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3531 - Posted: 28 May 2008, 0:04:28 UTC - in response to Message 3530.  

I've had 3 Wu's fail so far today (Nothing really out of the ordinary) on 10 Win Box's, other than that they seem to be running ok on those Box's. As long as they do I'll keep running them for now, but the Linux Box's will stay on NNW until we hear something form the Dev's on the matter ... :)


i'm taking a look into this to try and figure out what was wrong. it doesnt look like nate has done anything different than i have been doing. hopefully we'll figure this out shortly.
ID: 3531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 29 Aug 07
Posts: 486
Credit: 576,548,171
RAC: 0
Message 3532 - Posted: 28 May 2008, 0:40:11 UTC

It looks like they are failing on my Win Box's too Travis, I just wasn't getting that many was why i didn't see many Error's. But as I got more they failed > gs_3737082_ ones that is, the other ones are ok so far ... :)

I just went threw my Box's and Aborted the gs_3737082 ones so they can keep running that way at least.
ID: 3532 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [BMF] kdr_98

Send message
Joined: 28 Aug 07
Posts: 2
Credit: 2,084,862
RAC: 0
Message 3533 - Posted: 28 May 2008, 0:51:33 UTC
Last modified: 28 May 2008, 1:08:53 UTC

I have here several of those WU's as well there are now running more then 3h on a Q6600 running Ubuntu linux 64-bit (normal finishing time is about 4 or 5 mins).
I have aborted those WU's , then rest of the WU's are running normal.

wuid=30469638
The run time so far for this one is 2h 24 mins.
ID: 3533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 21 Dec 07
Posts: 69
Credit: 7,048,412
RAC: 0
Message 3534 - Posted: 28 May 2008, 1:56:28 UTC
Last modified: 28 May 2008, 1:59:20 UTC

Don't know if I have had any error out, but all my boxes that were crunching Milkyway overnight had results stuck at 0% after 2 to 5 hours of crunching. All of them had "gs_373082_" as the beginning of the result name. Others (gs_59#_) are working OK

All are running 64 bit Ubuntu (Hardy) with 64bit BOINC client.

They all worked fine yesterday and I have changed nothing at my end (no software updates installed overnight either).
Join the #1 Aussie Alliance on MilkyWay!
ID: 3534 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
voltron
Avatar

Send message
Joined: 30 Mar 08
Posts: 50
Credit: 11,593,755
RAC: 0
Message 3535 - Posted: 28 May 2008, 2:24:02 UTC - in response to Message 3534.  

Don't know if I have had any error out, but all my boxes that were crunching Milkyway overnight had results stuck at 0% after 2 to 5 hours of crunching. All of them had "gs_373082_" as the beginning of the result name. Others (gs_59#_) are working OK

All are running 64 bit Ubuntu (Hardy) with 64bit BOINC client.

They all worked fine yesterday and I have changed nothing at my end (no software updates installed overnight either).


I have aborted all the WU'S "373082" on my rigs. I hope the server stops sending them. This is not the time for the project to start playing that popular distributed computing game called "poison the code"

Voltron
ID: 3535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Snowdog of TSBT

Send message
Joined: 5 Feb 08
Posts: 3
Credit: 3,833,679
RAC: 0
Message 3536 - Posted: 28 May 2008, 6:21:55 UTC

Same problem here running ubuntu (fiesty fawn) on two machines 2 units 7hours+ and the other 2 9hours+ both sitting at 0%
ID: 3536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wang Solutions
Avatar

Send message
Joined: 22 Dec 07
Posts: 13
Credit: 46,606,530
RAC: 0
Message 3537 - Posted: 28 May 2008, 6:37:22 UTC
Last modified: 28 May 2008, 6:43:36 UTC

It could not be something as simple as the name of the text file associated with this series of work, could it? The other units all seem to link to files with the words convolved or unconvolved in them, and this one doesn't. It is just called stars_82.txt.
Proud member of BOINC@AUSTRALIA
ID: 3537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile niterobin

Send message
Joined: 11 Mar 08
Posts: 28
Credit: 818,194
RAC: 0
Message 3538 - Posted: 28 May 2008, 6:57:15 UTC

I've had a couple of failures here too on an AMD Duron, with Boinc 5.10.30 running on WinME.

Here's an example:

28/05/2008 07:45:47|Milkyway@home|Starting task gs_3737082_1211965821_1020895_1 using astronomy version 122
28/05/2008 07:45:48|Milkyway@home|Computation for task gs_3737082_1211965821_1020895_1 finished
28/05/2008 07:45:48|Milkyway@home|Output file gs_3737082_1211965821_1020895_1_0 for task gs_3737082_1211965821_1020895_1 absent


All failed after 1 second with similar output messages to the above.

And, just to confuse things, I've had a couple complete with no problems. They ran for 13 minutes exactly, rather than the old workunits' 13:46 - 13:48:

28/05/2008 07:18:56|Milkyway@home|Starting gs_3737082_1211974875_1041980_0
28/05/2008 07:18:56|Milkyway@home|Starting task gs_3737082_1211974875_1041980_0 using astronomy version 122
28/05/2008 07:18:59|Milkyway@home|Started upload of gs_3737082_1211974875_1041979_0_0
28/05/2008 07:19:03|Milkyway@home|Finished upload of gs_3737082_1211974875_1041979_0_0
28/05/2008 07:32:00|Milkyway@home|Computation for task gs_3737082_1211974875_1041980_0 finished
ID: 3538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 3539 - Posted: 28 May 2008, 7:11:47 UTC - in response to Message 3537.  
Last modified: 28 May 2008, 7:18:46 UTC

Well, here's the reason why it's not working (not an app issue)
(parameters_generated_1211968864_1027907)


number_parameters: 4
background_weight: 0.000000
background_parameters: 1.000000, nan, nan, 1.000000
background_h: 0.020000, 0.004000, 0.090000, 0.020000
background_mutation_range: 0.200000, 0.200000, 20.000000, 0.200000
background_parameter_min: 0.000000, 0.300000, 1.000000, 0.100000
background_parameter_max: 3.000000, 1.000000, 30.000000, 3.000000
optimize_parameter: false, true, true, false
number_streams: 1, 5
stream_weight: nan
stream_weight_h: 0.001000
stream_weight_mutation_range: 5.000000
stream_weight_min: -20.000000
stream_weight_max: 20.000000
optimize_weight: true
stream_parameters: nan, nan, nan, nan, nan
stream_h: 0.030000, 0.010000, 0.010000, 0.010000, 0.001000
stream_mutation_range: 20.000000, 10.000000, 0.500000, 0.500000, 1.000000
stream_parameter_min: -53.000000, 0.700000, 0.000000, 0.000000, 1.000000
stream_parameter_max: 76.000000, 45.700000, 6.283185, 6.283185, 20.000000
optimize_parameter: true, true, true, true, true
convolve: 30
wedge: 82
r_steps: 175
mu_steps: 400
nu_steps: 20
info: generated: 2807, rand: 0.051824

as you can see, the wus got messed up.
here's a copy of that wu so you can check it yourself
parameters_generated_1211968864_1027907 (gs_3737082_1211941340_952205_1)

Join Support science! Joinc Team BOINC United now!
ID: 3539 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 21 Dec 07
Posts: 69
Credit: 7,048,412
RAC: 0
Message 3540 - Posted: 28 May 2008, 7:22:27 UTC - in response to Message 3539.  
Last modified: 28 May 2008, 7:22:59 UTC

Well, here's the reason why it's not working (not an app issue)
(parameters_generated_1211968864_1027907)


And I guess that means "Not a Number" and is an error.

I've been aborting work units all day but can't be bothered baby-sitting my computers any further so have set MW to "nnw" (which is "No New Work") for now.
Join the #1 Aussie Alliance on MilkyWay!
ID: 3540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Odd-Rod

Send message
Joined: 7 Sep 07
Posts: 444
Credit: 5,712,523
RAC: 0
Message 3542 - Posted: 28 May 2008, 10:52:21 UTC - in response to Message 3537.  
Last modified: 28 May 2008, 11:06:36 UTC

It could not be something as simple as the name of the text file associated with this series of work, could it? The other units all seem to link to files with the words convolved or unconvolved in them, and this one doesn't. It is just called stars_82.txt.


A 5MB file named stars_82.txt was downloaded recently, so that is correct.
I was wondering if it was the cause, but as Crunch3r has posted, it appears to be the WU itself.

And as I type this another WU had an error, so No New Work till I see some posting from the admins.

Edit: Didn't mean that to sound harsh - I just won't have time to 'babysit' my hosts much over the next 2-3 days.

Hope to be crunching here again soon.
ID: 3542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jave200372

Send message
Joined: 25 May 08
Posts: 1
Credit: 8,471,005
RAC: 0
Message 3543 - Posted: 28 May 2008, 11:12:30 UTC

Well I'm glad that I'm not the only one.... I had one W/U go for just over 3 hours today with 0% progression. Also on P4 running Ubuntu Hardy.
ID: 3543 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KSMarksPsych
Avatar

Send message
Joined: 9 Sep 07
Posts: 22
Credit: 320,035
RAC: 0
Message 3544 - Posted: 28 May 2008, 11:13:50 UTC

I killed one on Fedora 7 x64 after it had run 9+ hours.
Kathryn :o)
The BOINC FAQ Service
The Unofficial BOINC Wiki
The Trac System
More BOINC information than you can shake a stick of RAM at.
ID: 3544 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
voltron
Avatar

Send message
Joined: 30 Mar 08
Posts: 50
Credit: 11,593,755
RAC: 0
Message 3545 - Posted: 28 May 2008, 11:44:00 UTC - in response to Message 3544.  

I killed one on Fedora 7 x64 after it had run 9+ hours.


Presumably, the class of noobie programmers that produced this junk will be correcting the code and reissuing them. So be prepared for another round of "weeding" your work que. Thanks Nate!

Voltron
ID: 3545 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Problem with new W/Us

©2024 Astroinformatics Group