Welcome to MilkyWay@home

Problem with new W/Us

Message boards : Number crunching : Problem with new W/Us
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
stefsaber

Send message
Joined: 2 Apr 08
Posts: 32
Credit: 1,017,362
RAC: 0
Message 3546 - Posted: 28 May 2008, 12:43:55 UTC

I noticed for a few days that my Windows machine seemed to hang up on WU's, after the 19th I have to kill a few, and I can't get a steady stream of results from it.

And now I've noticed my mac has been working on a new WU for 48+ minutes. Looks like I've had a few WU's finish in 3900 seconds as opposed to 400 seconds, which I'm fine with if they return valid results. The only thing that's a little irksome is the fact that the credit dishout is the same. 4.07 for a 400 second WU vs 4.07 for a 3900 second WU...
ID: 3546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Sysadm@Nbg
Avatar

Send message
Joined: 24 Jan 08
Posts: 6
Credit: 14,836
RAC: 0
Message 3547 - Posted: 28 May 2008, 13:30:01 UTC - in response to Message 3546.  

Just i killed one on Ubuntu Hardy: it was running a couple of hours and no progress. All others on my other machines hasn´t running yet, so I killed them, before they waste CPU

And at the moment: "no new work" accepted from Milky, sorry !!

Sysadm@Nbg
Member of Team

ID: 3547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile alijay

Send message
Joined: 15 Apr 08
Posts: 55
Credit: 24,047
RAC: 0
Message 3548 - Posted: 28 May 2008, 13:34:47 UTC - in response to Message 3546.  

I noticed for a few days that my Windows machine seemed to hang up on WU's, after the 19th I have to kill a few, and I can't get a steady stream of results from it.

And now I've noticed my mac has been working on a new WU for 48+ minutes. Looks like I've had a few WU's finish in 3900 seconds as opposed to 400 seconds, which I'm fine with if they return valid results. The only thing that's a little irksome is the fact that the credit dishout is the same. 4.07 for a 400 second WU vs 4.07 for a 3900 second WU...


This look like a different problem as the new WUs have only been available for the past day and when they do run they seem to be fractionallly faster - approx 13 min cpu time

My windows machine errors out in a matter of seconds with the new WUs
ID: 3548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dave Przybylo
Avatar

Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 3549 - Posted: 28 May 2008, 14:27:07 UTC - in response to Message 3548.  

I noticed for a few days that my Windows machine seemed to hang up on WU's, after the 19th I have to kill a few, and I can't get a steady stream of results from it.

And now I've noticed my mac has been working on a new WU for 48+ minutes. Looks like I've had a few WU's finish in 3900 seconds as opposed to 400 seconds, which I'm fine with if they return valid results. The only thing that's a little irksome is the fact that the credit dishout is the same. 4.07 for a 400 second WU vs 4.07 for a 3900 second WU...


This look like a different problem as the new WUs have only been available for the past day and when they do run they seem to be fractionallly faster - approx 13 min cpu time

My windows machine errors out in a matter of seconds with the new WUs



This is understandable due to the NaNs. We are pulling all those erroring WUs from the database as we speak. Sorry about the inconvenience.
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute
ID: 3549 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Saenger
Avatar

Send message
Joined: 28 Aug 07
Posts: 133
Credit: 29,423,179
RAC: 0
Message 3550 - Posted: 28 May 2008, 16:01:24 UTC
Last modified: 28 May 2008, 16:21:28 UTC

I've got two 2h-ones yesterday, and one 4h-one just now. I've killed them, have the project set to NNW and am just waiting to report them back before resetting the project. Or is there no need to report them?
Grüße vom Sänger
ID: 3550 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3551 - Posted: 28 May 2008, 16:37:59 UTC - in response to Message 3550.  

I've got two 2h-ones yesterday, and one 4h-one just now. I've killed them, have the project set to NNW and am just waiting to report them back before resetting the project. Or is there no need to report them?


we might have found the problem. we had the wrong script set up to dump the database every few days -- so while i took the server down this morning to get rid of the ones from the new search, the database was too slow to do anything.

i'm running a full purge (which is gonna take quite awhile), but after that i'll get the remaining gs_3737... WUs out of the system. i've already stopped the search so there wont be any new ones generated. hopefully they'll get out of the systems quickly.
ID: 3551 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
stefsaber

Send message
Joined: 2 Apr 08
Posts: 32
Credit: 1,017,362
RAC: 0
Message 3554 - Posted: 28 May 2008, 19:37:26 UTC - in response to Message 3551.  

I've got two 2h-ones yesterday, and one 4h-one just now. I've killed them, have the project set to NNW and am just waiting to report them back before resetting the project. Or is there no need to report them?


we might have found the problem. we had the wrong script set up to dump the database every few days -- so while i took the server down this morning to get rid of the ones from the new search, the database was too slow to do anything.

i'm running a full purge (which is gonna take quite awhile), but after that i'll get the remaining gs_3737... WUs out of the system. i've already stopped the search so there wont be any new ones generated. hopefully they'll get out of the systems quickly.


Thanks for the update! Looking forward to getting on with the new WU's once it sorts out :)
ID: 3554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dingo
Avatar

Send message
Joined: 28 Aug 07
Posts: 35
Credit: 89,063,438
RAC: 0
Message 3557 - Posted: 28 May 2008, 22:48:39 UTC - in response to Message 3554.  
Last modified: 28 May 2008, 22:53:04 UTC

I've got two 2h-ones yesterday, and one 4h-one just now. I've killed them, have the project set to NNW and am just waiting to report them back before resetting the project. Or is there no need to report them?


we might have found the problem. we had the wrong script set up to dump the database every few days -- so while i took the server down this morning to get rid of the ones from the new search, the database was too slow to do anything.

i'm running a full purge (which is gonna take quite awhile), but after that i'll get the remaining gs_3737... WUs out of the system. i've already stopped the search so there wont be any new ones generated. hopefully they'll get out of the systems quickly.


Thanks for the update! Looking forward to getting on with the new WU's once it sorts out :)



I just got three of the 3737 wu's, I thought that they were all flushed from the system ????

EDIT: All my PC's are getting them now. I have set to "No new work" again till it is sorted out.

Proud Founder and member of



Have a look at my WebCam
ID: 3557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 3558 - Posted: 28 May 2008, 22:58:35 UTC - in response to Message 3557.  

I'm getting gs_59x ones; perhaps the problem has been resolved.
ID: 3558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 3559 - Posted: 28 May 2008, 23:00:06 UTC - in response to Message 3558.  

I'm getting gs_59x ones; perhaps the problem has been resolved.

NO, they are mixed together ...

Windows systems seem to error on these bad Tasks immediately ...

My OS-X computers seem to run them for an hour and they complete and validate ... oddly enough ...
ID: 3559 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 3560 - Posted: 28 May 2008, 23:30:26 UTC
Last modified: 29 May 2008, 0:29:45 UTC

I just got some 3737's mixed in...not many....2 of 20.
ID: 3560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mgpower0
Avatar

Send message
Joined: 13 Oct 07
Posts: 12
Credit: 1,130,149
RAC: 0
Message 3561 - Posted: 28 May 2008, 23:31:40 UTC

They are slowly getting rid of the bad W/U's with less and less coming through, I seem to be having a minor problem with validation (as in it ain't happening) getting quite a list of pending, but this may be due to them trying to weed out the bad w/u's???
ID: 3561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Labbie
Avatar

Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 3562 - Posted: 29 May 2008, 0:08:00 UTC

I've got a bunch of pendings too.


Calm Chaos Forum...Join Calm Chaos Now
ID: 3562 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dave Przybylo
Avatar

Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 3563 - Posted: 29 May 2008, 0:55:21 UTC - in response to Message 3562.  

I've got a bunch of pendings too.


I believe when we took them out of the database, the bad workunits when and if they finish can't validate because BOINC doesn't know what they are. It thinks you're giving it a workunit that doesn't exist.
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute
ID: 3563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skip Da Shu
Avatar

Send message
Joined: 11 Apr 08
Posts: 82
Credit: 62,568,296
RAC: 59,480
Message 3564 - Posted: 29 May 2008, 1:34:02 UTC
Last modified: 29 May 2008, 2:22:41 UTC

Task ID 30924039
Workunit 30447102
33014.631285
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
Maximum CPU time exceeded
</message>
<stderr_txt>

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 302.729235230077
Granted credit 0
application version 1.23

Dang thought I'd found the one that ran for 10 hours + but this only comes up to 9 or so hours. Xubuntu 64b, v8.04

UPDATED: Here we go...
Task ID 30921915
Name gs_3737082_1211945903_968749_3
Workunit 30465650
CPU time 37535.465819

You know what I wanna whine about... but will resist.

Signed, the credit whore


EDIT2: Oh so so sad... two more 9 hour WUs, a 10hr 45min one and my entry for the grand booby prize... over 11hours, 10min, 48 seconds on a 2.55GHz Phenom:

Task ID 30896339
Name gs_3737082_1211948989_978283_1
Workunit 30475184
CPU time 40248.863396

Did I win, huh? huh?
ID: 3564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 2 Jan 08
Posts: 123
Credit: 69,816,057
RAC: 704
Message 3568 - Posted: 29 May 2008, 11:35:49 UTC - in response to Message 3564.  
Last modified: 29 May 2008, 11:37:48 UTC

Task ID 30924039
Workunit 30447102
33014.631285
stderr out

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
Maximum CPU time exceeded
</message>
<stderr_txt>

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 302.729235230077
Granted credit 0
application version 1.23

Dang thought I'd found the one that ran for 10 hours + but this only comes up to 9 or so hours. Xubuntu 64b, v8.04

UPDATED: Here we go...
Task ID 30921915
Name gs_3737082_1211945903_968749_3
Workunit 30465650
CPU time 37535.465819

You know what I wanna whine about... but will resist.

Signed, the credit whore


EDIT2: Oh so so sad... two more 9 hour WUs, a 10hr 45min one and my entry for the grand booby prize... over 11hours, 10min, 48 seconds on a 2.55GHz Phenom:

Task ID 30896339
Name gs_3737082_1211948989_978283_1
Workunit 30475184
CPU time 40248.863396

Did I win, huh? huh?


As they say in the game of Pool, Foul Stroke - Four Away, I believe I have pipped you at the post there Skip Da Shu, Try 13.5 hours:

Task ID 30899627
Name gs_3737082_1211952813_988295_0
Workunit 30485196
Created 27 May 2008 23:08:45 UTC
Sent 27 May 2008 23:09:38 UTC
Received 28 May 2008 19:16:42 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -177 (0xffffffffffffff4f)
Computer ID 4778
Report deadline 1 Jun 2008 23:09:38 UTC
CPU time 48620.108619
stderr out

<core_client_version>5.10.21</core_client_version>
<![CDATA[
<message>
Maximum CPU time exceeded
</message>
<stderr_txt>

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 165.045104245639
Granted credit 0
application version 1.22

I have a few more that are up to 1.5 hours now so I will abort them, all on my Linux system (Fedora Core 6), AMD Opteron 285.
ID: 3568 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 29 Aug 07
Posts: 486
Credit: 576,548,171
RAC: 0
Message 3569 - Posted: 29 May 2008, 12:18:00 UTC - in response to Message 3563.  
Last modified: 29 May 2008, 12:38:57 UTC

I've got a bunch of pendings too.


I believe when we took them out of the database, the bad workunits when and if they finish can't validate because BOINC doesn't know what they are. It thinks you're giving it a workunit that doesn't exist.


You haven't got them out yet, I've been getting them as fast as I can abort them, I Abort 4 & get 4 back again, set to NNW for now again, too much work having to go thru all that just to get a few good ones ... 0_o
ID: 3569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mgpower0
Avatar

Send message
Joined: 13 Oct 07
Posts: 12
Credit: 1,130,149
RAC: 0
Message 3570 - Posted: 29 May 2008, 12:51:04 UTC

had the same for the last 30-40 min, but the last few updates haven't had any. They seem to be coming in waves, about an hour before I will have to stop monitoring my rigs, not sure im going to let MW run unattended, if I don't get anymore in the next hour I may risk it :)
I'm sure I will regret this decision :)
ID: 3570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile hans_lion

Send message
Joined: 5 Mar 08
Posts: 5
Credit: 66,164
RAC: 0
Message 3571 - Posted: 29 May 2008, 12:52:51 UTC
Last modified: 29 May 2008, 13:07:49 UTC

Got a task from this WU.
name: gs_3737082_1211956623_998318_3
WU name: gs_3737082_1211956623_998318
app version num: 122
checkpoint CPU time: 0.000000
current CPU time: 1258.242634
fraction done: 0.000000

According to the posts above it's funny that
the reverse of the fraction-done-value gives ...
NaN :-)

EDIT: when I'd reported my aborted task,
the next one was created immediately afterwards.

This may be overcome, as seen on SAH beta, by setting max#oftotaltasks to zero in the concerned WU's.

EDIT2: Anonymous got it onto it's host.
ID: 3571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Odd-Rod

Send message
Joined: 7 Sep 07
Posts: 444
Credit: 5,712,523
RAC: 0
Message 3572 - Posted: 29 May 2008, 14:46:41 UTC
Last modified: 29 May 2008, 14:52:30 UTC

Since Windows WUs error out quickly, rather than taking many hours to get nowhere as they do on Linux, I've enabled Work Fetch on 2 of my boxes to help clear them out. Apart from lost credit, I'm hoping there won't be any problems.

You see, I don't only crunch for credits...

Rod
ID: 3572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Problem with new W/Us

©2024 Astroinformatics Group