Welcome to MilkyWay@home

Broken WUs

Message boards : Number crunching : Broken WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Stevea

Send message
Joined: 14 Jul 08
Posts: 50
Credit: 8,398,033
RAC: 0
Message 31038 - Posted: 16 Sep 2009, 3:37:53 UTC
Last modified: 16 Sep 2009, 3:39:46 UTC

Just my quad, and these wu's - de_constrainted_82_2s_6_

I think all these are Completed, marked as invalid



<core_client_version>6.6.36</core_client_version>
<![CDATA[
<stderr_txt>
Running Milkyway@home version 0.19 by Gipsel
CPU: Intel(R) Core(TM)2 Quad CPU Q9650 @ 3.00GHz (4 cores/threads) 3.73704 GHz (386ms)

WU completed. It took 1234.65 seconds CPU time and 1265.4 seconds wall clock time @ 3.73704 GHz.

</stderr_txt>
]]>


This is a watercooled box that has never errored out a wu before these came out...

This needs to be worked out.. rac from 23,000 now 16,000 and these are not helping matters
ID: 31038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John R. @ SETI.USA

Send message
Joined: 1 Jan 09
Posts: 15
Credit: 85,816,654
RAC: 0
Message 31042 - Posted: 16 Sep 2009, 4:09:35 UTC

I certainly appreciate all the work being done to correct this problem.

It looks like I'm still getting the dreaded invalid tags on my quads.

I'm not having any problems on the two GPU machines, or the single cores that I'm running.......just the quads.

I think if I am still seeing the same problem in the morning, I shall let the quaddies do all their work for SETI until you smart guys can figure it out.
ID: 31042 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 31043 - Posted: 16 Sep 2009, 4:15:30 UTC - in response to Message 31042.  

I certainly appreciate all the work being done to correct this problem.

It looks like I'm still getting the dreaded invalid tags on my quads.

I'm not having any problems on the two GPU machines, or the single cores that I'm running.......just the quads.

I think if I am still seeing the same problem in the morning, I shall let the quaddies do all their work for SETI until you smart guys can figure it out.


This is weird, because looking at the validator log, it's showing all of your reported workunits as valid.
ID: 31043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 31044 - Posted: 16 Sep 2009, 4:23:41 UTC - in response to Message 31043.  

This is weird, because looking at the validator log, it's showing all of your reported workunits as valid.

Maybe the validator update you mentioned in your post 3:37 UTC already helped?
When I look at this system of John he returned a whole load of WUs at 3:56 UTC and there all the 82_2s_6 WUs also validated!
ID: 31044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 31045 - Posted: 16 Sep 2009, 4:26:17 UTC - in response to Message 31044.  

This is weird, because looking at the validator log, it's showing all of your reported workunits as valid.

Maybe the validator update you mentioned in your post 3:37 UTC already helped?
When I look at this system of John he returned a whole load of WUs at 3:56 UTC and there all the 82_2s_6 WUs also validated!


Would be cool if it did :) Hopefully it's not because he moved his quads elsewheres and they're not reporting WUs anymore, lol.
ID: 31045 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 31046 - Posted: 16 Sep 2009, 4:28:57 UTC - in response to Message 31045.  
Last modified: 16 Sep 2009, 4:29:55 UTC

This is weird, because looking at the validator log, it's showing all of your reported workunits as valid.

Maybe the validator update you mentioned in your post 3:37 UTC already helped?
When I look at this system of John he returned a whole load of WUs at 3:56 UTC and there all the 82_2s_6 WUs also validated!

Would be cool if it did :)

Stevea's quadcore here shows the same behaviour, WUs returned at 3:21 UTC were marked invalid, the 82_2s_6 WUs reported at 4:23 UTC were all valid.

So what was the problem you solved?
ID: 31046 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John R. @ SETI.USA

Send message
Joined: 1 Jan 09
Posts: 15
Credit: 85,816,654
RAC: 0
Message 31047 - Posted: 16 Sep 2009, 4:32:14 UTC

Nah, haven't moved 'em yet..........

Just really really dislike seeing all those 0's.......

I'll let 'em run so y'all can see if your tweaks helped.


ID: 31047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 31049 - Posted: 16 Sep 2009, 4:50:18 UTC - in response to Message 31047.  

Nah, haven't moved 'em yet..........

Just really really dislike seeing all those 0's.......

I'll let 'em run so y'all can see if your tweaks helped.



Mind sending me the hostid's of the machines with problems? I've been looking at WUs from your user id, but if the hosts are under a different one that might be why I haven't seen any issues.
ID: 31049 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 31052 - Posted: 16 Sep 2009, 5:30:37 UTC - in response to Message 31049.  

Nah, haven't moved 'em yet..........

Just really really dislike seeing all those 0's.......

I'll let 'em run so y'all can see if your tweaks helped.



Mind sending me the hostid's of the machines with problems? I've been looking at WUs from your user id, but if the hosts are under a different one that might be why I haven't seen any issues.


http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=42033&offset=0&show_names=0&state=4

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=45681&offset=0&show_names=0&state=4

His other Quad last had invalids on the 13th.
ID: 31052 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 31054 - Posted: 16 Sep 2009, 5:53:44 UTC - in response to Message 31047.  

Nah, haven't moved 'em yet..........

Just really really dislike seeing all those 0's.......

I'll let 'em run so y'all can see if your tweaks helped.


Haven't seen any errors yet, maybe I fixed it :)
ID: 31054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 31055 - Posted: 16 Sep 2009, 5:55:38 UTC - in response to Message 31052.  

Nah, haven't moved 'em yet..........

Just really really dislike seeing all those 0's.......

I'll let 'em run so y'all can see if your tweaks helped.



Mind sending me the hostid's of the machines with problems? I've been looking at WUs from your user id, but if the hosts are under a different one that might be why I haven't seen any issues.


http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=42033&offset=0&show_names=0&state=4

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=45681&offset=0&show_names=0&state=4

His other Quad last had invalids on the 13th.


Pretty sure the errors were from before the update.
ID: 31055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [Russia] michs

Send message
Joined: 16 Oct 08
Posts: 18
Credit: 164,409,593
RAC: 0
Message 31056 - Posted: 16 Sep 2009, 6:06:01 UTC - in response to Message 31055.  

ok! all new wu validating correctly!
ID: 31056 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 31058 - Posted: 16 Sep 2009, 7:48:37 UTC

As you know I down clocked my CPU only quad, and this still produced a small number of invalid _2s_6_ WU results. I cannot go further back than about 2.30am. But all the work done after 3.30 am has validated without any exceptions so far, and I see the cache is holding the WUs which were becoming invalid.

I think Travis may have tweaked and solved?
Go away, I was asleep


ID: 31058 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 31065 - Posted: 16 Sep 2009, 11:56:48 UTC

Looks like there was some time limit for CPU crunched WUs in the validator. If a WU took less than 1500 seconds, it was declared invalid. That is also the reason why only the shorter _2s WUs on faster machines were affected and 0.20 in fact worsened the situation as it is needs ~10% or so less time.

But as Travis has that fixed now, there is nothing holding you back I hope.
ID: 31065 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John R. @ SETI.USA

Send message
Joined: 1 Jan 09
Posts: 15
Credit: 85,816,654
RAC: 0
Message 31067 - Posted: 16 Sep 2009, 12:41:52 UTC

WOOOOOHOOOOOO...............

That tweak seems to have fixed the problem.....

Heh......now I'm wondering why the GPUs, which reported in far less time, weren't being rated as invalid too?

I surely am grateful to you smart guys for the fast response to this problem.

As they say down here in Dixie......I'm as happy as a dead pig in the sunshine....
ID: 31067 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Stevea

Send message
Joined: 14 Jul 08
Posts: 50
Credit: 8,398,033
RAC: 0
Message 31094 - Posted: 17 Sep 2009, 2:09:05 UTC - in response to Message 31054.  



Haven't seen any errors yet, maybe I fixed it :)


Seems ok now... time will tell
ID: 31094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 31097 - Posted: 17 Sep 2009, 4:01:12 UTC

Um, on GPU de_constrainted_82_3s_6_1186629_1253159606_0:

First task i have fail in like forever ...


<core_client_version>6.10.3</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Running Milkyway@home ATI GPU application version 0.20 (Win32, SSE2) by Gipsel
ignoring unknown input argument in app_info.xml: --device
ignoring unknown input argument in app_info.xml: 0
Couldn't find input file [astronomy_parameters.txt] to read astronomy parameters.
APP: error reading astronomy parameters: 1

</stderr_txt>
]]>
ID: 31097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 31098 - Posted: 17 Sep 2009, 4:12:16 UTC - in response to Message 31097.  

Um, on GPU de_constrainted_82_3s_6_1186629_1253159606_0:

First task i have fail in like forever ...


<core_client_version>6.10.3</core_client_version>
<![CDATA[
[..]
Couldn't find input file [astronomy_parameters.txt] to read astronomy parameters.
APP: error reading astronomy parameters: 1

</stderr_txt>
]]>

Looks to be some kind of a download error. An input parameter file is simply missing.
ID: 31098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 31112 - Posted: 17 Sep 2009, 12:27:15 UTC - in response to Message 31098.  

Looks to be some kind of a download error. An input parameter file is simply missing.

My guess too ... but one never knows so I report errors when possible ...
ID: 31112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 31116 - Posted: 17 Sep 2009, 13:39:35 UTC - in response to Message 31112.  

Looks to be some kind of a download error. An input parameter file is simply missing.

My guess too ... but one never knows so I report errors when possible ...

I just see you are using the 6.10.3 client. With 6.10.4 these errors are quite common (~10% of the WUs affected), so maybe it happens also with 6.10.3, just not that often? It's the client's task to put files with symbolic links to the real input files into the slot directories. For me it appears 6.10.4 has real problems to do that right for some reason.

But it is supposedly fixed in the 6.10.5 preview version Cruch3r compiled from the source in the svn trunk. By the way, they are messing around a lot with the scheduler again according to the checkin notes ;)
ID: 31116 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Broken WUs

©2024 Astroinformatics Group