Validate errors
log in

Advanced search

Message boards : Number crunching : Validate errors

1 · 2 · Next
Author Message
Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46750 - Posted: 27 Mar 2011, 11:36:41 UTC

I am getting plenty of validate errors lately. Could someone look into the server to see what is going on? Example work units that are getting validate errors for their results include work units 260747742, 260762949, 260774277, and 260745047.

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46751 - Posted: 27 Mar 2011, 11:45:23 UTC
Last modified: 27 Mar 2011, 11:45:32 UTC

Another work unit that resulted in validate errors is work unit 260747648.

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46752 - Posted: 27 Mar 2011, 12:02:53 UTC

New validate error: work unit 260786638

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46753 - Posted: 27 Mar 2011, 12:05:25 UTC
Last modified: 27 Mar 2011, 12:05:43 UTC

And another one bites the dust due to validate errors: work unit 260784171

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46754 - Posted: 27 Mar 2011, 12:10:11 UTC

One more looking like it will die due to validate error: 260791824

I will not report more soon because I am draining my work unit queue in preparation for a BOINC upgrade because BOINC 6.10.60 just got released, and my MilkyWay@home queue got drained.

Profile krahulik
Send message
Joined: 7 Nov 08
Posts: 14
Credit: 179,303,710
RAC: 0
Message 46756 - Posted: 27 Mar 2011, 12:40:42 UTC

I also have new validate errors (error rate is between 10-15 %).

Profile kashi
Send message
Joined: 30 Dec 07
Posts: 309
Credit: 148,432,104
RAC: 0
Message 46757 - Posted: 27 Mar 2011, 12:48:33 UTC

Yes I think they are all de_separation_10_3s_free_1 type. May also have max # of error/total/success tasks of 1, 9, 6. Been happening for some time now, I aborted many before switching to another project.

cncguru
Avatar
Send message
Joined: 11 Jun 10
Posts: 329
Credit: 1,166,219,987
RAC: 0
Message 46758 - Posted: 27 Mar 2011, 12:48:34 UTC

OK I am getting loads of validate errors on all 5 of mine!
One, maybe too much overclocking, 5?
we have a serverside problem methinks!

Len LE/GE
Send message
Joined: 8 Feb 08
Posts: 235
Credit: 88,023,145
RAC: 36,454
Message 46761 - Posted: 27 Mar 2011, 13:23:34 UTC
Last modified: 27 Mar 2011, 13:23:49 UTC

Yep, de_separation_10_3s_free_1 are causing trouble.
Checked a couple of them and had like half of them not validating :(

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,364,343
RAC: 18,216
Message 46762 - Posted: 27 Mar 2011, 13:58:41 UTC
Last modified: 27 Mar 2011, 13:59:43 UTC

Been away from the Project for a while, restarted today - reckon I hexed it :)

Same here - its the _10_3s_free WUs. Happening on both machines on 5850s as well as 5970s. Validate errors mostly - couple of other types, but those small number of "non _free" types was me settling in. Runs to about 40% dead WUs overall.

Majority of the those _10_3s_free types falling over. Going to switch Project's for a short while until this one settles back again.

Regards
Zy

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,364,343
RAC: 18,216
Message 46763 - Posted: 27 Mar 2011, 14:06:50 UTC
Last modified: 27 Mar 2011, 14:14:36 UTC

Trying one more thing .... set nnt on both machines and aborted all _free's, see if the ones left go through error free - I suspect so, but giving it a whirl, should nail it one way of the other.

EDIT
Its definitely the _10_3s_free's. I aborted those in the queue on both machines, and validate errors stopped, all other types went through no bother. Something is amiss with the _free's, and need stopping at the server

Regards
Zy

vandiesel
Send message
Joined: 10 May 10
Posts: 27
Credit: 43,104,187
RAC: 0
Message 46764 - Posted: 27 Mar 2011, 14:16:51 UTC
Last modified: 27 Mar 2011, 14:18:04 UTC

same here both machines 2x4870 1x6950

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,364,343
RAC: 18,216
Message 46765 - Posted: 27 Mar 2011, 14:27:27 UTC
Last modified: 27 Mar 2011, 14:32:15 UTC

Watch Out with these ...... I was originally worried re temperatures. I monitored the non free's and they are normal, the _frees are all crunching at around +10 degrees above normal on the card VRMs.

That will put the card into very hot territory without the User knowing unless the card VRM is being monitored. The card will show a slightly upped GPU temperature, but still appears ok, however a check on the VRM temperatures is a different matter.

The _free's are heating up the VRMs alarmingly - if you are still running the WUs, watch your VRMs like a hawk.

Its time to freeze all WUs, its too dangerous for VRMs on the _free's.

Regards
Zy

Mark Doom
Send message
Joined: 27 Mar 11
Posts: 1
Credit: 309,533
RAC: 0
Message 46766 - Posted: 27 Mar 2011, 19:46:04 UTC

I've been having all the same issues today.

I have been just keeping an eye on it though and aborting any WU that comes up as the "free_1" name.

Seems everything else is working just fine.

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46775 - Posted: 28 Mar 2011, 2:06:52 UTC

If you have a Radeon HD 6xxx series card, you should not have to abort results due to the risk of overheating the VRM. AMD implemented PowerTune in these cards to automatically underclock the GPU by various levels if the code being sent to it would force the GPU over power usage limits until it reaches an underclocking level that would allow it to run under power usage limits, and then restore the clock once the power-hungry code goes away.

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46778 - Posted: 28 Mar 2011, 5:23:54 UTC

Two more validate errors: work units 261192073 and 261168214

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46779 - Posted: 28 Mar 2011, 6:49:02 UTC

One more: work unit 261208862

Jesse Viviano
Send message
Joined: 4 Feb 11
Posts: 60
Credit: 25,198,767
RAC: 0
Message 46780 - Posted: 28 Mar 2011, 8:00:57 UTC

Two more: 261218718 and 261221074

Profile The Gas Giant
Avatar
Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,865,573
RAC: 0
Message 46781 - Posted: 28 Mar 2011, 9:59:37 UTC

Yep, all free_ wu's have a validate error. Off to DNETC until this gets resolved...

Sunny129
Avatar
Send message
Joined: 25 Jan 11
Posts: 250
Credit: 179,153,312
RAC: 530,810
Message 46782 - Posted: 28 Mar 2011, 13:12:59 UTC - in response to Message 46781.
Last modified: 28 Mar 2011, 13:13:15 UTC

Yep, all free_ wu's have a validate error. Off to DNETC until this gets resolved...

so its not just the "_10_3s_free" WUs? its all "_free" WUs? the reason i ask is b/c my que of MW@H tasks are all de separation tasks at the moment, most of which are "_10_3s_free" WUs, and the few remaining are "_13_3s_free" WUs. i'm currently at work, and my host in question is at home. the project is also currently suspended, so i can't do any experimenting or troubleshooting at the moment.
____________

1 · 2 · Next
Post to thread

Message boards : Number crunching : Validate errors


Main page · Your account · Message boards


Copyright © 2013 AstroInformatics Group