Validate Error
log in

Advanced search

Message boards : Number crunching : Validate Error

Author Message
RobbieJ
Send message
Joined: 13 Feb 09
Posts: 1
Credit: 338,168
RAC: 530
Message 52321 - Posted: 8 Jan 2012 | 21:05:11 UTC

All my work units on a new CPU & Motherboard are coming up with a validate error on them. Is there any way I can determine whats wrong?

RobbieJ

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,704,253
RAC: 0
Message 52330 - Posted: 8 Jan 2012 | 22:35:27 UTC - in response to Message 52321.

All my work units on a new CPU & Motherboard are coming up with a validate error on them. Is there any way I can determine whats wrong?

RobbieJ
You're using an antique version which doesn't report results the same way as newer versions. You need to update to something newer.

Profile Chris
Avatar
Send message
Joined: 16 Dec 10
Posts: 42
Credit: 92,861,127
RAC: 181,344
Message 52378 - Posted: 11 Jan 2012 | 9:24:05 UTC

Since the new server is on i have again some validate errors on GPU tasks... not so much, but more than last time on old server...
____________


Len LE/GE
Send message
Joined: 8 Feb 08
Posts: 232
Credit: 86,879,903
RAC: 43,187
Message 52382 - Posted: 11 Jan 2012 | 12:18:48 UTC

WU 49477159

GPU (ATI) not validated against linux 32bit and linux 64bit.
Before this one I haven't seen an invalid result in ages on this gpu.

StrongARM
Send message
Joined: 28 Dec 11
Posts: 4
Credit: 1,289
RAC: 0
Message 52468 - Posted: 14 Jan 2012 | 2:05:06 UTC

I have had a validate error but it seems it was due to a coding issues (due to the error log). Now I wouldn't be so peeved or bother mentioning it but for the fact that clearly a file was missing but it went ahead and processed for 89,170.97 seconds anyway (despite likely producing an invalid unit) and then it turns out that the canonical result that it chose was 2.92 seconds of processing... that might explain a few of the problems people have had here... check your error logs for the work unit (!) On the other hand I would have said that there is a fundamental flaw in attempting to go ahead with processing a unit with the incorrect set of files since that is surely likely to produce an invalid result!!! (And I believe this may have been the second time this has happened to me but I can't find any of the old results to check (and two out of four WU makes me think my processing time is better spent elsewhere :(

WU 48513484

StrongARM

StrongARM
Send message
Joined: 28 Dec 11
Posts: 4
Credit: 1,289
RAC: 0
Message 52469 - Posted: 14 Jan 2012 | 2:11:07 UTC

I should had added this above - it is the Stderr output from my WU...

Stderr output

<core_client_version>6.12.34</core_client_version>
<![CDATA[
<stderr_txt>
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path

</stderr_txt>
]]>


StrongARM

StrongARM
Send message
Joined: 28 Dec 11
Posts: 4
Credit: 1,289
RAC: 0
Message 52580 - Posted: 20 Jan 2012 | 8:33:23 UTC

Can anyone suggest what this means? It'd be useful to know if it is a common problem (or even non-problem) and whether I can start looking at doing more work for Milkyway@Home or just assume that a lot of my work will always be thrown away...

Thank you for any help or advice,

StrongARM

Len LE/GE
Send message
Joined: 8 Feb 08
Posts: 232
Credit: 86,879,903
RAC: 43,187
Message 52582 - Posted: 20 Jan 2012 | 10:49:12 UTC

That message about the Lua script is a bit missleading.
You can ignore it. Everyone sees it.
At the beginning the prog checks different ways how the parameters are given.
That's where it comes from. It's more an annoyance than a real error.
The actual prog is getting it's parameters via comandline instead of a separate file. And the result is sent back via stderr instead of a result file.
Makes it a bit harder for you to find them (hidden in the logs) on your computer but is far easier for the server to handle the heavy load.

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,704,253
RAC: 0
Message 52585 - Posted: 20 Jan 2012 | 22:18:52 UTC - in response to Message 52580.

In some cases the stderr from BOINC is truncated or missing completely. I think I mostly fixed the problem for future updates

Irishgeezah
Avatar
Send message
Joined: 10 Nov 07
Posts: 37
Credit: 5,888,399
RAC: 18,013
Message 52643 - Posted: 26 Jan 2012 | 14:30:24 UTC

I've taken one of my machines off M@H altogether due to validation errors, 78,000 seconds cpu time for 0 credit from the results that haven't been purged :(

Len LE/GE
Send message
Joined: 8 Feb 08
Posts: 232
Credit: 86,879,903
RAC: 43,187
Message 52662 - Posted: 26 Jan 2012 | 22:19:22 UTC - in response to Message 52643.

I've taken one of my machines off M@H altogether due to validation errors, 78,000 seconds cpu time for 0 credit from the results that haven't been purged :(


Those 3 invalids (AMD cpu, SSE3 path) are all separation WUs validated against 2 ATI gpus.
Have seen something similar before (2 Linux valid, 1 ATI invalid).
Wonder if it's a precision problem between the applications for different hardware or a problem of the validator.

StrongARM
Send message
Joined: 28 Dec 11
Posts: 4
Credit: 1,289
RAC: 0
Message 52670 - Posted: 27 Jan 2012 | 2:43:50 UTC - in response to Message 52582.

Thank you for clearing up the error messages.

It's a shame to receive the validation errors though, I've not much of a processor to be crunching with and the time is precious and I remember the teething troubles Seti@home had with verifying Intel against AMD way back, I guess the GPU / CPU thing is going to cause a few projects the same thing... I'll just have to stick to the ones that don't do GPUs yet or are short WUs *sigh* but I like doing the Astronomy ones...

Oh well, if things get better I'll be back :)

StrongARM

Profile Paul John
Send message
Joined: 1 Jan 12
Posts: 6
Credit: 178,436
RAC: 316
Message 52716 - Posted: 28 Jan 2012 | 18:38:21 UTC

Can anyone tell me why over half of my large Milkyway@home v0.88 uploads have validate errors? They are 21, 28 and 42 thousand runtimes each so its rather disappointing to see them fail like that. Thanks.

Profile Beyond
Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,389
RAC: 4
Message 52722 - Posted: 29 Jan 2012 | 15:58:30 UTC - in response to Message 52716.

Can anyone tell me why over half of my large Milkyway@home v0.88 uploads have validate errors? They are 21, 28 and 42 thousand runtimes each so its rather disappointing to see them fail like that. Thanks.

Might take a stab at it but your computers are hidden. Validate errors are a problem here though and seem to be climbing again :(

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,346,915
RAC: 284,602
Message 52727 - Posted: 29 Jan 2012 | 17:43:13 UTC
Last modified: 29 Jan 2012 | 17:44:22 UTC

Validator is borked, so will see lots of "Waiting for Validation" as opposed to "validation inconclusive" (which means wait for a wingman) or Validate error.

Depends which you are getting but as pointed out cant see them as you have hidden computers.

Regards
Zy

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,704,253
RAC: 0
Message 52734 - Posted: 29 Jan 2012 | 19:38:17 UTC - in response to Message 52716.

There seem to be an increasing number of cases where the ATI stuff doesn't validate against CPU ones; I'm looking into it. This doesn't happen in any of my current tests but I do see people getting some where it does happen.

Profile Paul John
Send message
Joined: 1 Jan 12
Posts: 6
Credit: 178,436
RAC: 316
Message 52747 - Posted: 30 Jan 2012 | 1:19:07 UTC - in response to Message 52727.

Validator is borked, so will see lots of "Waiting for Validation" as opposed to "validation inconclusive" (which means wait for a wingman) or Validate error.

Depends which you are getting but as pointed out cant see them as you have hidden computers.

Regards
Zy


Thanks for that. I didn't know my system was hidden. How do I unhide it? I'm off-line a lot as I have to use a dongle at the moment due to no phone line.
Cheers.

Paul

Profile Beyond
Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,389
RAC: 4
Message 52748 - Posted: 30 Jan 2012 | 6:44:58 UTC - in response to Message 52747.

Thanks for that. I didn't know my system was hidden. How do I unhide it? I'm off-line a lot as I have to use a dongle at the moment due to no phone line.
Cheers. Paul

In your account page go to: "Preferences for this project - MilkyWay@Home preferences" and set:
"Should MilkyWay@Home show your computers on its web site?" to "yes"

Profile Paul John
Send message
Joined: 1 Jan 12
Posts: 6
Credit: 178,436
RAC: 316
Message 52750 - Posted: 30 Jan 2012 | 13:23:42 UTC - in response to Message 52748.

Many thanks, changed and Updated my preferences.

Len LE/GE
Send message
Joined: 8 Feb 08
Posts: 232
Credit: 86,879,903
RAC: 43,187
Message 52753 - Posted: 31 Jan 2012 | 0:40:43 UTC - in response to Message 52750.

Many thanks, changed and Updated my preferences.


One invalid left in your list and it's one of those cases Matt mentioned: CPU against GPU (ATI and NVIDIA validated, CPU lost)

Profile Beyond
Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,389
RAC: 4
Message 52769 - Posted: 1 Feb 2012 | 18:20:25 UTC - in response to Message 52585.

In some cases the stderr from BOINC is truncated or missing completely. I think I mostly fixed the problem for future updates

I've been watching the invalid count and it seems to cycle. When the invalids on one machine goes up so do the others. They also seem to go down more or less in sync. Could it be a problem with the way some WUs are formed? Could you also post a fixed ATI .exe version so we can test it with an app_info.xml?

Profile valterc
Send message
Joined: 28 Aug 09
Posts: 18
Credit: 68,115,027
RAC: 42,946
Message 52774 - Posted: 2 Feb 2012 | 11:39:22 UTC - in response to Message 52585.
Last modified: 2 Feb 2012 | 11:40:30 UTC

In some cases the stderr from BOINC is truncated or missing completely. I think I mostly fixed the problem for future updates


I also have, sometime, this kind of error (empty stderr)
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<stderr_txt>

</stderr_txt>
]]>

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,704,253
RAC: 0
Message 52777 - Posted: 2 Feb 2012 | 17:35:26 UTC - in response to Message 52769.

In some cases the stderr from BOINC is truncated or missing completely. I think I mostly fixed the problem for future updates

I've been watching the invalid count and it seems to cycle. When the invalids on one machine goes up so do the others. They also seem to go down more or less in sync. Could it be a problem with the way some WUs are formed? Could you also post a fixed ATI .exe version so we can test it with an app_info.xml?
The problem there is that the stderr is redirected to a file which isn't necessarily flushed to disk by the time that the BOINC client reads it.

There are still a few more things I want to take care of before actual release:
http://milkyway.cs.rpi.edu/milkyway/download/beta/separation_0.96/

288larsson
Avatar
Send message
Joined: 8 Dec 09
Posts: 5
Credit: 138,227,415
RAC: 199,931
Message 52778 - Posted: 3 Feb 2012 | 0:00:55 UTC - in response to Message 52777.

There are still a few more things I want to take care of before actual release:
http://milkyway.cs.rpi.edu/milkyway/download/beta/separation_0.96/

Looks promising on speed. Test with a HD7970 up to 25% faster:))

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,346,915
RAC: 284,602
Message 52779 - Posted: 3 Feb 2012 | 0:30:23 UTC

Whoa ..... Matt ..... Well Done!

That rewrote my OpenCL expectations .... I had no idea it could go like that.

Its running real good as above circa +25% on a 7970

Brilliant stuff, my grateful thanks .... *Tips Hat* :)

Regards
Zy

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,704,253
RAC: 0
Message 52780 - Posted: 3 Feb 2012 | 0:49:29 UTC - in response to Message 52778.

There are still a few more things I want to take care of before actual release:
http://milkyway.cs.rpi.edu/milkyway/download/beta/separation_0.96/

Looks promising on speed. Test with a HD7970 up to 25% faster:))
Erm what? The only boost I noticed was only a few percent from a minor change I made.

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,346,915
RAC: 284,602
Message 52781 - Posted: 3 Feb 2012 | 1:05:20 UTC
Last modified: 3 Feb 2012 | 1:11:18 UTC

I got the same ..... +25% on the 7970s. I was sending them back at around 88secs with two WU per card. I kept the same settings (1235/1375) for this version, was running a little hotter, so dialled it back for now so I didnt introduce other factors into any errors.

Result ... no errors or invalids so far, the cards are pushing them out at 65 secs for 2 WU per card at 1225/1375, and currently running at 78 and 82 degrees for the two cards.

All have validated.

Brilliant Job :)

(AMD Driver I am using is the RC11 release candidate driver for 79XX for 64bit, and BOINC 7.0.11)

Regards
Zy

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,346,915
RAC: 284,602
Message 52784 - Posted: 3 Feb 2012 | 1:37:16 UTC

All the cache et al mechanisms have now kicked in as its been running a while.

Its now at 63 seconds per card with 2 WUs on each card, running at 1225/1375

Regards
Zy

STE\/E
Send message
Joined: 29 Aug 07
Posts: 486
Credit: 572,432,344
RAC: 11
Message 52792 - Posted: 3 Feb 2012 | 9:36:56 UTC

Same here, 62-65 Sec's per Wu running 4 @ a time on 2 7970's, 2 each GPU ... Thanks
____________
STE\/E

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,346,915
RAC: 284,602
Message 52797 - Posted: 3 Feb 2012 | 15:04:16 UTC

Just posting an observation ... not sure what to make of it yet. I am running two WUs per card on Beta 0.96 OpenCL, and in the strictist sense thats outside normal parameters of an app ... maybe no this time, dont know formally ... so this could be a comment out of scope.

With the Beta 0.96 I have noticed fluctuation in runtimes of 6 seconds. At present it seems likely ... though not yet absolutely certain ... it happens when I open a browser on the 7970 machine, surf a ilttle, then close (rare, normally the machine is left alone when the wife isnt using it). On each occasion, after I shut the browser, instantly the GPU attached to the screen starts to change behaviour, and on the graph, large troughs appear when WUs load and unload. It takes a while - up to 3-6 hours for it gradually to return to a flat line useage graph steady at 99%.

All the while this is happening, GPU2, not attached to the monitor, stays flat lined at 99%, and doesnt even blip when loading/unloading WUs. The results on time to completion reflect that. Card one (attatched to the monitor) is currently 65 seconds for 2 WUs, Card 2 is currently 59secs for 2WUs. When the Card1 flat lines in a few hours, both will be 59 secs, fairly certain of that. Its a behaviour similar to that I saw in 0.82.

Not sure what to conclude, hence just reporting a slightly strange behaviour. That card1 will be affected by useage on screen is clear, but to the extent of slipping 6 seconds is a little strange. Could well be "you get what you get" when mixing useage of the machine, and that would be a fair comment.

Anyway .... for what its worth, there it is. If there is a mechanism to prevent such a dip as seen in card1 after a browser use - for such a long period - that would be good.

Regards
Zy

Vortac
Send message
Joined: 22 Apr 09
Posts: 3
Credit: 106,919,630
RAC: 0
Message 52799 - Posted: 4 Feb 2012 | 4:00:32 UTC - in response to Message 52797.
Last modified: 4 Feb 2012 | 4:02:50 UTC

I am now seriously considering a purchase of 2x7970. What are the run times and GPU utilization with one MW 0.96 beta task per 7970 GPU?

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 608
Credit: 85,346,915
RAC: 284,602
Message 52801 - Posted: 4 Feb 2012 | 11:15:57 UTC - in response to Message 52799.
Last modified: 4 Feb 2012 | 11:28:25 UTC

Not tried it, after initial tests I went straight to two per GPU - worst case divide my results by two, so you are looking at around 31-32 secs per WU. Utilisation is always 99% with the current Beta App running two per card - wait and see final release version, cant really predict utilisation until then, but looking at v0.82 and v0.96 it is high 97-99% is the norm. Utilisation is wholely dependent on the application, so for that, wait for final release version.

Currently I am recovering from a 9 hour ISP outage so its not yet settled back as such, at present its running at 65 seconds per card with two WUs running per card. It will come down a couple of seconds by the time it settles. I am slightly over volted, basic settings are:

GPU: 1220 (default 925, max inside CCC 1125, max known external to BOINC o/c stable 1260)
Memory: 1375 (default 1375, not worth messing with memory on 7970 at MW leave it at default)
Voltage: 1.218v (default 1.174v, card max 1.3v, max known external to BOINC o/c @1.3v for stable 1260 GPU)

I know another dual 7970 on MW who is "stuck" at 71/73 secs per card with two WU per card, for some reason it wont go below that on his with same voltage as mine. Bare in mind all my timings and the other example given is on Matts Beta OpenCL v0.96, which is not yet 100% ready for Prime Time.

In an overall crude rule of thumb you are looking at the speed of a 6970 +85% when the 7970 is run @1220/1375 and 1.218v with single WU. Without overvolting I would estimate it would be +/- 38 seconds per WU per single card on V0.96, thats roughly a 6970 +70% (extrapolated estimate dont hold me to that as such - it was 52 seconds for the 7970 pre 0.96, and 0.96 is a vast improvement by +25% ).

Dont make assumptions re overvolting and o/c above 1.218v/1220 because above 1220/1230, at present, whilst it is possible to go faster its real hard going and the additional voltage is not worth it. Between 1220 and 1250 you might shave another couple of secs per WU, but voltage increase to get there is way too high in my view for 7x24. So I would plan on 1220/1.218v as a yardstick, and if you get higher and happy with voltage, great stuff treat it as a "bonus".

I stress .... Beta app, still water to go under the bridge, and may all change by the time its on general release.

Regards
Zy

Post to thread

Message boards : Number crunching : Validate Error


Main page · Your account · Message boards


Copyright © 2013 AstroInformatics Group