Welcome to MilkyWay@home

Validate Error

Message boards : Number crunching : Validate Error
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 52769 - Posted: 1 Feb 2012, 18:20:25 UTC - in response to Message 52585.  

In some cases the stderr from BOINC is truncated or missing completely. I think I mostly fixed the problem for future updates

I've been watching the invalid count and it seems to cycle. When the invalids on one machine goes up so do the others. They also seem to go down more or less in sync. Could it be a problem with the way some WUs are formed? Could you also post a fixed ATI .exe version so we can test it with an app_info.xml?
ID: 52769 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile valterc

Send message
Joined: 28 Aug 09
Posts: 23
Credit: 1,265,963,340
RAC: 123,734
Message 52774 - Posted: 2 Feb 2012, 11:39:22 UTC - in response to Message 52585.  
Last modified: 2 Feb 2012, 11:40:30 UTC

In some cases the stderr from BOINC is truncated or missing completely. I think I mostly fixed the problem for future updates


I also have, sometime, this kind of error (empty stderr)
<core_client_version>6.10.60</core_client_version>
<![CDATA[
<stderr_txt>

</stderr_txt>
]]>
ID: 52774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 52777 - Posted: 2 Feb 2012, 17:35:26 UTC - in response to Message 52769.  

In some cases the stderr from BOINC is truncated or missing completely. I think I mostly fixed the problem for future updates

I've been watching the invalid count and it seems to cycle. When the invalids on one machine goes up so do the others. They also seem to go down more or less in sync. Could it be a problem with the way some WUs are formed? Could you also post a fixed ATI .exe version so we can test it with an app_info.xml?
The problem there is that the stderr is redirected to a file which isn't necessarily flushed to disk by the time that the BOINC client reads it.

There are still a few more things I want to take care of before actual release:
http://milkyway.cs.rpi.edu/milkyway/download/beta/separation_0.96/
ID: 52777 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
288larsson
Avatar

Send message
Joined: 8 Dec 09
Posts: 14
Credit: 902,727,796
RAC: 0
Message 52778 - Posted: 3 Feb 2012, 0:00:55 UTC - in response to Message 52777.  

There are still a few more things I want to take care of before actual release:
http://milkyway.cs.rpi.edu/milkyway/download/beta/separation_0.96/

Looks promising on speed. Test with a HD7970 up to 25% faster:))
ID: 52778 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 52779 - Posted: 3 Feb 2012, 0:30:23 UTC

Whoa ..... Matt ..... Well Done!

That rewrote my OpenCL expectations .... I had no idea it could go like that.

Its running real good as above circa +25% on a 7970

Brilliant stuff, my grateful thanks .... *Tips Hat* :)

Regards
Zy
ID: 52779 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 52780 - Posted: 3 Feb 2012, 0:49:29 UTC - in response to Message 52778.  

There are still a few more things I want to take care of before actual release:
http://milkyway.cs.rpi.edu/milkyway/download/beta/separation_0.96/

Looks promising on speed. Test with a HD7970 up to 25% faster:))
Erm what? The only boost I noticed was only a few percent from a minor change I made.
ID: 52780 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 52781 - Posted: 3 Feb 2012, 1:05:20 UTC
Last modified: 3 Feb 2012, 1:11:18 UTC

I got the same ..... +25% on the 7970s. I was sending them back at around 88secs with two WU per card. I kept the same settings (1235/1375) for this version, was running a little hotter, so dialled it back for now so I didnt introduce other factors into any errors.

Result ... no errors or invalids so far, the cards are pushing them out at 65 secs for 2 WU per card at 1225/1375, and currently running at 78 and 82 degrees for the two cards.

All have validated.

Brilliant Job :)

(AMD Driver I am using is the RC11 release candidate driver for 79XX for 64bit, and BOINC 7.0.11)

Regards
Zy
ID: 52781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 52784 - Posted: 3 Feb 2012, 1:37:16 UTC

All the cache et al mechanisms have now kicked in as its been running a while.

Its now at 63 seconds per card with 2 WUs on each card, running at 1225/1375

Regards
Zy
ID: 52784 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 29 Aug 07
Posts: 486
Credit: 576,512,666
RAC: 37,574
Message 52792 - Posted: 3 Feb 2012, 9:36:56 UTC

Same here, 62-65 Sec's per Wu running 4 @ a time on 2 7970's, 2 each GPU ... Thanks
STE\/E
ID: 52792 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 52797 - Posted: 3 Feb 2012, 15:04:16 UTC

Just posting an observation ... not sure what to make of it yet. I am running two WUs per card on Beta 0.96 OpenCL, and in the strictist sense thats outside normal parameters of an app ... maybe no this time, dont know formally ... so this could be a comment out of scope.

With the Beta 0.96 I have noticed fluctuation in runtimes of 6 seconds. At present it seems likely ... though not yet absolutely certain ... it happens when I open a browser on the 7970 machine, surf a ilttle, then close (rare, normally the machine is left alone when the wife isnt using it). On each occasion, after I shut the browser, instantly the GPU attached to the screen starts to change behaviour, and on the graph, large troughs appear when WUs load and unload. It takes a while - up to 3-6 hours for it gradually to return to a flat line useage graph steady at 99%.

All the while this is happening, GPU2, not attached to the monitor, stays flat lined at 99%, and doesnt even blip when loading/unloading WUs. The results on time to completion reflect that. Card one (attatched to the monitor) is currently 65 seconds for 2 WUs, Card 2 is currently 59secs for 2WUs. When the Card1 flat lines in a few hours, both will be 59 secs, fairly certain of that. Its a behaviour similar to that I saw in 0.82.

Not sure what to conclude, hence just reporting a slightly strange behaviour. That card1 will be affected by useage on screen is clear, but to the extent of slipping 6 seconds is a little strange. Could well be "you get what you get" when mixing useage of the machine, and that would be a fair comment.

Anyway .... for what its worth, there it is. If there is a mechanism to prevent such a dip as seen in card1 after a browser use - for such a long period - that would be good.

Regards
Zy
ID: 52797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 52799 - Posted: 4 Feb 2012, 4:00:32 UTC - in response to Message 52797.  
Last modified: 4 Feb 2012, 4:02:50 UTC

I am now seriously considering a purchase of 2x7970. What are the run times and GPU utilization with one MW 0.96 beta task per 7970 GPU?
ID: 52799 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 52801 - Posted: 4 Feb 2012, 11:15:57 UTC - in response to Message 52799.  
Last modified: 4 Feb 2012, 11:28:25 UTC

Not tried it, after initial tests I went straight to two per GPU - worst case divide my results by two, so you are looking at around 31-32 secs per WU. Utilisation is always 99% with the current Beta App running two per card - wait and see final release version, cant really predict utilisation until then, but looking at v0.82 and v0.96 it is high 97-99% is the norm. Utilisation is wholely dependent on the application, so for that, wait for final release version.

Currently I am recovering from a 9 hour ISP outage so its not yet settled back as such, at present its running at 65 seconds per card with two WUs running per card. It will come down a couple of seconds by the time it settles. I am slightly over volted, basic settings are:

GPU: 1220 (default 925, max inside CCC 1125, max known external to BOINC o/c stable 1260)
Memory: 1375 (default 1375, not worth messing with memory on 7970 at MW leave it at default)
Voltage: 1.218v (default 1.174v, card max 1.3v, max known external to BOINC o/c @1.3v for stable 1260 GPU)

I know another dual 7970 on MW who is "stuck" at 71/73 secs per card with two WU per card, for some reason it wont go below that on his with same voltage as mine. Bare in mind all my timings and the other example given is on Matts Beta OpenCL v0.96, which is not yet 100% ready for Prime Time.

In an overall crude rule of thumb you are looking at the speed of a 6970 +85% when the 7970 is run @1220/1375 and 1.218v with single WU. Without overvolting I would estimate it would be +/- 38 seconds per WU per single card on V0.96, thats roughly a 6970 +70% (extrapolated estimate dont hold me to that as such - it was 52 seconds for the 7970 pre 0.96, and 0.96 is a vast improvement by +25% ).

Dont make assumptions re overvolting and o/c above 1.218v/1220 because above 1220/1230, at present, whilst it is possible to go faster its real hard going and the additional voltage is not worth it. Between 1220 and 1250 you might shave another couple of secs per WU, but voltage increase to get there is way too high in my view for 7x24. So I would plan on 1220/1.218v as a yardstick, and if you get higher and happy with voltage, great stuff treat it as a "bonus".

I stress .... Beta app, still water to go under the bridge, and may all change by the time its on general release.

Regards
Zy
ID: 52801 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Validate Error

©2024 Astroinformatics Group