Message boards :
Number crunching :
Many "errors while computing" in the stats!
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Mar 09 Posts: 58 Credit: 1,129,612 RAC: 0 |
When I browse into my tasks, what I see is that more or less every work unit has at least one "Error while computing" in the hosts processing them (often/always after 0.00 seconds). From your side, do you see so many wrong configs? What could it be? On my side, milkyway has an history of full reliability (ati app), never trashed any wu if not intentionally. |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
When did the errors start - reason for asking is that version 0.23 came out recently, and it was harder on the GPU than previously and ran hotter. Some with cards o/c right to the edge will have been pushed over the edge by the change. That will be caused by too high a gpu clocks mhz rate and/or no high enough overvolting depending on the o/c being done. If you do o/c, try a session for a couple of hours at stock speed, see what happens. Along the lines of the new version, have you detatched and reattached via BOINCstats Hosts? That will make sure you have all the correct uptodate files coming in. Are other Projects behaving ? Might have a memory stick going on the mainboard, unlikely for sure, but they do go every now and then albeit rarely. Regards Zy |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
That's exactly what I have been getting for days but I may have solved it. I was running the enhanced app 0.20b I've now upgraded to 0.22. I was also running some earlier versions of Boinc Manager (6.4.7) so I took out the <co-proc> lines in the app_info file. Since then I updated to 6.10.43 but forgot to put the lines back. I've now done so. As I'm running Catalyst 8.12 for stability, I also made sure I downloaded the AMD versions of enhanced apps. It seems to have worked so far (2 hours) we will see. If the CPU and GPU workunits are different then it could have been that the GPU was running the wrong ones. Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
I was running the enhanced app 0.20b I've now upgraded to 0.22. I was also running some earlier versions of Boinc Manager (6.4.7) so I took out the <co-proc> lines in the app_info file. Since then I updated to 6.10.43 but forgot to put the lines back. I've now done so. As I'm running Catalyst 8.12 for stability, I also made sure I downloaded the AMD versions of enhanced apps. v0.23 is current. Earlier versions will give incorrect results especially with 58xx/59xx cards. Also try v10.3 drivers, the stability issues were solved long ago AFAIK. http://milkyway.cs.rpi.edu/milkyway/apps.php |
Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0 |
When did the errors start - reason for asking is that version 0.23 came out recently, and it was harder on the GPU than previously and ran hotter. Some with cards o/c right to the edge will have been pushed over the edge by the change. That will be caused by too high a gpu clocks mhz rate and/or no high enough overvolting depending on the o/c being done. I don't know where you got the impression that the "new" app works the gpu any harder than the previous one. They're the same. The only thing that "changed" was the handling of the streams in regard to the 58xx series. That's all.
I don't think it's to much OC either. I suspect the 10.3 drivers to be quite buggy/unstable and of no use here. I had something similar happening here where 1.4.556 computed only garbage and after downgrading to 10.2 everything is fine again. Join Support science! Joinc Team BOINC United now! |
Send message Joined: 16 Mar 09 Posts: 58 Credit: 1,129,612 RAC: 0 |
maybe I expressed myself wrongly. I do not have any problem! In fact, you can see that the few wu that I do on my 4870 are all done well, using stock app and without OC. I was only exposing the fact that many (all?) of my wus have wingmen with problems. In many if not all there's someone that has errored out in no time. Those are machines that maybe are really stressful on the database, am I wrong? It was my curiosity to ask how could it be that there're so many hosts doing bad work (or, saying better, no work at all, because they often error out after 0.00 seconds of computation, mainly on CUDA and ATi app). Is it clearer now? |
Send message Joined: 6 Mar 09 Posts: 41 Credit: 38,856,291 RAC: 0 |
I clocked about 50 MHz back in order to keep the app stable; and still the GPU runs 4 to 5 degrees hotter than before... |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
I had to increase the fan by 5% when I started the new 0.23 as it had jumped to running at 89-91 degrees. It now back running at 81-83 degrees. Regards Zy |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
maybe I expressed myself wrongly. I'm not seeing very many errors on my end. Also, the number of invalids the server is seeing is down to < 2% (including results that just error out). Swapping to the new application should drop this even farther. |
Send message Joined: 16 Mar 09 Posts: 58 Credit: 1,129,612 RAC: 0 |
I'm not seeing very many errors on my end. Also, the number of invalids the server is seeing is down to < 2% (including results that just error out). Swapping to the new application should drop this even farther. good! thanks a lot! |
Send message Joined: 11 Nov 07 Posts: 232 Credit: 178,229,009 RAC: 0 |
I had to increase the fan by 5% when I started the new 0.23 as it had jumped to running at 89-91 degrees. It now back running at 81-83 degrees. Same experience here. The temp. is someting like +5 - +10 Celsius up with new 0.23 apps. but my new PowerColor Radeon HD 5870 1GB LCS will arrive to day Fry on! :) |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
Many thanks for the responce there Beyond, appreciated. I've checked again this morning and so far my 7 boxes with cards are reporting and vaildating OK. Crunch3r feels that the 10.3 drivers are buggy and thinks 10.2 are better. I'd be quite happy to upgrade to 0.23 but it needs a driver 9.3 or above. Any consensus out there as to which catalyst driver 9.3 or above is now stable? The other problem I had was that some of the boxes have AGP cards and I had to to use the 8.12 AGP hotfix version to get them to work. Is there a hotfix for AGP above 9.3? Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
I've used 10.3 on a 5970 for a couple of weeks now, been stable for me, but clearly thats not an indication it works for all as everyone has their own specific circumstances that the driver responds to. I dont know the situation re AGP on 10.3, if 8.12 had a AGP hotfix I would have thought that by now it would have been incorporated into the main driver, however the latter is a guess only. Give 10.2 or 10.3 a whirl on one AGP & one 5xxx/4xxx box see how it goes for 24 hrs as a test, would be a good thing overall to run all the boxes on a 10.xxx series driver in the longer term. Regards Zy |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
Thanks Zydor, good advice, I'll upgrade a couple of boxes and see how I get on. Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
With the recent MW outrage I took the time to move my P4 with the 3850 AGO from an old box to a newer one and while I was at it I upgraded it to 10.3 - which didn't take, so I tried the 10.3 hotfix and it worked a treat. |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
With the recent MW outrage I took the time to move my P4 with the 3850 AGO from an old box to a newer one and while I was at it I upgraded it to 10.3 - which didn't take, so I tried the 10.3 hotfix and it worked a treat. That is just what I wanted to hear! Well done GG, I owe you a beer! :-) Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 |
I have been running 10.3 since before it officially came out, but I needed it as the original drivers were 10.2 and did not completely support the card. |
Send message Joined: 2 Jan 08 Posts: 122 Credit: 69,479,692 RAC: 1,456 |
I have been away from Milkyway for a while as I was doing DNETC but it started to run "xfer_error" errors at the rate of 25%. So I switched back to Milky to test the cards. Not a good move unfortunately, as all I kept getting were VPU reset errors, I did not get those before on any project and was not getting them when I last left. Updated to the latest 0.23 version but nothing changed, as soon as I try to do some work on the same Windows computer that is running Milky the computer freezes then a VPU errors reset. Only the one card locks up and a suspend/resume will allow the WU to start again with a much longer run time as the process time does not stop only percent done stops. I have now switched to Collatz and after again updating the app version, it is running great with no hassles and I can use the computer at the same time without it freezing. I am using two 4800 cards, heat does not appear a problem and the cards are standard. So I am at a loss as to why I can no longer run Milkyway |
Send message Joined: 17 Mar 08 Posts: 165 Credit: 410,228,216 RAC: 0 |
The newer wu's seem to run hotter for me so I lowered my GPU mem speed and all is well now. Lowering the mem speed did not effect my output very much. |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
OK, over the w/e I upgraded an HD4850 box to catalyst 9.3 and the 0.23 ati app and all seems well so far. I'll give it a couple more days and then do some others. Haven't noticed any heat issues, but that's probably because I run Akasa Vortexx Neo after-market coolers :-) Don't drink water, that's the stuff that rusts pipes |
©2024 Astroinformatics Group