Welcome to MilkyWay@home

Computation errors.

Message boards : Number crunching : Computation errors.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60278 - Posted: 2 Nov 2013, 19:14:58 UTC - in response to Message 60276.  

Yea that was my thought as well Richard. I've seen units get flagged as out of date when updating and simply replaced, these units started and ran for at least 1 second each before failing. upon updating the client and getting new units issued, these all also failed. Definitely nothing to do with juts not getting fresh units from the project after a major update
ID: 60278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60280 - Posted: 2 Nov 2013, 20:06:36 UTC - in response to Message 60278.  

richard i'm hunting for something specific which i can't find currently on the E@H forums and noticed you've posted to a lot of the threads im browsing through, maybe you can help me find the fix i need. Over on anandtech one of our teammates has been trying to help figure out whats up as well and posted this

i was not able to dig up what i was looking for, but i'm pretty sure it can be found by searching the Einstein@Home forums. i seem to recall that parts of the OpenCL library were left out of a specific Catalyst driver version (or versions), though i cannot recall which version(s) off the top of my head...and i may be remebering things inaccurately, so alternatively it might have been the APP SDK that was left out of a particular Catalyst driver version (or versions). IIRC, this caused problems for AMD 5xxx series GPUs, but not AMD 6xxx series GPUs and up. unfortunately i'll be going out of town for the weekend right after i get out of work this afternoon, so i won't be able to poke around the Einstein@Home forums to look for possible problems and solutions until next week. in the mean time, i suggest you skim over the E@H message boards while i'm away (assuming you haven't already solved the problem via a fresh driver install...after all, it seems like you're onto something having recalled that the GTX 260 used to be installed on that machine...let us know how that goes).


you should probably refer to the thread for context. i posted the link earlier but here it is again. http://forums.anandtech.com/showthread.php?t=2350667.
ID: 60280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 60281 - Posted: 2 Nov 2013, 22:13:35 UTC - in response to Message 60280.  

OK. The best I can do.

I've never run BOINC on an AMD/ATI GPU, but I read a lot - and from what I read, AMD/ATI driver support is all over the place. Components come and go, sometimes with an announcement and sometimes silently. Some projects can write applications that run under practically any driver, some projects write applications which are fussy and only run under one driver (or group of drivers), other projects write applications which run under other drivers. And sometimes - for some reason the numbers 13.4 stick in my mind - there's a driver which is on the 'must have' list for one project, and on the 'can't get it to run at all' list for another project.

And that's before we even get onto the subject of operating systems...

One thing I can state for certain: for Windows XP, OpenCL was removed from driver 12.2 (February 2012 - that's what the numbers mean) and later, although the download page at AMD claimed it was still included. I did the original research for that one, and the data is still on this hard disk:

 Directory of C:\AMD\Support\12-1_xp32_dd_ccc\Packages\Apps

17/05/2012  16:51    <DIR>          .
17/05/2012  16:51    <DIR>          ..
17/05/2012  16:50    <DIR>          ATIPCE
17/05/2012  16:50    <DIR>          CCC
17/05/2012  16:51    <DIR>          CIM
17/05/2012  16:51    <DIR>          dotnetfx
17/05/2012  16:51    <DIR>          OpenCL
               0 File(s)              0 bytes
               7 Dir(s)  43,120,893,952 bytes free

 Directory of C:\AMD\Support\12-2_xp32_dd_ccc\Packages\Apps

29/06/2012  09:36    <DIR>          .
29/06/2012  09:36    <DIR>          ..
29/06/2012  09:36    <DIR>          ATIPCE
29/06/2012  09:36    <DIR>          CCC
29/06/2012  09:36    <DIR>          CIM
29/06/2012  09:36    <DIR>          dotnetfx
               0 File(s)              0 bytes
               6 Dir(s)  43,120,893,952 bytes free

Spot the difference ;)

Any machine that you've ever run an AMD driver installation on will have a similar "C:\AMD\Support\..." folder tree (IIRC, they don't let you unpack the download to any disk other than C:), and you can search it for 'OpenCL.msi'

Many people in BOINC-land advise that you remove all traces of all AMD drivers before you change to a different one. AMD seem to have heard them, and produced a driver removal tool. The only official-looking link Google could find for me is http://sites.amd.com/us/game/downloads/Pages/catalyst-uninstall-utility.aspx, but that's asking me for a user/password login, which I don't have. Maybe somebody else here does?

Another useful resource, especially for matching up the various version numbers of the various internal components and tying them back to the simple year.month Catalyst identifier is HAL 9000's ATI Driver Version Cheat Sheet.

One extremely common and simple question which you posted over at Anandtech is 'the card is detecting but is listed as "not used"'. With two dis-similar GPUs in the same computer (from the same 'family'), BOINC will by default only use the 'better' card, and will 'not use' the lesser card - that's been policy for a long, long time. You can over-ride it by setting the <use_all_gpus> option described in client configuration.
ID: 60281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 60287 - Posted: 3 Nov 2013, 11:58:54 UTC - in response to Message 60276.  

ill try this if a fix is found for the errors. right now updating the client to that version is pointless anyways because doing so made my 6970 instantly fail all GPU work units anyways making updating a moot point. At least with the old version i can run both congruently and run some of the GPU work units. And the card isn't technically disabled, its showing up as enabled in device manager and in drivers the same with both versions. The version switch was also done without a reboot (i rebooted to see if the card would start getting them, but when i switched back to the old version it instantly started working so i didn't bother rebooting again, making the first reboot redundant), telling me the card was detecting fine to begin with, BOINC was just disabling it as a compute card

Of course they failed, they were downloaded under an older version of Boinc and since you are now using a newer version any existing units WILL ALWAYS fail, this is just one of the many security checks Boinc has. Most projects just resend you the same units back again though under the new version. That's one reason you either just accept that it's going to happen, or set the pc to no new tasks prior to updating.


Sorry, that's one thing you need to UN-learn today.


DONE, THANK YOU for setting me straight!!
ID: 60287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 60290 - Posted: 3 Nov 2013, 12:31:28 UTC - in response to Message 60281.  
Last modified: 3 Nov 2013, 12:40:25 UTC

Any machine that you've ever run an AMD driver installation on will have a similar "C:\AMD\Support\..." folder tree (IIRC, they don't let you unpack the download to any disk other than C:), and you can search it for 'OpenCL.msi'


I think that explains why I am failing all opencl tasks on one of my gtx460 machines while another similar 460 runs opencl fine. I once had an ATI video board on it. After reading Richards ATI info post, I uninstalled the AMD catalyst set using "express uninstall manager" and rebooted but it seems I have the same problem:

Before rebooting there were 7 or so MW tasks that had compute error and one that had not started yet as I had managed to suspend the project. After rebooting I resumed MW project and the tasks started up and failed immediately. An additional task was downloaded and it also failed.

It would appear that there is still some opencl ATI driver somewhere causing a problem. This system is a old Tyan S2892 server with onboard ATI video (unused and disabled via jumper) with two gtx460 and win7x64pro. It would appear I will have to use some driver cleaner to get rid of ATI opencl. Alternately, there is some other problem. Maybe a guru can spot another problem here

EDIT - Collatz and Prime run fine on this system but are not using opencl.
ID: 60290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 60293 - Posted: 3 Nov 2013, 15:53:35 UTC - in response to Message 60290.  

Any machine that you've ever run an AMD driver installation on will have a similar "C:\AMD\Support\..." folder tree (IIRC, they don't let you unpack the download to any disk other than C:), and you can search it for 'OpenCL.msi'


I think that explains why I am failing all opencl tasks on one of my gtx460 machines while another similar 460 runs opencl fine. I once had an ATI video board on it. After reading Richards ATI info post, I uninstalled the AMD catalyst set using "express uninstall manager" and rebooted but it seems I have the same problem:

Before rebooting there were 7 or so MW tasks that had compute error and one that had not started yet as I had managed to suspend the project. After rebooting I resumed MW project and the tasks started up and failed immediately. An additional task was downloaded and it also failed.

It would appear that there is still some opencl ATI driver somewhere causing a problem. This system is a old Tyan S2892 server with onboard ATI video (unused and disabled via jumper) with two gtx460 and win7x64pro. It would appear I will have to use some driver cleaner to get rid of ATI opencl. Alternately, there is some other problem. Maybe a guru can spot another problem here

EDIT - Collatz and Prime run fine on this system but are not using opencl.


To get rid of the last remnants try 'driversweeper', or 'driverfusion' both available here http://darkryder.com/#pricing_modified on the bottom right hand side of the page.
ID: 60293 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60297 - Posted: 3 Nov 2013, 20:06:39 UTC - in response to Message 60293.  

i ran driversweeper and i am still having problems on my ATI system. as of the last update i posted where i tried to fix it i had already run driversweeper on the system twice.
ID: 60297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 60299 - Posted: 3 Nov 2013, 20:26:23 UTC - in response to Message 60297.  

i ran driversweeper and i am still having problems on my ATI system. as of the last update i posted where i tried to fix it i had already run driversweeper on the system twice.


The webpage I gave you says that 'driver fusion' is the newer version, maybe you could try that?
ID: 60299 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60300 - Posted: 3 Nov 2013, 20:32:50 UTC - in response to Message 60299.  
Last modified: 3 Nov 2013, 20:33:50 UTC

i will investigate it. i was about to sit down for a session of DJing before my family gets home so i can actually run my full 1000w set of monitors and subs since they were supposed to get home today but it sounds like they're gonna be a few hours. I also set my work unit cache for BOINC to be 10 days worth and deleted all the offending units issued before they could start until i only had a cache of separation (modified fit) units to run for the next couple hours. I'll watch it closely until i can tinker with it because that rig's RAC is really suffering, it's still going up but it's going up at a much slower rate relative to the rigs computational power in comparison vs my 2x 7950s. that 5-7s + task switch time is adding up really fast on all these failed units :(
ID: 60300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 60301 - Posted: 3 Nov 2013, 20:42:52 UTC - in response to Message 60300.  

Please don't download masses of tasks, only to abort them later - that really is unkind to the project and other participants.

Use the tools provided on MilkyWay@Home preferences so you only download tasks for the application(s) you're interested in.
ID: 60301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60303 - Posted: 4 Nov 2013, 2:04:03 UTC - in response to Message 60301.  

they're getting downloaded and turned into errors anyways, and my other rig is running them fine. is there a way to set this on a client by client basis?
ID: 60303 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 60304 - Posted: 4 Nov 2013, 12:50:19 UTC - in response to Message 60303.  
Last modified: 4 Nov 2013, 12:50:57 UTC

they're getting downloaded and turned into errors anyways, and my other rig is running them fine. is there a way to set this on a client by client basis?


Yes there is go onto the MilkyWay webpage under your account and then under preferences for this project you will see the default settings. If you scroll down that page you will see that you can setup settings for other groups too, ie work, home and school each with it's own special and unique settings. After you set them up to your liking, you can use one of all of the groups, you go back into your account and then to computers on this account and click on the details of each pc, at the bottom of that page you can will see a small box with those same home, work and school options on a drop down box. The dash means default, chose the one you setup for that pc and that pc will start using those settings as of it's next connection to the project.
ID: 60304 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60306 - Posted: 4 Nov 2013, 22:29:50 UTC - in response to Message 60304.  

awesome. i set it so it's running like this. been busy catching up on things i have had to do that piled up while i was sick, hoping to look into why its actually failing tonight. over on anandtech we have a working model of the problem and some possible fixes laid out. turns out not all versions of the AMD drivers have open_cl drivers and we think that something required to run the units is missing from whatever driver version I'm currently using (and which was destroyed in my initial reinstall which would explain my 5870 no longer working like it did with the units). I'm gonna do the footwork and find a driver version that works, figure out whats special about it if i can, and if there are any others that also work. the 5870/6970 combo (or any combo of 58xx/69xx cards) is probably going to become really common as anyone that didn't already upgrade to 79xx is probably looking at upgrading to an R9 card now and there's a ton of cards up for sale in my area. Just the other day i saw someone locally selling 15 6970s for $100 a pop and i was sitting there salvating over the WUs that many cards would produce for such a price. It'll be important to make sure people buying these old parts know which drivers are needed to run the tasks until the project finds a work around or AMD stops slacking on driver support (assuming that's the issue). I'm also looking at just foregoing windows computing possibly and installing a linux boot/VM to compute in since performance and driver support both seem to be better there right now especially for GPU compute simply because of the lack of overhead. ill take a look at this option later though since i have other things to fix and deal with first before any upgrades like that can be made.
ID: 60306 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 60307 - Posted: 5 Nov 2013, 0:46:12 UTC - in response to Message 60306.  
Last modified: 5 Nov 2013, 0:46:38 UTC

awesome. i set it so it's running like this. been busy catching up on things i have had to do that piled up while i was sick, hoping to look into why its actually failing tonight. over on anandtech we have a working model of the problem and some possible fixes laid out. turns out not all versions of the AMD drivers have open_cl drivers and we think that something required to run the units is missing from whatever driver version I'm currently using (and which was destroyed in my initial reinstall which would explain my 5870 no longer working like it did with the units). I'm gonna do the footwork and find a driver version that works, figure out whats special about it if i can, and if there are any others that also work. the 5870/6970 combo (or any combo of 58xx/69xx cards) is probably going to become really common as anyone that didn't already upgrade to 79xx is probably looking at upgrading to an R9 card now and there's a ton of cards up for sale in my area. Just the other day i saw someone locally selling 15 6970s for $100 a pop and i was sitting there salvating over the WUs that many cards would produce for such a price. It'll be important to make sure people buying these old parts know which drivers are needed to run the tasks until the project finds a work around or AMD stops slacking on driver support (assuming that's the issue). I'm also looking at just foregoing windows computing possibly and installing a linux boot/VM to compute in since performance and driver support both seem to be better there right now especially for GPU compute simply because of the lack of overhead. ill take a look at this option later though since i have other things to fix and deal with first before any upgrades like that can be made.


This page might help with the drivers, best advice is to stay away from the Beta drivers as they are not always supported by all projects. Also the latest and greatest drivers are usually done to benefit gaming not crunching. For instance the latest Nvidia drivers have shown to be 20% slower for most gpu's on most projects.
http://www.hal6000.com/seti/boinc_ati_gpu_cheat_sheet.htm
ID: 60307 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60308 - Posted: 5 Nov 2013, 6:22:46 UTC - in response to Message 60307.  

thanks a bunch mikey, this is exactly what sunny and I needed over on anandtech forums to complete the picture for the fix he had worked up, now i know exactly what drivers i can and can't use.
ID: 60308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 60310 - Posted: 5 Nov 2013, 12:38:45 UTC - in response to Message 60308.  
Last modified: 5 Nov 2013, 12:40:38 UTC

thanks a bunch mikey, this is exactly what sunny and I needed over on anandtech forums to complete the picture for the fix he had worked up, now i know exactly what drivers i can and can't use.


No problem, glad I could help. Did you notice that waaay down at the bottom of the page are the actual download links? It took me a while to find those is why I am asking? I originally stopped at the bottom of the chart, but the download links are below that.
ID: 60310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Doctor

Send message
Joined: 11 Nov 09
Posts: 17
Credit: 7,324,208
RAC: 0
Message 60315 - Posted: 6 Nov 2013, 3:42:00 UTC

I've got three work units that failed because of an undefined BACKGROUND_PROFILE is there any way I can fix this? my computer was hung one morning, maybe these were due to this?

"C:\Users\Kyle\AppData\Local\Temp\OCLFFBA.tmp.cl", line 232: error: identifier
          "BACKGROUND_PROFILE" is undefined
          if (BACKGROUND_PROFILE == FAST_HERNQUIST)
              ^

1 error detected in the compilation of "C:\Users\Kyle\AppData\Local\Temp\OCLFFBA.tmp.cl".

Frontend phase failed compilation.
ID: 60315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60316 - Posted: 6 Nov 2013, 5:05:32 UTC - in response to Message 60310.  

no i didn't actually i JUST closed the link from the downloads. i went to guru3d because AMD's page wouldn't give me a link when i looked for 13.4, but guru3d as always had 13.4 WHQL at the top of google :)
ID: 60316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Faxon

Send message
Joined: 16 May 10
Posts: 15
Credit: 302,964,461
RAC: 0
Message 60317 - Posted: 6 Nov 2013, 6:45:10 UTC

okay so i got the drivers working, everything is golden and then the unit fails right as it hits 100%. I'm thinking BOINC version might be the issue now, but the cc_config.xml fix wont load properly and when i did update it, i let it run for a while but it never got any units to test if the 6970 would pass or fail either so i reverted back and I'm waiting for some questions to be answered. I'm guessing i use the default BOINC install folder (not the hidden program data folder microsoft has for all apps to store critical info in) but i put the file there and it didn't work. did i do anything wrong? if not i will uninstall boinc clean and reboot and try again, or if one of you wants to get on skype or something with me and yell at me like i'm a teenager looking at porn and not an A+ certified tech who should be able to figure this out themselves /sobs. I'm so rusty lol I need to focus on keeping up to date when i go away from tech for a while
ID: 60317 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Doctor

Send message
Joined: 11 Nov 09
Posts: 17
Credit: 7,324,208
RAC: 0
Message 60318 - Posted: 6 Nov 2013, 7:17:57 UTC - in response to Message 60317.  
Last modified: 6 Nov 2013, 7:19:45 UTC

cc_config.xml doesn't that go into the data directory?
C:\ProgramData\BOINC
What does the cc_config do? I ran a 6970 for awhile and boinc worked fine, i cant recall if I needed to edit the cc_config other than to tell boinc to not use the CPU, but that option is now in the gui.

Edit:
http://boinc.berkeley.edu/wiki/Client_configuration says data directory.
ID: 60318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Computation errors.

©2024 Astroinformatics Group