Message boards :
Number crunching :
Computation errors.
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
Yea that was my thought as well Richard. I've seen units get flagged as out of date when updating and simply replaced, these units started and ran for at least 1 second each before failing. upon updating the client and getting new units issued, these all also failed. Definitely nothing to do with juts not getting fresh units from the project after a major update |
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
richard i'm hunting for something specific which i can't find currently on the E@H forums and noticed you've posted to a lot of the threads im browsing through, maybe you can help me find the fix i need. Over on anandtech one of our teammates has been trying to help figure out whats up as well and posted this i was not able to dig up what i was looking for, but i'm pretty sure it can be found by searching the Einstein@Home forums. i seem to recall that parts of the OpenCL library were left out of a specific Catalyst driver version (or versions), though i cannot recall which version(s) off the top of my head...and i may be remebering things inaccurately, so alternatively it might have been the APP SDK that was left out of a particular Catalyst driver version (or versions). IIRC, this caused problems for AMD 5xxx series GPUs, but not AMD 6xxx series GPUs and up. unfortunately i'll be going out of town for the weekend right after i get out of work this afternoon, so i won't be able to poke around the Einstein@Home forums to look for possible problems and solutions until next week. in the mean time, i suggest you skim over the E@H message boards while i'm away (assuming you haven't already solved the problem via a fresh driver install...after all, it seems like you're onto something having recalled that the GTX 260 used to be installed on that machine...let us know how that goes). you should probably refer to the thread for context. i posted the link earlier but here it is again. http://forums.anandtech.com/showthread.php?t=2350667. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
OK. The best I can do. I've never run BOINC on an AMD/ATI GPU, but I read a lot - and from what I read, AMD/ATI driver support is all over the place. Components come and go, sometimes with an announcement and sometimes silently. Some projects can write applications that run under practically any driver, some projects write applications which are fussy and only run under one driver (or group of drivers), other projects write applications which run under other drivers. And sometimes - for some reason the numbers 13.4 stick in my mind - there's a driver which is on the 'must have' list for one project, and on the 'can't get it to run at all' list for another project. And that's before we even get onto the subject of operating systems... One thing I can state for certain: for Windows XP, OpenCL was removed from driver 12.2 (February 2012 - that's what the numbers mean) and later, although the download page at AMD claimed it was still included. I did the original research for that one, and the data is still on this hard disk: Directory of C:\AMD\Support\12-1_xp32_dd_ccc\Packages\Apps 17/05/2012 16:51 <DIR> . 17/05/2012 16:51 <DIR> .. 17/05/2012 16:50 <DIR> ATIPCE 17/05/2012 16:50 <DIR> CCC 17/05/2012 16:51 <DIR> CIM 17/05/2012 16:51 <DIR> dotnetfx 17/05/2012 16:51 <DIR> OpenCL 0 File(s) 0 bytes 7 Dir(s) 43,120,893,952 bytes free Directory of C:\AMD\Support\12-2_xp32_dd_ccc\Packages\Apps 29/06/2012 09:36 <DIR> . 29/06/2012 09:36 <DIR> .. 29/06/2012 09:36 <DIR> ATIPCE 29/06/2012 09:36 <DIR> CCC 29/06/2012 09:36 <DIR> CIM 29/06/2012 09:36 <DIR> dotnetfx 0 File(s) 0 bytes 6 Dir(s) 43,120,893,952 bytes free Spot the difference ;) Any machine that you've ever run an AMD driver installation on will have a similar "C:\AMD\Support\..." folder tree (IIRC, they don't let you unpack the download to any disk other than C:), and you can search it for 'OpenCL.msi' Many people in BOINC-land advise that you remove all traces of all AMD drivers before you change to a different one. AMD seem to have heard them, and produced a driver removal tool. The only official-looking link Google could find for me is http://sites.amd.com/us/game/downloads/Pages/catalyst-uninstall-utility.aspx, but that's asking me for a user/password login, which I don't have. Maybe somebody else here does? Another useful resource, especially for matching up the various version numbers of the various internal components and tying them back to the simple year.month Catalyst identifier is HAL 9000's ATI Driver Version Cheat Sheet. One extremely common and simple question which you posted over at Anandtech is 'the card is detecting but is listed as "not used"'. With two dis-similar GPUs in the same computer (from the same 'family'), BOINC will by default only use the 'better' card, and will 'not use' the lesser card - that's been policy for a long, long time. You can over-ride it by setting the <use_all_gpus> option described in client configuration. |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
ill try this if a fix is found for the errors. right now updating the client to that version is pointless anyways because doing so made my 6970 instantly fail all GPU work units anyways making updating a moot point. At least with the old version i can run both congruently and run some of the GPU work units. And the card isn't technically disabled, its showing up as enabled in device manager and in drivers the same with both versions. The version switch was also done without a reboot (i rebooted to see if the card would start getting them, but when i switched back to the old version it instantly started working so i didn't bother rebooting again, making the first reboot redundant), telling me the card was detecting fine to begin with, BOINC was just disabling it as a compute card DONE, THANK YOU for setting me straight!! |
Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,461,693,501 RAC: 0 |
Any machine that you've ever run an AMD driver installation on will have a similar "C:\AMD\Support\..." folder tree (IIRC, they don't let you unpack the download to any disk other than C:), and you can search it for 'OpenCL.msi' I think that explains why I am failing all opencl tasks on one of my gtx460 machines while another similar 460 runs opencl fine. I once had an ATI video board on it. After reading Richards ATI info post, I uninstalled the AMD catalyst set using "express uninstall manager" and rebooted but it seems I have the same problem: Before rebooting there were 7 or so MW tasks that had compute error and one that had not started yet as I had managed to suspend the project. After rebooting I resumed MW project and the tasks started up and failed immediately. An additional task was downloaded and it also failed. It would appear that there is still some opencl ATI driver somewhere causing a problem. This system is a old Tyan S2892 server with onboard ATI video (unused and disabled via jumper) with two gtx460 and win7x64pro. It would appear I will have to use some driver cleaner to get rid of ATI opencl. Alternately, there is some other problem. Maybe a guru can spot another problem here EDIT - Collatz and Prime run fine on this system but are not using opencl. |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
Any machine that you've ever run an AMD driver installation on will have a similar "C:\AMD\Support\..." folder tree (IIRC, they don't let you unpack the download to any disk other than C:), and you can search it for 'OpenCL.msi' To get rid of the last remnants try 'driversweeper', or 'driverfusion' both available here http://darkryder.com/#pricing_modified on the bottom right hand side of the page. |
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
i ran driversweeper and i am still having problems on my ATI system. as of the last update i posted where i tried to fix it i had already run driversweeper on the system twice. |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
i ran driversweeper and i am still having problems on my ATI system. as of the last update i posted where i tried to fix it i had already run driversweeper on the system twice. The webpage I gave you says that 'driver fusion' is the newer version, maybe you could try that? |
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
i will investigate it. i was about to sit down for a session of DJing before my family gets home so i can actually run my full 1000w set of monitors and subs since they were supposed to get home today but it sounds like they're gonna be a few hours. I also set my work unit cache for BOINC to be 10 days worth and deleted all the offending units issued before they could start until i only had a cache of separation (modified fit) units to run for the next couple hours. I'll watch it closely until i can tinker with it because that rig's RAC is really suffering, it's still going up but it's going up at a much slower rate relative to the rigs computational power in comparison vs my 2x 7950s. that 5-7s + task switch time is adding up really fast on all these failed units :( |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
Please don't download masses of tasks, only to abort them later - that really is unkind to the project and other participants. Use the tools provided on MilkyWay@Home preferences so you only download tasks for the application(s) you're interested in. |
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
they're getting downloaded and turned into errors anyways, and my other rig is running them fine. is there a way to set this on a client by client basis? |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
they're getting downloaded and turned into errors anyways, and my other rig is running them fine. is there a way to set this on a client by client basis? Yes there is go onto the MilkyWay webpage under your account and then under preferences for this project you will see the default settings. If you scroll down that page you will see that you can setup settings for other groups too, ie work, home and school each with it's own special and unique settings. After you set them up to your liking, you can use one of all of the groups, you go back into your account and then to computers on this account and click on the details of each pc, at the bottom of that page you can will see a small box with those same home, work and school options on a drop down box. The dash means default, chose the one you setup for that pc and that pc will start using those settings as of it's next connection to the project. |
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
awesome. i set it so it's running like this. been busy catching up on things i have had to do that piled up while i was sick, hoping to look into why its actually failing tonight. over on anandtech we have a working model of the problem and some possible fixes laid out. turns out not all versions of the AMD drivers have open_cl drivers and we think that something required to run the units is missing from whatever driver version I'm currently using (and which was destroyed in my initial reinstall which would explain my 5870 no longer working like it did with the units). I'm gonna do the footwork and find a driver version that works, figure out whats special about it if i can, and if there are any others that also work. the 5870/6970 combo (or any combo of 58xx/69xx cards) is probably going to become really common as anyone that didn't already upgrade to 79xx is probably looking at upgrading to an R9 card now and there's a ton of cards up for sale in my area. Just the other day i saw someone locally selling 15 6970s for $100 a pop and i was sitting there salvating over the WUs that many cards would produce for such a price. It'll be important to make sure people buying these old parts know which drivers are needed to run the tasks until the project finds a work around or AMD stops slacking on driver support (assuming that's the issue). I'm also looking at just foregoing windows computing possibly and installing a linux boot/VM to compute in since performance and driver support both seem to be better there right now especially for GPU compute simply because of the lack of overhead. ill take a look at this option later though since i have other things to fix and deal with first before any upgrades like that can be made. |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
awesome. i set it so it's running like this. been busy catching up on things i have had to do that piled up while i was sick, hoping to look into why its actually failing tonight. over on anandtech we have a working model of the problem and some possible fixes laid out. turns out not all versions of the AMD drivers have open_cl drivers and we think that something required to run the units is missing from whatever driver version I'm currently using (and which was destroyed in my initial reinstall which would explain my 5870 no longer working like it did with the units). I'm gonna do the footwork and find a driver version that works, figure out whats special about it if i can, and if there are any others that also work. the 5870/6970 combo (or any combo of 58xx/69xx cards) is probably going to become really common as anyone that didn't already upgrade to 79xx is probably looking at upgrading to an R9 card now and there's a ton of cards up for sale in my area. Just the other day i saw someone locally selling 15 6970s for $100 a pop and i was sitting there salvating over the WUs that many cards would produce for such a price. It'll be important to make sure people buying these old parts know which drivers are needed to run the tasks until the project finds a work around or AMD stops slacking on driver support (assuming that's the issue). I'm also looking at just foregoing windows computing possibly and installing a linux boot/VM to compute in since performance and driver support both seem to be better there right now especially for GPU compute simply because of the lack of overhead. ill take a look at this option later though since i have other things to fix and deal with first before any upgrades like that can be made. This page might help with the drivers, best advice is to stay away from the Beta drivers as they are not always supported by all projects. Also the latest and greatest drivers are usually done to benefit gaming not crunching. For instance the latest Nvidia drivers have shown to be 20% slower for most gpu's on most projects. http://www.hal6000.com/seti/boinc_ati_gpu_cheat_sheet.htm |
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
thanks a bunch mikey, this is exactly what sunny and I needed over on anandtech forums to complete the picture for the fix he had worked up, now i know exactly what drivers i can and can't use. |
Send message Joined: 8 May 09 Posts: 3315 Credit: 519,943,276 RAC: 22,449 |
thanks a bunch mikey, this is exactly what sunny and I needed over on anandtech forums to complete the picture for the fix he had worked up, now i know exactly what drivers i can and can't use. No problem, glad I could help. Did you notice that waaay down at the bottom of the page are the actual download links? It took me a while to find those is why I am asking? I originally stopped at the bottom of the chart, but the download links are below that. |
Send message Joined: 11 Nov 09 Posts: 17 Credit: 7,324,208 RAC: 0 |
I've got three work units that failed because of an undefined BACKGROUND_PROFILE is there any way I can fix this? my computer was hung one morning, maybe these were due to this? "C:\Users\Kyle\AppData\Local\Temp\OCLFFBA.tmp.cl", line 232: error: identifier "BACKGROUND_PROFILE" is undefined if (BACKGROUND_PROFILE == FAST_HERNQUIST) ^ 1 error detected in the compilation of "C:\Users\Kyle\AppData\Local\Temp\OCLFFBA.tmp.cl". Frontend phase failed compilation. |
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
no i didn't actually i JUST closed the link from the downloads. i went to guru3d because AMD's page wouldn't give me a link when i looked for 13.4, but guru3d as always had 13.4 WHQL at the top of google :) |
Send message Joined: 16 May 10 Posts: 15 Credit: 302,964,461 RAC: 0 |
okay so i got the drivers working, everything is golden and then the unit fails right as it hits 100%. I'm thinking BOINC version might be the issue now, but the cc_config.xml fix wont load properly and when i did update it, i let it run for a while but it never got any units to test if the 6970 would pass or fail either so i reverted back and I'm waiting for some questions to be answered. I'm guessing i use the default BOINC install folder (not the hidden program data folder microsoft has for all apps to store critical info in) but i put the file there and it didn't work. did i do anything wrong? if not i will uninstall boinc clean and reboot and try again, or if one of you wants to get on skype or something with me and yell at me like i'm a teenager looking at porn and not an A+ certified tech who should be able to figure this out themselves /sobs. I'm so rusty lol I need to focus on keeping up to date when i go away from tech for a while |
Send message Joined: 11 Nov 09 Posts: 17 Credit: 7,324,208 RAC: 0 |
cc_config.xml doesn't that go into the data directory? C:\ProgramData\BOINC What does the cc_config do? I ran a 6970 for awhile and boinc worked fine, i cant recall if I needed to edit the cc_config other than to tell boinc to not use the CPU, but that option is now in the gui. Edit: http://boinc.berkeley.edu/wiki/Client_configuration says data directory. |
©2024 Astroinformatics Group