GPU Issues Mega Thread

Author	Message
Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 65076 - Posted: 29 Aug 2016, 14:01:36 UTC Hey everyone, Sorry for the week of silence. I was out of the office to visit my parents before the semester started. It may be a server issue, but updating the boinc libraries in the client code (the program running on your computers) won't change anything on the server. I've already updated the server libraries and that didn't help. Anyway, I am going to work on compiling all of the binaries today. Hopefully there will be a new version of MilkyWay@home put out by the end of tomorrow or Wednesday. Jake ID: 65076 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Feb 09 Posts: 52 Credit: 16,383,157 RAC: 31	Message 65078 - Posted: 29 Aug 2016, 16:10:17 UTC - in response to Message 65076. I've already updated the server libraries and that didn't help. Uh, that's strange. Anyway, I am going to work on compiling all of the binaries today. Hopefully there will be a new version of MilkyWay@home put out by the end of tomorrow or Wednesday. Do you think to contact D. Anderson and the Boinc devs team for have an help? ID: 65078 · Rating: 0 · rate: / Reply Quote

ZapbuzZ Send message Joined: 6 Apr 12 Posts: 42 Credit: 3,215,609 RAC: 0	Message 65080 - Posted: 30 Aug 2016, 14:39:22 UTC Hello. I have been number crunching for awhile. I'd like to comment a few aspects about this erroneous bug hunting that I've seen lacking in the past. Have people actually looked inside their computer lately? Its a good idea to re slot your graphics cards, and CPU's every so often say 6 monthly or 12 monthly intervals. The heat sink compound found interfacing chips with heat sinks degrade (dry out) depending on what grade of compound used, indicated by the tone of dye. (white is the lowest, grey is of higher and the best is diamond) Even the chips with heat sink tape degrade their ability to transfer heat from the chips to the heat sinks. When number crunching the chips are stressed more or just as much as those high resolution games so its good practise to be of high maintenance with your number crunching rigs. And please try and understand that no matter wether it is AMD ATI or NVIDIA the graphics card generally live 5 years. All modern facilities that make computer peripherals base their products lifetime of 5 years in an office environment. Perhaps if people considered my comments and went over their equipment it would be more ethical and robust in bug tracking. ID: 65080 · Rating: 0 · rate: / Reply Quote

ZapbuzZ Send message Joined: 6 Apr 12 Posts: 42 Credit: 3,215,609 RAC: 0	Message 65081 - Posted: 30 Aug 2016, 14:40:25 UTC - in response to Message 65080. oh btw, clean out the dust bunnies in your PSU's and CPU's and GPU's ID: 65081 · Rating: 0 · rate: / Reply Quote

Sebastian* Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,033,638,730 RAC: 19,766	Message 65083 - Posted: 31 Aug 2016, 6:53:43 UTC Last modified: 31 Aug 2016, 7:04:11 UTC Hello Jake, thanks for hunting the bugs. For the GPU dection issu, is suggest you contact Einstein@home, they never had issues with the GPU detection in my opinion. And AMD GPUS work very well there with OpenCL. I can test the older AMD GPUs there with the latest available driver for them (5870 or 6870 / 6970 GPUS) I can also bring them online for Milkyway@home. For some reason, with driver newer then Catalyst 13.1, Milkyway@home aborts some tasks. (Have to check which one) Thanks to Joseph, for reminding us about maintenance. But i have to object to the 5 years of GPU life-time. The GPU-Chips might hold that long, but the fans usually fail much erlier. And they are very hard to find Can anyone check, how well the Nvidia detection works? I can check with my GTX 980. When anyone has a 700 series card, or even a 1000 series card, it would be nice to see some feedback :) ID: 65083 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 734 Credit: 564,781,412 RAC: 12,387	Message 65084 - Posted: 31 Aug 2016, 15:18:46 UTC - in response to Message 65083. I've always gotten the impression that if you want to crunch for MilkyWay, it is best to run Nvidia cards because they never have any issues. Either completing and validating work nor getting detected properly by the project. You take your chances with any kind of AMD card on the other hand. I have had 400, 600, 900 and 1000 series cards now working for MW with not one issue, ever. ID: 65084 · Rating: 0 · rate: / Reply Quote

_heinz Send message Joined: 23 Feb 09 Posts: 28 Credit: 10,775,220 RAC: 0	Message 65085 - Posted: 31 Aug 2016, 19:12:52 UTC Last modified: 31 Aug 2016, 19:23:07 UTC I run 3 Computers with NVIDIA Graphicadapers, till now no problem with the Software here. 30.08.2016 20:49:42 \| \| CUDA: NVIDIA GPU 0: GeForce GT 540M (driver version 368.81, CUDA version 8.0, compute capability 2.1, 1024MB, 970MB available, 258 GFLOPS peak) 30.08.2016 20:49:42 \| \| OpenCL: NVIDIA GPU 0: GeForce GT 540M (driver version 368.81, device version OpenCL 1.1 CUDA, 1024MB, 970MB available, 258 GFLOPS peak) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 30.08.2016 23:03:12 \| \| CUDA: NVIDIA GPU 0: GeForce GT 650M (driver version 362.61, CUDA version 8.0, compute capability 3.0, 2048MB, 1681MB available, 730 GFLOPS peak) 30.08.2016 23:03:12 \| \| OpenCL: NVIDIA GPU 0: GeForce GT 650M (driver version 362.61, device version OpenCL 1.2 CUDA, 2048MB, 1681MB available, 730 GFLOPS peak) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 31.08.2016 21:19:06 \| \| CUDA: NVIDIA GPU 0: GeForce GTX TITAN (driver version 372.54, CUDA version 8.0, compute capability 3.5, 4096MB, 4010MB available, 4989 GFLOPS peak) 31.08.2016 21:19:06 \| \| CUDA: NVIDIA GPU 1: GeForce GTX TITAN (driver version 372.54, CUDA version 8.0, compute capability 3.5, 4096MB, 4010MB available, 4989 GFLOPS peak) 31.08.2016 21:19:06 \| \| CUDA: NVIDIA GPU 2: GeForce GTX TITAN (driver version 372.54, CUDA version 8.0, compute capability 3.5, 4096MB, 4010MB available, 4989 GFLOPS peak) 31.08.2016 21:19:06 \| \| OpenCL: NVIDIA GPU 0: GeForce GTX TITAN (driver version 372.54, device version OpenCL 1.2 CUDA, 6144MB, 4010MB available, 4989 GFLOPS peak) 31.08.2016 21:19:06 \| \| OpenCL: NVIDIA GPU 1: GeForce GTX TITAN (driver version 372.54, device version OpenCL 1.2 CUDA, 6144MB, 4010MB available, 4989 GFLOPS peak) 31.08.2016 21:19:06 \| \| OpenCL: NVIDIA GPU 2: GeForce GTX TITAN (driver version 372.54, device version OpenCL 1.2 CUDA, 6144MB, 4010MB available, 4989 GFLOPS peak) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ID: 65085 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 65086 - Posted: 31 Aug 2016, 20:23:21 UTC Hey Everyone, Ran into a little snag getting the new BOINC libraries cross compiling for windows. Will hopefully have it all figured out by the end of the week for a Monday release. Sorry for the delays. Jake ID: 65086 · Rating: 0 · rate: / Reply Quote

w1hue Send message Joined: 13 Feb 09 Posts: 51 Credit: 74,047,495 RAC: 4,425	Message 65087 - Posted: 1 Sep 2016, 4:00:05 UTC - in response to Message 65083. Thanks to Joseph, for reminding us about maintenance. But i have to object to the 5 years of GPU life-time. The GPU-Chips might hold that long, but the fans usually fail much erlier. And they are very hard to find Those fans are mostly crap to begin with. . . and more like impossible to find! No problem here with NVIDIA cards being recognized -- I have used five different ones so far. The latest are: GT 730, GTX 750Ti and GT 610 (in a low-profile machine). ID: 65087 · Rating: 0 · rate: / Reply Quote

Sebastian* Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,033,638,730 RAC: 19,766	Message 65088 - Posted: 1 Sep 2016, 8:12:00 UTC Thanks for all the feedback so far :) @Keith Myers The problem with Nvidia Cards is, that they have almost no Double Precision crunching power except for The GTX Titan or Titan Black (both Kepler). I also had some driver issues with my Titan Blacks, when the cuda version was updated and a bug was in that version. And i am glad that Nvidia at least supports OpenCL 1.2, so we can run them here :) @_heinz Your Titans can do quite well on Milkyway. If you want to run it a little, you have to set in the driver that double precision is used. And you have to use the app_config file to run 6 to 8 workunits in parallel per card to get them fully loaded. ID: 65088 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Feb 09 Posts: 52 Credit: 16,383,157 RAC: 31	Message 65089 - Posted: 1 Sep 2016, 12:38:50 UTC - in response to Message 65087. No problem here with NVIDIA cards being recognized -- I have used five different ones so far. That's the point. Boinc manager recognizes my Amd gpu (and i crunch on Poem, Seti, Einstein, etc), but MW@H not recognizes it. ID: 65089 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Feb 09 Posts: 52 Credit: 16,383,157 RAC: 31	Message 65090 - Posted: 1 Sep 2016, 15:35:52 UTC - in response to Message 65086. Sorry for the delays. Jake No problem. You are working well! ID: 65090 · Rating: 0 · rate: / Reply Quote

Vortac Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0	Message 65091 - Posted: 1 Sep 2016, 18:09:57 UTC Last modified: 1 Sep 2016, 18:20:04 UTC Don't know if it counts as a "GPU issue", but I have noticed there is a shortage of GPU workunits again. When BOINC requests new work, plenty of times I get "0 new tasks" and the queue quickly goes empty. Admittedly, I have 4x7970 and they need a lot of workunits for continuous crunching. ID: 65091 · Rating: 0 · rate: / Reply Quote

_heinz Send message Joined: 23 Feb 09 Posts: 28 Credit: 10,775,220 RAC: 0	Message 65092 - Posted: 1 Sep 2016, 19:10:39 UTC - in response to Message 65088. Thanks for all the feedback so far :) @_heinz Your Titans can do quite well on Milkyway. If you want to run it a little, you have to set in the driver that double precision is used. And you have to use the app_config file to run 6 to 8 workunits in parallel per card to get them fully loaded. Really it run 8 per device in about ~4min ID: 65092 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 734 Credit: 564,781,412 RAC: 12,387	Message 65093 - Posted: 1 Sep 2016, 19:55:33 UTC - in response to Message 65088. @Sebastian* Oh I realize the Nvidia consumer cards are very hamstrung with respect to their Double Precision compute power. They are good enough to run MW if you also run multiple other projects that don't require Double Precision. If you want to run a single project that requires Double Precision, then an AMD card is your obvious choice to get maximum efficiency and RAC. I suppose then you have the will to constantly fight the driver compatibility issues with OS' and projects. I don't like to work that hard. ID: 65093 · Rating: 0 · rate: / Reply Quote

[AF>EDLS]GuL Send message Joined: 5 Jun 08 Posts: 21 Credit: 245,803,013 RAC: 0	Message 65094 - Posted: 1 Sep 2016, 20:04:24 UTC - in response to Message 65093. @Keith Indeed. And moreover recent AMD have less double precision capacity than old ones. I was really surprised to decrease my Milky power by almost 4 when passing from a HD 7950 to a R9 380. All is working fine for me, except the number of available tasks. Cheers ID: 65094 · Rating: 0 · rate: / Reply Quote

[AF>EDLS]GuL Send message Joined: 5 Jun 08 Posts: 21 Credit: 245,803,013 RAC: 0	Message 65095 - Posted: 1 Sep 2016, 21:19:37 UTC - in response to Message 65094. I forgot : on the R9 380, I needed to add a app_info.xml in order to recognize the card, and a app_config.xml to compute 4 tasks simultaneously. Good night ID: 65095 · Rating: 0 · rate: / Reply Quote

Sebastian* Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,033,638,730 RAC: 19,766	Message 65096 - Posted: 2 Sep 2016, 5:52:05 UTC Last modified: 2 Sep 2016, 5:53:30 UTC Hey everyone, i set up a HD 5970 (Dual 5870 on one card). I installed the newest not beta driver on it, and it runs smoothly. If you have such cards (HD 5850 or 5870) then you can use them without troubel for MW. I will test a 69xx card soon, but i expect the same results :) @Keith Myers Yep, i agree to your last post. Most projects only have a few persons, even students working on it (programming, maintaining server) so that only for one GPU brand is optimized. You have to pick your cards depending on what projects you like most. A GTX Titan black has 1700 Gigaflop of double precision power, but it can barely compare to a 7970 with 1000 Gigaflop here in Milkyway. Primegrid is the other way around. I can image that even OpenCL is hard work for the project programmers. Even if they only optimize on one manufacturer, they have to deal with different GPU architectures. And there is no help from the companies as well, since you can't earn profit on that :( ID: 65096 · Rating: 0 · rate: / Reply Quote

Aetherius Send message Joined: 5 Oct 09 Posts: 1 Credit: 5,684,450 RAC: 0	Message 65097 - Posted: 2 Sep 2016, 17:18:19 UTC Hi all. Just recently started crunching again for my favorite projects. Years and years ago I used Radeon HD 38xx and 48xx (if memory serves lol) and got work for my GPUs from MW@home and they tore through it quickly. I have an R9 Fury now I was wanting it to do work with here and it doesn't receive work for Milkyway. I had kinda skimmed through a few threads the last couple of days looking for an answer and thought this was a good spot to ask. SETI, COllatz, ect. all receive GPU work with it. Thanks all for any help ! RG ID: 65097 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 734 Credit: 564,781,412 RAC: 12,387	Message 65098 - Posted: 3 Sep 2016, 4:06:40 UTC - in response to Message 65096. @Sebastian* I primarily work for Seti since that is my first distributed computing project I got involved with back in 2001 with OS/2 Warp. Single precision works fine there. I know exactly what you mean about the programming minefield that is OpenCL. I follow the threads of the main Seti OpenCL app volunteer developer and it is incredible the amount of workarounds he has to employ to support all the generations of video cards going back ten years just to get OpenCL to work on Seti. Things get incredibly more difficult with each new generation of hardware. Every new app is done by volunteer developers now since there is no funding for the project scientists anymore. We crunchers are very grateful for their hard work. ID: 65098 · Rating: 0 · rate: / Reply Quote