Welcome to MilkyWay@home

GPU Issues Mega Thread

Message boards : News : GPU Issues Mega Thread
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

AuthorMessage
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65076 - Posted: 29 Aug 2016, 14:01:36 UTC

Hey everyone,

Sorry for the week of silence. I was out of the office to visit my parents before the semester started.

It may be a server issue, but updating the boinc libraries in the client code (the program running on your computers) won't change anything on the server. I've already updated the server libraries and that didn't help.

Anyway, I am going to work on compiling all of the binaries today. Hopefully there will be a new version of MilkyWay@home put out by the end of tomorrow or Wednesday.

Jake
ID: 65076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 10 Feb 09
Posts: 52
Credit: 16,286,597
RAC: 0
Message 65078 - Posted: 29 Aug 2016, 16:10:17 UTC - in response to Message 65076.  

I've already updated the server libraries and that didn't help.

Uh, that's strange.

Anyway, I am going to work on compiling all of the binaries today. Hopefully there will be a new version of MilkyWay@home put out by the end of tomorrow or Wednesday.


Do you think to contact D. Anderson and the Boinc devs team for have an help?
ID: 65078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Za69uzZ
Avatar

Send message
Joined: 6 Apr 12
Posts: 42
Credit: 3,215,609
RAC: 0
Message 65080 - Posted: 30 Aug 2016, 14:39:22 UTC

Hello.
I have been number crunching for awhile.
I'd like to comment a few aspects about this erroneous bug hunting that I've seen lacking in the past.
Have people actually looked inside their computer lately?
Its a good idea to re slot your graphics cards, and CPU's every so often say 6 monthly or 12 monthly intervals.
The heat sink compound found interfacing chips with heat sinks degrade (dry out) depending on what grade of compound used, indicated by the tone of dye.
(white is the lowest, grey is of higher and the best is diamond)
Even the chips with heat sink tape degrade their ability to transfer heat from the chips to the heat sinks.
When number crunching the chips are stressed more or just as much as those high resolution games so its good practise to be of high maintenance with your number crunching rigs.
And please try and understand that no matter wether it is AMD ATI or NVIDIA the graphics card generally live 5 years. All modern facilities that make computer peripherals base their products lifetime of 5 years in an office environment.
Perhaps if people considered my comments and went over their equipment it would be more ethical and robust in bug tracking.
ID: 65080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Za69uzZ
Avatar

Send message
Joined: 6 Apr 12
Posts: 42
Credit: 3,215,609
RAC: 0
Message 65081 - Posted: 30 Aug 2016, 14:40:25 UTC - in response to Message 65080.  

oh btw, clean out the dust bunnies in your PSU's and CPU's and GPU's
ID: 65081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65083 - Posted: 31 Aug 2016, 6:53:43 UTC
Last modified: 31 Aug 2016, 7:04:11 UTC

Hello Jake, thanks for hunting the bugs.

For the GPU dection issu, is suggest you contact Einstein@home, they never had issues with the GPU detection in my opinion. And AMD GPUS work very well there with OpenCL.

I can test the older AMD GPUs there with the latest available driver for them (5870 or 6870 / 6970 GPUS)

I can also bring them online for Milkyway@home. For some reason, with driver newer then Catalyst 13.1, Milkyway@home aborts some tasks. (Have to check which one)

Thanks to Joseph, for reminding us about maintenance. But i have to object to the 5 years of GPU life-time. The GPU-Chips might hold that long, but the fans usually fail much erlier. And they are very hard to find

Can anyone check, how well the Nvidia detection works? I can check with my GTX 980. When anyone has a 700 series card, or even a 1000 series card, it would be nice to see some feedback :)
ID: 65083 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,005,521
RAC: 86,852
Message 65084 - Posted: 31 Aug 2016, 15:18:46 UTC - in response to Message 65083.  

I've always gotten the impression that if you want to crunch for MilkyWay, it is best to run Nvidia cards because they never have any issues. Either completing and validating work nor getting detected properly by the project. You take your chances with any kind of AMD card on the other hand. I have had 400, 600, 900 and 1000 series cards now working for MW with not one issue, ever.
ID: 65084 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
_heinz

Send message
Joined: 23 Feb 09
Posts: 28
Credit: 10,775,220
RAC: 0
Message 65085 - Posted: 31 Aug 2016, 19:12:52 UTC
Last modified: 31 Aug 2016, 19:23:07 UTC

I run 3 Computers with NVIDIA Graphicadapers, till now no problem with the Software here.

30.08.2016 20:49:42 | | CUDA: NVIDIA GPU 0: GeForce GT 540M (driver version 368.81, CUDA version 8.0, compute capability 2.1, 1024MB, 970MB available, 258 GFLOPS peak)
30.08.2016 20:49:42 | | OpenCL: NVIDIA GPU 0: GeForce GT 540M (driver version 368.81, device version OpenCL 1.1 CUDA, 1024MB, 970MB available, 258 GFLOPS peak)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
30.08.2016 23:03:12 | | CUDA: NVIDIA GPU 0: GeForce GT 650M (driver version 362.61, CUDA version 8.0, compute capability 3.0, 2048MB, 1681MB available, 730 GFLOPS peak)
30.08.2016 23:03:12 | | OpenCL: NVIDIA GPU 0: GeForce GT 650M (driver version 362.61, device version OpenCL 1.2 CUDA, 2048MB, 1681MB available, 730 GFLOPS peak)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31.08.2016 21:19:06 | | CUDA: NVIDIA GPU 0: GeForce GTX TITAN (driver version 372.54, CUDA version 8.0, compute capability 3.5, 4096MB, 4010MB available, 4989 GFLOPS peak)
31.08.2016 21:19:06 | | CUDA: NVIDIA GPU 1: GeForce GTX TITAN (driver version 372.54, CUDA version 8.0, compute capability 3.5, 4096MB, 4010MB available, 4989 GFLOPS peak)
31.08.2016 21:19:06 | | CUDA: NVIDIA GPU 2: GeForce GTX TITAN (driver version 372.54, CUDA version 8.0, compute capability 3.5, 4096MB, 4010MB available, 4989 GFLOPS peak)
31.08.2016 21:19:06 | | OpenCL: NVIDIA GPU 0: GeForce GTX TITAN (driver version 372.54, device version OpenCL 1.2 CUDA, 6144MB, 4010MB available, 4989 GFLOPS peak)
31.08.2016 21:19:06 | | OpenCL: NVIDIA GPU 1: GeForce GTX TITAN (driver version 372.54, device version OpenCL 1.2 CUDA, 6144MB, 4010MB available, 4989 GFLOPS peak)
31.08.2016 21:19:06 | | OpenCL: NVIDIA GPU 2: GeForce GTX TITAN (driver version 372.54, device version OpenCL 1.2 CUDA, 6144MB, 4010MB available, 4989 GFLOPS peak)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ID: 65085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65086 - Posted: 31 Aug 2016, 20:23:21 UTC

Hey Everyone,

Ran into a little snag getting the new BOINC libraries cross compiling for windows. Will hopefully have it all figured out by the end of the week for a Monday release.

Sorry for the delays.

Jake
ID: 65086 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 65087 - Posted: 1 Sep 2016, 4:00:05 UTC - in response to Message 65083.  

Thanks to Joseph, for reminding us about maintenance. But i have to object to the 5 years of GPU life-time. The GPU-Chips might hold that long, but the fans usually fail much erlier. And they are very hard to find

Those fans are mostly crap to begin with. . . and more like impossible to find!

No problem here with NVIDIA cards being recognized -- I have used five different ones so far. The latest are: GT 730, GTX 750Ti and GT 610 (in a low-profile machine).
ID: 65087 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65088 - Posted: 1 Sep 2016, 8:12:00 UTC

Thanks for all the feedback so far :)

@Keith Myers
The problem with Nvidia Cards is, that they have almost no Double Precision crunching power except for The GTX Titan or Titan Black (both Kepler).
I also had some driver issues with my Titan Blacks, when the cuda version was updated and a bug was in that version.
And i am glad that Nvidia at least supports OpenCL 1.2, so we can run them here :)

@_heinz
Your Titans can do quite well on Milkyway. If you want to run it a little, you have to set in the driver that double precision is used. And you have to use the app_config file to run 6 to 8 workunits in parallel per card to get them fully loaded.
ID: 65088 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 10 Feb 09
Posts: 52
Credit: 16,286,597
RAC: 0
Message 65089 - Posted: 1 Sep 2016, 12:38:50 UTC - in response to Message 65087.  

No problem here with NVIDIA cards being recognized -- I have used five different ones so far.


That's the point. Boinc manager recognizes my Amd gpu (and i crunch on Poem, Seti, Einstein, etc), but MW@H not recognizes it.
ID: 65089 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 10 Feb 09
Posts: 52
Credit: 16,286,597
RAC: 0
Message 65090 - Posted: 1 Sep 2016, 15:35:52 UTC - in response to Message 65086.  

Sorry for the delays.

Jake


No problem. You are working well!
ID: 65090 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65091 - Posted: 1 Sep 2016, 18:09:57 UTC
Last modified: 1 Sep 2016, 18:20:04 UTC

Don't know if it counts as a "GPU issue", but I have noticed there is a shortage of GPU workunits again. When BOINC requests new work, plenty of times I get "0 new tasks" and the queue quickly goes empty. Admittedly, I have 4x7970 and they need a lot of workunits for continuous crunching.
ID: 65091 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
_heinz

Send message
Joined: 23 Feb 09
Posts: 28
Credit: 10,775,220
RAC: 0
Message 65092 - Posted: 1 Sep 2016, 19:10:39 UTC - in response to Message 65088.  

Thanks for all the feedback so far :)
@_heinz
Your Titans can do quite well on Milkyway. If you want to run it a little, you have to set in the driver that double precision is used. And you have to use the app_config file to run 6 to 8 workunits in parallel per card to get them fully loaded.


Really it run 8 per device in about ~4min
ID: 65092 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,005,521
RAC: 86,852
Message 65093 - Posted: 1 Sep 2016, 19:55:33 UTC - in response to Message 65088.  

@Sebastian*

Oh I realize the Nvidia consumer cards are very hamstrung with respect to their Double Precision compute power. They are good enough to run MW if you also run multiple other projects that don't require Double Precision. If you want to run a single project that requires Double Precision, then an AMD card is your obvious choice to get maximum efficiency and RAC. I suppose then you have the will to constantly fight the driver compatibility issues with OS' and projects. I don't like to work that hard.
ID: 65093 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS]GuL
Avatar

Send message
Joined: 5 Jun 08
Posts: 21
Credit: 245,803,013
RAC: 0
Message 65094 - Posted: 1 Sep 2016, 20:04:24 UTC - in response to Message 65093.  

@Keith

Indeed. And moreover recent AMD have less double precision capacity than old ones. I was really surprised to decrease my Milky power by almost 4 when passing from a HD 7950 to a R9 380.

All is working fine for me, except the number of available tasks.
Cheers
ID: 65094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS]GuL
Avatar

Send message
Joined: 5 Jun 08
Posts: 21
Credit: 245,803,013
RAC: 0
Message 65095 - Posted: 1 Sep 2016, 21:19:37 UTC - in response to Message 65094.  

I forgot : on the R9 380, I needed to add a app_info.xml in order to recognize the card, and a app_config.xml to compute 4 tasks simultaneously.

Good night
ID: 65095 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65096 - Posted: 2 Sep 2016, 5:52:05 UTC
Last modified: 2 Sep 2016, 5:53:30 UTC

Hey everyone, i set up a HD 5970 (Dual 5870 on one card). I installed the newest not beta driver on it, and it runs smoothly. If you have such cards (HD 5850 or 5870) then you can use them without troubel for MW.
I will test a 69xx card soon, but i expect the same results :)

@Keith Myers

Yep, i agree to your last post. Most projects only have a few persons, even students working on it (programming, maintaining server) so that only for one GPU brand is optimized. You have to pick your cards depending on what projects you like most. A GTX Titan black has 1700 Gigaflop of double precision power, but it can barely compare to a 7970 with 1000 Gigaflop here in Milkyway. Primegrid is the other way around.

I can image that even OpenCL is hard work for the project programmers. Even if they only optimize on one manufacturer, they have to deal with different GPU architectures. And there is no help from the companies as well, since you can't earn profit on that :(
ID: 65096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aetherius

Send message
Joined: 5 Oct 09
Posts: 1
Credit: 5,684,450
RAC: 0
Message 65097 - Posted: 2 Sep 2016, 17:18:19 UTC

Hi all. Just recently started crunching again for my favorite projects. Years and years ago I used Radeon HD 38xx and 48xx (if memory serves lol) and got work for my GPUs from MW@home and they tore through it quickly. I have an R9 Fury now I was wanting it to do work with here and it doesn't receive work for Milkyway.
I had kinda skimmed through a few threads the last couple of days looking for an answer and thought this was a good spot to ask. SETI, COllatz, ect. all receive GPU work with it. Thanks all for any help !

RG
ID: 65097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,005,521
RAC: 86,852
Message 65098 - Posted: 3 Sep 2016, 4:06:40 UTC - in response to Message 65096.  

@Sebastian*
I primarily work for Seti since that is my first distributed computing project I got involved with back in 2001 with OS/2 Warp. Single precision works fine there. I know exactly what you mean about the programming minefield that is OpenCL. I follow the threads of the main Seti OpenCL app volunteer developer and it is incredible the amount of workarounds he has to employ to support all the generations of video cards going back ten years just to get OpenCL to work on Seti. Things get incredibly more difficult with each new generation of hardware. Every new app is done by volunteer developers now since there is no funding for the project scientists anymore. We crunchers are very grateful for their hard work.
ID: 65098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next

Message boards : News : GPU Issues Mega Thread

©2024 Astroinformatics Group