Welcome to MilkyWay@home

Nvidia driver 461.09 causes wu's to stall/run indefinitely, 460.79 works fine


Advanced search

Questions and Answers : Windows : Nvidia driver 461.09 causes wu's to stall/run indefinitely, 460.79 works fine
Message board moderation

To post messages, you must log in.

AuthorMessage
Gibbzy1991

Send message
Joined: 14 Apr 17
Posts: 5
Credit: 361
RAC: 0
1 credit badge4 year member badge
Message 70388 - Posted: 18 Jan 2021, 10:52:14 UTC

As the title says, nvidias new gpu driver is breaking milkyway opencl workunits. Reinstalling the previous driver, version 460.79, fixes the issue. Another users 1080ti and my 2070 super both had the same problem and the same fix, so there are likely other nvidia gpus affected by the bug too.

While testing/confirming it was the driver, I noticed that cpu usage was identical between the two driver versions, but with the newer driver gpu load never went above idle. They don't error out, at least not overnight, so it just shows an ever increasing remaining time estimate in boinc.

I suppose it would be possible to see what is going on by using the nvidia visual profiler tool? I'm on mobile data at the moment, so I just want to check that the tool would work to profile opencl before downloading it and trying to figure out how it all works. Happy to do it if the logs could be helpful to the devs here.
ID: 70388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Holdolin

Send message
Joined: 9 Dec 11
Posts: 33
Credit: 1,041,378,574
RAC: 4,847,204
1 billion credit badge9 year member badge
Message 70398 - Posted: 19 Jan 2021, 16:46:41 UTC

Not the first time a driver update broke a DC project. I think it was last year any NVIDIA driver past a certain point caused problems for F@H. You got it right, roll back the driver and keep truckin :)
ID: 70398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gibbzy1991

Send message
Joined: 14 Apr 17
Posts: 5
Credit: 361
RAC: 0
1 credit badge4 year member badge
Message 70408 - Posted: 20 Jan 2021, 5:02:34 UTC - in response to Message 70398.  

Yeah I had a look into the F@H thing, apparently there was a bug that went unfixed by nvidia for 2+ years and while the project had a workaround it affected performance quite a bit.

I'm probably going to move on to SRBase, after reading a bit more it seems a little silly to be using this gpu for milkyway when old amd cards are just as fast or faster. And while this pc spends more time crunching than gaming, but it is primarily a gaming pc so I'd prefer to keep the drivers up to date.

Still happy to do some profiling or extra troubleshooting if it would be helpful. But I guess if most people are running older hardware, or only crunching, then the driver issues aren't really a problem for the project in general.
ID: 70408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilejohnnymc
Avatar

Send message
Joined: 10 Mar 11
Posts: 8
Credit: 13,252,501
RAC: 520
10 million credit badge10 year member badge
Message 70468 - Posted: 29 Jan 2021, 13:46:35 UTC

2021/01/26 update to version 461.40 driver fixed the same issue for me.

But tonight I upgraded to Windows 10 Pro from Home and all work units now Compute error.
Detaching and reattaching still error out every work unit so I've stopped the project until I find the solution.
Life's short; make fun of it!
ID: 70468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilejohnnymc
Avatar

Send message
Joined: 10 Mar 11
Posts: 8
Credit: 13,252,501
RAC: 520
10 million credit badge10 year member badge
Message 70469 - Posted: 29 Jan 2021, 15:14:57 UTC - in response to Message 70468.  

As in typical RTFM fashion, I uninstalled the nVidia drivers and reinstalled them allowing Boinc to continue happily crunching once again.
Life's short; make fun of it!
ID: 70469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2500
Credit: 462,477,670
RAC: 1,410
300 million credit badge11 year member badgeextraordinary contributions badge
Message 70472 - Posted: 30 Jan 2021, 0:22:04 UTC - in response to Message 70469.  

As in typical RTFM fashion, I uninstalled the nVidia drivers and reinstalled them allowing Boinc to continue happily crunching once again.


You don't have to uninstall the old ones if you reinstall them using the same version, it will just overwrite them and you will be good to go after a reboot.
Windows has a VERY bad habit of messing things up for crunchers and gamers by thinking THEIR drivers are better, they aren't, but it doesn't matter to them.
ID: 70472 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWerinbert

Send message
Joined: 30 Dec 12
Posts: 7
Credit: 10,011,100
RAC: 77
10 million credit badge8 year member badge
Message 70477 - Posted: 30 Jan 2021, 21:46:55 UTC

I converted one of my computers from Linux to Win 10 a couple of days ago. At the time I installed the 461.09 drivers. My GPU has been happily crunching MW tasks since. The drivers were loaded after installing windows and I have not let windows update anything, so I suspect it was some windows update causing the problem not the Nvidia drivers.
ID: 70477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Spatzthecat

Send message
Joined: 1 Dec 10
Posts: 12
Credit: 3,597,784,476
RAC: 9,033,164
3 billion credit badge10 year member badge
Message 70478 - Posted: 30 Jan 2021, 23:21:55 UTC - in response to Message 70477.  

It is definitely the nVidia drivers 461.09.
I have 3 hosts all Win 10 which will downclock my overclock and then the units take an age.
The units were taking about 1 min with the overclock until the glitch which is caused by simply being connected to a monitor/TV.
I use an HDMI switch for the 3 hosts and once a reboot is instigated, switch to another machine and repeat the process until all 3 machines have been done.
I can then switch the switch off as soon as the last host starts to reboot and all 3 hosts behave.

The previous drivers and the 8 before all work without the glitch but the units take 1 min 15 secs
ID: 70478 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilejohnnymc
Avatar

Send message
Joined: 10 Mar 11
Posts: 8
Credit: 13,252,501
RAC: 520
10 million credit badge10 year member badge
Message 70479 - Posted: 31 Jan 2021, 0:58:45 UTC

I was noticing something glitchy as some units were crunching for 4-12 hours and stuck at around 40-70% complete so I realize there was a driver issue and came here to the forums for a solution.
Life's short; make fun of it!
ID: 70479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2500
Credit: 462,477,670
RAC: 1,410
300 million credit badge11 year member badgeextraordinary contributions badge
Message 70480 - Posted: 31 Jan 2021, 10:46:48 UTC - in response to Message 70479.  

They also had a couple bad batches of workunits, that could have figured in as well
ID: 70480 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Spatzthecat

Send message
Joined: 1 Dec 10
Posts: 12
Credit: 3,597,784,476
RAC: 9,033,164
3 billion credit badge10 year member badge
Message 70483 - Posted: 31 Jan 2021, 13:03:53 UTC - in response to Message 70480.  

They also had a couple bad batches of workunits, that could have figured in as well


These batches would have been problematic regardless of the nVidia driver used.
ID: 70483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Windows : Nvidia driver 461.09 causes wu's to stall/run indefinitely, 460.79 works fine

©2021 Astroinformatics Group