Message boards :
Number crunching :
Some GPU tasks failing
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 May 10 Posts: 9 Credit: 724,501,229 RAC: 22,330 |
Noticed a fair amount of errors among my GPU tasks. Some work, others don't. The logs seem to indicate a problem with device detection: Success: https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=106001665 Fail: https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=106001938 This is a dual GPU machine but can't figure out from the log which GPU the WU was using or trying to use. JJ |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Noticed a fair amount of errors among my GPU tasks. Some work, others don't. I looked at one of your invalid tasks and found this: Found 2 CL devices Device 'NVIDIA GeForce RTX 3090' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU) I looked at another of your invalid tasks and found this: Found 2 CL devices Device 'GeForce GTX 1080 Ti' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU) I also noticed that your 1080Ti is using driver version: 460.97 while the 3090 is using driver version: 511.23 Are you perhaps trying to run more than one task at a time on the different gpu's? |
Send message Joined: 17 May 10 Posts: 9 Credit: 724,501,229 RAC: 22,330 |
Well, yes. It's kind of hard to fully utilize a 3090 not to mention two of them without running multiple WUs. It actually might be the case that the error pertains to the WU not fitting in the GPU memory. Which is silly since these things have 24GB each but there seem to be limitations on GPU memory utilization with BOINC. Where on earth did that 1080Ti reference come from??? Those haven't been in use for over half a year. JJ |
Send message Joined: 24 Dec 07 Posts: 33 Credit: 1,923,161,515 RAC: 44,276 |
I've got a couple clients with failing WUs. They are WUs with names that begin with de_modfit_83_bundle5_3s_south_pt2_1643910122f EDIT: Admin posted this is the news... https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4834#71697 |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Well, yes. It's kind of hard to fully utilize a 3090 not to mention two of them without running multiple WUs. You may need to flush all the drivers from your system then by using the DDU utility you can download from the net and then reload them so you only have one driver, also unless you game I would stay away from the latest and greatest drivers especially at projects that are pretty slow to upgrade their own software like MilkyWay. Let someone else try it and then you can do it after you know that it works, new software can mean new ways of doing things and the Boinc Projects need to test the drivers to see if it works with their current tasks. |
Send message Joined: 17 May 10 Posts: 9 Credit: 724,501,229 RAC: 22,330 |
Don't usually bother with DDU or it's ilk. Several decades of experience of limited usefulness. I did however recently do a thorough driver cleanup due to an unrelated problem. I also went through BOINC logs and can find no mention of older GPUs or drivers in recent logs. I think you accidentally looked at a year old invalid task. i don't do a lot of invalids ;-) Fully aware of the risks involved in using bleeding edge software (or hardware for that matter). Recent driver updates have however included certain fixes that pertain to issues I'm personally experiencing. Also, since I'm a very technical person by profession and hobby I consider it my role to be a tester in these things. I dialed down the amount of WUs allowed per GPU simultaneously. Will keep monitoring the situation but no similar failures have since happened. JJ |
©2024 Astroinformatics Group