Run Multiple WU's on Your GPU

Author	Message
Keith Myers Send message Joined: 24 Jan 11 Posts: 709 Credit: 549,581,661 RAC: 56,250	Message 71165 - Posted: 28 Sep 2021, 18:48:58 UTC - in response to Message 71164. If I remember correctly, and assume I don't . . . . MW allows 300 tasks per card and a project maximum of 900 tasks. ID: 71165 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 13 Jun 09 Posts: 24 Credit: 137,536,729 RAC: 0	Message 71167 - Posted: 28 Sep 2021, 20:27:35 UTC - in response to Message 71165. Yes, your correct. Based on the other discussions I try to write a script that will work around the limit by pausing the networking ID: 71167 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 709 Credit: 549,581,661 RAC: 56,250	Message 71170 - Posted: 29 Sep 2021, 1:18:37 UTC - in response to Message 71164. I also assume that it would be better for the project to create larger work units? or at least an option for larger ones? They already make larger units by bundling 4 or 5 tasks in the work unit. But they ran into issues just recently with failed work units that for some reason the devs haven't figured out had the scheduler/splitter bundle 7 tasks into the WU and that exceeds the parameter length limit. So making larger tasks doesn't seem possible at this time. ID: 71170 · Rating: 0 · rate: / Reply Quote

Ryan Munro Send message Joined: 22 Jun 09 Posts: 13 Credit: 75,835,284 RAC: 3,096	Message 71172 - Posted: 29 Sep 2021, 9:37:46 UTC - in response to Message 71157. Still won't go above 210w, timed it and when running 16 units at the same time it does about as much work in the same time as when I run 4 units at once, no real difference. ID: 71172 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 13 Jun 09 Posts: 24 Credit: 137,536,729 RAC: 0	Message 71182 - Posted: 29 Sep 2021, 20:12:58 UTC - in response to Message 71172. Last modified: 29 Sep 2021, 20:14:25 UTC I assume the 3090 just can't do enough work to get to full power, in the Ampere SMs only half of the SM can do FP64 so, I can imagine the best case here is 3090 would be about half of max power which is about what you see I assume if you have 400 W BIOS, I have founders so I have 350 W and see about 180 W, with just one WU at a time even on my Titan V, its not full bore with 7 WUs at once ID: 71182 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 709 Credit: 549,581,661 RAC: 56,250	Message 71183 - Posted: 29 Sep 2021, 21:25:25 UTC - in response to Message 71182. There are other gpu projects that will use every watt available to them on Ampere. GPUGrid, Minecraft, Einstein for example. We just need Petri to give the MW app a good work over like he has done with Seti and Einstein. ID: 71183 · Rating: 0 · rate: / Reply Quote

Wrend Send message Joined: 4 Nov 12 Posts: 96 Credit: 251,528,484 RAC: 0	Message 72357 - Posted: 30 Mar 2022, 18:12:16 UTC Last modified: 30 Mar 2022, 18:22:17 UTC Bearing that in mind, I'm quite pleased that this project can make good use of the DP/FP64 capabilities of my Titan Black GPUs, whereas other projects can't. Yes, the project is more niche for it, but then aptly so are my GPUs. If anyone is to be held responsible for it, I think it would be best placed at Nvidia's feet for generally limiting the capabilities of their cards to profit more from segmenting their market. As an update to my previous post, I have applied new thermal paste to my GPUs and it made a quite significant difference, dropping full load temperatures over 10Â°C; and I say over since I was hitting thermal throttling levels before. It would seem the stock thermal paste had an effective lifespan of up to about 5 years for cards that are in active use. Currently I'm back to running two tasks per GPU to help keep room temperatures and fan speeds and noise down. It's nice to be having my cards doing some good work in the background again. Best regards. ID: 72357 · Rating: 0 · rate: / Reply Quote

Chooka Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 526	Message 72522 - Posted: 5 Apr 2022, 19:04:26 UTC - in response to Message 71160. No matter how many WU's I run my card won't use more than 210w when crunching, 2 / 4 / 8 units, its a 3090 and under WCG running 8 of their GPU units at the same time it will pull the full 350w, With the price of graphic cards skyrocketing, I would avoid running any card at its max rating. High temps cause thermal paste to harden making removal non-trivial. Exact OEM fan replacement can be hard to find and one must be creative on occasions. I have a few really weird fan arrangements I can post if anyone interested. BUT the problem comes down to MW not being able to supply you with enough tasks in one day to keep it going, with their 10 minutes back-off between sending tasks your 3090 will be doing something else for those 10 minutes even more often. That BOINC app I modded, 7.15.0 fixes the 10 minute wait. The latest official version is 7l.16.11 and I assume the 10 minute problem still exists for that app. Due to temperatures recently dropping here in texas, I started up a pair of garage "racks" to start crunching on Einstein and WCG. I have a 3rd rack for milkyway but the garage is still too hot to run that one. YES your app did fix it, but unfortunately not everyone uses it. I was using the modded 7.15 version that Joseph created but I changed to another version for the Primegrid challenge..... now I can't get it to change back to 7.15. Each time I try to install it as per the instructions, BOINC just won't connect. I delete just about every BOINC file I can find but no luck. I'm not sure if perhaps it's due tot he project file not being deleted or if it's my cloud backup? Either way, I've tried it on 3 or 4 pc's but can't get it to work any more. I really can't believe that the 10min backoff thing still exists. It's the single biggest issue with this project and because of it I crunch elsewhere like Einstein. I REALLY wish someone would take the time to address this issue :( It's only going to get worse as cards get faster. ID: 72522 · Rating: 0 · rate: / Reply Quote

Wrend Send message Joined: 4 Nov 12 Posts: 96 Credit: 251,528,484 RAC: 0	Message 72579 - Posted: 8 Apr 2022, 13:33:10 UTC Last modified: 8 Apr 2022, 13:44:16 UTC I'm wondering what the current name syntax for work units types/apps all are now for the purposes of this file. I think it's been years since I've updated them. I currently have this to run 4 tasks per GPU for 2 SLIed Titan Black cards: ... <app> <name>milkyway</name> <max_concurrent>0</max_concurrent> <gpu_versions> <gpu_usage>0.25</gpu_usage> <cpu_usage>0.10</cpu_usage> </gpu_versions> </app> <app> <name>milkyway_nbody</name> <max_concurrent>0</max_concurrent> <gpu_versions> <gpu_usage>0.25</gpu_usage> <cpu_usage>0.10</cpu_usage> </gpu_versions> </app> <app> <name>milkyway_separation__modified_fit</name> <max_concurrent>0</max_concurrent> <gpu_versions> <gpu_usage>0.25</gpu_usage> <cpu_usage>0.10</cpu_usage> </gpu_versions> </app> ... If someone could let me know, I'd appreciate it. Thanks. ID: 72579 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 709 Credit: 549,581,661 RAC: 56,250	Message 72581 - Posted: 8 Apr 2022, 13:49:15 UTC You can drop the milkyway_separation_modified_fit section. It's just milkway for the Separation tasks now. ID: 72581 · Rating: 0 · rate: / Reply Quote

Wrend Send message Joined: 4 Nov 12 Posts: 96 Credit: 251,528,484 RAC: 0	Message 72583 - Posted: 8 Apr 2022, 13:56:25 UTC - in response to Message 72581. You can drop the milkyway_separation_modified_fit section. It's just milkway for the Separation tasks now. Thanks. ID: 72583 · Rating: 0 · rate: / Reply Quote

Blake Send message Joined: 1 Jul 12 Posts: 8 Credit: 351,094,054 RAC: 0	Message 73817 - Posted: 12 Jun 2022, 5:59:42 UTC Anyone figured out the best <gpu_usage /> setting for Nvidia RTX 2080? It is not the 'Super' or the 'TI,' just a plain-Jane RTX 2080 with 8 GB of dedicated VRAM. I've had it running only one work unit at a time so far. ID: 73817 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 709 Credit: 549,581,661 RAC: 56,250	Message 73819 - Posted: 12 Jun 2022, 16:35:55 UTC - in response to Message 73817. I use 0.5 for gpu_usage on my three 2080's. Running doubles is faster per task than running singles. ID: 73819 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3334 Credit: 524,010,781 RAC: 962	Message 73828 - Posted: 13 Jun 2022, 10:20:11 UTC - in response to Message 57755. I have been trying to run the app_config file shown above and whenever I do the Boinc event log shows an error message saying "App file not found". Has anyone else had this problem? Yes, and haven't found the answer yet. Did try the ideas mentioned above. edit: Mine wanted <use_all_gpus>1</use_all_gpus> in the cc_config, apparently. One thing to be very careful of is to never use a word processing program to edit the file as Boinc can't read all the stuff they leave when they save a file, ONLY use a text editor, ie Notepad in Windows. Then save the file Windows in the c:\program data\Boinc\Projects\milkyway.cs.rpi.edu_milkyway or in Linux in the var\lib\Boinc\projects\milkkyway.cs.rpi.edu_milkyway folder After you save the file in the right place be sure to then go back to the Boinc Manager and click on Options, read config files so Boinc will implement the changes right away. IF you changed the number of cpu cores per Nbody task that won't take effect until the next time you get new tasks from the project. And YES since the <use_all_gpus> command is a global Boinc command for all the projects to use it must go in the cc_config.xml file in the main Boinc folder. Changing whether to run more than a single gpu task at a time or even not using all your cpu cores for the Nbody tasks is a MilkyWay only rule and therefore it goes in the MilkyWay Projects folder. ID: 73828 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,163,587 RAC: 0	Message 73867 - Posted: 19 Jun 2022, 4:08:13 UTC Last modified: 19 Jun 2022, 4:20:25 UTC Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average ID: 73867 · Rating: 0 · rate: / Reply Quote

.clair. Send message Joined: 3 Mar 13 Posts: 84 Credit: 779,527,712 RAC: 0	Message 73871 - Posted: 19 Jun 2022, 16:22:27 UTC - in response to Message 73867. Last modified: 19 Jun 2022, 16:22:57 UTC Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average Yes , you will see a good increase in throughput , BUT , keep an eye on GPU temperature , don't let them go above 70c long term 60c is better , and less is even better . ID: 73871 · Rating: 0 · rate: / Reply Quote

HRFMguy Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0	Message 73872 - Posted: 19 Jun 2022, 19:32:29 UTC - in response to Message 73867. Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average yep. here is a little something I posted earlier: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4883&postid=73049#73049 ID: 73872 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,163,587 RAC: 0	Message 73874 - Posted: 19 Jun 2022, 22:21:32 UTC - in response to Message 73872. Last modified: 19 Jun 2022, 22:50:05 UTC Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average yep. here is a little something I posted earlier: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4883&postid=73049#73049 Thanks for the guide unfortunately I couldn't get it to work. I have decided to only run 1 task at a time ID: 73874 · Rating: 0 · rate: / Reply Quote

HRFMguy Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0	Message 73875 - Posted: 20 Jun 2022, 0:23:06 UTC - in response to Message 73874. Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average yep. here is a little something I posted earlier: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4883&postid=73049#73049 Thanks for the guide unfortunately I couldn't get it to work. I have decided to only run 1 task at a time @ speedy51, OK, no problemo, as we say in Texas. I am a bit OCD about squeezing as much performance as I can out of this hardware. My GPU was running about 50% utilization at the time I started my quest. So clearly there was room for improvement there. If you are hovering around 75%, then probably not much room for improvement. My second GPU is running 95% utilization, so not much chance at all for any improvement. And testing actually showed a loss of performance at 2 tasks, so I went back to 1 task per, on that GPU. But I would encourage you to keep on trying. ID: 73875 · Rating: 0 · rate: / Reply Quote

Speedy51 Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,163,587 RAC: 0	Message 73876 - Posted: 20 Jun 2022, 0:51:07 UTC - in response to Message 73875. Last modified: 20 Jun 2022, 1:31:16 UTC So in a sense you have done exactly what I said above. Contrary to what I said I managed to get to tasks to run however I actually think it is slower because each task is taking around 3 minutes so that 6 minutes in total. However if I was to run one at a time I could complete 2 tasks in approximately just over 4 minutes so that is a speed increase running one at a time from where I am sitting. What are your thoughts? Currently running 0.5 CPU and 0.5 GPU ID: 73876 · Rating: 0 · rate: / Reply Quote