Questions and Answers :
Windows :
Milkyway@Home Uses Only One of Three GPUs
Message board moderation
Author | Message |
---|---|
Frank Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
I am relatively new to Milkyway@Home. I have installed the software on about 6 computers. Immediately after install everything runs well, all CPUs and all 3 GPUs. On some computers when the first wave of tasks are complete only one GPU receives a new task. On other computers all three GPUs continue to run new tasks. My computers are running Windows 10 Pro 1632 or Windows 10 Pro 1909. The processors are AMD 8320, 8350 and 6300. The GPUs are NVIDIA GTX 980 ti, GTX 1070 ti and RTX 2070, all using driver 1432.00. I need some help in solving this problem. Please. I have been BOINCing for as long there has been a BOINC; over in SETI@Home I was in the top 20 participants. |
![]() ![]() Send message Joined: 8 May 09 Posts: 3107 Credit: 518,141,787 RAC: 24,791 ![]() ![]() ![]() |
I am relatively new to Milkyway@Home. I have installed the software on about 6 computers. Immediately after install everything runs well, all CPUs and all 3 GPUs. On some computers when the first wave of tasks are complete only one GPU receives a new task. On other computers all three GPUs continue to run new tasks. You need to add a cc_config.xml file to Boinc to tell it to you use all the gpu's in the machine for those it isn't already doing that: <cc_config> <options> <use_all_gpus>1</use_all_gpus> </options> </cc_config> In Windows use Notepad to make the file and be sure when you save it that it does not tack on the '.txt' file extension, save the file in the hidden directory c:\program data\boinc You will have to either tell Boinc to read the config files or stop and restart Boinc to make it read the file, telling it to read is done thru the Boinc Manager, down by the clock, and clicking on options, read config files. If that doesn't work you may have to load the drivers again, once for each gpu, normally Windows 10 picks up them all but not always. |
Frank Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
Thank you for the response. I use the cc_config use all gpus on all my computers. I checked and it is there in ProgramData file. I tried updating the NVIDIA drivers and it didn't help. So, I.m still chasing my problem. |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,440,626,564 RAC: 4,038,492 ![]() ![]() |
Thank you for the response. I use the cc_config use all gpus on all my computers. I checked and it is there in ProgramData file. I tried updating the NVIDIA drivers and it didn't help. So, I.m still chasing my problem. Possibly the problem is how the server fills the queue. When the last tasks completes there is usually a 10 minute wait before more tasks download. This is a well known problem with 2 solutions as discussed here https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4532 However, your problem seems different. I assume you checked the server status and there are jobs? |
![]() ![]() Send message Joined: 8 May 09 Posts: 3107 Credit: 518,141,787 RAC: 24,791 ![]() ![]() ![]() |
Thank you for the response. I use the cc_config use all gpus on all my computers. I checked and it is there in ProgramData file. I tried updating the NVIDIA drivers and it didn't help. So, I.m still chasing my problem. The file should be placed in the Boinc folder under the Program Data folder. When you first start Boinc open the Event Log and look near the top of it and you should see all 3 gpu's listed, if not then Boinc isn't seeing your gpu's, until that happens Boinc won't even try to use them. As for driver the latest one are NOT Boinc friendly at every Project, go back to a pre 400 version driver. 390 works for me. Nvidia had a problem with one of the 4?? series and fixed it then rolled out the 445 series WITHOUT the fix. Unless you are a gamer newer is not always better for Boinc. Also if you have any SLI cables on your cards you can remove them too unless you game, Boinc is not setup to use multiple gpu's on one task. |
![]() ![]() Send message Joined: 24 Jan 11 Posts: 676 Credit: 533,114,836 RAC: 228,570 ![]() ![]() ![]() |
Post the first 30 lines of your BOINC startup from the Event Log. We need to see if BOINC even sees your gpus. If you don't have the cards detected with OpenCL drivers, then you won't get any gpu tasks. The startup should have lines similar to this: 31-Mar-2020 14:44:58 [---] Data directory: /home/keith/Desktop/BOINC 31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 0: GeForce RTX 2080 (driver version 440.64, CUDA version 10.2, compute capability 7.5, 7982MB, 7742MB available, 10598 GFLOPS peak) 31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 1: GeForce RTX 2080 (driver version 440.64, CUDA version 10.2, compute capability 7.5, 7979MB, 7473MB available, 10598 GFLOPS peak) 31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 2: GeForce GTX 1080 (driver version 440.64, CUDA version 10.2, compute capability 6.1, 8120MB, 7891MB available, 9523 GFLOPS peak) 31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 0: GeForce RTX 2080 (driver version 440.64, device version OpenCL 1.2 CUDA, 7982MB, 7742MB available, 10598 GFLOPS peak) 31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 1: GeForce RTX 2080 (driver version 440.64, device version OpenCL 1.2 CUDA, 7979MB, 7473MB available, 10598 GFLOPS peak) 31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 2: GeForce GTX 1080 (driver version 440.64, device version OpenCL 1.2 CUDA, 8120MB, 7891MB available, 9523 GFLOPS peak) ![]() |
Frank Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
Joseph, Thank you for your response to my problem. My work queues all seem to have plenty of tasks (like hundreds for GPUs). Over the past few days, I have two computers that were running one GPU task suddenly deciding to run three GPUs. Two days later they went back to using only one while their queues had plenty of GPU tasks. I didn't make any changes to settings, their switching was totally unexpected. I am still baffled. [/quote] |
Frank Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
Thank you for the response. I use the cc_config use all gpus on all my computers. I checked and it is there in ProgramData file. I tried updating the NVIDIA drivers and it didn't help. So, I.m still chasing my problem. Mikey, cc_config is where it belongs As for log: 4/12/2020 09:08:29 AM | | Running under account FrankMeade 4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak) 4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak) 4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 2: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak) 4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak) 4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak) 4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 2: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak) 4/12/2020 09:08:31 AM | | Host name: CENTER-1 4/12/2020 09:08:31 AM | | Processor: 6 AuthenticAMD AMD FX(tm)-6300 Six-Core Processor [Family 21 Model 2 Stepping 0] 4/12/2020 09:08:31 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt fma4 tce tbm topx page1gb rdtscp bmi1 4/12/2020 09:08:31 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18363.00) 4/12/2020 09:08:31 AM | | Memory: 7.90 GB physical, 14.15 GB virtual 4/12/2020 09:08:31 AM | | Disk: 297.49 GB total, 249.18 GB free 4/12/2020 09:08:31 AM | | Local time is UTC -7 hours 4/12/2020 09:08:31 AM | | No WSL found. 4/12/2020 09:08:31 AM | | Config: use all coprocessors As shown the GPU Driver is 432.00 which is below 441.21 which is where NVIDIA screwed the pooch. With version 18362 of Windows 10 Pro Microsoft took control of Display Drivers and 432.00 is what Microsoft provides. It seems to work well. I am not a "gamer" so there are no SLI set ups. Do you have any further ideas? Frank |
Frank Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
Post the first 30 lines of your BOINC startup from the Event Log. We need to see if BOINC even sees your gpus. If you don't have the cards detected with OpenCL drivers, then you won't get any gpu tasks. The startup should have lines similar to this:
|
![]() ![]() Send message Joined: 24 Jan 11 Posts: 676 Credit: 533,114,836 RAC: 228,570 ![]() ![]() ![]() |
Strange problem. I can only think you are running a lot of cpu tasks and there isn't enough spare cpu thread resources to support all the running gpu tasks on all the cards. Or that you have a very restrictive memory limit limit in place, but I would have expected to see "not enough memory" messages in the log when a gpu task tries to run. ![]() |
Frank Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
A while ago I reported that MilkyWay was using only 1 of 3 available GPUs on several of my computers. I was baffled so I asked for help. Very quickly I had a number of respondents that provided helpful information. Unfortunately, after checking on configurations and settings as suggested, I still had the problem. It had become a major mystery. Well, the mystery is solved. I know what was preventing my Idle GPUs from getting work. It all boils down to CPU load. MW captures all the CPU power available for its CPU Tasks. If MV tries to start a GPU Task it will normally fit within the CPU domain using the small chunks of time not committed to the running CPU. When MV tries to start a second and third GPU Tasks MV will stop a couple of CPU Tasks to make room for the GPUs. Hooray. About 6 CPU tasks are running along with 3 GPU Tasks - I'm a happy camper but it is a house of cards. If the six CPU tasks happen to be individual Separation tasks and an Nbody task comes looking for a home MV will kick all the separation tasks into the waiting to run queue. I guess those tasks in the waiting to run queue count against available CPU time. MV will not tolerate more than 100% of CPU time being used so MW waits for the GPU tasks to complete the won't allow any to start. You are down to 6 Processors and 1 GPU. I know because I've bee there and done that. To prevent this from happening to me and by guys (computers) I just went to the Milkyway@Home Preferences and selected only one of the CPU Task types (Nbody or Separation) to be allowed. They don't seem to play well together. Now I have all five of the computers I have committed to WilkyWay running 3 PUs each. So, thanks to Mikey and Joseph Stateson for the help and guiding nudges. Without them I would still be wandering in the Never-Never. |
![]() ![]() Send message Joined: 24 Jan 11 Posts: 676 Credit: 533,114,836 RAC: 228,570 ![]() ![]() ![]() |
When trying to control application resource usage, you can resort to individual restrictions in an app_config file for the project. For example you could limit the number of cores that an MT task is allowed to commandeer so that the regular cpu tasks still get resources if you reduce the max_concurrent for both cpu apps and save enough cpu resources for all the gpu tasks to run. https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration ![]() |
Frank Send message Joined: 2 Nov 10 Posts: 25 Credit: 1,894,269,109 RAC: 0 ![]() ![]() |
Keith, I plumb forgot to credit you for your assistance during my journey through the dark ages. You provided valuable insight during the journey and continue to do so now. I'll research your current suggestion and see if I can improve the stability of my setup. Thanks again! |
![]() ![]() Send message Joined: 24 Jan 11 Posts: 676 Credit: 533,114,836 RAC: 228,570 ![]() ![]() ![]() |
For the MT or multi-threaded cpu tasks, you can use the nthreads limit in the app_config for the app to control how many threads the task is allowed to use. The example is in the reference docs previously linked. [<app_version> <app_name>Application_Name</app_name> [<plan_class>mt</plan_class>] [<avg_ncpus>x</avg_ncpus>] [<ngpus>x</ngpus>] [<cmdline>--nthreads 7</cmdline>] </app_version>] ![]() |
©2023 Astroinformatics Group