Welcome to MilkyWay@home

Milkyway@Home Uses Only One of Three GPUs

Questions and Answers : Windows : Milkyway@Home Uses Only One of Three GPUs
Message board moderation

To post messages, you must log in.

AuthorMessage
Frank

Send message
Joined: 2 Nov 10
Posts: 25
Credit: 1,894,269,109
RAC: 0
Message 69633 - Posted: 31 Mar 2020, 23:40:26 UTC

I am relatively new to Milkyway@Home. I have installed the software on about 6 computers. Immediately after install everything runs well, all CPUs and all 3 GPUs. On some computers when the first wave of tasks are complete only one GPU receives a new task. On other computers all three GPUs continue to run new tasks.
My computers are running Windows 10 Pro 1632 or Windows 10 Pro 1909. The processors are AMD 8320, 8350 and 6300. The GPUs are NVIDIA GTX 980 ti, GTX 1070 ti and RTX 2070, all using driver 1432.00.
I need some help in solving this problem. Please. I have been BOINCing for as long there has been a BOINC; over in SETI@Home I was in the top 20 participants.
ID: 69633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,939,976
RAC: 22,667
Message 69643 - Posted: 3 Apr 2020, 10:27:35 UTC - in response to Message 69633.  

I am relatively new to Milkyway@Home. I have installed the software on about 6 computers. Immediately after install everything runs well, all CPUs and all 3 GPUs. On some computers when the first wave of tasks are complete only one GPU receives a new task. On other computers all three GPUs continue to run new tasks.
My computers are running Windows 10 Pro 1632 or Windows 10 Pro 1909. The processors are AMD 8320, 8350 and 6300. The GPUs are NVIDIA GTX 980 ti, GTX 1070 ti and RTX 2070, all using driver 1432.00.
I need some help in solving this problem. Please. I have been BOINCing for as long there has been a BOINC; over in SETI@Home I was in the top 20 participants.


You need to add a cc_config.xml file to Boinc to tell it to you use all the gpu's in the machine for those it isn't already doing that:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

In Windows use Notepad to make the file and be sure when you save it that it does not tack on the '.txt' file extension, save the file in the hidden directory c:\program data\boinc
You will have to either tell Boinc to read the config files or stop and restart Boinc to make it read the file, telling it to read is done thru the Boinc Manager, down by the clock, and clicking on
options, read config files.

If that doesn't work you may have to load the drivers again, once for each gpu, normally Windows 10 picks up them all but not always.
ID: 69643 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank

Send message
Joined: 2 Nov 10
Posts: 25
Credit: 1,894,269,109
RAC: 0
Message 69650 - Posted: 5 Apr 2020, 14:58:39 UTC - in response to Message 69643.  

Thank you for the response. I use the cc_config use all gpus on all my computers. I checked and it is there in ProgramData file. I tried updating the NVIDIA drivers and it didn't help. So, I.m still chasing my problem.
ID: 69650 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 69651 - Posted: 5 Apr 2020, 16:19:27 UTC - in response to Message 69650.  

Thank you for the response. I use the cc_config use all gpus on all my computers. I checked and it is there in ProgramData file. I tried updating the NVIDIA drivers and it didn't help. So, I.m still chasing my problem.


Possibly the problem is how the server fills the queue. When the last tasks completes there is usually a 10 minute wait before more tasks download. This is a well known problem with 2 solutions as discussed here https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4532
However, your problem seems different. I assume you checked the server status and there are jobs?
ID: 69651 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,939,976
RAC: 22,667
Message 69654 - Posted: 6 Apr 2020, 11:28:42 UTC - in response to Message 69650.  
Last modified: 6 Apr 2020, 11:34:45 UTC

Thank you for the response. I use the cc_config use all gpus on all my computers. I checked and it is there in ProgramData file. I tried updating the NVIDIA drivers and it didn't help. So, I.m still chasing my problem.


The file should be placed in the Boinc folder under the Program Data folder.

When you first start Boinc open the Event Log and look near the top of it and you should see all 3 gpu's listed, if not then Boinc isn't seeing your gpu's, until that happens Boinc won't even try to use them.

As for driver the latest one are NOT Boinc friendly at every Project, go back to a pre 400 version driver. 390 works for me. Nvidia had a problem with one of the 4?? series and fixed it then rolled out the 445 series WITHOUT the fix. Unless you are a gamer newer is not always better for Boinc.

Also if you have any SLI cables on your cards you can remove them too unless you game, Boinc is not setup to use multiple gpu's on one task.
ID: 69654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 539,995,222
RAC: 86,890
Message 69656 - Posted: 6 Apr 2020, 18:51:21 UTC

Post the first 30 lines of your BOINC startup from the Event Log. We need to see if BOINC even sees your gpus. If you don't have the cards detected with OpenCL drivers, then you won't get any gpu tasks. The startup should have lines similar to this:

31-Mar-2020 14:44:58 [---] Data directory: /home/keith/Desktop/BOINC
31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 0: GeForce RTX 2080 (driver version 440.64, CUDA version 10.2, compute capability 7.5, 7982MB, 7742MB available, 10598 GFLOPS peak)
31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 1: GeForce RTX 2080 (driver version 440.64, CUDA version 10.2, compute capability 7.5, 7979MB, 7473MB available, 10598 GFLOPS peak)
31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 2: GeForce GTX 1080 (driver version 440.64, CUDA version 10.2, compute capability 6.1, 8120MB, 7891MB available, 9523 GFLOPS peak)
31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 0: GeForce RTX 2080 (driver version 440.64, device version OpenCL 1.2 CUDA, 7982MB, 7742MB available, 10598 GFLOPS peak)
31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 1: GeForce RTX 2080 (driver version 440.64, device version OpenCL 1.2 CUDA, 7979MB, 7473MB available, 10598 GFLOPS peak)
31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 2: GeForce GTX 1080 (driver version 440.64, device version OpenCL 1.2 CUDA, 8120MB, 7891MB available, 9523 GFLOPS peak)
ID: 69656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank

Send message
Joined: 2 Nov 10
Posts: 25
Credit: 1,894,269,109
RAC: 0
Message 69690 - Posted: 12 Apr 2020, 15:42:08 UTC - in response to Message 69651.  

Joseph, Thank you for your response to my problem.

My work queues all seem to have plenty of tasks (like hundreds for GPUs). Over the past few days, I have two computers that were running one GPU task suddenly deciding to run three GPUs. Two days later they went back to using only one while their queues had plenty of GPU tasks. I didn't make any changes to settings, their switching was totally unexpected. I am still baffled.


[/quote]
ID: 69690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank

Send message
Joined: 2 Nov 10
Posts: 25
Credit: 1,894,269,109
RAC: 0
Message 69691 - Posted: 12 Apr 2020, 17:09:47 UTC - in response to Message 69654.  

Thank you for the response. I use the cc_config use all gpus on all my computers. I checked and it is there in ProgramData file. I tried updating the NVIDIA drivers and it didn't help. So, I.m still chasing my problem.


The file should be placed in the Boinc folder under the Program Data folder.

When you first start Boinc open the Event Log and look near the top of it and you should see all 3 gpu's listed, if not then Boinc isn't seeing your gpu's, until that happens Boinc won't even try to use them.

As for driver the latest one are NOT Boinc friendly at every Project, go back to a pre 400 version driver. 390 works for me. Nvidia had a problem with one of the 4?? series and fixed it then rolled out the 445 series WITHOUT the fix. Unless you are a gamer newer is not always better for Boinc.

Also if you have any SLI cables on your cards you can remove them too unless you game, Boinc is not setup to use multiple gpu's on one task.


Mikey,
cc_config is where it belongs
As for log:
4/12/2020 09:08:29 AM | | Running under account FrankMeade
4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 2: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 2: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:31 AM | | Host name: CENTER-1
4/12/2020 09:08:31 AM | | Processor: 6 AuthenticAMD AMD FX(tm)-6300 Six-Core Processor [Family 21 Model 2 Stepping 0]
4/12/2020 09:08:31 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt fma4 tce tbm topx page1gb rdtscp bmi1
4/12/2020 09:08:31 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18363.00)
4/12/2020 09:08:31 AM | | Memory: 7.90 GB physical, 14.15 GB virtual
4/12/2020 09:08:31 AM | | Disk: 297.49 GB total, 249.18 GB free
4/12/2020 09:08:31 AM | | Local time is UTC -7 hours
4/12/2020 09:08:31 AM | | No WSL found.
4/12/2020 09:08:31 AM | | Config: use all coprocessors

As shown the GPU Driver is 432.00 which is below 441.21 which is where NVIDIA screwed the pooch. With version 18362 of Windows 10 Pro Microsoft took control of Display Drivers and 432.00 is what Microsoft provides. It seems to work well.
I am not a "gamer" so there are no SLI set ups.
Do you have any further ideas?
Frank
ID: 69691 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank

Send message
Joined: 2 Nov 10
Posts: 25
Credit: 1,894,269,109
RAC: 0
Message 69692 - Posted: 12 Apr 2020, 18:14:58 UTC - in response to Message 69656.  

Post the first 30 lines of your BOINC startup from the Event Log. We need to see if BOINC even sees your gpus. If you don't have the cards detected with OpenCL drivers, then you won't get any gpu tasks. The startup should have lines similar to this:

31-Mar-2020 14:44:58 [---] Data directory: /home/keith/Desktop/BOINC
31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 0: GeForce RTX 2080 (driver version 440.64, CUDA version 10.2, compute capability 7.5, 7982MB, 7742MB available, 10598 GFLOPS peak)
31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 1: GeForce RTX 2080 (driver version 440.64, CUDA version 10.2, compute capability 7.5, 7979MB, 7473MB available, 10598 GFLOPS peak)
31-Mar-2020 14:44:59 [---] CUDA: NVIDIA GPU 2: GeForce GTX 1080 (driver version 440.64, CUDA version 10.2, compute capability 6.1, 8120MB, 7891MB available, 9523 GFLOPS peak)
31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 0: GeForce RTX 2080 (driver version 440.64, device version OpenCL 1.2 CUDA, 7982MB, 7742MB available, 10598 GFLOPS peak)
31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 1: GeForce RTX 2080 (driver version 440.64, device version OpenCL 1.2 CUDA, 7979MB, 7473MB available, 10598 GFLOPS peak)
31-Mar-2020 14:44:59 [---] OpenCL: NVIDIA GPU 2: GeForce GTX 1080 (driver version 440.64, device version OpenCL 1.2 CUDA, 8120MB, 7891MB available, 9523 GFLOPS peak)



Kieth Meyers!
We meet again. The King of Ubuntu and the possessor of fine hardware.

My log from a only running one GPU computer says:
4/12/2020 09:08:29 AM | | Running under account FrankMeade
4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 0: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | CUDA: NVIDIA GPU 2: GeForce GTX 1070 Ti (driver version 432.00, CUDA version 10.1, compute capability 6.1, 4096MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 0: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:30 AM | | OpenCL: NVIDIA GPU 2: GeForce GTX 1070 Ti (driver version 432.00, device version OpenCL 1.2 CUDA, 8192MB, 3554MB available, 8186 GFLOPS peak)
4/12/2020 09:08:31 AM | | Host name: CENTER-1
4/12/2020 09:08:31 AM | | Processor: 6 AuthenticAMD AMD FX(tm)-6300 Six-Core Processor [Family 21 Model 2 Stepping 0]
4/12/2020 09:08:31 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni ssse3 fma cx16 sse4_1 sse4_2 popcnt aes f16c syscall nx lm avx svm sse4a osvw ibs xop skinit wdt fma4 tce tbm topx page1gb rdtscp bmi1
4/12/2020 09:08:31 AM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18363.00)
4/12/2020 09:08:31 AM | | Memory: 7.90 GB physical, 14.15 GB virtual
4/12/2020 09:08:31 AM | | Disk: 297.49 GB total, 249.18 GB free
4/12/2020 09:08:31 AM | | Local time is UTC -7 hours
4/12/2020 09:08:31 AM | | No WSL found.
4/12/2020 09:08:31 AM | | Config: use all coprocessors
Not much to see, except that there is a lot of computing power going to waste.

My problem is weird. I have five computers running Milkyway and only one is using all 3 GPUs. The other 4 computers run only one GPU (including the computer that provided the log data above). Every once in awhile one of the lazy guys will starting running all three GPUs and will do so for a day or two before reverting to one GPU. It's weird.
I'm wondering if my Internet connection could be at fault. It has been flakey for about a week. I'll get that checked out tomorrow.
Frank
ID: 69692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 539,995,222
RAC: 86,890
Message 69693 - Posted: 12 Apr 2020, 18:54:15 UTC

Strange problem. I can only think you are running a lot of cpu tasks and there isn't enough spare cpu thread resources to support all the running gpu tasks on all the cards.

Or that you have a very restrictive memory limit limit in place, but I would have expected to see "not enough memory" messages in the log when a gpu task tries to run.
ID: 69693 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank

Send message
Joined: 2 Nov 10
Posts: 25
Credit: 1,894,269,109
RAC: 0
Message 69723 - Posted: 16 Apr 2020, 20:13:19 UTC
Last modified: 16 Apr 2020, 20:19:19 UTC

A while ago I reported that MilkyWay was using only 1 of 3 available GPUs on several of my computers. I was baffled so I asked for help. Very quickly I had a number of respondents that provided helpful information. Unfortunately, after checking on configurations and settings as suggested, I still had the problem. It had become a major mystery.
Well, the mystery is solved. I know what was preventing my Idle GPUs from getting work. It all boils down to CPU load. MW captures all the CPU power available for its CPU Tasks. If MV tries to start a GPU Task it will normally fit within the CPU domain using the small chunks of time not committed to the running CPU. When MV tries to start a second and third GPU Tasks MV will stop a couple of CPU Tasks to make room for the GPUs. Hooray. About 6 CPU tasks are running along with 3 GPU Tasks - I'm a happy camper but it is a house of cards.
If the six CPU tasks happen to be individual Separation tasks and an Nbody task comes looking for a home MV will kick all the separation tasks into the waiting to run queue. I guess those tasks in the waiting to run queue count against available CPU time. MV will not tolerate more than 100% of CPU time being used so MW waits for the GPU tasks to complete the won't allow any to start. You are down to 6 Processors and 1 GPU. I know because I've bee there and done that.
To prevent this from happening to me and by guys (computers) I just went to the Milkyway@Home Preferences and selected only one of the CPU Task types (Nbody or Separation) to be allowed. They don't seem to play well together. Now I have all five of the computers I have committed to WilkyWay running 3 PUs each.
So, thanks to Mikey and Joseph Stateson for the help and guiding nudges. Without them I would still be wandering in the Never-Never.
ID: 69723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 539,995,222
RAC: 86,890
Message 69724 - Posted: 16 Apr 2020, 20:26:47 UTC - in response to Message 69723.  

When trying to control application resource usage, you can resort to individual restrictions in an app_config file for the project.

For example you could limit the number of cores that an MT task is allowed to commandeer so that the regular cpu tasks still get resources if you reduce the max_concurrent for both cpu apps and save enough cpu resources for all the gpu tasks to run.

https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
ID: 69724 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank

Send message
Joined: 2 Nov 10
Posts: 25
Credit: 1,894,269,109
RAC: 0
Message 69725 - Posted: 17 Apr 2020, 14:51:25 UTC - in response to Message 69724.  

Keith,
I plumb forgot to credit you for your assistance during my journey through the dark ages. You provided valuable insight during the journey and continue to do so now. I'll research your current suggestion and see if I can improve the stability of my setup. Thanks again!
ID: 69725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 539,995,222
RAC: 86,890
Message 69728 - Posted: 17 Apr 2020, 21:29:18 UTC - in response to Message 69725.  

For the MT or multi-threaded cpu tasks, you can use the nthreads limit in the app_config for the app to control how many threads the task is allowed to use. The example is in the reference docs previously linked.
  [<app_version>
       <app_name>Application_Name</app_name>
       [<plan_class>mt</plan_class>]
       [<avg_ncpus>x</avg_ncpus>]
       [<ngpus>x</ngpus>]
       [<cmdline>--nthreads 7</cmdline>]
   </app_version>]

ID: 69728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Windows : Milkyway@Home Uses Only One of Three GPUs

©2024 Astroinformatics Group