Welcome to MilkyWay@home

boic sees 2 GPU's but only uses 1

Questions and Answers : Unix/Linux : boic sees 2 GPU's but only uses 1
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Cat22
Avatar

Send message
Joined: 26 May 20
Posts: 23
Credit: 668,470,070
RAC: 18,462
Message 74217 - Posted: 21 Sep 2022, 18:54:32 UTC

I just added 2nd GPU and at boinc startup it sees both GPU's but it only uses one
# lspci|grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2488 (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2060] (rev a1

21-Sep-2022 09:49:35 [---] CUDA: NVIDIA GPU 0: NVIDIA GeForce RTX 3070 (driver version 515.65, CUDA version 11.7, compute capability 8.6, 4096MB, 3958MB available, 20314 GFLOPS peak)
21-Sep-2022 09:49:35 [---] CUDA: NVIDIA GPU 1: NVIDIA GeForce RTX 2060 (driver version 515.65, CUDA version 11.7, compute capability 7.5, 4096MB, 3970MB available, 12902 GFLOPS peak)
21-Sep-2022 09:49:35 [---] OpenCL: NVIDIA GPU 0: NVIDIA GeForce RTX 3070 (driver version 515.65.01, device version OpenCL 3.0 CUDA, 7979MB, 3958MB available, 20314 GFLOPS peak)
21-Sep-2022 09:49:35 [---] OpenCL: NVIDIA GPU 1: NVIDIA GeForce RTX 2060 (driver version 515.65.01, device version OpenCL 3.0 CUDA, 5935MB, 3970MB available, 12902 GFLOPS peak)
21-Sep-2022 09:49:35 [---] Processor: 16 GenuineIntel Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz [Family 6 Model 158 Stepping 13]
21-Sep-2022 09:49:35 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities
21-Sep-2022 09:49:35 [---] OS: Linux openSUSE: openSUSE Leap 15.3 [5.3.18-59.27-preempt]
21-Sep-2022 09:49:35 [---] Memory: 15.54 GB physical, 4.00 GB virtual
21-Sep-2022 09:49:35 [---] Config: use all coprocessors

I tried doing a project reset but t didn't help.
Then i went to milkyway "Your cmputres" and it shows this:
7.8.3 	GenuineIntel
Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz [Family 6 Model 158 Stepping 13]  (16 processors) 	
[2] NVIDIA NVIDIA GeForce RTX 3070 (4095MB) driver: 515.65 OpenCL: 3.0 	
Linux openSUSE openSUSE Leap 15.3 [5.3.18-59.27-preempt] 	21 Sep 2022, 18:29:20 UTC

I have a cc_config:
<cc_config>
    <options>
       <use_all_gpus>1</use_all_gpus>
    </options>
<!--    <log_flags>
       <coproc_debug>1</coproc_debug>
       <task_debug>1</task_debug>
    </log_flags>
-->
</cc_config>

Here is my coproc_info.xml, I wanted to add it as an attachment but i dont see how to do it.
 # cat coproc_info.xml
    <coprocs>
    <have_cuda>1</have_cuda>
    <cuda_version>11070</cuda_version>
<coproc_cuda>
   <count>1</count>
   <name>NVIDIA GeForce RTX 3070</name>
   <available_ram>4150263808.000000</available_ram>
   <have_cuda>1</have_cuda>
   <have_opencl>0</have_opencl>
   <peak_flops>20313600000000.000000</peak_flops>
   <cudaVersion>11070</cudaVersion>
   <drvVersion>51565</drvVersion>
   <totalGlobalMem>4294967295.000000</totalGlobalMem>
   <sharedMemPerBlock>49152.000000</sharedMemPerBlock>
   <regsPerBlock>65536</regsPerBlock>
   <warpSize>32</warpSize>
   <memPitch>2147483647.000000</memPitch>
   <maxThreadsPerBlock>1024</maxThreadsPerBlock>
   <maxThreadsDim>1024 1024 64</maxThreadsDim>
   <maxGridSize>2147483647 65535 65535</maxGridSize>
   <clockRate>1725000</clockRate>
   <totalConstMem>65536.000000</totalConstMem>
   <major>8</major>
   <minor>6</minor>
   <textureAlignment>512.000000</textureAlignment>
   <deviceOverlap>1</deviceOverlap>
   <multiProcessorCount>46</multiProcessorCount>
<pci_info>
   <bus_id>1</bus_id>
   <device_id>0</device_id>
   <domain_id>0</domain_id>
</pci_info>
</coproc_cuda>
<coproc_cuda>
   <count>1</count>
   <name>NVIDIA GeForce RTX 2060</name>
   <available_ram>4162846720.000000</available_ram>
   <have_cuda>1</have_cuda>
   <have_opencl>0</have_opencl>
   <peak_flops>12902400000000.000000</peak_flops>
   <cudaVersion>11070</cudaVersion>
   <drvVersion>51565</drvVersion>
   <totalGlobalMem>4294967295.000000</totalGlobalMem>
   <sharedMemPerBlock>49152.000000</sharedMemPerBlock>
   <regsPerBlock>65536</regsPerBlock>
   <warpSize>32</warpSize>
   <memPitch>2147483647.000000</memPitch>
   <maxThreadsPerBlock>1024</maxThreadsPerBlock>
   <maxThreadsDim>1024 1024 64</maxThreadsDim>
   <maxGridSize>2147483647 65535 65535</maxGridSize>
   <clockRate>1680000</clockRate>
   <totalConstMem>65536.000000</totalConstMem>
   <major>7</major>
   <minor>5</minor>
   <textureAlignment>512.000000</textureAlignment>
   <deviceOverlap>1</deviceOverlap>
   <multiProcessorCount>30</multiProcessorCount>
<pci_info>
   <bus_id>2</bus_id>
   <device_id>0</device_id>
   <domain_id>0</domain_id>
</pci_info>
</coproc_cuda>
   <nvidia_opencl>
      <name>NVIDIA GeForce RTX 3070</name>
      <vendor>NVIDIA Corporation</vendor>
      <vendor_id>4318</vendor_id>
      <available>1</available>
      <half_fp_config>0</half_fp_config>
      <single_fp_config>191</single_fp_config>
      <double_fp_config>63</double_fp_config>
      <endian_little>1</endian_little>
      <execution_capabilities>1</execution_capabilities>
      <extensions>cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd</extensions>
      <global_mem_size>8366784512</global_mem_size>
      <local_mem_size>49152</local_mem_size>
      <max_clock_frequency>1725</max_clock_frequency>
      <max_compute_units>46</max_compute_units>
      <nv_compute_capability_major>8</nv_compute_capability_major>
      <nv_compute_capability_minor>6</nv_compute_capability_minor>
      <amd_simd_per_compute_unit>0</amd_simd_per_compute_unit>
      <amd_simd_width>0</amd_simd_width>
      <amd_simd_instruction_width>0</amd_simd_instruction_width>
      <opencl_platform_version>OpenCL 3.0 CUDA 11.7.101</opencl_platform_version>
      <opencl_device_version>OpenCL 3.0 CUDA</opencl_device_version>
      <opencl_driver_version>515.65.01</opencl_driver_version>
      <device_num>0</device_num>
      <peak_flops>20313600000000.000000</peak_flops>
      <opencl_available_ram>4150263808.000000</opencl_available_ram>
      <opencl_device_index>0</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
   </nvidia_opencl>
   <nvidia_opencl>
      <name>NVIDIA GeForce RTX 2060</name>
      <vendor>NVIDIA Corporation</vendor>
      <vendor_id>4318</vendor_id>
      <available>1</available>
      <half_fp_config>0</half_fp_config>
      <single_fp_config>191</single_fp_config>
      <double_fp_config>63</double_fp_config>
      <endian_little>1</endian_little>
      <execution_capabilities>1</execution_capabilities>
      <extensions>cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info cl_khr_external_semaphore cl_khr_external_memory cl_khr_external_semaphore_opaque_fd cl_khr_external_memory_opaque_fd</extensions>
      <global_mem_size>6222970880</global_mem_size>
      <local_mem_size>49152</local_mem_size>
      <max_clock_frequency>1680</max_clock_frequency>
      <max_compute_units>30</max_compute_units>
      <nv_compute_capability_major>7</nv_compute_capability_major>
      <nv_compute_capability_minor>5</nv_compute_capability_minor>
      <amd_simd_per_compute_unit>0</amd_simd_per_compute_unit>
      <amd_simd_width>0</amd_simd_width>
      <amd_simd_instruction_width>0</amd_simd_instruction_width>
      <opencl_platform_version>OpenCL 3.0 CUDA 11.7.101</opencl_platform_version>
      <opencl_device_version>OpenCL 3.0 CUDA</opencl_device_version>
      <opencl_driver_version>515.65.01</opencl_driver_version>
      <device_num>1</device_num>
      <peak_flops>12902400000000.000000</peak_flops>
      <opencl_available_ram>4162846720.000000</opencl_available_ram>
      <opencl_device_index>1</opencl_device_index>
      <warn_bad_cuda>0</warn_bad_cuda>
   </nvidia_opencl>
<warning>NVIDIA library reports 2 GPUs</warning>
<warning>ATI: libaticalrt.so: cannot open shared object file: No such file or directory</warning>
    </coprocs>

I have plenty of GPU tasks so that shouldn't be the issue
I tried swapping slots on the motherboard and it just does the same thing but whatever is in the first slot is what gets used, the GPU in the next slot is ignored even tho boinc sees it n startup.
What am I missing?
ID: 74217 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 1
Message 74227 - Posted: 21 Sep 2022, 21:00:48 UTC - in response to Message 74217.  

I have used this in the past with good luck. It yields 3 parallel instances running on AMD GPU, and 1 instance on the NVIDIA GPU.

<app_config>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_ati_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>0.333</ngpus>
</app_version>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_nvidia_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
<!--Your comment-->
</app_config>
ID: 74227 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cat22
Avatar

Send message
Joined: 26 May 20
Posts: 23
Credit: 668,470,070
RAC: 18,462
Message 74230 - Posted: 22 Sep 2022, 0:24:41 UTC - in response to Message 74227.  

This look about right?
<app_config>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_nvidia_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>0.5</ngpus>
</app_version>
<!--Your comment-->
</app_config>
ID: 74230 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cat22
Avatar

Send message
Joined: 26 May 20
Posts: 23
Credit: 668,470,070
RAC: 18,462
Message 74231 - Posted: 22 Sep 2022, 0:40:16 UTC - in response to Message 74230.  

just want to confirm - you are on Linux correct? not windows or Mac.
ID: 74231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 1
Message 74232 - Posted: 22 Sep 2022, 2:54:27 UTC - in response to Message 74230.  

I am no xml expert, but I think that gets you 2 instances running on one GPU, but does not address the second GPU at all.
ID: 74232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 1
Message 74233 - Posted: 22 Sep 2022, 2:55:19 UTC - in response to Message 74231.  

nope. windows 10 pro.
ID: 74233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cat22
Avatar

Send message
Joined: 26 May 20
Posts: 23
Credit: 668,470,070
RAC: 18,462
Message 74238 - Posted: 22 Sep 2022, 9:22:13 UTC - in response to Message 74232.  

I am no xml expert, but I think that gets you 2 instances running on one GPU, but does not address the second GPU at all.

So, how should it be done?
ID: 74238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3333
Credit: 524,010,781
RAC: 3,159
Message 74241 - Posted: 22 Sep 2022, 11:31:25 UTC - in response to Message 74238.  

I am no xml expert, but I think that gets you 2 instances running on one GPU, but does not address the second GPU at all.

So, how should it be done?


I believe this line makes it run 2 tasks at the same time <ngpus>0.5</ngpus>

Changing that line to 1.0 instead of 0.5 would make it run 1 task at a time, while changing it to 0.33 would make it run 3 tasks at the same time assuming the gpu can handle it
ID: 74241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 1
Message 74242 - Posted: 22 Sep 2022, 16:11:59 UTC - in response to Message 74238.  

I am no xml expert, but I think that gets you 2 instances running on one GPU, but does not address the second GPU at all.

So, how should it be done?
Within the app_config file, put in two app version sections, like I did below. Mine just happened to be 1 each of AMD and NVIDIA, but yours will be two NVIDIA. Try this and see if it works. Of course you will need to adjust ncpus and ngpus for each app_version to suit your particular situation. Just for grins, start with 1 each cpu and 1 each gpu for each app_version. If it works, then fine tune from there. Good luck!

<app_config>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_ati_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>0.333</ngpus>
</app_version>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_ati_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
<!--Your comment-->
</app_config>
ID: 74242 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 44
Credit: 225,677,188
RAC: 17,614
Message 74251 - Posted: 23 Sep 2022, 12:09:38 UTC

It seems like you have an older version of BOINC, try updating to the latest one, and perhaps also check that GPU drivers are up to date. Try shutting down BOINC and deleting coproc_info.xml and have BOINC recreate it upon restart. It seems to be reporting your GPUs to the website incorrectly. The explanation and format of cc_config.xml and app_config.xml are here: https://boinc.berkeley.edu/wiki/Client_configuration. They're the same for any OS. I'm assuming the files are in the correct places and when you make changes to them you restart BOINC. It seems like to use multiple GPUs you only need the use_all_gpus flag in cc_config. I believe app_config only controls how many tasks to run concurrently of a given app (in general) or app version (more specifically), I don't think it controls the number of GPUs to use. Also, I think that if you have multiple copies of app_version for the same app_name and plan_class (in app_config) the last one will just override the first one. I don't think you can distinguish different GPUs of the same plan_class without going the complicated route of Anonymous Platform setup https://boinc.berkeley.edu/wiki/Anonymous_platform. This would also be the last resort option to try for multiple GPU usage.
ID: 74251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 709
Credit: 548,896,591
RAC: 58,447
Message 74264 - Posted: 24 Sep 2022, 2:39:06 UTC
Last modified: 24 Sep 2022, 2:41:17 UTC

With dissimilar compute capability, BOINC will only use the most capable card, being the 3070. So you have to tell it to use the 2060 also with a <use_all_gpus>1</use_all_gpus> statement in the cc_config.xml file.

You seem to have done that but it doesn't look like it is getting picked up correctly. The coproc_info.xml file picks up both cards correctly.

The problem likely is the cc_config.xml file has extraneous characters in it that are preventing it from being read or the filename is incorrect. This is common because the file was edited in Windows and the file editor appends a .txt to the xml filename or the editor saved the file and was not set to save the file as <all files> type. Windows file editors often use both a <CR> <LF> and <EOF> tag in the file.

Make sure to save the file with ANSI and UTF-8 encoding, if not that will cause BOINC to skip over reading the file correctly.

Recreate the cc_config.xml file and put it in the BOINC folder and make sure the filename is correct.

If the file is correct, it will be shown as being read in the BOINC Event Log startup.
ID: 74264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 1
Message 74276 - Posted: 24 Sep 2022, 18:59:58 UTC

@cat22

any luck?
ID: 74276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cat22
Avatar

Send message
Joined: 26 May 20
Posts: 23
Credit: 668,470,070
RAC: 18,462
Message 74381 - Posted: 8 Oct 2022, 8:30:01 UTC

Well, I installed boinc version 7.18.1 and now it only sees 1 GPU.
Is there some conflict between a RTX 3070 and an RTX 2060 that is causing this??
Do i need to build my own copy from source?
I did try building from source but cant get it to stop complaining there are no widgets (I'm on Linux)
and I sure wish i knew what widgets it lacked.
Seems like configure should tell us what is missing specifically not just "oh you dont have any widgets"
What widgets? what version?
so I'm pretty well busted on building from source
I have 2 other systems that are running dual GPU's and they work fine
1 has an RTX 2060 and a GTX 1660 ti and he other system is running 2 RTX 1660 ti's
Neither one has a problem with that config.
I sure would appreciate some guidance.
TIA
ID: 74381 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 709
Credit: 548,896,591
RAC: 58,447
Message 74398 - Posted: 10 Oct 2022, 7:09:50 UTC - in response to Message 74381.  
Last modified: 10 Oct 2022, 7:11:33 UTC

No, there is no issue running both the 3070 and 2060 together.
You really should use a different and newer BOINC branch to build with. The 7.18.1 branch was only intended for Android clients. Use the latest 7.20.2 branch tag.

The widget it is complaining about is the wxwidgets libraries which are only necessary for building the Manager. The client does not need them to be built and is the only part you should be building.

You need to adjust your build command to only build the client.

Review the build process:
https://boinc.berkeley.edu/trac/wiki/SoftwarePrereqsUnix
https://boinc.berkeley.edu/trac/wiki/BuildSystem

You only really need this:

./_autosetup -f
./configure --enable-client --disable-manager


But building the client is not necessary just to get two gpus running. You have something else weird going on.
ID: 74398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : boic sees 2 GPU's but only uses 1

©2024 Astroinformatics Group