1)
Questions and Answers :
Unix/Linux :
Nvidia Quadro K6000 Low Utilization - Slow Processing
(Message 69313)
Posted 26 Nov 2019 by Samyah93 Post: Hello, I've tried swapping the two cards in my workstation, and it is still the K6000 that shows much lower utilization. The stderr file for the K6000 also still shows fewer blocks/chunk (14 vs. 146 on Tesla K40) and more chunks (10 vs. 1 on Tesla K40): <search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application> BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' Setting process priority to 0 (13): Permission denied Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using AVX path Found 1 platform Platform 0 information: Name: NVIDIA CUDA Version: OpenCL 1.2 CUDA 10.2.95 Vendor: NVIDIA Corporation Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addr$ Profile: FULL_PROFILE Using device 0 on platform 0 Found 2 CL devices Device 'Quadro K6000' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU) Board: Driver version: 440.33.01 Version: OpenCL 1.2 CUDA Compute capability: 3.5 Max compute units: 15 Clock frequency: 901 Mhz Global mem size: 11988566016 Local mem size: 49152 Max const buf size: 65536 Double extension: cl_khr_fp64 Build log: -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Build log: -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Estimated Nvidia GPU GFLOP/s: 865 SP GFLOP/s, 108 DP FLOP/s Using a target frequency of 60.0 Using a block size of 3840 with 14 blocks/chunk Using clWaitForEvents() for polling with initial wait of 13 ms (mode 0) Range: { nu_steps = 320, mu_steps = 800, r_steps = 700 } Iteration area: 560000 Chunk estimate: 10 Num chunks: 11 Chunk size: 53760 Added area: 31360 Effective area: 591360 Initial wait: 13 ms Integration time: 43.993598 s. Average time per iteration = 137.479993 ms Integral 0 time = 44.114030 s Running likelihood with 10 stars Likelihood time = 0.000139 s <background_integral> 0.000023991637849 </background_integral> <stream_integral> 0.013916549538950 0.081085356418659 0.551135322056047 </stream_integral> <background_likelihood> -3.108318245877400 </background_likelihood> <stream_only_likelihood> -217.020095058481104 -169.073239934746482 -169.106244918774621 </stream_only_likelihood> <search_likelihood> -1.633680829801995 </search_likelihood> I also tried updating the drivers. The updated output for nvidia-smi is: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Quadro K6000 On | 00000000:02:00.0 Off | 0 | | 35% 58C P0 74W / 225W | 279MiB / 11433MiB | 15% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K40c On | 00000000:03:00.0 Off | 0 | | 42% 71C P0 168W / 235W | 180MiB / 11441MiB | 100% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1905 G /usr/lib/xorg/Xorg 24MiB | | 0 5520 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 241MiB | | 1 5683 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 169MiB | +-----------------------------------------------------------------------------+ I also used lspci to get information about the bus speed, and both cards are at PCIe 3.0 x16. The output also shows that the Tesla K40 is in fact a GK110 (derivative) card: 02:00.0 VGA compatible controller: NVIDIA Corporation GK110GL [Quadro K6000] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation GK110GL [Quadro K6000] Physical Slot: 2 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 32 NUMA node: 0 Region 0: Memory at f3000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at c0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at d0000000 (64-bit, prefetchable) [size=32M] Region 5: I/O ports at 1000 [size=128] [virtual] Expansion ROM at f4080000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee16000 Data: 4022 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 1024 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+ LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s <1us, L1 <4us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [420 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900 v1] #19 Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia 03:00.0 3D controller: NVIDIA Corporation GK110BGL [Tesla K40c] (rev a1) Subsystem: Hewlett-Packard Company GK110BGL [Tesla K40c] Physical Slot: 5 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 33 NUMA node: 0 Region 0: Memory at f2000000 (32-bit, non-prefetchable) [size=16M] Region 1: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 3: Memory at f0000000 (64-bit, prefetchable) [size=32M] Capabilities: [60] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Address: 00000000fee35000 Data: 4021 Capabilities: [78] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- MaxPayload 256 bytes, MaxReadReq 1024 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend+ LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency L0s <1us, L1 <4us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+ EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [128 v1] Power Budgeting <?> Capabilities: [420 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900 v1] #19 Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia I tried running SETI before I swapped the cards, and it showed near 100% utilization on both cards: Mon Nov 25 16:53:34 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K40c On | 00000000:02:00.0 Off | 0 | | 37% 73C P0 154W / 235W | 270MiB / 11441MiB | 97% Default | +-------------------------------+----------------------+----------------------+ | 1 Quadro K6000 On | 00000000:03:00.0 Off | 0 | | 44% 73C P0 148W / 225W | 296MiB / 11433MiB | 92% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 2788 C ..._x86_64-pc-linux-gnu__opencl_nvidia_sah 258MiB | | 1 2036 G /usr/lib/xorg/Xorg 24MiB | | 1 2773 C ..._x86_64-pc-linux-gnu__opencl_nvidia_sah 258MiB | +-----------------------------------------------------------------------------+ |
2)
Questions and Answers :
Unix/Linux :
Nvidia Quadro K6000 Low Utilization - Slow Processing
(Message 69305)
Posted 25 Nov 2019 by Samyah93 Post: Hello, Thanks for the information! It is my understanding that the GK110 ended up being used in the Tesla K40 at launch. GK180 was also just a GK110 with some minor tweaks. Both cards are on PCIe x16 slots, and the cards work correctly for non-BOINC (quantum chemical) calculation purposes. Is there a way to check if it is being starved of resources? As was pointed out, it also seems to have significant differences in the number of blocks processed simultaneously. |
3)
Questions and Answers :
Unix/Linux :
Nvidia Quadro K6000 Low Utilization - Slow Processing
(Message 69304)
Posted 25 Nov 2019 by Samyah93 Post: Thanks! I will get back to you on whether SETI has this problem, since I have not tried it yet. |
4)
Questions and Answers :
Unix/Linux :
Nvidia Quadro K6000 Low Utilization - Slow Processing
(Message 69301)
Posted 25 Nov 2019 by Samyah93 Post: Sorry about that. Here is the output from nvidia-smi while running jobs: Sun Nov 24 21:27:57 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.87.01 Driver Version: 418.87.01 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K40c On | 00000000:02:00.0 Off | 0 | | 44% 75C P0 162W / 235W | 125MiB / 11441MiB | 100% Default | +-------------------------------+----------------------+----------------------+ | 1 Quadro K6000 On | 00000000:03:00.0 Off | 0 | | 34% 61C P0 79W / 225W | 210MiB / 11434MiB | 17% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 10982 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 114MiB | | 1 3724 G /usr/lib/xorg/Xorg 19MiB | | 1 10881 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 177MiB | +-----------------------------------------------------------------------------+ Here is the event log through memory: Mon 25 Nov 2019 12:44:56 AM PST | | Starting BOINC client version 7.9.3 for x86_64-pc-linux-gnu Mon 25 Nov 2019 12:44:56 AM PST | | log flags: file_xfer, sched_ops, task Mon 25 Nov 2019 12:44:56 AM PST | | Libraries: libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3 Mon 25 Nov 2019 12:44:56 AM PST | | Data directory: /var/lib/boinc-client Mon 25 Nov 2019 12:44:56 AM PST | | CUDA: NVIDIA GPU 0: Tesla K40c (driver version 418.87, CUDA version 10.1, compute capability 3.5, 4096MB, 4007MB available, 4291 GFLOPS peak) Mon 25 Nov 2019 12:44:56 AM PST | | CUDA: NVIDIA GPU 1: Quadro K6000 (driver version 418.87, CUDA version 10.1, compute capability 3.5, 4096MB, 4007MB available, 5193 GFLOPS peak) Mon 25 Nov 2019 12:44:56 AM PST | | OpenCL: NVIDIA GPU 0: Tesla K40c (driver version 418.87.01, device version OpenCL 1.2 CUDA, 11441MB, 4007MB available, 4291 GFLOPS peak) Mon 25 Nov 2019 12:44:56 AM PST | | OpenCL: NVIDIA GPU 1: Quadro K6000 (driver version 418.87.01, device version OpenCL 1.2 CUDA, 11434MB, 4007MB available, 5193 GFLOPS peak) Mon 25 Nov 2019 12:44:57 AM PST | | [libc detection] gathered: 2.27, Ubuntu GLIBC 2.27-3ubuntu1 Mon 25 Nov 2019 12:44:57 AM PST | | Host name: albus Mon 25 Nov 2019 12:44:57 AM PST | | Processor: 32 GenuineIntel Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz [Family 6 Model 63 Stepping 2] Mon 25 Nov 2019 12:44:57 AM PST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush_l1d Mon 25 Nov 2019 12:44:57 AM PST | | OS: Linux Ubuntu: Ubuntu 18.04.3 LTS [4.15.0-70-generic|libc 2.27 (Ubuntu GLIBC 2.27-3ubuntu1)] Mon 25 Nov 2019 12:44:57 AM PST | | Memory: 62.82 GB physical, 980.00 MB virtual I tried adding the line <cmdline>--verbose</cmdline> as follows to an app_config.xml, but the content of the stderr file does not seem to change... (I found these under boinc/slots/X/ and on the milkyway@home website. Is there somewhere else I should look?) My app_config.xml looks like the following: <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> </app_config> Thanks! |
5)
Questions and Answers :
Unix/Linux :
Nvidia Quadro K6000 Low Utilization - Slow Processing
(Message 69295)
Posted 24 Nov 2019 by Samyah93 Post: nvidia-smi shows around 15-25% utilization for the K6000 and 90-100% for the K40. |
6)
Questions and Answers :
Unix/Linux :
Nvidia Quadro K6000 Low Utilization - Slow Processing
(Message 69293)
Posted 24 Nov 2019 by Samyah93 Post: Hello, I have a workstation equipped with two Nvidia cards: a Quadro K6000 and a Tesla K40. I've been noticing that the K6000 processes jobs much more slowly than the K40 when the two should be comparable in performance (same chip). The K6000 takes around 220 seconds to complete a job while the K40 finishes them in ~45 seconds. When I check the GPU utilization using nvidia-smi, the K6000 never has more than 25% utilization, but the K40 is going at 90-100%. When I check the workunit output, the only significant different I notice is the "blocks/chunk" and number of chunks. (For example, the K40 has 73 blocks/chunk with "num chunks: 1", while the K6000 is showing 5 blocks/chunk with "num chunks: 15"). Is this the source of the problem? And is there a way to speed up the calculations on the K6000? Sample outputs are included below. Thanks! Quadro K6000: <core_client_version>7.9.3</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application> BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' Setting process priority to 0 (13): Permission denied Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 4 </number_WUs> <number_params_per_WU> 26 </number_params_per_WU> Using AVX path Found 1 platform Platform 0 information: Name: NVIDIA CUDA Version: OpenCL 1.2 CUDA 10.1.236 Vendor: NVIDIA Corporation Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer Profile: FULL_PROFILE Using device 1 on platform 0 Found 2 CL devices Device 'Quadro K6000' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU) Board: Driver version: 418.87.01 Version: OpenCL 1.2 CUDA Compute capability: 3.5 Max compute units: 15 Clock frequency: 901 Mhz Global mem size: 11989549056 Local mem size: 49152 Max const buf size: 65536 Double extension: cl_khr_fp64 Build log: -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Build log: -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Estimated Nvidia GPU GFLOP/s: 865 SP GFLOP/s, 108 DP FLOP/s Using a target frequency of 60.0 Using a block size of 7680 with 5 blocks/chunk Using clWaitForEvents() for polling with initial wait of 12 ms (mode 0) Range: { nu_steps = 320, mu_steps = 800, r_steps = 700 } Iteration area: 560000 Chunk estimate: 13 Num chunks: 15 Chunk size: 38400 Added area: 16000 Effective area: 576000 Initial wait: 12 ms Integration time: 53.356348 s. Average time per iteration = 166.738586 ms Integral 0 time = 53.476008 s Running likelihood with 34614 stars Likelihood time = 0.447050 s <background_integral> 0.000059483224699 </background_integral> <stream_integral> 161.051564875388237 19.295289056122250 17.373218361365936 0.453606010282199 </stream_integral> <background_likelihood> -3.370756367290975 </background_likelihood> <stream_only_likelihood> -3.414761260341190 -4.680083632057806 -4.476524432162782 -62.665198829490095 </stream_only_likelihood> <search_likelihood> -2.788954249416149 </search_likelihood> Tesla K40: <core_client_version>7.9.3</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application> BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' Setting process priority to 0 (13): Permission denied Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 4 </number_WUs> <number_params_per_WU> 26 </number_params_per_WU> Using AVX path Found 1 platform Platform 0 information: Name: NVIDIA CUDA Version: OpenCL 1.2 CUDA 10.1.236 Vendor: NVIDIA Corporation Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer Profile: FULL_PROFILE Using device 0 on platform 0 Found 2 CL devices Device 'Tesla K40c' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU) Board: Driver version: 418.87.01 Version: OpenCL 1.2 CUDA Compute capability: 3.5 Max compute units: 15 Clock frequency: 745 Mhz Global mem size: 11996954624 Local mem size: 49152 Max const buf size: 65536 Double extension: cl_khr_fp64 Build log: -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Build log: -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Estimated Nvidia GPU GFLOP/s: 715 SP GFLOP/s, 358 DP FLOP/s Using a target frequency of 60.0 Using a block size of 7680 with 73 blocks/chunk Using clWaitForEvents() for polling with initial wait of 12 ms (mode 0) Range: { nu_steps = 320, mu_steps = 800, r_steps = 700 } Iteration area: 560000 Chunk estimate: 4 Num chunks: 1 Chunk size: 560640 Added area: 640 Effective area: 560640 Initial wait: 12 ms Integration time: 9.972695 s. Average time per iteration = 31.164673 ms Integral 0 time = 10.090335 s Running likelihood with 38073 stars Likelihood time = 0.683837 s <background_integral> 0.000059045116913 </background_integral> <stream_integral> 107.805354630900979 47.098348005207185 0.477909869639480 1.440862238706881 </stream_integral> <background_likelihood> -3.395313980381030 </background_likelihood> <stream_only_likelihood> -4.154406400186403 -3.352796560986827 -61.060047563529722 -101.940935169837473 </stream_only_likelihood> <search_likelihood> -2.797258234549717 </search_likelihood> |
©2022 Astroinformatics Group