Welcome to MilkyWay@home

Posts by Šarūnas Burdulis

1) Message boards : Number crunching : GPU tasks with AMD ROCm (Message 69556)
Posted 19 Feb 2020 by Profile Šarūnas Burdulis
Post:
Open source amdgpu+rocm-opencl is still not usable by Milkyway@home.

However open source amdgpu (part of stock Linux kernel) can be combined with OpenCL libraries from AMDGPU-PRO driver (download from AMD).

amdgpu-install --opencl=legacy,pal --headless --no-dkms

This works with up to the latest Linux kernel, 5.6-rc2 as of today.
2) Message boards : Number crunching : GPU tasks with AMD ROCm (Message 66753)
Posted 26 Oct 2017 by Profile Šarūnas Burdulis
Post:
Any ideas on how to debug this? The OpenCL platform seems to be there. Here is /var/log/boinc.log on boinc-client startup:

26-Oct-2017 13:38:23 [---] Starting BOINC client version 7.8.3 for x86_64-pc-linux-gnu
26-Oct-2017 13:38:23 [---] log flags: file_xfer, sched_ops, task
26-Oct-2017 13:38:23 [---] Libraries: libcurl/7.55.1 OpenSSL/1.0.2g zlib/1.2.11 libidn2/2.0.2 libpsl/0.18.0 (+libidn2/2.0.2) librtmp/2.3
26-Oct-2017 13:38:23 [---] Data directory: /var/lib/boinc-client
26-Oct-2017 13:38:23 [---] OpenCL: AMD/ATI GPU 0: gfx701 (driver version 1.1 (HSA,LC), device version OpenCL 1.2, 8192MB, 8192MB available, 3696 GFLOPS peak)
26-Oct-2017 13:38:23 [---] Host name: hilbert
26-Oct-2017 13:38:23 [---] Processor: 12 AuthenticAMD AMD Ryzen 5 1600 Six-Core Processor [Family 23 Model 1 Stepping 1]
26-Oct-2017 13:38:23 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca
26-Oct-2017 13:38:23 [---] OS: Linux Ubuntu: Ubuntu 17.10 [4.11.0-kfd-compute-rocm-rel-1.6-180]
26-Oct-2017 13:38:23 [---] Memory: 15.67 GB physical, 15.95 GB virtual
26-Oct-2017 13:38:23 [---] Disk: 452.26 GB total, 283.99 GB free
26-Oct-2017 13:38:23 [---] Local time is UTC -4 hours
26-Oct-2017 13:38:23 [---] VirtualBox version: 5.1.30_Ubuntur118389
26-Oct-2017 13:38:23 [---] Config: GUI RPCs allowed from:
26-Oct-2017 13:38:23 [Milkyway@Home] URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 734143; resource share 100
26-Oct-2017 13:38:23 [Milkyway@Home] General prefs: from Milkyway@Home (last modified 16-Aug-2017 11:52:13)
26-Oct-2017 13:38:23 [Milkyway@Home] Host location: none
26-Oct-2017 13:38:23 [Milkyway@Home] General prefs: using your defaults
26-Oct-2017 13:38:23 [---] Reading preferences override file
26-Oct-2017 13:38:23 [---] Preferences:
26-Oct-2017 13:38:23 [---]    max memory usage when active: 8023.47 MB
26-Oct-2017 13:38:23 [---]    max memory usage when idle: 14442.24 MB
26-Oct-2017 13:38:23 [---]    max disk usage: 283.91 GB
26-Oct-2017 13:38:23 [---]    max CPUs used: 1
26-Oct-2017 13:38:23 [---]    (to change preferences, visit a project web site or select Preferences in the Manager)
26-Oct-2017 13:38:23 [---] gui_rpc_auth.cfg is empty - no GUI RPC password protection
26-Oct-2017 13:38:23 Initialization completed
3) Message boards : Number crunching : GPU tasks with AMD ROCm (Message 66704)
Posted 19 Oct 2017 by Profile Šarūnas Burdulis
Post:
I have used AMD GPUS with their AMDGPU-PRO drivers. This works with Linux 4.4 and 4.10 (Ubuntu). One can use 4.10 kernel also in the latest Ubuntu 17.10b, but while AMDGPU-PRO installs, it causes some issues with Desktop apps. So I switched to AMD's open source ROCm and its corresponding OpenCL implementation. All seems to works fine, including applications which use OpenCL, e.g. darktable. MW@home GPU tasks however are failing (task log below).

Did anyone try ROCm OpenCL? Any ideas on how to 'fix' this?

<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
09:15:21 (13641): called boinc_finish(1)

</stderr_txt>
]]>
4) Message boards : News : GPU Issues Mega Thread (Message 66703)
Posted 19 Oct 2017 by Profile Šarūnas Burdulis
Post:
I have been successfully running GPU tasks with both AMD (amdgpu-pro) and Nvidia devices, using their provided OpenCL libraries.

Yesterday I upgraded one of the AMD workstations to use ROCm and its OpenCL (amdgpu-pro doesn't work on Ubuntu 17.10). GPU device is RX 480 (Ellesmere/Polaris, or 'gfx803' in ROCm). Since then MilkyWay@home GPU tasks are failing. Below is what I see in the task log and part of the clinfo. Let me know if there already is any solution to this or more info is needed.

<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
09:15:21 (13641): called boinc_finish(1)

</stderr_txt>
]]>

clinfo|head -20
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (2508.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Extensions function suffix AMD

Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name gfx803
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2
Driver Version 1.1 (HSA,LC)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Profile FULL_PROFILE
Max compute units 36
Max clock frequency 1288MHz
...




©2024 Astroinformatics Group