Message boards :
Number crunching :
GPU tasks with AMD ROCm
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Apr 15 Posts: 4 Credit: 427,409,763 RAC: 0 |
I have used AMD GPUS with their AMDGPU-PRO drivers. This works with Linux 4.4 and 4.10 (Ubuntu). One can use 4.10 kernel also in the latest Ubuntu 17.10b, but while AMDGPU-PRO installs, it causes some issues with Desktop apps. So I switched to AMD's open source ROCm and its corresponding OpenCL implementation. All seems to works fine, including applications which use OpenCL, e.g. darktable. MW@home GPU tasks however are failing (task log below). Did anyone try ROCm OpenCL? Any ideas on how to 'fix' this? <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> <search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application> BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Setting process priority to 0 (13): Permission denied Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood 09:15:21 (13641): called boinc_finish(1) </stderr_txt> ]]> |
Send message Joined: 27 Apr 15 Posts: 4 Credit: 427,409,763 RAC: 0 |
Any ideas on how to debug this? The OpenCL platform seems to be there. Here is /var/log/boinc.log on boinc-client startup: 26-Oct-2017 13:38:23 [---] Starting BOINC client version 7.8.3 for x86_64-pc-linux-gnu 26-Oct-2017 13:38:23 [---] log flags: file_xfer, sched_ops, task 26-Oct-2017 13:38:23 [---] Libraries: libcurl/7.55.1 OpenSSL/1.0.2g zlib/1.2.11 libidn2/2.0.2 libpsl/0.18.0 (+libidn2/2.0.2) librtmp/2.3 26-Oct-2017 13:38:23 [---] Data directory: /var/lib/boinc-client 26-Oct-2017 13:38:23 [---] OpenCL: AMD/ATI GPU 0: gfx701 (driver version 1.1 (HSA,LC), device version OpenCL 1.2, 8192MB, 8192MB available, 3696 GFLOPS peak) 26-Oct-2017 13:38:23 [---] Host name: hilbert 26-Oct-2017 13:38:23 [---] Processor: 12 AuthenticAMD AMD Ryzen 5 1600 Six-Core Processor [Family 23 Model 1 Stepping 1] 26-Oct-2017 13:38:23 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca 26-Oct-2017 13:38:23 [---] OS: Linux Ubuntu: Ubuntu 17.10 [4.11.0-kfd-compute-rocm-rel-1.6-180] 26-Oct-2017 13:38:23 [---] Memory: 15.67 GB physical, 15.95 GB virtual 26-Oct-2017 13:38:23 [---] Disk: 452.26 GB total, 283.99 GB free 26-Oct-2017 13:38:23 [---] Local time is UTC -4 hours 26-Oct-2017 13:38:23 [---] VirtualBox version: 5.1.30_Ubuntur118389 26-Oct-2017 13:38:23 [---] Config: GUI RPCs allowed from: 26-Oct-2017 13:38:23 [Milkyway@Home] URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 734143; resource share 100 26-Oct-2017 13:38:23 [Milkyway@Home] General prefs: from Milkyway@Home (last modified 16-Aug-2017 11:52:13) 26-Oct-2017 13:38:23 [Milkyway@Home] Host location: none 26-Oct-2017 13:38:23 [Milkyway@Home] General prefs: using your defaults 26-Oct-2017 13:38:23 [---] Reading preferences override file 26-Oct-2017 13:38:23 [---] Preferences: 26-Oct-2017 13:38:23 [---] max memory usage when active: 8023.47 MB 26-Oct-2017 13:38:23 [---] max memory usage when idle: 14442.24 MB 26-Oct-2017 13:38:23 [---] max disk usage: 283.91 GB 26-Oct-2017 13:38:23 [---] max CPUs used: 1 26-Oct-2017 13:38:23 [---] (to change preferences, visit a project web site or select Preferences in the Manager) 26-Oct-2017 13:38:23 [---] gui_rpc_auth.cfg is empty - no GUI RPC password protection 26-Oct-2017 13:38:23 Initialization completed |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hi, We currently do not support the open source AMDGPU drivers. Getting the client working on AMDGPU is on my list of things to do, but it has taken the back burner to a few more important tasks recently. Sorry, Jake |
Send message Joined: 11 Apr 11 Posts: 4 Credit: 7,186,274 RAC: 750 |
I thought I might come in on this one since I've also been getting 'Computation errors' with these ATI/GPU workunits. Perhaps if other users can contribute their experiences some clues can be found. My setup: Ubuntu 17.10 amd64 AMD APU A10 7700K KAVERI BOINC 7.8.3 BOINC installation: boinc boinc-client boinc-client-opencl boinc-manager libboinc7 mesa-opencl-icd This last, with its dependencies, is needed because Ubuntu 17.10 comes with Mesa OpenCL and it works fine elsewhere. It's been my past experience that keeping to Ubuntu packages wherever possible tends to produce better results than what I call 'outside stuff'. I'm not familiar with the drivers others are using. So here's what I'm now seeing in my error messages: Stderr output <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> <search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application> Reading preferences ended prematurely BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Setting process priority to 0 (13): Permission denied Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using AVX path Found 1 platform Platform 0 information: Name: Clover Version: OpenCL 1.1 Mesa 17.2.2 Vendor: Mesa Extensions: cl_khr_icd Profile: FULL_PROFILE Didn't find preferred platform Using device 0 on platform 0 Found 1 CL device Device 'AMD KAVERI (DRM 2.50.0 / 4.13.0-16-generic, LLVM 5.0.0)' (AMD:0x1002) (CL_DEVICE_TYPE_GPU) Board: Driver version: 17.2.2 Version: OpenCL 1.1 Mesa 17.2.2 Compute capability: 0.0 Max compute units: 6 Clock frequency: 720 Mhz Global mem size: 2138722304 Local mem size: 32768 Max const buf size: 1497105612 Double extension: cl_khr_fp64 clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE Error creating integral program from source Failed to calculate likelihood Using AVX path... ...and so on. Although it's still failing I'm getting closer to the truth than you since platform and device have been identified correctly. (If we're unsure what version we have, running clinfo will tell us.) So it seems to reduce to an error CL_BUILD_PROGRAM_FAILURE. I don't know whether this provides any clue but it does imply that Milkyway ATI/GPU units don't like Mesa OpenCL drivers. Whether this helps or not, here it is anyway! As a footnote, my setup runs Einstein@home ATI/GPU workunits perfectly well, so an answer must be possible. |
Send message Joined: 17 Dec 17 Posts: 1 Credit: 294,154 RAC: 0 |
I'm in the same situation! OpenCL working fine for Einstein@Home but failing instantly for Milkyway@Home, errors like JohnRH here. ... Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path ... Running on AMDGPU Pro (proprietary) 17.50 on a AMD RX Vega 64. |
Send message Joined: 27 Apr 15 Posts: 4 Credit: 427,409,763 RAC: 0 |
Open source amdgpu+rocm-opencl is still not usable by Milkyway@home. However open source amdgpu (part of stock Linux kernel) can be combined with OpenCL libraries from AMDGPU-PRO driver (download from AMD). amdgpu-install --opencl=legacy,pal --headless --no-dkms This works with up to the latest Linux kernel, 5.6-rc2 as of today. |
Send message Joined: 22 Feb 09 Posts: 2 Credit: 1,057,955 RAC: 0 |
Does anyone know the reason why ROCm don't work ? Is it missing some proprietary lib or something ? |
Send message Joined: 7 May 14 Posts: 57 Credit: 206,540,646 RAC: 0 |
hi all made vid on youtube for multiple instances instruction's and at full load on a Radeon VII RADEON VII GIGABYTE// 3 Instances_ Milkyway@home WUs BOINC_ 3_instances https://www.youtube.com/watch?v=4xKy9wGKmz4 all the best and welcome to earth |
©2024 Astroinformatics Group