rpi_logo
GPU tasks with AMD ROCm
GPU tasks with AMD ROCm
log in

Advanced search

Message boards : Number crunching : GPU tasks with AMD ROCm

Author Message
Profile Šarūnas Burdulis
Avatar
Send message
Joined: 27 Apr 15
Posts: 3
Credit: 283,755,418
RAC: 340,847

Message 66704 - Posted: 19 Oct 2017, 13:49:57 UTC

I have used AMD GPUS with their AMDGPU-PRO drivers. This works with Linux 4.4 and 4.10 (Ubuntu). One can use 4.10 kernel also in the latest Ubuntu 17.10b, but while AMDGPU-PRO installs, it causes some issues with Desktop apps. So I switched to AMD's open source ROCm and its corresponding OpenCL implementation. All seems to works fine, including applications which use OpenCL, e.g. darktable. MW@home GPU tasks however are failing (task log below).

Did anyone try ROCm OpenCL? Any ideas on how to 'fix' this?

<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
09:15:21 (13641): called boinc_finish(1)

</stderr_txt>
]]>
____________

Profile Šarūnas Burdulis
Avatar
Send message
Joined: 27 Apr 15
Posts: 3
Credit: 283,755,418
RAC: 340,847

Message 66753 - Posted: 26 Oct 2017, 17:52:59 UTC

Any ideas on how to debug this? The OpenCL platform seems to be there. Here is /var/log/boinc.log on boinc-client startup:

26-Oct-2017 13:38:23 [---] Starting BOINC client version 7.8.3 for x86_64-pc-linux-gnu 26-Oct-2017 13:38:23 [---] log flags: file_xfer, sched_ops, task 26-Oct-2017 13:38:23 [---] Libraries: libcurl/7.55.1 OpenSSL/1.0.2g zlib/1.2.11 libidn2/2.0.2 libpsl/0.18.0 (+libidn2/2.0.2) librtmp/2.3 26-Oct-2017 13:38:23 [---] Data directory: /var/lib/boinc-client 26-Oct-2017 13:38:23 [---] OpenCL: AMD/ATI GPU 0: gfx701 (driver version 1.1 (HSA,LC), device version OpenCL 1.2, 8192MB, 8192MB available, 3696 GFLOPS peak) 26-Oct-2017 13:38:23 [---] Host name: hilbert 26-Oct-2017 13:38:23 [---] Processor: 12 AuthenticAMD AMD Ryzen 5 1600 Six-Core Processor [Family 23 Model 1 Stepping 1] 26-Oct-2017 13:38:23 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca 26-Oct-2017 13:38:23 [---] OS: Linux Ubuntu: Ubuntu 17.10 [4.11.0-kfd-compute-rocm-rel-1.6-180] 26-Oct-2017 13:38:23 [---] Memory: 15.67 GB physical, 15.95 GB virtual 26-Oct-2017 13:38:23 [---] Disk: 452.26 GB total, 283.99 GB free 26-Oct-2017 13:38:23 [---] Local time is UTC -4 hours 26-Oct-2017 13:38:23 [---] VirtualBox version: 5.1.30_Ubuntur118389 26-Oct-2017 13:38:23 [---] Config: GUI RPCs allowed from: 26-Oct-2017 13:38:23 [Milkyway@Home] URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 734143; resource share 100 26-Oct-2017 13:38:23 [Milkyway@Home] General prefs: from Milkyway@Home (last modified 16-Aug-2017 11:52:13) 26-Oct-2017 13:38:23 [Milkyway@Home] Host location: none 26-Oct-2017 13:38:23 [Milkyway@Home] General prefs: using your defaults 26-Oct-2017 13:38:23 [---] Reading preferences override file 26-Oct-2017 13:38:23 [---] Preferences: 26-Oct-2017 13:38:23 [---] max memory usage when active: 8023.47 MB 26-Oct-2017 13:38:23 [---] max memory usage when idle: 14442.24 MB 26-Oct-2017 13:38:23 [---] max disk usage: 283.91 GB 26-Oct-2017 13:38:23 [---] max CPUs used: 1 26-Oct-2017 13:38:23 [---] (to change preferences, visit a project web site or select Preferences in the Manager) 26-Oct-2017 13:38:23 [---] gui_rpc_auth.cfg is empty - no GUI RPC password protection 26-Oct-2017 13:38:23 Initialization completed

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 501
Credit: 34,647,251
RAC: 224

Message 66754 - Posted: 26 Oct 2017, 19:16:17 UTC

Hi,

We currently do not support the open source AMDGPU drivers. Getting the client working on AMDGPU is on my list of things to do, but it has taken the back burner to a few more important tasks recently.

Sorry,

Jake

JohnRH
Send message
Joined: 11 Apr 11
Posts: 4
Credit: 2,973,270
RAC: 329

Message 66789 - Posted: 20 Nov 2017, 5:54:18 UTC - in response to Message 66704.

I thought I might come in on this one since I've also been getting 'Computation errors' with these ATI/GPU workunits. Perhaps if other users can contribute their experiences some clues can be found.
My setup:
Ubuntu 17.10 amd64
AMD APU A10 7700K KAVERI
BOINC 7.8.3

BOINC installation:
boinc
boinc-client
boinc-client-opencl
boinc-manager
libboinc7
mesa-opencl-icd

This last, with its dependencies, is needed because Ubuntu 17.10 comes with Mesa OpenCL and it works fine elsewhere. It's been my past experience that keeping to Ubuntu packages wherever possible tends to produce better results than what I call 'outside stuff'. I'm not familiar with the drivers others are using.

So here's what I'm now seeing in my error messages:
Stderr output

<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Found 1 platform
Platform 0 information:
Name: Clover
Version: OpenCL 1.1 Mesa 17.2.2
Vendor: Mesa
Extensions: cl_khr_icd
Profile: FULL_PROFILE
Didn't find preferred platform
Using device 0 on platform 0
Found 1 CL device
Device 'AMD KAVERI (DRM 2.50.0 / 4.13.0-16-generic, LLVM 5.0.0)' (AMD:0x1002) (CL_DEVICE_TYPE_GPU)
Board:
Driver version: 17.2.2
Version: OpenCL 1.1 Mesa 17.2.2
Compute capability: 0.0
Max compute units: 6
Clock frequency: 720 Mhz
Global mem size: 2138722304
Local mem size: 32768
Max const buf size: 1497105612
Double extension: cl_khr_fp64
clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE
Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE
Error creating integral program from source
Failed to calculate likelihood
Using AVX path...
...and so on.

Although it's still failing I'm getting closer to the truth than you since platform and device have been identified correctly. (If we're unsure what version we have, running clinfo will tell us.) So it seems to reduce to an error CL_BUILD_PROGRAM_FAILURE. I don't know whether this provides any clue but it does imply that Milkyway ATI/GPU units don't like Mesa OpenCL drivers.

Whether this helps or not, here it is anyway!

As a footnote, my setup runs Einstein@home ATI/GPU workunits perfectly well, so an answer must be possible.

jwalck
Send message
Joined: 17 Dec 17
Posts: 1
Credit: 294,154
RAC: 0

Message 66873 - Posted: 19 Dec 2017, 23:30:07 UTC

I'm in the same situation! OpenCL working fine for Einstein@Home but failing instantly for Milkyway@Home, errors like JohnRH here.

... Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path ...


Running on AMDGPU Pro (proprietary) 17.50 on a AMD RX Vega 64.


Post to thread

Message boards : Number crunching : GPU tasks with AMD ROCm


Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group