Welcome to MilkyWay@home

GPU tasks with AMD ROCm


Advanced search

Message boards : Number crunching : GPU tasks with AMD ROCm
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileŠarūnas Burdulis
Avatar

Send message
Joined: 27 Apr 15
Posts: 3
Credit: 352,133,379
RAC: 122,617
300 million credit badge4 year member badge
Message 66704 - Posted: 19 Oct 2017, 13:49:57 UTC

I have used AMD GPUS with their AMDGPU-PRO drivers. This works with Linux 4.4 and 4.10 (Ubuntu). One can use 4.10 kernel also in the latest Ubuntu 17.10b, but while AMDGPU-PRO installs, it causes some issues with Desktop apps. So I switched to AMD's open source ROCm and its corresponding OpenCL implementation. All seems to works fine, including applications which use OpenCL, e.g. darktable. MW@home GPU tasks however are failing (task log below).

Did anyone try ROCm OpenCL? Any ideas on how to 'fix' this?

<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
09:15:21 (13641): called boinc_finish(1)

</stderr_txt>
]]>
ID: 66704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileŠarūnas Burdulis
Avatar

Send message
Joined: 27 Apr 15
Posts: 3
Credit: 352,133,379
RAC: 122,617
300 million credit badge4 year member badge
Message 66753 - Posted: 26 Oct 2017, 17:52:59 UTC

Any ideas on how to debug this? The OpenCL platform seems to be there. Here is /var/log/boinc.log on boinc-client startup:

26-Oct-2017 13:38:23 [---] Starting BOINC client version 7.8.3 for x86_64-pc-linux-gnu
26-Oct-2017 13:38:23 [---] log flags: file_xfer, sched_ops, task
26-Oct-2017 13:38:23 [---] Libraries: libcurl/7.55.1 OpenSSL/1.0.2g zlib/1.2.11 libidn2/2.0.2 libpsl/0.18.0 (+libidn2/2.0.2) librtmp/2.3
26-Oct-2017 13:38:23 [---] Data directory: /var/lib/boinc-client
26-Oct-2017 13:38:23 [---] OpenCL: AMD/ATI GPU 0: gfx701 (driver version 1.1 (HSA,LC), device version OpenCL 1.2, 8192MB, 8192MB available, 3696 GFLOPS peak)
26-Oct-2017 13:38:23 [---] Host name: hilbert
26-Oct-2017 13:38:23 [---] Processor: 12 AuthenticAMD AMD Ryzen 5 1600 Six-Core Processor [Family 23 Model 1 Stepping 1]
26-Oct-2017 13:38:23 [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx hw_pstate vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 xsaves clzero irperf arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic overflow_recov succor smca
26-Oct-2017 13:38:23 [---] OS: Linux Ubuntu: Ubuntu 17.10 [4.11.0-kfd-compute-rocm-rel-1.6-180]
26-Oct-2017 13:38:23 [---] Memory: 15.67 GB physical, 15.95 GB virtual
26-Oct-2017 13:38:23 [---] Disk: 452.26 GB total, 283.99 GB free
26-Oct-2017 13:38:23 [---] Local time is UTC -4 hours
26-Oct-2017 13:38:23 [---] VirtualBox version: 5.1.30_Ubuntur118389
26-Oct-2017 13:38:23 [---] Config: GUI RPCs allowed from:
26-Oct-2017 13:38:23 [Milkyway@Home] URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 734143; resource share 100
26-Oct-2017 13:38:23 [Milkyway@Home] General prefs: from Milkyway@Home (last modified 16-Aug-2017 11:52:13)
26-Oct-2017 13:38:23 [Milkyway@Home] Host location: none
26-Oct-2017 13:38:23 [Milkyway@Home] General prefs: using your defaults
26-Oct-2017 13:38:23 [---] Reading preferences override file
26-Oct-2017 13:38:23 [---] Preferences:
26-Oct-2017 13:38:23 [---]    max memory usage when active: 8023.47 MB
26-Oct-2017 13:38:23 [---]    max memory usage when idle: 14442.24 MB
26-Oct-2017 13:38:23 [---]    max disk usage: 283.91 GB
26-Oct-2017 13:38:23 [---]    max CPUs used: 1
26-Oct-2017 13:38:23 [---]    (to change preferences, visit a project web site or select Preferences in the Manager)
26-Oct-2017 13:38:23 [---] gui_rpc_auth.cfg is empty - no GUI RPC password protection
26-Oct-2017 13:38:23 Initialization completed
ID: 66753 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 75,271,794
RAC: 270
50 million credit badge6 year member badgeextraordinary contributions badge
Message 66754 - Posted: 26 Oct 2017, 19:16:17 UTC

Hi,

We currently do not support the open source AMDGPU drivers. Getting the client working on AMDGPU is on my list of things to do, but it has taken the back burner to a few more important tasks recently.

Sorry,

Jake
ID: 66754 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnRH

Send message
Joined: 11 Apr 11
Posts: 4
Credit: 3,033,029
RAC: 0
3 million credit badge8 year member badge
Message 66789 - Posted: 20 Nov 2017, 5:54:18 UTC - in response to Message 66704.  

I thought I might come in on this one since I've also been getting 'Computation errors' with these ATI/GPU workunits. Perhaps if other users can contribute their experiences some clues can be found.
My setup:
Ubuntu 17.10 amd64
AMD APU A10 7700K KAVERI
BOINC 7.8.3

BOINC installation:
boinc
boinc-client
boinc-client-opencl
boinc-manager
libboinc7
mesa-opencl-icd

This last, with its dependencies, is needed because Ubuntu 17.10 comes with Mesa OpenCL and it works fine elsewhere. It's been my past experience that keeping to Ubuntu packages wherever possible tends to produce better results than what I call 'outside stuff'. I'm not familiar with the drivers others are using.

So here's what I'm now seeing in my error messages:
Stderr output

<core_client_version>7.8.3</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Found 1 platform
Platform 0 information:
Name: Clover
Version: OpenCL 1.1 Mesa 17.2.2
Vendor: Mesa
Extensions: cl_khr_icd
Profile: FULL_PROFILE
Didn't find preferred platform
Using device 0 on platform 0
Found 1 CL device
Device 'AMD KAVERI (DRM 2.50.0 / 4.13.0-16-generic, LLVM 5.0.0)' (AMD:0x1002) (CL_DEVICE_TYPE_GPU)
Board:
Driver version: 17.2.2
Version: OpenCL 1.1 Mesa 17.2.2
Compute capability: 0.0
Max compute units: 6
Clock frequency: 720 Mhz
Global mem size: 2138722304
Local mem size: 32768
Max const buf size: 1497105612
Double extension: cl_khr_fp64
clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE
Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE
Error creating integral program from source
Failed to calculate likelihood
Using AVX path...
...and so on.

Although it's still failing I'm getting closer to the truth than you since platform and device have been identified correctly. (If we're unsure what version we have, running clinfo will tell us.) So it seems to reduce to an error CL_BUILD_PROGRAM_FAILURE. I don't know whether this provides any clue but it does imply that Milkyway ATI/GPU units don't like Mesa OpenCL drivers.

Whether this helps or not, here it is anyway!

As a footnote, my setup runs Einstein@home ATI/GPU workunits perfectly well, so an answer must be possible.
ID: 66789 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jwalck

Send message
Joined: 17 Dec 17
Posts: 1
Credit: 294,154
RAC: 0
100 thousand credit badge1 year member badge
Message 66873 - Posted: 19 Dec 2017, 23:30:07 UTC

I'm in the same situation! OpenCL working fine for Einstein@Home but failing instantly for Milkyway@Home, errors like JohnRH here.

...
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using AVX path
...


Running on AMDGPU Pro (proprietary) 17.50 on a AMD RX Vega 64.
ID: 66873 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : GPU tasks with AMD ROCm

©2019 Astroinformatics Group