Welcome to MilkyWay@home

Yet another computation-error problem


Advanced search

Questions and Answers : Unix/Linux : Yet another computation-error problem
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileUnionJack

Send message
Joined: 8 Jan 10
Posts: 16
Credit: 11,329,166
RAC: 15,542
10 million credit badge9 year member badgeextraordinary contributions badge
Message 67323 - Posted: 11 Apr 2018, 10:31:58 UTC
Last modified: 11 Apr 2018, 10:40:10 UTC

I stopped using my GPU in BOINC when a case fan failed from hard work, but now I've tried to resume. I get immediate computation errors on all MW tasks, but not Einstein@Home tasks which run fine.

OS: Gentoo Linux, kernel 4.9.76-r1
GPU: Radeon Pro WX 5100 8GB GDDR5
Driver: amdgpu-pro-opencl-17.50.511655 using mesa-17.3.8
Typical error: Computation error (0.929 CPUs + AMD/ATI GPU) ... MilkyWay@Home 1.46 (opencl_ati_101) de_modfit_14_bundle5_NoConstraintsWithDisk...
(Also with de_modfit_23)
Toolkit: wxGTK-3.0.3-r300

stdoutdae.txt shows this (T&D stripped):
OpenCL: AMD/ATI GPU 0: AMD Radeon (TM) Pro WX 5100 Graphics (POLARIS10 / DRM 3.8.0 / 4.9.76-gentoo-r1, LLVM 5.0.1) (driver version 17.3.8, device version OpenCL 1.1 Mesa 17.3.8, 16029MB, 16029MB available, 2433 GFLOPS peak)
[...]
Memory: 31.32 GB physical, 62.47 GB virtual
Disk: 39.12 GB total, 26.02 GB free
Local time is UTC +1 hours
VirtualBox version: 5.2.8_Gentoor120774
Config: don't compute while cc1 is running
Config: don't compute while cc1plus is running
Config: don't compute while cmake is running
[...]
Reading preferences override file
Preferences:
   max memory usage when active: 28859.99 MB
   max memory usage when idle: 30463.32 MB
   max disk usage: 37.06 GB
   max download rate: 2621440 bytes/sec
   max upload rate: 838861 bytes/sec

I've searched everywhere I can think of for clues to this, but there's nothing either recent or relevant.
I've tried downgrading to amdgpu-pro-opencl-17.40.492261 but it's made no difference. Those are the only two versions available in Gentoo.
What else can I try?
Rgds
Peter.
ID: 67323 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileUnionJack

Send message
Joined: 8 Jan 10
Posts: 16
Credit: 11,329,166
RAC: 15,542
10 million credit badge9 year member badgeextraordinary contributions badge
Message 67330 - Posted: 14 Apr 2018, 9:47:30 UTC

I don't know what's going on here, but something has changed since I wrote the above: now I don't get the computation errors. I have another problem instead, on which I'll ask another question if I can't solve it myself.
Rgds
Peter.
ID: 67330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 253
Credit: 119,774,326
RAC: 74,091
100 million credit badge8 year member badgeextraordinary contributions badge
Message 67493 - Posted: 19 May 2018, 0:57:49 UTC
Last modified: 19 May 2018, 0:58:45 UTC

I have upgraded an older system and now the MilkyWay milkyway_separation 1.46 Linux x86_64 double OpenCL application does not run on Ubuntu 18.04. It just produces instant errors.

<core_client_version>7.4.44</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 4 </number_WUs>
<number_params_per_WU> 26 </number_params_per_WU>
stream sigma 0.0 is invalid
Failed to get stream constants
18:16:08 (11399): called boinc_finish(1)

</stderr_txt>
]]>

My other projects SETI and Einstein are running fine on this system with their respective OpenCL applications.

Can a developer look into this problem please. I would like to continue with MilkyWay if possible. I will run into this same problem again when I upgrade another old system with identical hardware and software.
ID: 67493 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 253
Credit: 119,774,326
RAC: 74,091
100 million credit badge8 year member badgeextraordinary contributions badge
Message 67503 - Posted: 19 May 2018, 20:00:14 UTC

Just did a better look at the errored tasks. I found one that ran for longer and it has a lot more output in stderr.txt. Maybe someone can look at this and offer a suggestion. It looks like the OpenCL wisdom file couldn't be created properly.

Stderr output
<core_client_version>7.4.44</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Found 1 platform
Platform 0 information:
Name: NVIDIA CUDA
Version: OpenCL 1.2 CUDA 9.2.101
Vendor: NVIDIA Corporation
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Profile: FULL_PROFILE
Using device 1 on platform 0
Found 3 CL devices
Device 'GeForce GTX 1070' (NVIDIA Corporation:0x10de) (CL_DEVICE_TYPE_GPU)
Board:
Driver version: 396.24
Version: OpenCL 1.2 CUDA
Compute capability: 6.1
Max compute units: 15
Clock frequency: 1683 Mhz
Global mem size: 8513978368
Local mem size: 49152
Max const buf size: 65536
Double extension: cl_khr_fp64
Build log:
--------------------------------------------------------------------------------
<kernel>:183:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
<kernel>:185:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
<kernel>:186:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
<kernel>:235:26: error: use of undeclared identifier 'inf'
tmp = mad((real) Q_INV_SQR, z * z, tmp); /* (q_invsqr * z^2) + (x^2 + y^2) */
^
<built-in>:35:19: note: expanded from here
#define Q_INV_SQR inf
^

--------------------------------------------------------------------------------
clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE
Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE
Error creating integral program from source
Failed to calculate likelihood
Background Epsilon (61.817300) must be >= 0, <= 1
18:13:51 (10595): called boinc_finish(1)

</stderr_txt>
]]>
ID: 67503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 45
Credit: 34,243,885
RAC: 35,122
30 million credit badge9 year member badgeextraordinary contributions badge
Message 67512 - Posted: 20 May 2018, 0:39:14 UTC - in response to Message 67503.  

Keith,

I popped something in your "New Linux system trashes all tasks" thread in the Number Crunching forum which may or may not help...

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4288

Cheers - Al.
ID: 67512 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Yet another computation-error problem

©2019 Astroinformatics Group