Welcome to MilkyWay@home

Posts by Tomas Brod

1) Message boards : News : [RESOLVED] Problems with Receiving Email for Resetting Passwords (Message 68776)
Posted 22 May 2019 by Tomas Brod
Post:
For users trying to reset passwords: Boinc allows you to log in using so called strong account key. This key is by default written in a file on computers attached to the project.
And nope. It seems that this great option has been removed from boinc.
2) Message boards : News : [RESOLVED] Problems with Receiving Email for Resetting Passwords (Message 68775)
Posted 22 May 2019 by Tomas Brod
Post:
Probably easiest would be to set up a google mail account as a -SMTP smart host- for sending the emails from boinc server. It seems that google enforces higher degree of anti-spam policy, it could be SPF records, signatures, correct reverse records or just being on a spam blacklist.
3) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67995)
Posted 9 Jan 2019 by Tomas Brod
Post:
Maybe Jack or Eric are interested on updating the official application.
Hope they get a look at the forums. ;-)

The code has been merged into official repository, but I do not know whether it was deployed to boinc server.
4) Message boards : Number crunching : Datanase Unavailability (Message 67923)
Posted 3 Dec 2018 by Tomas Brod
Post:
I can confirm. The database outages occur regularly every day late morning (9-12 UTC). It looks like someone is working on the db server or there is a scheduled backup task that is taking too long.
Increasing the maximum number of tasks per host would really help with keeping our queue full. I might attach a backup project with share=0.
5) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67880)
Posted 6 Nov 2018 by Tomas Brod
Post:
I doubt this will be useful, but I put my custom milkyway separation binary here http://www.tbrada.eu/up/363f27ebac1a27b6715609c245555881 and app info here http://www.tbrada.eu/up/6826f16895d61bf7e069dcd4acf33c21.xml.
6) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67879)
Posted 6 Nov 2018 by Tomas Brod
Post:
N-Body wasn't even supposed to work. That's why I could not get it to work.
7) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67874)
Posted 3 Nov 2018 by Tomas Brod
Post:
I got it to work. At least Separation. N-Body still does not work because it is stupid.
8) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67873)
Posted 3 Nov 2018 by Tomas Brod
Post:
That allowed me to run Separation jobs on Polaris. NBody fails to load/compile on both Polaris and Tahiti and Separation fails to "Failed to calculate integral 0" on Tahiti.
9) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67872)
Posted 3 Nov 2018 by Tomas Brod
Post:
I think I fixed it. Will submit PR once i finalize it.


Done in https://github.com/Milkyway-at-home/milkywayathome_client/pull/62!

The problem was that it was using different type in declaration of inline kernel source size than in definition. This resulted in size in order of terabytes, which crashed the compiler.

Also I had to do these changes to build non-static debug-enabled binaries https://github.com/gridcoin-community/milkywayathome_client/pull/1. I did not submit that one to MW.
10) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67871)
Posted 3 Nov 2018 by Tomas Brod
Post:
I think I fixed it. Will submit PR once i finalize it.
11) Message boards : Application Code Discussion : Running MilkyWay@Home GPU tasks on Linux with Mesa OpenCL (Message 67870)
Posted 3 Nov 2018 by Tomas Brod
Post:
This way I repeatedly need to overwrite the Boinc-downloaded precompiled application with the custom-built one every time I start Boinc. Is there a way how to prevent Boinc from stubbornly re-downloading the malfunctioning prebuilt application?


Use app_info.xml file. See boinc anonymous platform. You can find values for <app>, <file>, and <app_version> tags in your client_state.xml .
12) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67869)
Posted 3 Nov 2018 by Tomas Brod
Post:
Similar to what the Einstein users are having to do with Nvidia Turing cards.

It is either the applications are performing a function that crashes the driver or the drivers are having issues performing a valid function that only the MW and Einstein apps are exposing.


On my system, Einstein apps work just fine. I do not have Turing card, however.
13) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67867)
Posted 2 Nov 2018 by Tomas Brod
Post:
I now created/copied a simple program that would just compile, load and execute the opencl kernel with the same Flags as milkyway used. And that did not crash!
14) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67866)
Posted 2 Nov 2018 by Tomas Brod
Post:
I had normal stable mesa and opencl-mesa release.

Adding backtrace with development debug version of Mesa (OpenCL 1.1 Mesa 18.3.0-devel (git-9007c0ed26)) installed. This is almost definitely error in the driver and may be even another issue. yeah:

Thread 24 "milkyway_s:sh0" received signal SIGSEGV, Segmentation fault.                                             
[Switching to Thread 0x7fff937fe700 (LWP 10873)]                                                                    
0x00007ffff7fa07b4 in gelf_getshdr () from /usr/lib/libelf.so.1                                                     
(gdb) bt                                                                                                            
#0  0x00007ffff7fa07b4 in gelf_getshdr () from /usr/lib/libelf.so.1                                                 
#1  0x00007fffefae3d87 in ac_elf_read (elf_data=<optimized out>, elf_size=<optimized out>,                          
    binary=binary@entry=0x55555611e118) at common/ac_binary.c:135                                                   
#2  0x00007fffefaebd97 in ac_compile_module_to_binary (p=p@entry=0x555555cb92d0,                                    
    module=module@entry=0x7fff84005de0, binary=binary@entry=0x55555611e118)                                         
    at /usr/include/llvm/ADT/StringRef.h:138                                                                        
#3  0x00007fffefaab48d in si_llvm_compile (M=M@entry=0x7fff84005de0, binary=binary@entry=0x55555611e118,            
    compiler=compiler@entry=0x555555cdea68, debug=debug@entry=0x55555611e018,                                       
    less_optimized=less_optimized@entry=false) at si_shader_tgsi_setup.c:103                                        
#4  0x00007fffefaa1137 in si_compile_llvm (sscreen=sscreen@entry=0x555555cde350,         
    binary=binary@entry=0x55555611e118, conf=conf@entry=0x55555611e168, compiler=compiler@entry=0x555555cdea68,     
    mod=0x7fff84005de0, debug=debug@entry=0x55555611e018, processor=5, name=0x7fffefb312d1 "Compute Shader",        
    less_optimized=false) at si_shader.c:5599                                                                       
#5  0x00007fffefaa2937 in si_compile_tgsi_shader (sscreen=0x555555cde350, compiler=0x555555cdea68,            
    shader=0x55555611e058, debug=0x55555611e018) at si_shader.c:6734                                                
#6  0x00007fffefaa3755 in si_shader_create (sscreen=sscreen@entry=0x555555cde350,                                
    compiler=compiler@entry=0x555555cdea68, shader=shader@entry=0x55555611e058, debug=debug@entry=0x55555611e018)   
    at si_shader.c:8045                                                                        
#7  0x00007fffefa7d125 in si_create_compute_state_async (job=job@entry=0x55555611dff0,                      
    thread_index=thread_index@entry=0) at si_compute.c:152                                                         
#8  0x00007fffefa43c79 in util_queue_thread_func (input=input@entry=0x555555cdd600) at u_queue.c:286                
#9  0x00007fffefa43937 in impl_thrd_routine (p=<optimized out>) at ../../include/c11/threads_posix.h:87             
#10 0x00007ffff7dbba9d in start_thread () from /usr/lib/libpthread.so.0                                        
#11 0x00007ffff7cebb23 in clone () from /usr/lib/libc.so.6                                                          (gdb) frame 1                                                                                                       #1  0x00007fffefae3d87 in ac_elf_read (elf_data=<optimized out>, elf_size=<optimized out>,                         
    binary=binary@entry=0x55555611e118) at common/ac_binary.c:135
135                     if (gelf_getshdr(section, &section_header) != &section_header) {                          
(gdb) p section                                                                                                     
$1 = (Elf_Scn *) 0x7fff84043da8                                                                                    
(gdb) p section_header                                                                                             
$2 = {sh_name = 0, sh_type = 0, sh_flags = 0, sh_addr = 68719476736, sh_offset = 140735667996976,                   
  sh_size = 140737323335464, sh_link = 0, sh_info = 0, sh_addralign = 93825000247984, sh_entsize = 0}
15) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67865)
Posted 2 Nov 2018 by Tomas Brod
Post:
Thanks. I definitely have OpenCL support installed. I tried to install the proprietary OpenCL component but ran into dependency issue. I could continue, but instead I took different path.

I recompiled the milkyway separation app from source codes with debugging symbols enabled and static linking disabled. First I just plugged my custom app into boinc (via app_info.xml), but that crashed the same way!

Then I pulled random separation WU from boinc and tried to run the app in debugger. It unsurprisingly crashed again, but this time I got the back trace. It appears the crash is in `gelf_getshdr` function from libelf.so, called by clBuildProgram from libOpenCL.so library. This means

A) the opencl driver/compiler has bug and crashes trying to load the code

B) the milkyway opencl code and/or cl flags are problematic

Trace follows:
Thread 1 "milkyway_separa" received signal SIGSEGV, Segmentation fault.
0x00007ffff7fb47b4 in gelf_getshdr () from /usr/lib/libelf.so.1
(gdb) bt
#0  0x00007ffff7fb47b4 in gelf_getshdr () from /usr/lib/libelf.so.1
#1  0x00007ffff7b3e8ab in ?? () from /usr/lib/libMesaOpenCL.so.1
#2  0x00007ffff7b39ed4 in ?? () from /usr/lib/libMesaOpenCL.so.1
#3  0x00007ffff7ae9c68 in ?? () from /usr/lib/libMesaOpenCL.so.1
#4  0x00007ffff7acce1b in ?? () from /usr/lib/libMesaOpenCL.so.1
#5  0x00007ffff7de5d9b in clBuildProgram () from /usr/lib/libOpenCL.so.1
#6  0x0000555555668f68 in mwBuildProgram (program=0x55555616dbe8, device=0x555555831078, 
    options=0x555556178310 "-D DOUBLEPREC=1 -cl-mad-enable -cl-no-signed-zeros -cl-finite-math-only -D BACKGROUND_PROFILE=1 -D AUX_BG_PROFILE=0 -D NSTREAM=4 -D CONVOLVE=120 -D R0=12 -D SUN_R0=8.5 -D Q_INV_SQR=3.69822485207101 -D BG_A=0 -D BG_B=0 -D BG_C=0 -D BACKGROUND_WEIGHT=0.99 -D THICK_DISK_WEIGHT=0.01 -D INNERPOWER=1 -D OUTERPOWER=1 -D ALPHA_DELTA_3=3 ") at /home/tomas/downloads/milkywayathome_client/milkyway/src/milkyway_cl_program.c:99
#7  0x00005555556693af in mwCreateProgramFromSrc (ci=0x7fffffffd3a0, srcCount=1, src=0x7fffffffd2c0, 
    lengths=0x7fffffffd2c8, 
    compileDefs=0x555556178310 "-D DOUBLEPREC=1 -cl-mad-enable -cl-no-signed-zeros -cl-finite-math-only -D BACKGROUND_PROFILE=1 -D AUX_BG_PROFILE=0 -D NSTREAM=4 -D CONVOLVE=120 -D R0=12 -D SUN_R0=8.5 -D Q_INV_SQR=3.69822485207101 -D BG_A=0 -D BG_B=0 -D BG_C=0 -D BACKGROUND_WEIGHT=0.99 -D THICK_DISK_WEIGHT=0.01 -D INNERPOWER=1 -D OUTERPOWER=1 -D ALPHA_DELTA_3=3 ") at /home/tomas/downloads/milkywayathome_client/milkyway/src/milkyway_cl_program.c:223
#8  0x00005555555e90f0 in setupSeparationCL (ci=0x7fffffffd3a0, ap=0x7fffffffdc80, ias=0x5555557e3280, 
    clr=0x7fffffffdc20) at /home/tomas/downloads/milkywayathome_client/separation/src/setup_cl.c:600
#9  0x00005555555deca6 in evaluate (results=0x5555557df930, ap=0x7fffffffdc80, ias=0x5555557e3280, 
    streams=0x7fffffffdbc0, sc=0x5555557e0180, likelihoodToText=0, starPointsFile=0x5555557d52b0 "stars.txt", 
    clr=0x7fffffffdc20, do_separation=0, ignoreCheckpoint=0x7fffffffdb9c, separation_outfile=0x0)
    at /home/tomas/downloads/milkywayathome_client/separation/src/evaluation.c:249
#10 0x00005555555de2c4 in worker (sf=0x7fffffffdd90)
    at /home/tomas/downloads/milkywayathome_client/separation/src/separation_main.c:688
#11 0x00005555555de572 in main (argc=3, argv=0x7fffffffe038)
    at /home/tomas/downloads/milkywayathome_client/separation/src/separation_main.c:784
(gdb) frame 8
(gdb) p compileFlags
$6 = 0x555556178310 "-D DOUBLEPREC=1 -cl-mad-enable -cl-no-signed-zeros -cl-finite-math-only -D BACKGROUND_PROFILE=1 -D AUX_BG_PROFILE=0 -D NSTREAM=4 -D CONVOLVE=120 -D R0=12 -D SUN_R0=8.5 -D Q_INV_SQR=3.69822485207101 -D BG_A=0 -D BG_B=0 -D BG_C=0 -D BACKGROUND_WEIGHT=0.99 -D THICK_DISK_WEIGHT=0.01 -D INNERPOWER=1 -D OUTERPOWER=1 -D ALPHA_DELTA_3=3 "
16) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67862)
Posted 2 Nov 2018 by Tomas Brod
Post:
Keith, I am confused by your reply.
I already have OpenCL drivers, including opencl header and can run tasks from Amicable, Einstein, Primegrid and Collatz no problem. The instructions talk about installing the opencl part of amdgpu. Do yo think there is some file missing from the stuff that I have already installed? In that case, the app still should not segfault, but print error.
I did try to install the amdgpu opencl part using my distribution recommended way, but it crashes all opencl becaue it requires deprecated version of libdrm. I did not try to install it directly, but I will, to see if there was a file missing.
17) Message boards : Number crunching : Segfault on Linux, AMD Radeon, open source Mesa drivers (Message 67860)
Posted 1 Nov 2018 by Tomas Brod
Post:
Hello. The app crashes shortly after initialization.

Kernel options releated to VSYSCALL:

CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_X86_VSYSCALL_EMULATION=y
# CONFIG_LEGACY_VSYSCALL_EMULATE is not set
CONFIG_LEGACY_VSYSCALL_NONE=y


A bit of clinfo:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 18.2.3
  Platform Extensions                             cl_khr_icd
  Device Name                                     Radeon RX 560 Series (POLARIS11, DRM 3.26.0, 4.18.16-arch1-1-ARCH, LLVM 7.0.0)
  Device Version                                  OpenCL 1.1 Mesa 18.2.3
  Driver Version                                  18.2.3
  Device OpenCL C Version                         OpenCL C 1.1 
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.12
  ICD loader Profile                              OpenCL 2.2


Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4
' 
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 4 </number_WUs>
<number_params_per_WU> 26 </number_params_per_WU>
Using AVX path
Found 1 platform
Platform 0 information:
  Name:       Clover
  Version:    OpenCL 1.1 Mesa 18.2.3
  Vendor:     Mesa
  Extensions: cl_khr_icd
  Profile:    FULL_PROFILE
Didn't find preferred platform
Using device 0 on platform 0
Found 2 CL devices
Device 'Radeon RX 560 Series (POLARIS11, DRM 3.26.0, 4.18.16-arch1-1-ARCH, LLVM 7.0.0)' (AMD:0x1002) (CL_DEVICE_TYPE
_GPU)
Board: 
Driver version:      18.2.3
Version:             OpenCL 1.1 Mesa 18.2.3
Compute capability:  0.0
Max compute units:   16
Clock frequency:     1300 Mhz
Global mem size:     3221225472
Local mem size:      32768
Max const buf size:  2147483647
Double extension:    cl_khr_fp64
SIGSEGV: segmentation violation

Exiting...


I will provide details on request.
18) Message boards : Number crunching : Server low on work? (Message 64477)
Posted 18 Apr 2016 by Tomas Brod
Post:
You are lucky to get GPU work from SETI, unlike me. World community have no GPU work. They had in the past but as of 2weeks ago I checked they had none. Check your event log if OpenCL is detected correctly.
19) Message boards : Number crunching : Setting MilkyWay@Home parameters (Message 64401)
Posted 22 Mar 2016 by Tomas Brod
Post:
On my system the parameters have no observable effect (my nVidia is not used to render desktop and I do not have screen-saver).

I believe "Frequency" controls the kernel size,
and the rest controls the screen-saver.

I set the "Frequency" parameter to 1, because "Lower may be faster" and I have no lag.
20) Message boards : Number crunching : Wasting CPU time in syscalls (Message 64377)
Posted 12 Mar 2016 by Tomas Brod
Post:
I noticed this while experimenting with running two tasks on my single gpu. CPU usage of the GPU app increased from 5% to 60%. This made me to check the milkyway process with htop strace feature and I found that milkyway is performing approximately 100 calls to clock_gettime(CLOCK_MONOTONIC_RAW...) in 100ms every 300ms. This spam of sycalls also happens when running SINGLE milkyway task and nothing more on the machine.
I do not think the app really needs to know current time such often. Does the clock_gettime call enable communication with the gpu? I would expect some reads or ioctls.

I am talking about app name milkyway and milkyway_separation... plan_class opencl_nvidia obviously on Linux (weed) 4.2.0-1-amd64, nVidia GeF840M dirver 352.79.

The same I noticed on primegrid genefer. On the other hand Einsten and Asteroids tasks are quietly waiting on some FUTEX locks.




©2024 Astroinformatics Group