Welcome to MilkyWay@home

Posts by christophe

1) Message boards : Application Code Discussion : Signal 11 on x86_64 for milkyway_separation_1.02 (Message 60638)
Posted 21 Dec 2013 by christophe
Post:
Thanks.
I reallized this is the same problem as reported on the thread "All Milkyway@Home 1.02 tasks ending in computation error on HD6950."
2) Message boards : Application Code Discussion : Signal 11 on x86_64 for milkyway_separation_1.02 (Message 60626)
Posted 19 Dec 2013 by christophe
Post:
Hi,

milkyway_separation_1.02_x86_64-pc-linux-gnu__opencl_amd_ati always failed on my machine, despite other programs such as milkyway_separation__modified_fit_1.28_x86_64-pc-linux-gnu__opencl_amd_ati runs fine.

I run other GPU boinc projects without any problems.

I was running on 13.4 drivers and just updated to 13.11 Beta V9.4, but nothing changed.

I have been investigating this with gdb.

Here is the error:
gdb ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_1.02_x86_64-pc-linux-gnu__opencl_amd_ati
> r -a astronomy_parameters.txt

.....

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff290b700 (LWP 5586)]
0x00007ffff496a7f5 in gpu::VirtualGPU::setActiveKernelDesc(amd::NDRangeContainer const&, gpu::Kernel const*) () from /usr/lib/libamdocl64.so
(gdb) bt
#0 0x00007ffff496a7f5 in gpu::VirtualGPU::setActiveKernelDesc(amd::NDRangeContainer const&, gpu::Kernel const*) () from /usr/lib/libamdocl64.so
#1 0x00007ffff496aabd in gpu::VirtualGPU::submitKernelInternal(amd::NDRangeContainer const&, amd::Kernel const&, unsigned char const*, bool) () from /usr/lib/libamdocl64.so
#2 0x00007ffff496f059 in gpu::VirtualGPU::submitKernel(amd::NDRangeKernelCommand&) () from /usr/lib/libamdocl64.so
#3 0x00007ffff4900810 in amd::CommandQueue::loop(device::VirtualDevice*) () from /usr/lib/libamdocl64.so
#4 0x00007ffff4901085 in amd::CommandQueue::Thread::run(void*) () from /usr/lib/libamdocl64.so
#5 0x00007ffff4917321 in amd::Thread::main() () from /usr/lib/libamdocl64.so
#6 0x00007ffff491487c in amd::Thread::entry(amd::Thread*) () from /usr/lib/libamdocl64.so
#7 0x00007ffff74bae0e in start_thread (arg=0x7ffff290b700) at pthread_create.c:311
#8 0x00007ffff71ef9ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


The crash obviously occurs inside AMD library, and disassembing the faulting line:
0x00007ffff496a7ea <+26>: sub $0x68,%rsp
0x00007ffff496a7ee <+30>: mov 0x188(%rdx),%rax
=> 0x00007ffff496a7f5 <+37>: mov 0x10(%rax),%rbx

(gdb) p $rax
$1 = 0


-> Something NULL is passed in the OpenCl program/execution environment. Very probably a struct.

The execution stack from milkyway code, is as follows
(gdb) info threads
Id Target Id Frame
* 3 Thread 0x7ffff290b700 (LWP 5586) "milkyway_separa" 0x00007ffff496aabd in gpu::VirtualGPU::submitKernelInternal(amd::NDRangeContainer const&, amd::Kernel const&, unsigned char const*, bool) () from /usr/lib/libamdocl64.so
2 Thread 0x7ffff7ff6700 (LWP 5585) "milkyway_separa" 0x00007ffff71c049d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
1 Thread 0x7ffff7fd5700 (LWP 5582) "milkyway_separa" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85


(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fd5700 (LWP 5582))]

(gdb) bt
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x00007ffff49167c0 in amd::Semaphore::wait() () from /usr/lib/libamdocl64.so
#2 0x00007ffff4912b1f in amd::Monitor::wait() () from /usr/lib/libamdocl64.so
#3 0x00007ffff48ff750 in amd::Event::awaitCompletion() () from /usr/lib/libamdocl64.so
#4 0x00007ffff49001bb in amd::CommandQueue::finish() () from /usr/lib/libamdocl64.so
#5 0x00007ffff48d87c7 in clFinish () from /usr/lib/libamdocl64.so
#6 0x000000000044c419 in ?? ()
#7 0x000000000044c6ed in ?? ()
#8 0x000000000044cbda in integrateCL ()
#9 0x0000000000445c95 in evaluate ()
#10 0x00000000004437d6 in main ()

Since the symbols above evaluate() and integrateCL() are stripped, I can't pinpoint the cause of the crash.

Am I the only x86_64 Linux user to have this error?

I'm running debian stable with an HD6900.




©2024 Astroinformatics Group