Message boards :
Application Code Discussion :
Signal 11 on x86_64 for milkyway_separation_1.02
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Dec 11 Posts: 2 Credit: 9,650,503 RAC: 0 |
Hi, milkyway_separation_1.02_x86_64-pc-linux-gnu__opencl_amd_ati always failed on my machine, despite other programs such as milkyway_separation__modified_fit_1.28_x86_64-pc-linux-gnu__opencl_amd_ati runs fine. I run other GPU boinc projects without any problems. I was running on 13.4 drivers and just updated to 13.11 Beta V9.4, but nothing changed. I have been investigating this with gdb. Here is the error: gdb ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_1.02_x86_64-pc-linux-gnu__opencl_amd_ati > r -a astronomy_parameters.txt ..... Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffff290b700 (LWP 5586)] 0x00007ffff496a7f5 in gpu::VirtualGPU::setActiveKernelDesc(amd::NDRangeContainer const&, gpu::Kernel const*) () from /usr/lib/libamdocl64.so (gdb) bt #0 0x00007ffff496a7f5 in gpu::VirtualGPU::setActiveKernelDesc(amd::NDRangeContainer const&, gpu::Kernel const*) () from /usr/lib/libamdocl64.so #1 0x00007ffff496aabd in gpu::VirtualGPU::submitKernelInternal(amd::NDRangeContainer const&, amd::Kernel const&, unsigned char const*, bool) () from /usr/lib/libamdocl64.so #2 0x00007ffff496f059 in gpu::VirtualGPU::submitKernel(amd::NDRangeKernelCommand&) () from /usr/lib/libamdocl64.so #3 0x00007ffff4900810 in amd::CommandQueue::loop(device::VirtualDevice*) () from /usr/lib/libamdocl64.so #4 0x00007ffff4901085 in amd::CommandQueue::Thread::run(void*) () from /usr/lib/libamdocl64.so #5 0x00007ffff4917321 in amd::Thread::main() () from /usr/lib/libamdocl64.so #6 0x00007ffff491487c in amd::Thread::entry(amd::Thread*) () from /usr/lib/libamdocl64.so #7 0x00007ffff74bae0e in start_thread (arg=0x7ffff290b700) at pthread_create.c:311 #8 0x00007ffff71ef9ed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 The crash obviously occurs inside AMD library, and disassembing the faulting line: 0x00007ffff496a7ea <+26>: sub $0x68,%rsp 0x00007ffff496a7ee <+30>: mov 0x188(%rdx),%rax => 0x00007ffff496a7f5 <+37>: mov 0x10(%rax),%rbx (gdb) p $rax $1 = 0 -> Something NULL is passed in the OpenCl program/execution environment. Very probably a struct. The execution stack from milkyway code, is as follows (gdb) info threads Id Target Id Frame * 3 Thread 0x7ffff290b700 (LWP 5586) "milkyway_separa" 0x00007ffff496aabd in gpu::VirtualGPU::submitKernelInternal(amd::NDRangeContainer const&, amd::Kernel const&, unsigned char const*, bool) () from /usr/lib/libamdocl64.so 2 Thread 0x7ffff7ff6700 (LWP 5585) "milkyway_separa" 0x00007ffff71c049d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 1 Thread 0x7ffff7fd5700 (LWP 5582) "milkyway_separa" sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7fd5700 (LWP 5582))] (gdb) bt #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 #1 0x00007ffff49167c0 in amd::Semaphore::wait() () from /usr/lib/libamdocl64.so #2 0x00007ffff4912b1f in amd::Monitor::wait() () from /usr/lib/libamdocl64.so #3 0x00007ffff48ff750 in amd::Event::awaitCompletion() () from /usr/lib/libamdocl64.so #4 0x00007ffff49001bb in amd::CommandQueue::finish() () from /usr/lib/libamdocl64.so #5 0x00007ffff48d87c7 in clFinish () from /usr/lib/libamdocl64.so #6 0x000000000044c419 in ?? () #7 0x000000000044c6ed in ?? () #8 0x000000000044cbda in integrateCL () #9 0x0000000000445c95 in evaluate () #10 0x00000000004437d6 in main () Since the symbols above evaluate() and integrateCL() are stripped, I can't pinpoint the cause of the crash. Am I the only x86_64 Linux user to have this error? I'm running debian stable with an HD6900. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Hi, My suggestion would be to post this in the News section, they talk alot about the Separation runs there. Just make sure you get the right thread or your responses could be limited. |
Send message Joined: 24 Dec 11 Posts: 2 Credit: 9,650,503 RAC: 0 |
Thanks. I reallized this is the same problem as reported on the thread "All Milkyway@Home 1.02 tasks ending in computation error on HD6950." |
©2024 Astroinformatics Group