Welcome to MilkyWay@home

Always immediate segfault on MilkyWay@Home N-Body Simulation v1.62 (mt)

Message boards : Number crunching : Always immediate segfault on MilkyWay@Home N-Body Simulation v1.62 (mt)
Message board moderation

To post messages, you must log in.

AuthorMessage
Carsten Milkau

Send message
Joined: 10 Feb 13
Posts: 6
Credit: 1,994,863
RAC: 0
Message 65038 - Posted: 17 Aug 2016, 14:42:44 UTC

N-Body sim always crashes immediately (0 secs CPU) with segmentation fault.
    - I wasn't able to enforce a non-mt version for testing.
    - Other Milkyway@home apps run fine.
    - Ran a memtest on all cpus in parallel just to make sure it's not hardware. All fine.

For instance, see:
[1] http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1731129208
[2] http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1270553147

ID: 65038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sidd
Project developer
Project tester
Project scientist

Send message
Joined: 19 May 14
Posts: 73
Credit: 356,131
RAC: 0
Message 65039 - Posted: 17 Aug 2016, 18:59:06 UTC - in response to Message 65038.  

Hey,

Thanks for letting us know. I looked into it and it seems that this workunit ran successfully on other systems. However, on a couple of systems there was max disk usage exceeded errors. I am not sure about why that happened and am looking into it. I think perhaps this was the issue with yours but for some reason threw a different error. It might be due to the difference in operating systems, but I am not sure. I will continue looking into it.

If this continues with other workunits, please be sure to let us know,

Thanks,
Sidd
ID: 65039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Carsten Milkau

Send message
Joined: 10 Feb 13
Posts: 6
Credit: 1,994,863
RAC: 0
Message 65040 - Posted: 17 Aug 2016, 20:05:53 UTC
Last modified: 17 Aug 2016, 20:16:26 UTC

Erm, as I mentioned, this happens with *every* WU (of this app) for me.
I have more than 100 failed tasks. It's always immediate segfault.

I checked many of them, most belong to WU with both failed and successful runs (for other user). But I didnt see many other crashes, mostly the disk usage problem.

So it looks a bit specific to me. Unfortunately I dont know how zo obtain more information.

P.S. I noticed the app is statically linked. Do you use different libraries or a different compiler for nbody? The segfaults are so early they are likely still during initialization. I recently disabled kernel support for some very old compilers / c libraries.
ID: 65040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65041 - Posted: 17 Aug 2016, 23:23:43 UTC

Hey Carsten,

We statically compile with very old libraries to support some older systems which run our project. Maybe that is causing the issues.

Jake
ID: 65041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Carsten Milkau

Send message
Joined: 10 Feb 13
Posts: 6
Credit: 1,994,863
RAC: 0
Message 65043 - Posted: 18 Aug 2016, 8:59:07 UTC

I identfied three possibly related settings:
    Disabled vsyscall (breaks <glibc-2.14)
    Enabled heap randomization (breaks libc5)
    Disable uselib syscall (breaks libc5)


I'll post results. As these are security related settings, I'll changr them only temporarily and keep nbody sim disabled.

The standard milkyway and milkyway opencl apps run just fine, are you using newer libs for those?

ID: 65043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Carsten Milkau

Send message
Joined: 10 Feb 13
Posts: 6
Credit: 1,994,863
RAC: 0
Message 65044 - Posted: 18 Aug 2016, 12:36:05 UTC - in response to Message 65043.  

I successfully ran MilkyWay@Home N-Body Simulation v1.62 (mt) on the following setup:
    ENABLED vsyscall emulation (supports <glibc2.14)
    enabled heap randomization (breaks libc5)
    disabled uselib syscall (breaks libc5)



So nbody seems to use a lib requiring vsyscall, likely some glibc version prior to 2.14.

As the other milkyway apps don't require vsyscall (and there's a small security impact in emulating it), I'll just disable nbody for now, and occasionally check back to see if a newer version works.

ID: 65044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Always immediate segfault on MilkyWay@Home N-Body Simulation v1.62 (mt)

©2024 Astroinformatics Group