Welcome to MilkyWay@home

Something changed with linux crunching

Message boards : Number crunching : Something changed with linux crunching
Message board moderation

To post messages, you must log in.

AuthorMessage
nairb

Send message
Joined: 17 Feb 09
Posts: 24
Credit: 3,432,392
RAC: 23
Message 42758 - Posted: 11 Oct 2010, 17:57:44 UTC

Started a dual intel machine up running fedora core 10 linux. This machine has done MW w/u's before fine. Not this time. Downloaded the first w/u and it errored straight away with "SIGILL: illegal instruction",
This machine is running seti and Einstein fine.
Its fc10 (2.6.27.5-117.fc10.i686)
Same thing with a fedora core 8 linux with dual AMD cpus. (2.6.23.1-42.fc8) This has run seti for ages fine.

Both machines used to run MW fine. Any thing changed?.

Nairb
ID: 42758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bob Greenwood

Send message
Joined: 29 Sep 10
Posts: 1
Credit: 6,548
RAC: 0
Message 42760 - Posted: 11 Oct 2010, 18:24:49 UTC - in response to Message 42758.  
Last modified: 11 Oct 2010, 18:27:55 UTC

Seeing similar issue this morning on my linux box. Found boinc client down... on attempted restart got the following:
...
11-Oct-2010 11:10:48 [Milkyway@home] Restarting task de_11_2s_5_609775_1286475414_2 using milkyway version 4
SIGSEGV: segmentation violation
Stack trace (10 frames):
/usr/lib64/libboinc.so.6(boinc_catch_signal+0x4d)[0x7f23f6d8bd2d]
/lib64/libc.so.6[0x3015032a20]
/usr/lib64/libboinc.so.6(_ZN14APP_CLIENT_SHM10reset_msgsEv+0x26)[0x7f23f6d87356]
boinc[0x411f0f]
boinc[0x412633]
boinc[0x42908c]
boinc[0x418304]
boinc[0x4490ec]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x301501ec5d]
boinc[0x4076f9]
Exiting...

Running: Linux 2.6.34.7-56.fc13.x86_64

Bob
ID: 42760 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 42761 - Posted: 11 Oct 2010, 18:33:55 UTC - in response to Message 42758.  

Started a dual intel machine up running fedora core 10 linux. This machine has done MW w/u's before fine. Not this time. Downloaded the first w/u and it errored straight away with "SIGILL: illegal instruction",
This machine is running seti and Einstein fine.
Its fc10 (2.6.27.5-117.fc10.i686)
Same thing with a fedora core 8 linux with dual AMD cpus. (2.6.23.1-42.fc8) This has run seti for ages fine.



This is probably my fault. It looks like I accidentally built the 32 bit Linux one with SSE2 and the old processor doesn't have it.
ID: 42761 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 42762 - Posted: 11 Oct 2010, 18:35:41 UTC - in response to Message 42760.  

Seeing similar issue this morning on my linux box. Found boinc client down... on attempted restart got the following:
...
11-Oct-2010 11:10:48 [Milkyway@home] Restarting task de_11_2s_5_609775_1286475414_2 using milkyway version 4
SIGSEGV: segmentation violation
Stack trace (10 frames):
/usr/lib64/libboinc.so.6(boinc_catch_signal+0x4d)[0x7f23f6d8bd2d]
/lib64/libc.so.6[0x3015032a20]
/usr/lib64/libboinc.so.6(_ZN14APP_CLIENT_SHM10reset_msgsEv+0x26)[0x7f23f6d87356]
boinc[0x411f0f]
boinc[0x412633]
boinc[0x42908c]
boinc[0x418304]
boinc[0x4490ec]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x301501ec5d]
boinc[0x4076f9]
Exiting...

Running: Linux 2.6.34.7-56.fc13.x86_64

Bob


Not sure about this one.
ID: 42762 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nairb

Send message
Joined: 17 Feb 09
Posts: 24
Credit: 3,432,392
RAC: 23
Message 42764 - Posted: 11 Oct 2010, 19:24:21 UTC - in response to Message 42761.  
Last modified: 11 Oct 2010, 19:25:26 UTC

Haaaaa, I thought something must have changed. 2 out of 2 machines with a fault looked a bit iffy.
Are we going to have a new app?.

I only have 1 windoz pc. The others are all linux of one sort or another

Ta
Nairb
ID: 42764 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nairb

Send message
Joined: 17 Feb 09
Posts: 24
Credit: 3,432,392
RAC: 23
Message 43114 - Posted: 23 Oct 2010, 18:38:13 UTC
Last modified: 23 Oct 2010, 18:39:20 UTC

Ummmm is this a linux problem or an old cpu problem. I dont have a "modern" cpu running linux to check. And I dont have an old cpu running windows xp to check.
If its a SSE2 issue its not an linux problem is it??.

Just checking

Ta
Nairb
ID: 43114 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mhhall21207

Send message
Joined: 9 May 10
Posts: 3
Credit: 1,589,065
RAC: 0
Message 44115 - Posted: 23 Nov 2010, 14:16:48 UTC

I seem to be seeing a problem on my CentOS 5.5 x64 machine running a nVidia GTX 275. System appears to have received a recent kernel update (over weekend). As a result, I also upgraded the GPU device drivers from nVidia (dtd 11/11/10). Since then, I seem to be seeing frequent compute errors
(machine ID 194981).... most message indicate within task info following error:
Error executing gpu__integral_kernel3 error message: unknown error


Any suggestions for repair?
ID: 44115 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mhhall21207

Send message
Joined: 9 May 10
Posts: 3
Credit: 1,589,065
RAC: 0
Message 44433 - Posted: 30 Nov 2010, 1:32:34 UTC - in response to Message 44115.  

I have since moved my NVIDIA driver back to the prior release (oct 2010) and driver / GPU seem to be performing well on this project and others.
ID: 44433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Something changed with linux crunching

©2024 Astroinformatics Group