Welcome to MilkyWay@home

"Computation error" - only on ati gpu WUs

Message boards : Number crunching : "Computation error" - only on ati gpu WUs
Message board moderation

To post messages, you must log in.

AuthorMessage
LiSrt

Send message
Joined: 21 Mar 09
Posts: 4
Credit: 203,656
RAC: 0
Message 50612 - Posted: 9 Aug 2011, 17:00:02 UTC

I think there's something wrong with my set-up, because I have (so far) been unable to complete any GPU WUs - they error after a few seconds, every time.

BOINC appears to detect the GPUs ok, this shows up after it starts:
Tue 09 Aug 2011 17:35:41 BST		ATI GPU 0: ATI unknown (CAL version 1.4.1457, 2048MB, 3379 GFLOPS peak)
Tue 09 Aug 2011 17:35:41 BST		ATI GPU 1: ATI unknown (CAL version 1.4.1457, 2048MB, 3379 GFLOPS peak)

My drivers are up to date - 11.7.
I also installed the 2.5 SDK but I don't know if it's actually being used...

Is there anything I can do to fix this?

(OS is Linux - 64 bit)
ID: 50612 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 50617 - Posted: 9 Aug 2011, 22:47:19 UTC

I wonder if the scheduler sent you the 32-bit binaries instead of the 64-bit version, that would be one cause of the error you are seeing.

If you are comfortable with changing the application, you might try the 64-bit version with an app_info.
http://www.arkayn.us/forum/index.php?action=downloads;sa=view;down=23
ID: 50617 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 50618 - Posted: 9 Aug 2011, 23:12:47 UTC - in response to Message 50612.  

Are you using app_info? Could you try running file on the actual binary? It should be somewhere like /var/lib/boinc/projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_0.82_x86_64-pc-linux-gnu__ati14
ID: 50618 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LiSrt

Send message
Joined: 21 Mar 09
Posts: 4
Credit: 203,656
RAC: 0
Message 50627 - Posted: 10 Aug 2011, 11:34:29 UTC

I wonder if the scheduler sent you the 32-bit binaries instead of the 64-bit version, that would be one cause of the error you are seeing.

If you are comfortable with changing the application, you might try the 64-bit version with an app_info.
http://www.arkayn.us/forum/index.php?action=downloads;sa=view;down=23

I just tried this - the computation error happened like before.


Are you using app_info? Could you try running file on the actual binary? It should be somewhere like /var/lib/boinc/projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_0.82_x86_64-pc-linux-gnu__ati14

I'm not really sure what to do there, I have the original executable file that was in the milkyway folder - but I don't know how to run it on a work unit.
ID: 50627 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LiSrt

Send message
Joined: 21 Mar 09
Posts: 4
Credit: 203,656
RAC: 0
Message 50628 - Posted: 10 Aug 2011, 13:52:06 UTC

If it helps, this is the error I'm getting:

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)
</message>
<stderr_txt>
execv: No such file or directory

</stderr_txt>
]]>
ID: 50628 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 50646 - Posted: 11 Aug 2011, 7:45:37 UTC

I don't use Linux so I can't say much - but have you thought on user rights on the working DIR of BOinc?
ID: 50646 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LiSrt

Send message
Joined: 21 Mar 09
Posts: 4
Credit: 203,656
RAC: 0
Message 50648 - Posted: 11 Aug 2011, 8:05:33 UTC - in response to Message 50646.  

I don't use Linux so I can't say much - but have you thought on user rights on the working DIR of BOinc?

I don't think that's a problem, I've created a separate user called "boinc" and everything runs in the home folder /home/boinc (which is owned by that user).
ID: 50648 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 50649 - Posted: 11 Aug 2011, 12:11:14 UTC

Maybe this thread will help?
ID: 50649 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 50653 - Posted: 11 Aug 2011, 15:15:44 UTC - in response to Message 50648.  

I don't use Linux so I can't say much - but have you thought on user rights on the working DIR of BOinc?

I don't think that's a problem, I've created a separate user called "boinc" and everything runs in the home folder /home/boinc (which is owned by that user).


And this user has exec Rights (r-x) in this folder?
ID: 50653 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mattia Verga

Send message
Joined: 30 Nov 09
Posts: 4
Credit: 287,953
RAC: 0
Message 50685 - Posted: 14 Aug 2011, 11:12:08 UTC

I have the same problem... if I try to run
$ cd BOINC/projects/milkyway.cs.rpi.edu_milkyway/
$ ./milkyway_separation_0.82_x86_64-pc-linux-gnu__ati14 
bash: ./milkyway_separation_0.82_x86_64-pc-linux-gnu__ati14: /lib/ld-linux-x86-64.so.2: bad ELF interpreter: File o directory non esistente


The problem is ld-linux-x86_64.so.2 is in /lib64/ and not in /lib/.
I resolved with
ln -s /lib64/ld-linux-x86-64.so.2 /lib/ld-linux-x86-64.so.2


Now the GPU app is working :-)
ID: 50685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 50742 - Posted: 18 Aug 2011, 22:57:48 UTC - in response to Message 50685.  

Same error where tasks fail with a computation error after 2 seconds in XP Pro (32-bit) & HD3850.
Just started this project running GPU tasks on my new(!) AGP card with CCC 10.5.
Collatz & DNETC working 100% OK ???
Have set Milkyway to not request new tasks on that PC until this is fixed.
ID: 50742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 50744 - Posted: 19 Aug 2011, 3:44:07 UTC

Same error where tasks fail with a computation error after 2 seconds in XP Pro (32-bit) & HD3850.

Highly likely its the driver - you are on 1.4.636, try updating it and post again if still not working

Regards
Zy
ID: 50744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BladeD
Avatar

Send message
Joined: 2 Nov 10
Posts: 731
Credit: 131,536,342
RAC: 0
Message 50749 - Posted: 19 Aug 2011, 8:38:52 UTC - in response to Message 50612.  

I think there's something wrong with my set-up, because I have (so far) been unable to complete any GPU WUs - they error after a few seconds, every time.

BOINC appears to detect the GPUs ok, this shows up after it starts:
Tue 09 Aug 2011 17:35:41 BST		ATI GPU 0: ATI unknown (CAL version 1.4.1457, 2048MB, 3379 GFLOPS peak)
Tue 09 Aug 2011 17:35:41 BST		ATI GPU 1: ATI unknown (CAL version 1.4.1457, 2048MB, 3379 GFLOPS peak)

My drivers are up to date - 11.7.
I also installed the 2.5 SDK but I don't know if it's actually being used...

Is there anything I can do to fix this?

(OS is Linux - 64 bit)

I know nothing about Linux, but some users have problems with the latest drivers.
Have you tried an older ones?
ID: 50749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 50753 - Posted: 19 Aug 2011, 12:12:13 UTC - in response to Message 50744.  

Same error where tasks fail with a computation error after 2 seconds in XP Pro (32-bit) & HD3850.

Highly likely its the driver - you are on 1.4.636, try updating it and post again if still not working

Regards
Zy


Cat 10.5 is clearly too old to work with the ATI app.
I know it was working with cat 10.10.
So you should start from there and see what the latest cat version supporting your HD3850(AGP) is.
ID: 50753 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 50758 - Posted: 19 Aug 2011, 21:08:37 UTC

I'm using 11.3 with the AGP hotfix on my 3850 with no problems (XP Pro 32bit).
ID: 50758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 51536 - Posted: 29 Oct 2011, 16:32:59 UTC - in response to Message 50758.  

Just got back to this after some non-graphics card hardware updates.

OK, have just completed a task that validated OK.

Run ......... CPU
Time ........ Time
(sec) ....... (sec)
-------------------
462.31 .... 20.30
(7.7 minutes)

WOW that's fast! (For this rig, anway.)

CCC details:-
Driver Packaging Version 8.881-110728a-122947E-ATI
Catalystâ„¢ Version 11.8
Provider AMD Technologies Inc.
2D Driver Version 6.14.10.7213
Direct3D Version 6.14.10.0855
OpenGL Version 6.14.10.11005
Catalystâ„¢ Control Center Version 2011.0728.1723.29300
AIW/VIVO WDM Driver Version 6.14.10.6238
AIW/VIVO WDM SP Driver Version 6.14.10.6238
ID: 51536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 51539 - Posted: 30 Oct 2011, 0:51:06 UTC - in response to Message 51536.  
Last modified: 30 Oct 2011, 0:55:37 UTC

... PS. Yes this PC (Computer [ID] 231173) now shows in BOINC as having 'Coprocessors':-

"CAL ATI Radeon HD 3800 (RV670) (512MB) driver: 1.4.1523"

although I have no idea where it gets the driver reference 1.4.1523 ... that exact driver reference only appears in BOINC files on this PC.

Can someone point me in the right direction so that I can cross-reference these utterly confusing GPU driver references? Please?

(FWIW if I Google e.g., "1.4.1523" all I get is a string of links related to BOINC.)

Alternatively, I respectfully suggest that maybe references to whatever BOINC calls the driver should be replaced or quantified by the correct ATI driver ref - in ATI's format.

Otherwise the BOINC driver reference appears meaningless (with the greatest respect, if I'm simply missing something obvious here).

Summary:-

In (XP) Computer/Manage/Device Manager/Display adapters/ATI Radeon HD 3850 AGP the driver shows up as ... 8.881.0.0

In BOINC the driver shows as ... 1.4.1523
ID: 51539 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 51621 - Posted: 7 Nov 2011, 2:06:36 UTC - in response to Message 51539.  
Last modified: 7 Nov 2011, 2:07:25 UTC

OK, updated everything as detailed before
All was OK for a while and processed a ton of WU's e.g., WU 14197156 etc, etc
Then updated a few BOINC preferences.
Now processing errors after 4 seconds on all MW@H tasks due to the old "Computation Error" problem.

HELP!

FWIW this problem does not occur running Collatz GPU tasks on the same PC, same preferences e.g., WU 42570881 etc post updated BOINC preferences
ID: 51621 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 51634 - Posted: 8 Nov 2011, 0:10:52 UTC - in response to Message 51621.  

Well, this PC is now processing GPU tasks OK

ps_separation_13_3s_fix20_2_5993594_1 finished
ps_separation_13_3s_fix20_2_593217_1 78% complete etc

Very strange.
ID: 51634 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : "Computation error" - only on ati gpu WUs

©2024 Astroinformatics Group