Welcome to MilkyWay@home

Is the de_separation_17_3s_fix_4_* series broken?

Message boards : Number crunching : Is the de_separation_17_3s_fix_4_* series broken?
Message board moderation

To post messages, you must log in.

AuthorMessage
Jesse Viviano

Send message
Joined: 4 Feb 11
Posts: 86
Credit: 60,913,150
RAC: 0
Message 47330 - Posted: 9 Apr 2011, 23:42:37 UTC

I noticed a large number of work units in my work queue whose names start with "de_separation_17_3s_fix_4_", and they seem to generate plenty of compute errors. Could someone look into these work units? My GPU has spent twelve minutes on one of these and still is stuck at 0% progress, when it normally crunches a task in one to two minutes.
ID: 47330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 47333 - Posted: 10 Apr 2011, 0:57:39 UTC - in response to Message 47330.  

I noticed a large number of work units in my work queue whose names start with "de_separation_17_3s_fix_4_", and they seem to generate plenty of compute errors. Could someone look into these work units? My GPU has spent twelve minutes on one of these and still is stuck at 0% progress, when it normally crunches a task in one to two minutes.


If you're using an older version of one of the optimized CPU applications or ATI GPU applications (0.23 and lower) those have been deprecated in part to deal with some of our server issues. The old applications required an input and an output file, while the new ones do not. With the server being down we're taking this time to update the code to no longer generate those input files and accept those output files to help improve the stability and performance of the server.

If you're using the current ATI application supplied by the server (0.57), then this news item is the place to put your error:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2333#47329

So Matt A. has a common place to see what issues are happening with the application so he can debug it and get it running smoothly for everyone.
ID: 47333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 4 Feb 11
Posts: 86
Credit: 60,913,150
RAC: 0
Message 47338 - Posted: 10 Apr 2011, 1:11:11 UTC

I am using stock applications, and the problems seem to be happening to all of the applications I have seen trying to chew on them and then choking on them (like former president Bush choking on a pretzel), including the CPU and the ATI GPU clients. Therefore, I doubt that this is related at all to the ATI GPU client if it is choking up CPU clients.

Surprisingly, none of these that have landed in my queue have been sent to an Nvidia GPU yet.

I will cross-post this in the thread you indicated, though.
ID: 47338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 4 Feb 11
Posts: 86
Credit: 60,913,150
RAC: 0
Message 47382 - Posted: 10 Apr 2011, 2:32:52 UTC

I finally found a work unit showing an Nvidia client choking on a work unit from this series along with an ATI client: work unit 1133. Now, we have work units from this series that fail on ATI/AMD GPUs (most of them from this series that I have seen), CPUs (work unit 999 which also includes ATI/AMD clients), and Nvidia GPUs (work unit 1133). Therefore, I think that this set of errors has nothing to do with the new GPU client. I also have noticed malloc errors and/or errors reading parameters in the stderr output of all of these work units in this series.
ID: 47382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 47429 - Posted: 10 Apr 2011, 9:55:44 UTC

Sorry, Ibelieve I should have asked over here rather than Chris_S's thread?

I restarted my HD5970 on Milkyway, and it picked up the new client (milkyway_0.57_windows_intelx86_ati14) and this PC is progressing them well (averaging 129-223 seconds). NOTE: I also se that the project folder has the client milkyway_0.23_windows_intelx86_ati13ati present).

I am having problems starting my HD3850 back on Milkyway. I have both reset the project and detached-reattached (via BAM) and downloaded the same client as highlighted above for the HD5970. The result has been computational errors all the way.

I would say the diggerence in ATI drivers between the 2 GPUs is - forthe HD5970 I run the 10.11 (1.4.880) with APP and for the HD3850 it is 1.4.556 with no APP drivers included.

Anyone have an inkling as to why the HD3850 is trashing the new work? Or are we in a new position that the old seried ATI GPUs are now considered redundant for Milkyway?
Go away, I was asleep


ID: 47429 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 4 Feb 11
Posts: 86
Credit: 60,913,150
RAC: 0
Message 47495 - Posted: 10 Apr 2011, 15:04:48 UTC - in response to Message 47429.  

You need to update the driver on the HD 3850 to a driver that includes the APP driver. This provides the libraries needed to do GPGPU.
ID: 47495 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Is the de_separation_17_3s_fix_4_* series broken?

©2024 Astroinformatics Group