Welcome to MilkyWay@home

Problem Clients

Message boards : Number crunching : Problem Clients
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 47901 - Posted: 15 Apr 2011, 16:20:56 UTC

There are many CPU clients spewing out invalid results at a rate of ~1/second. This is causing a lot of WUs to be flagged as invalid as the number of reties exceed what's allowed. The worst offenders seem to be:

Error while computing 0.00 0.00 --- MilkyWay@Home v0.50
Error while computing 0.00 0.00 --- MilkyWay@Home v0.50 (sse2)

Because finished WUs are cleared from the database so quickly this may not be obvious. Can't the server be set NOT to send WUs to these clients that are throwing massive numbers of errors?
ID: 47901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 47902 - Posted: 15 Apr 2011, 16:26:11 UTC
Last modified: 15 Apr 2011, 16:27:35 UTC

Interesting .... I thought the BOINC Server had an automatic choke, testing the rate of invalids, and restricting supply of new ones for a set time frame until good valids return from the errent host, to prevent run-aways.

Maybe that parameter has been knocked a bit during the recent hassles ?

Regards
Zy
ID: 47902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 47903 - Posted: 15 Apr 2011, 16:50:02 UTC - in response to Message 47902.  

Interesting .... I thought the BOINC Server had an automatic choke, testing the rate of invalids, and restricting supply of new ones for a set time frame until good valids return from the errent host, to prevent run-aways. Zy

It does, but maybe it's not set or not set aggressively enough at the moment.
ID: 47903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 47956 - Posted: 17 Apr 2011, 12:35:53 UTC

Kashi also posted about this problem here:

Still getting "Completed, can't validate" invalids. This is due to wingmen exceeding the maximum number of errors of 3.

Some of these errors are due to ATI GPUs using older Catalyst drivers that are not compatible with the new application. Others are from CPUs which either do not work with the new application or are using old optimised applications that will no longer work without a parameter file.

These computers are trashing a lot of tasks and some of the owners are just letting them run.

Here are 4 such computers from the last "Completed, can't validate" I had before it was rapidly cleared from the database:
hostid=264221
hostid=200293
hostid=102667
hostid=211500

Hostid 102667 in particular currently has over 7,000 tasks listed. It is still using v0.18 speedimic_sse3_64

As I requested a week ago, could the maximum number of errors be increased from 3 please until the number of these computers producing errors decreases or they are blocked from receiving work if they return many invalid results.

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2353&nowrap=true#47949
ID: 47956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Bohn

Send message
Joined: 16 Aug 10
Posts: 15
Credit: 32,160,978
RAC: 0
Message 47982 - Posted: 18 Apr 2011, 3:19:34 UTC

I have downloaded and installed the lastest drivers from the AMD web site and still get errors on all GPU calculations. I have disabled GPU until I can find a resolution to this issue. Any ideas besides lastest drivers?

ATI 4870
ATI 3200
ID: 47982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 47983 - Posted: 18 Apr 2011, 3:53:03 UTC - in response to Message 47982.  

I have downloaded and installed the lastest drivers from the AMD web site and still get errors on all GPU calculations. I have disabled GPU until I can find a resolution to this issue. Any ideas besides lastest drivers?

ATI 4870
ATI 3200

Can you unhide your computers so we can have a look at their details and the wu errors?
ID: 47983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 47993 - Posted: 18 Apr 2011, 18:56:48 UTC

ID: 47993 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Bohn

Send message
Joined: 16 Aug 10
Posts: 15
Credit: 32,160,978
RAC: 0
Message 47999 - Posted: 18 Apr 2011, 19:16:58 UTC - in response to Message 47983.  

Not sure if I did what you requested, but see if you can see what you needed now.
ID: 47999 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 48002 - Posted: 18 Apr 2011, 19:26:45 UTC - in response to Message 47999.  

Not sure if I did what you requested, but see if you can see what you needed now.


Its unhiding the computers so that the details of WUs can be seen.

Go to bottom of this page - click on Account

In Account halfway down you will see an option "Preferences for this project", click on the blue words "MilkyWay@home preferences"

Inside preferences, got to the sixth option line from the top "Should MilkyWay@home show your computers on its web site?", change that to "yes" - you do that by clicking the blue words "Edit MilkyWay@home preferences", and follow your nose. Save the changes afterwards, else any change made is lost.

Once you've done that, update your BAM Client, and we can then the details which may be able to point at what is happening.

Regards
Zy
ID: 48002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Bohn

Send message
Joined: 16 Aug 10
Posts: 15
Credit: 32,160,978
RAC: 0
Message 48011 - Posted: 18 Apr 2011, 22:17:57 UTC - in response to Message 48002.  

Everything but the BAM client I had done with my last post. I just completely shut down boinc and re-started it which I assume will update the BAM client.
ID: 48011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 48012 - Posted: 18 Apr 2011, 22:42:16 UTC - in response to Message 48011.  
Last modified: 18 Apr 2011, 22:44:50 UTC

All failed WUs are showing the same errors:

Device does not support double precision
Device failed capability check
Failed to end timer resolution
Failed to setup CAL
17:12:46 (7056): called boinc_finish


MW needs Double Precision capable cards - it will not work on single precision cards.

What is the card type number of the AMD GPU Card(s) you are using ?

Did you load the APP Driver set as well as the main drivers when you loaded 11.3?

Regards
Zy
ID: 48012 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Bohn

Send message
Joined: 16 Aug 10
Posts: 15
Credit: 32,160,978
RAC: 0
Message 48020 - Posted: 19 Apr 2011, 2:40:54 UTC - in response to Message 48012.  

I have a ATI4875 (Gigabyte branded) as my primary display and yes the APP Driver set was loaded. I am wonder if the ATI 3200 built into the 780G chipset is the problem, and if short of disabling it completely ( I use it to drive a secondary monitor ) is there a means to tell milkyway not to use it. That would also explain why 1 GPU process seems to run OK while the other GPU work units quickly error out. That is the 4875 is processing correctly and the 3200 errors out over and over quickly consuming all the wu. The $64 question is that up until the recent server issues both GPUs ran fine, in fact they ran at similar speeds.
ID: 48020 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 48021 - Posted: 19 Apr 2011, 2:53:07 UTC - in response to Message 48020.  

I have a ATI4875 (Gigabyte branded) as my primary display and yes the APP Driver set was loaded. I am wonder if the ATI 3200 built into the 780G chipset is the problem, and if short of disabling it completely ( I use it to drive a secondary monitor ) is there a means to tell milkyway not to use it. That would also explain why 1 GPU process seems to run OK while the other GPU work units quickly error out. That is the 4875 is processing correctly and the 3200 errors out over and over quickly consuming all the wu. The $64 question is that up until the recent server issues both GPUs ran fine, in fact they ran at similar speeds.


You can disable the 3200 in boinc to prevent it from trashing more WUs.

Shut down boinc and after that, create a file called cc_config.xml (using notepad) with the following content:

<cc_config>
    <options>
    <ignore_ati_dev>1</ignore_ati_dev>
    </options>
    </cc_config>


copy that file to "C:\Documents and Settings\All Users\Application Data\BOINC"

start boinc again.






Join Support science! Joinc Team BOINC United now!
ID: 48021 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 48027 - Posted: 19 Apr 2011, 6:49:38 UTC - in response to Message 48020.  

..... The $64 question is that up until the recent server issues both GPUs ran fine, in fact they ran at similar speeds.


The new application will not work on 3XXX GPUs, the minimum level is 4XXX. Thats why it started falling over from the start of the changes on the Server.

Crunch3r's post specifically disables (to BOINC, will still work ok) the 3200, and you can work as usual, except the 3200 can no longer crunch MW.

Regards
Zy
ID: 48027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 48035 - Posted: 19 Apr 2011, 11:06:09 UTC

That's not true. See my hosts. I have a 3850 working...
ID: 48035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 48037 - Posted: 19 Apr 2011, 11:20:37 UTC - in response to Message 48035.  
Last modified: 19 Apr 2011, 11:22:44 UTC

Oppps - my error - apologies, 38XX will work .....

Keep your eye on development though as the writing is on the wall for 38XX at MW in not too distant future. See Matt's post re future development trend

Regards
Zy
ID: 48037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1391
Credit: 203,563,566
RAC: 0
Message 48038 - Posted: 19 Apr 2011, 11:54:09 UTC

Keep your eye on development though as the writing is on the wall for 38XX at MW in not too distant future.


I have found that it is the number of shaders that give ATI cards their power.

3850 cards have 320 shaders
4770 cards have 640 shaders

4770 cards are available on Ebay for very little more than 3850 ones.
ID: 48038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
David Bohn

Send message
Joined: 16 Aug 10
Posts: 15
Credit: 32,160,978
RAC: 0
Message 48042 - Posted: 19 Apr 2011, 13:57:36 UTC - in response to Message 48021.  

Thank you the de_seperation are running again without erroring out.
ID: 48042 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48070 - Posted: 20 Apr 2011, 14:36:53 UTC - in response to Message 47993.  

ID: 48070 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Bif74 [Lombardia]

Send message
Joined: 25 May 09
Posts: 6
Credit: 23,350,701
RAC: 1,332
Message 48078 - Posted: 20 Apr 2011, 19:16:20 UTC - in response to Message 48035.  

Hi.
Apologize this silly question: how?

I have a 3850 AGP (with ATI hotfix 11.3, BOINC mgr 6.12.22 and XP SP2) but the WU with MW application 0.62 always finish in Compute Error.

That GPU had crunched a lot of WU (for me... :-)) with the Optimized Application by Gipsel until the version 0.23.

Can you help me?

Thanks a lot, regards.

Marco.
ID: 48078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Problem Clients

©2024 Astroinformatics Group