Welcome to MilkyWay@home

Lots of crunching errors since today

Message boards : Number crunching : Lots of crunching errors since today
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 55881 - Posted: 20 Oct 2012, 15:17:12 UTC - in response to Message 55874.  

right, i understand that much...i guess i should have more specifically asked "how do you know which v1.4.xxxx corresponds to which v11.x or v12.x"?


Use the cheat sheet.
http://www.hal6000.com/seti/boinc_ati_gpu_cheat_sheet.htm
ID: 55881 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 55883 - Posted: 20 Oct 2012, 15:25:40 UTC - in response to Message 55881.  
Last modified: 20 Oct 2012, 15:28:34 UTC

right, i understand that much...i guess i should have more specifically asked "how do you know which v1.4.xxxx corresponds to which v11.x or v12.x"?


Use the cheat sheet.
http://www.hal6000.com/seti/boinc_ati_gpu_cheat_sheet.htm

do i hear angels singing? seriously, you have no idea how long i've been looking for something like this! even when i open the hardware info page in the Catalyst Control Center, it doesn't specify the "v1.4.xxxx" driver versions that we see on our personal DC project web pages. thank you so much!
ID: 55883 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 55885 - Posted: 20 Oct 2012, 16:08:47 UTC - in response to Message 55862.  
Last modified: 20 Oct 2012, 16:32:23 UTC

Link - it seems as though your BOINC app is an old version.

You mean the BOINC client or the Milkyway application?

As to the Milkyway application, I have to use that since my GPU is not OpenCL capable.

Same applies to catalyst drivers, AFAIK AMD removed CAL support from the most recent versions, so I have to stay with an older one, though I'm not sure which was the last one with CAL support.

As I have written in the other thread in news section, if you look at my error tasks, you'll see that the tasks, that error out for me error out also for everybody else (except for one I've seen so far), even on CPUs. I'm not talking here about some invalid tasks I had, which might indeed have something to do with my old hardware/software (those with missing output in std_err every now and than are nothing unusual with the CAL app), but unless someone updates the CAL app and AMD reintroduces CAL support in their drivers, there's not much I can do since I'm not really planing any hardware upgrades anytime soon.

OTOH, I had just one error today (so far) and yesterday just 3, I think I can live with that...


EDIT: now that we have the "cheat sheet", here a wingman of my most recent error: 454149. Most recent drivers, most recent stable BOINC version, Milkyway OpenCL app v1.02, i.e. everything up to date. Has about the same error rate as me (or everybody else).

Same applies to 446199, 430325, 60480, 442268, 400218. And that are just up-to-date hosts with ATI cards, which I found in my 9 error tasks, nVidia and CPU-only I didn't check, otherwise the list would be probably longer.
ID: 55885 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Miklos M

Send message
Joined: 29 Dec 11
Posts: 26
Credit: 1,456,736,094
RAC: 0
Message 55892 - Posted: 20 Oct 2012, 21:52:21 UTC - in response to Message 55879.  

Thanks for the help. I noticed with my errors, all error out too on the same units.
ID: 55892 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 55893 - Posted: 20 Oct 2012, 22:00:53 UTC
Last modified: 20 Oct 2012, 22:26:49 UTC

while i haven't checked all 116 of my errors lol, i did check the first 20, and all the wingmen on every single one of them have errors too.

*EDIT* - also, i'm having trouble calculating my current error rate b/c i'm at that balance point where my number of errors remains constant b/c older errors are being flushed from the server just as fast as new errors show up (it hovers around 115 tasks). i need to know how long errors stay on the server before they're flushed.

thanks,
Eric
ID: 55893 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,946,352
RAC: 22,547
Message 55898 - Posted: 21 Oct 2012, 11:25:18 UTC - in response to Message 55883.  

right, i understand that much...i guess i should have more specifically asked "how do you know which v1.4.xxxx corresponds to which v11.x or v12.x"?


Use the cheat sheet.
http://www.hal6000.com/seti/boinc_ati_gpu_cheat_sheet.htm

do i hear angels singing? seriously, you have no idea how long i've been looking for something like this! even when i open the hardware info page in the Catalyst Control Center, it doesn't specify the "v1.4.xxxx" driver versions that we see on our personal DC project web pages. thank you so much!


I have bookmarked the page too!!!
THANK YOU VERY MUCH!!!
ID: 55898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 55904 - Posted: 21 Oct 2012, 21:53:09 UTC - in response to Message 55885.  

EDIT: now that we have the "cheat sheet", here a wingman of my most recent error: 454149. Most recent drivers, most recent stable BOINC version, Milkyway OpenCL app v1.02, i.e. everything up to date. Has about the same error rate as me (or everybody else).

Same applies to 446199, 430325, 60480, 442268, 400218. And that are just up-to-date hosts with ATI cards, which I found in my 9 error tasks, nVidia and CPU-only I didn't check, otherwise the list would be probably longer.

Here some more up-to-date hosts with ATI cards: 273487, 468295, 445489, 469338, 448818, 345828, 367472, 113627, 426569.

Here one up-to-date host with nVidia card: 453936.

CPU-only hosts with current BOINC version and using current Milkyway application: 449984, 460801, 302236, 326792, 264807, 473076, 468253, 443035.

All those hosts could not successfully complete WUs on which I got computation errors, they got them as well and their valid/error ratio is not much different from my or anybody else's who posted here.
ID: 55904 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 55931 - Posted: 23 Oct 2012, 23:20:48 UTC

I have updated last week the ATI drives as BOINC to latest version for my cards, 5800 series. Errors at MilkyWay arose then, but that was coincidence (I hope).
Travis mentioned they take of the error batch, but errors still occur.
So I guess updating to the latest is not necessary.
Greetings from,
TJ
ID: 55931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Miklos M

Send message
Joined: 29 Dec 11
Posts: 26
Credit: 1,456,736,094
RAC: 0
Message 55939 - Posted: 24 Oct 2012, 13:44:27 UTC - in response to Message 55931.  

My error numbers are steady, about 5 a day. I noticed that all my wingmen have the same errors. Hoping that soon they run out of the errorfull batches. It ruins about 5% of my computer time.
ID: 55939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dskagcommunity
Avatar

Send message
Joined: 26 Feb 11
Posts: 170
Credit: 205,557,553
RAC: 0
Message 55940 - Posted: 24 Oct 2012, 15:03:03 UTC

Yes lets hope they soon through. I have around 7-20 error units per days at the time.
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 55940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dataman
Avatar

Send message
Joined: 5 Sep 08
Posts: 28
Credit: 245,585,043
RAC: 0
Message 55942 - Posted: 24 Oct 2012, 18:55:52 UTC

Is anyone from the project actively working on this problem or are we going to waste cruncher's resources until it magically goes away? I moved to Einstein but would rather be back here. Oh well ...

ID: 55942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 55944 - Posted: 24 Oct 2012, 21:30:04 UTC - in response to Message 55942.  

Is anyone from the project actively working on this problem or are we going to waste cruncher's resources until it magically goes away? I moved to Einstein but would rather be back here. Oh well ...


Einstein always steady, a genius after all ;-)
Greetings from,
TJ
ID: 55944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray_GTI-R
Avatar

Send message
Joined: 5 Nov 10
Posts: 69
Credit: 15,064,831
RAC: 0
Message 55947 - Posted: 25 Oct 2012, 0:45:15 UTC

1 fail in 13 (so 12 OK) in the past hour using a good old HD3850 AGP card.
IMHO it's not the driver:- unless you've updated in which case ... good luck ... it's faulty WU's which seem to appear irrespective of hardware???
ID: 55947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JHMarshall

Send message
Joined: 24 Jul 12
Posts: 40
Credit: 7,123,301,054
RAC: 0
Message 55949 - Posted: 25 Oct 2012, 5:20:24 UTC - in response to Message 55947.  

Ray,

Absolutely NOT a driver problem. I have several instances where a WU failed on 4 completely different configurations (mine and wingmen) (Intel CPU, AMD CPU, Nvidia Card, and AMD card). These are definitely bad workunits.

Joe
ID: 55949 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Miklos M

Send message
Joined: 29 Dec 11
Posts: 26
Credit: 1,456,736,094
RAC: 0
Message 55959 - Posted: 26 Oct 2012, 0:05:33 UTC

26 errors today and 1 invalid unit. One more day like this one and I will go back to doing Einstein, until they clean this mess up. My computer can do more good there.
ID: 55959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ^..^~~

Send message
Joined: 22 Oct 11
Posts: 23
Credit: 71,023,220
RAC: 0
Message 55963 - Posted: 26 Oct 2012, 3:10:48 UTC

One of my computers that I cannot ever remember doing a "computational error" is now doing so...

And to those of you who thought to change drivers, I did the same thing; it was to no avail.

Is somebody going to answer the "magically disappear" question that was asked here? I'm also anxious to hear what the eta is on normal computer work.

^..^~~ nyteshade!
ID: 55963 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,946,352
RAC: 22,547
Message 55965 - Posted: 26 Oct 2012, 10:25:26 UTC - in response to Message 55963.  

One of my computers that I cannot ever remember doing a "computational error" is now doing so...

And to those of you who thought to change drivers, I did the same thing; it was to no avail.

Is somebody going to answer the "magically disappear" question that was asked here? I'm also anxious to hear what the eta is on normal computer work.

^..^~~ nyteshade!


Check out the NEWS thread, Matthew, a form admin and project scientist, IS replying in there.
ID: 55965 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Blurf
Volunteer moderator
Project administrator

Send message
Joined: 13 Mar 08
Posts: 804
Credit: 26,380,161
RAC: 0
Message 55972 - Posted: 26 Oct 2012, 20:18:19 UTC

ID: 55972 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 55976 - Posted: 27 Oct 2012, 8:27:50 UTC

A new Milestone for me (one that I not necessarily wanted to reach): 20 bad WUs* on a single day (26 Oct).

That's out of 206 crunched that day, so almost 10%.

*) those that fail for everyone, I had 5 more, that failed just for me, the CAL app seems to get wrong results on some of the new WUs, but that's another thing.
ID: 55976 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ^..^~~

Send message
Joined: 22 Oct 11
Posts: 23
Credit: 71,023,220
RAC: 0
Message 55986 - Posted: 28 Oct 2012, 11:22:33 UTC - in response to Message 55976.  

I brought a new (used) computer online yesterday thinking to run just the CPU, work units that haven't been so touchy. With two cores and taking almost 5 hours to run... what does it do? It "computational errors out" on the second piece of work it manages to run... only after running the entire 5 hours.

What a waste of my time, my energy, everything.

^..^~~

nyteshade!
ID: 55986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Lots of crunching errors since today

©2024 Astroinformatics Group