Welcome to MilkyWay@home

Infinite GPU Task Time (Sometimes)


Advanced search

Message boards : Number crunching : Infinite GPU Task Time (Sometimes)
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfilePurple Rabbit
Avatar

Send message
Joined: 9 Nov 08
Posts: 44
Credit: 109,426,841
RAC: 7,745
100 million credit badge10 year member badge
Message 46727 - Posted: 26 Mar 2011, 4:17:09 UTC
Last modified: 26 Mar 2011, 5:00:58 UTC

I have been pulling my hair out (I don't have any to spare) since 3 March when Tomato (AMD64 3800+, Win XP (32 bit), HD 4770, Cat 11.1; it was Cat 10.8 when all this began) suddenly started making errors. I didn't do nothing-honest.

Tomato is running SIMAP (CPU) and MW (GPU). I had errors on both projects. I detached and reinstalled both projects, reinstalled the ATI drivers, defragged the HD, stood on my left foot, twirled my right index finger, and yelled various phrases that I can't (shouldn't) reproduce here.

All of my efforts made things better, but MW still has errors every day or so. The miscreant task either runs for 6 hours then locks up it or declares an error and proceeds to the next MW task. Occasionally it decides to give up at the beginning of the task. The error in all cases is an attempt to access a memory location it shouldn't.

I haven't been able to localize it to a particular WU type. The GPU isn't overclocked and the temperatures are within spec.

My current thought is that Tomato has become possessed by powers beyond knowing. A more likely reason is that I'm doing something stupid. Perhaps I have a hardware problem that I haven't found yet?

Has anyone else seen this or do I have to battle the dark powers alone?
ID: 46727 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sandor

Send message
Joined: 7 May 10
Posts: 8
Credit: 39,603,036
RAC: 0
30 million credit badge10 year member badge
Message 46730 - Posted: 26 Mar 2011, 6:05:06 UTC
Last modified: 26 Mar 2011, 6:05:40 UTC

That's a pretty old system now. I would suspect a failing Motherboards or perhaps Power Supply. Many of the Motherboards of that era had Capacitor issues. To check for that, Open your Case and look at your capacitors(little pop/soda can looking things). If any of their tops are bulging(they should be perfectly flat) or leaking ooze, it's only a matter of time before the system completely fails. That could be a long time, but errors will continue both in your Projects and likely OS/Apps as well.
ID: 46730 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 323
Credit: 186,747,590
RAC: 340,203
100 million credit badge9 year member badgeextraordinary contributions badge
Message 46733 - Posted: 26 Mar 2011, 6:31:36 UTC - in response to Message 46730.  

I see an occasional work unit come thru and ends up in the error column. Rare, maybe 1 out of 100 or so. I figure it is just probability or poorly formed work unit or the CPU or GPU took a cosmic ray hit. Believe me, there really is a lot of cosmic flux raining down on us all the time. Every long exposure picture I take with my CCD camera on my telescope proves it every night.

The comment about the computer caps is certainly valid. I just built a new system to replace the ten year old one that died after I took it out to the garage for its quarterly dust removal and it never came back to life. A little quick exam showed multiple bulged and failed caps about ready to split their X strain relief creases. The fact that new motherboard technology has moved on to solid caps likely means the current generation of computers will live for 20 years. I hope so at least.

I think that errors will creep in occasionally, but an error every day begs some inspection on the hardware.
ID: 46733 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePurple Rabbit
Avatar

Send message
Joined: 9 Nov 08
Posts: 44
Credit: 109,426,841
RAC: 7,745
100 million credit badge10 year member badge
Message 46755 - Posted: 27 Mar 2011, 12:40:11 UTC

Thanks guys. I was guessing some kind of hardware problem, but the easy checking didn't find any. Why can't it just fail with a bang and a puff of smoke!? I could find that failure. There haven't been any task failures for 3 days. That's the best it's done since March 3. I think it's lonely in the basement and wants attention :)
ID: 46755 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Infinite GPU Task Time (Sometimes)

©2020 Astroinformatics Group