Welcome to MilkyWay@home

GPU units hanging


Advanced search

Message boards : Number crunching : GPU units hanging
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17638 - Posted: 5 Apr 2009, 15:50:10 UTC

Ook, where to start.

I'm running Vista64bit, Boinc version 6.4.7, catalsyt 9.2. Trying to get the GPU version 19e running. I have edited the system 32 folder to what is saying in the read me file. I haven't altered the app_info.xml file just inserted it. Machine runs fine for a few hours but when I go back the units are just hanging. Other projects are running fine but milkyway seems to have just frozen.

Any clues to what I have done wrong please. Thanks for reading.


I have tried going to Catalyst 8.12 as I read that sometimes this is the only one that can work on some boxes.
ID: 17638 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge14 year member badge
Message 17640 - Posted: 5 Apr 2009, 16:08:54 UTC - in response to Message 17638.  

I remember others had a problem with hanging awhile back. Not sure of the solution. It should be in one of the topics about Gpu's in the App. section of the boards.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 17640 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile[XTBA>XTC] ZeuZ

Send message
Joined: 27 Dec 07
Posts: 14
Credit: 5,089,974
RAC: 0
5 million credit badge14 year member badge
Message 17641 - Posted: 5 Apr 2009, 16:14:21 UTC

Hi

Try to suspend and resume milkyway when this happens



ID: 17641 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bobgoblin

Send message
Joined: 8 Dec 07
Posts: 60
Credit: 67,028,931
RAC: 0
50 million credit badge14 year member badge
Message 17642 - Posted: 5 Apr 2009, 16:14:26 UTC - in response to Message 17638.  

Ook, where to start.

I'm running Vista64bit, Boinc version 6.4.7, catalsyt 9.2. Trying to get the GPU version 19e running. I have edited the system 32 folder to what is saying in the read me file. I haven't altered the app_info.xml file just inserted it. Machine runs fine for a few hours but when I go back the units are just hanging. Other projects are running fine but milkyway seems to have just frozen.

Any clues to what I have done wrong please. Thanks for reading.


I have tried going to Catalyst 8.12 as I read that sometimes this is the only one that can work on some boxes.



i'm running 6.4.7 & 9.2 on an i7. I set the <cmdline>n8</cmdline> in the app_info.xml to limit the number that were actually processing. I found that if too many were running concurrently, they would lock up with only a 512M card. then i would have to close boinc and restart to clear out the frozen wu's.

but I get so few wu's for the gpu these days I'm mainly running ABC on this particular machine.
ID: 17642 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17644 - Posted: 5 Apr 2009, 16:46:20 UTC

Thanks for the replies people. Just trying it with the 8.12 at the minute and will report back with the findings.
ID: 17644 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brickhead
Avatar

Send message
Joined: 20 Mar 08
Posts: 108
Credit: 2,562,515,681
RAC: 0
2 billion credit badge14 year member badgeextraordinary contributions badge
Message 17658 - Posted: 5 Apr 2009, 19:21:37 UTC - in response to Message 17644.  

As you've posted this to two threads, you might want to check both for replies ;)
ID: 17658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17659 - Posted: 5 Apr 2009, 20:11:01 UTC

Yeah, I think the other thread was the wrong place but there doesn't seem to be a delete button. I have made some alterations which were in the thread that Banditwolf kindly pointed me in the dierction of. Seems ok at the minute, changed some resource priorities and added n3 to my apps file and w 1.01 and seem ok at the minute. But then it always does, lol. {:o)
ID: 17659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17663 - Posted: 5 Apr 2009, 20:43:45 UTC

Still hanging. Tried to suspend and restart and nothing. Resource share is 10000 for milkyway with 1000 for Climate. I have no idea what to do, lol.
ID: 17663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17666 - Posted: 5 Apr 2009, 20:49:15 UTC

Upped the virtual memory from 2334mb to 3069 which it recommended on the box that opened up and it started running again. Will keep an eye on again.

Wish me luck.
ID: 17666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge14 year member badge
Message 17667 - Posted: 5 Apr 2009, 21:22:30 UTC

Good luck!
ID: 17667 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17670 - Posted: 5 Apr 2009, 21:57:25 UTC
Last modified: 5 Apr 2009, 21:59:19 UTC

Still hanging. I didn't have to restart the computer to get it running again though just the client.
ID: 17670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileverstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
300 million credit badge13 year member badge
Message 17673 - Posted: 5 Apr 2009, 22:15:23 UTC

>I'm running Vista...
Well there's your first problem... :)
I've had the 'can't even delete my own posts on this board' problem too.
Cheers,

PeterV

.
ID: 17673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileDoctorNow
Avatar

Send message
Joined: 28 Aug 07
Posts: 146
Credit: 10,276,862
RAC: 0
10 million credit badge14 year member badge
Message 17716 - Posted: 6 Apr 2009, 9:35:05 UTC
Last modified: 6 Apr 2009, 9:36:04 UTC

The missing GPU-resuming could be a problem with the client version, some of the newer ones have a bad bug with following:
Check if the "leave apps in memory" option is checked.
Uncheck it, that should immediateley resume the GPU-app!

I use version 6.3.21 and have this problem every time when I crunch GPUGrid with the activated memory-option.
Member of BOINC@Heidelberg and ATA!

My BOINCstats
ID: 17716 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17722 - Posted: 6 Apr 2009, 11:59:00 UTC

Thanks Doc, will check when I get home.
ID: 17722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17740 - Posted: 6 Apr 2009, 17:57:38 UTC
Last modified: 6 Apr 2009, 18:04:17 UTC

Nope, still not going. I started to get problems when I upgraded the client actually. It seems the client that actually ran the units that said they were running instead of this one that says it's running them all when actually its running 4. Here are the preferences:-

Shout is something needs changing please.

Suspend work while computer is on battery power?
(matters only for portable computers) no
Suspend work while computer is in use? no
'In use' means mouse/keyboard activity in last 1 minutes
Suspend work if no mouse/keyboard activity in last
(Needed to enter low-power mode on some computers)
Enforced by version 5.10.14+ --- minutes
Do work only between the hours of (no restriction)
Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes') no
Switch between applications every
(recommended: 60 minutes) 60 minutes
On multiprocessors, use at most
Enforced by version 5.10 and earlier 4 processors
On multiprocessors, use at most
Enforced by version 6.1+ 100 % of the processors
Use at most
(Can be used to reduce CPU heat)
Enforced by version 5.6+ 60 percent of CPU time
Disk and memory usage
Use at most 10 GB disk space
Leave at least
(Values smaller than 0.001 are ignored) 1 GB disk space free
Use at most 100% of total disk space
Write to disk at most every 180 seconds
Use at most 99% of page file (swap space)
Use at most
Enforced by version 5.8+ 99% of memory when computer is in use
Use at most
Enforced by version 5.8+ 99% of memory when computer is not in use
Network usage
Computer is connected to the Internet about every
(Leave blank or 0 if always connected.
BOINC will try to maintain at least this much work.) 0.5 days
Maintain enough work for an additional
Enforced by version 5.10+ 1 days
Confirm before connecting to Internet?
(matters only if you have a modem, ISDN or VPN connection) no
Disconnect when done?
(matters only if you have a modem, ISDN or VPN connection) no
Maximum download rate: no limit
Maximum upload rate: no limit
Use network only between the hours of
Enforced by version 4.46+ (no restriction)
Skip image file verification?
Check this ONLY if your Internet provider modifies image files (UMTS does this, for example).
Skipping verification reduces the security of BOINC. no
ID: 17740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge14 year member badge
Message 17741 - Posted: 6 Apr 2009, 18:30:48 UTC

Can you take a screen shot of BOINC and post it?
ID: 17741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile[KWSN]John Galt 007
Avatar

Send message
Joined: 12 Dec 08
Posts: 56
Credit: 269,889,439
RAC: 0
200 million credit badge13 year member badge
Message 17750 - Posted: 6 Apr 2009, 21:14:38 UTC
Last modified: 6 Apr 2009, 21:16:50 UTC


Is this what you are talking about? 4 running, 2 actually computing?

EDIT: I'm using BOINC 6.4.7, 8.12 drivers, and a 3850 ATI card.
Click to help Seti City.




ID: 17750 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileverstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
300 million credit badge13 year member badge
Message 17768 - Posted: 6 Apr 2009, 22:58:19 UTC

The latest GPU optimised client, 0.19e, defaults to only running 3 WUs simultaneously. The readme explains this and how you can change it [and some other options] via commands in app_info.xml
Cheers,

PeterV

.
ID: 17768 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileNeal Chantrill
Avatar

Send message
Joined: 17 Jan 09
Posts: 98
Credit: 72,182,367
RAC: 0
50 million credit badge13 year member badge
Message 17956 - Posted: 8 Apr 2009, 17:21:06 UTC



This is from today. You can see that the top unit as been running for 13 hours an none of the others have crunched anything. I have set it with n4 and w1.1 in the app_info.xml file.

Any hints welcome. Thanks to everyone again for taking the time to try and help us out.
ID: 17956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileGalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
100 million credit badge14 year member badge
Message 17971 - Posted: 8 Apr 2009, 18:44:40 UTC - in response to Message 17956.  

[IMG ]snip[ /IMG]

This is from today. You can see that the top unit as been running for 13 hours an none of the others have crunched anything. I have set it with n4 and w1.1 in the app_info.xml file.

Any hints welcome. Thanks to everyone again for taking the time to try and help us out.

Try suspending everything but MW, shut down BOINC, restart it again, pound the update for MW a couple of times, resume other projects.

If this doesn't work, send the WUs to me and I'll see if they can help my RAC any ;)


ID: 17971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : GPU units hanging

©2022 Astroinformatics Group