Welcome to MilkyWay@home

MW is killing my machine!

Message boards : Number crunching : MW is killing my machine!
Message board moderation

To post messages, you must log in.

AuthorMessage
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 65952 - Posted: 24 Nov 2016, 3:31:07 UTC

I have a WinXP32 machine with an AMD64 X2 cpu and NVIDIA GTX-750Ti. When I run MikleyWay GPU tasks, the machine gets sluggish over a day or two and finally gets to the poing that there is no response to keyboard or mouse input, requiring a manual reboot (the equivalent of the old "three-finger salute"). I also run Einstein, SETI, Astroids and GPUGRID gpu tasks. When those run without any MW tasks, the machine just keeps on running and runing and running. . .

I am running two gpu tasks at a time for SETI and Einstein, but only one for MW and the other two. I was running 2 WU tasks for awhile (with same result -- mechine unresponsife after 2-3 days) until I noted that two tasks were using 10X more cpu time than a single task!

Now I am considering not running any MW tasks on that machine at all. (I have a couple of Win10 64bit machines running MW gpu tasks with no problems.)

I remember seeing some mention of a "memory leak" in the 32bit app -- perhaps that is what is causing the problem.
ID: 65952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 635,998,708
RAC: 0
Message 65954 - Posted: 24 Nov 2016, 9:51:34 UTC

w1hue -

For starters, when your machine gets sluggish, I would open the Task Manager (Ctrl-Shift-ESC) and look at all of the processes for abnormally high CPU usage or memory usage.

Also, on Performance tab (I think that it what it is labeled) where the it shows CPU usage, network usage, disk usage, look for anything strange especially the CPU bar graph on the left for "red" (should me mostly green).

When your machine gets sluggish, if you suspend all Milkyway activity, does the sluggishness go away?
ID: 65954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,941,778
RAC: 22,440
Message 65955 - Posted: 24 Nov 2016, 12:38:54 UTC - in response to Message 65952.  

I have a WinXP32 machine with an AMD64 X2 cpu and NVIDIA GTX-750Ti. When I run MikleyWay GPU tasks, the machine gets sluggish over a day or two and finally gets to the poing that there is no response to keyboard or mouse input, requiring a manual reboot (the equivalent of the old "three-finger salute"). I also run Einstein, SETI, Astroids and GPUGRID gpu tasks. When those run without any MW tasks, the machine just keeps on running and runing and running. . .

I am running two gpu tasks at a time for SETI and Einstein, but only one for MW and the other two. I was running 2 WU tasks for awhile (with same result -- mechine unresponsife after 2-3 days) until I noted that two tasks were using 10X more cpu time than a single task!

Now I am considering not running any MW tasks on that machine at all. (I have a couple of Win10 64bit machines running MW gpu tasks with no problems.)

I remember seeing some mention of a "memory leak" in the 32bit app -- perhaps that is what is causing the problem.


Part of the problem is that each of your valid units, on both machines, says this:
"<number_WUs> 5 </number_WUs>"

It looks like each of your gpu's is trying to run 5 workunits at the same time, with the XP machine being only 32 bit and memory disadvantaged on top of that that could be your problem. You might try setting up a different 'venue', ie home work or school, for each pc and start lowering the number of workunits the XP machine runs at one time until the machine stops locking up.
ID: 65955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 11 Apr 15
Posts: 58
Credit: 63,291,127
RAC: 0
Message 65957 - Posted: 24 Nov 2016, 14:32:55 UTC - in response to Message 65955.  

(...)
Part of the problem is that each of your valid units, on both machines, says this:
"<number_WUs> 5 </number_WUs>"
(...)

Ähem - no!
That marks just the new type of WU bundling 5 traditional WUs in one download. The 5 WUs will be separately crunched, one after the other, just like before...
Aloha, Uli

ID: 65957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 635,998,708
RAC: 0
Message 65958 - Posted: 24 Nov 2016, 17:21:30 UTC

But I think Mikey is on the right track. Try reducing the workload (either CPU and/or GPU) one workunit at time and see what happens.

What I tried to write previously was to use the Task Manager to see if the computer is "overloaded". GPU-Z is a good tool to check your video card load.
ID: 65958 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 65959 - Posted: 24 Nov 2016, 20:07:39 UTC - in response to Message 65958.  
Last modified: 24 Nov 2016, 20:43:17 UTC

But I think Mikey is on the right track. Try reducing the workload (either CPU and/or GPU) one workunit at time and see what happens.

As stated in my original post, the XP machine is running ONE MW gpu work unit at a time. GPU-Z shows gpu load is 90-97% when running one MW WU. Average is ~95%. The GPU load reaches 99-100% for some of the other project's WUs with no apparent ill effects.

The GPU temp runs 50-55C (depending on room temp) when running MW WUs. It gets up to 65C or so when running GPUGRID WUs. CPU temp typically runs 50-55C pretty independent of what is running in the GPU.

So I do not believe that anything is being stressed by MW WUs.

I have a simple utility that monitors memory usage, but I doubt it would show no-usuable memory due to memory leaks. According to the monitor, memory usuage rarely gets above 75% or so and is usually 35-50%. Task Manager currently shows 1.76GB PF usage, 2.88GB physical memory with 1.34GB in the system cache and 1.56GB available; the monitor utility is showing 46% usage. FireFox is currently using the most memory at 238kB.
ID: 65959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 635,998,708
RAC: 0
Message 65960 - Posted: 24 Nov 2016, 20:39:52 UTC

w1hue -

If you are only running one MW GPU task and no other tasks (CPU), then maybe MW just doesn't work on that machine.

I have a mixture of LINUX and Windows (XP and 10) machines with various NVIDIA cards. Some machines won't run MW and some won't run Einstein (system lockups). I have never been able to figure out why.
ID: 65960 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 65961 - Posted: 24 Nov 2016, 22:44:20 UTC
Last modified: 24 Nov 2016, 22:53:31 UTC

OK, I may be a little slow (not to mention old...), but I believe that I finally realize what is going on: The memory leak (if that is, indeed, the problem) is in the GPU code, NOT the CPU part! I doubt that GPU-Z would recognize that.

When the machine "hangs", it appears that something is still running in the cpu (and yes, I am running cpu only BOINC tasks) judging from the disk activity LED -- but the display does NOT update. So, I think that the GPU finally runs short of memory and hangs up! Reboot needed...

Since others haven't reported the problem (but there was some comment awhile back about a memory leak in the 32bit app), it is probably specific to the 32bit NVIDIA app. Apparently not many of you "hot shot nerds" are running MW in low to mid range NVIDIA GPUs, much less in old XP machines! :-)

I'm gonna go eat my turkey now. . . and then do a "pre-emptive" reboot!

Edit: Up until a couple of months ago, I did not have the "hang" problem when running MW tasks on the XP machine. Not sure exactly when it started. . .
ID: 65961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 635,998,708
RAC: 0
Message 65962 - Posted: 25 Nov 2016, 1:44:00 UTC

w1hue -

Just thought of something that probably doesn't apply. About 4-6 weeks ago I had two XP machines that started hanging up for no apparent new reason. I finally figured out it was AVAST anti-virus software.
ID: 65962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 65963 - Posted: 25 Nov 2016, 3:34:23 UTC - in response to Message 65962.  

Just thought of something that probably doesn't apply. About 4-6 weeks ago I had two XP machines that started hanging up for no apparent new reason. I finally figured out it was AVAST anti-virus software.

Possibly -- but the machine ran OK 7 days without any MW work and froze twice over a 5 day period running MW. But I am using AVAST. . . so I'll keep that in mind.

However, I don't think that it could be due to AVAST scanning the project files since I put C:\Documents and Settings\All Users\Application Data\BOINC\projects\* on the exclusions list after AVAST falsely tagging a BOINC project file as infected some time ago.
ID: 65963 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 65964 - Posted: 25 Nov 2016, 3:48:23 UTC - in response to Message 65954.  
Last modified: 25 Nov 2016, 3:59:20 UTC

When your machine gets sluggish, if you suspend all Milkyway activity, does the sluggishness go away?

No -- once it starts getting sluggish, suspending MW makes no difference. If the problem is due to a memory leak, the damage would already have been done and the only way to recover is to reboot.

I'm leaning more and more in favor of a memory leak in the MW gpu code as being the culprit (see third post above). Once things come to a screaching halt (display-wise, at least), the cpu appears to be doing something (as indicated by disk activity) but the display does not update -- except to show the mouse cursor moving. Anything else showing on the display (a clock, for example) at the time does not update.

In case anyone is not clear about what I mean by "memory leak", that refers to memory that is allocated by a program for its own use and not released when it exits -- making the unrealeased memory unavailable for further use. See https://en.wikipedia.org/wiki/Memory_leak for a discussion of the problem.

(Yeah, I know all you techno-computer jocks know what that means, but not everyone runs in the fast lane. . .)
ID: 65964 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 635,998,708
RAC: 0
Message 65965 - Posted: 25 Nov 2016, 14:25:14 UTC

w1hue -

I would think if there was a memory leak in the MW code other users would also notice the issue. But, who knows.
ID: 65965 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 635,998,708
RAC: 0
Message 65966 - Posted: 25 Nov 2016, 19:37:53 UTC

w1hue -

Here is another idea if you want to experiment. Maybe the driver you are using is "too new". Sometimes older drivers work better on older machines.

You are using 368.81 on your XP machine. If you want to experiment, try 350.12 or 352.68 (the two I am using on my XP machines).
ID: 65966 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 65967 - Posted: 25 Nov 2016, 20:31:54 UTC - in response to Message 65965.  

I would think if there was a memory leak in the MW code other users would also notice the issue. But, who knows.

A search of the forums for the past year turned up one associated with a linux system: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3919&postid=64456#64456 and that was a cpu memory leak. But nothing for XP.

I'll try an older driver and see if that makes any difference. But like you said: strange no one else has reported a similar problem with XP -- and I am sure that I am not the only one using the 368.81 driver (which, I believe is the latest one for XP). And I don't appear to be having any problems running other GPU BOINC apps.

Ahhhh... the joys of modern technology!!
ID: 65967 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 65982 - Posted: 2 Dec 2016, 8:06:43 UTC - in response to Message 65966.  

Sometimes older drivers work better on older machines

I reverted back to a driver that I had no problems with for several months -- didn't help.

I still believe that the problem is with the 1.43 32bit app since it started happening fairly recently and only when running MW GPU WUs.
ID: 65982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cliff
Avatar

Send message
Joined: 28 Nov 14
Posts: 51
Credit: 86,696,721
RAC: 0
Message 65990 - Posted: 8 Dec 2016, 4:06:35 UTC - in response to Message 65966.  

Hi wb8ili.


Here is another idea if you want to experiment. Maybe the driver you are using is "too new". Sometimes older drivers work better on older machines.

You are using 368.81 on your XP machine. If you want to experiment, try 350.12 or 352.68 (the two I am using on my XP machines).


Good thing he's not running Win10.. Had one of my rigs mugged by Microsoft yesterday, dam win upgrade also upgraded my GPU driver. Which I thought I'd disabled some time ago, but apparently another MS upgrade removed that restriction.

Only saw by chance that there were suddenly a shedload of 'computational errors from all gpu tasks across 3 projects:-( Had to shutdown and re-install the driver [3 times] once per GPU..

Anyway driver update is now disabled on that rig and I'll be re-checking to make sure it stays disabled:-)
Regards,
Cliff.
--
Been there Done That, still no Damn T-Shirt
ID: 65990 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : MW is killing my machine!

©2024 Astroinformatics Group