Welcome to MilkyWay@home

Posts by Traills

1) Message boards : Number crunching : GPU Requirements [OLD] (Message 49130)
Posted 31 May 2011 by Traills
Post:
Post mortem on the NVidia GT 420 ... second WU behaved same as first, running "forever" with 0% progress ... I then found / downloaded / installed a more recent driver v 270.61 after which all the MW cuda WUs failed on computational error within seconds. Now guessing the same thing happened with the old driver but that driver did not send the expected signal and BAM! thought the WUs were still running. I have with regret disabled GPU usage for MW. Will stay active with CPU jobs.

Meanwhile, the GPU does run SETI cuda fermi WUs correctly so the card isn't entirely pathological...
2) Message boards : Number crunching : GPU Requirements [OLD] (Message 49114)
Posted 30 May 2011 by Traills
Post:
Thanks, Sunny & Robert for your comments. I figured anything OEM was going to be relatively slow. Now though it is up to 10 hrs elapsed and still showing 0% Progress so I have to think something is amiss besides the GT 420 using an old GPU design. I found something in an earlier thread about some conditions resulting in the gpu clock rate slowing way down under Win7 when a WU is restarted after a power-saver interruption, and then not recovering (message pasted below). I am speculating that that may be what is going on here. Being new to Win7 I didn't expect it to shut down running processes just because the keyboard was inactive overnight, but of course it did, so the MW cuda WU was put to sleep not long after I started it last night. Today I discovered that problem and reset Win7 not to shut anything off period but if that is what happened to this WU then the "damage" has been done. As a test I have suspended the WU and started another to see if it behaves closer to expectations (~2 hrs). If it ends up with the same problem I will probably have to disable MW use of the GPU. Too bad, but I didn't spec the machine mainly for Boinc; the GPU would be nice if it will behave but if it doesn't I'll be content with all the CPUs.

(Message 48986)
Posted 4 days ago by Tex1954
--------------------------------------------------------------------------------
There is an ongoing problem with CUDA tasks and the Clock Rates being dropped in Nvidia cards I've written it up and the problem is with Vista and Win7 both.

What happens, is the clock rate gets dropped to conserve power/heat etc. and never returns to high speed again. This always happens with DUAL Nvidia cards installed and seems only magic prevents it from happening on it's own most of the time. Doesn't matter what power settings are set, performance mode seems to help, but not totally correct it. Snoozing or Suspending tasks is a 95% guarantee the clocks with drop and never regain full speed again.
3) Message boards : Number crunching : GPU Requirements [OLD] (Message 49107)
Posted 29 May 2011 by Traills
Post:
I set up a new machine this week and started MW on it. The WUs that have run on the CPUs have been fine, but the one that has started on the GPU looks to be in trouble. Before I abort it, I want to ask whether there is something that can be done to save the other 11 cuda WUs that are queued.

Host machine is an i7 with 8 CPUs running Windows 7 SP1. Graphics card is an OEM nVidia GeForce GT 420 with driver 267.24, cuda driver is 3.2.1. The WU having problems is 29477371, and on my machine the MW app is 0.52 cuda_opencl.

Trouble symptom: The WU has just passed 3 hrs run time and shows 0% Progress in the BOINC Manager. (Is the progress meter supposed to update when a GPU job is running?) It and 3 other of the queued cuda WUs downloaded with an expected time of 2:00:49. On the other hand,someone else's machine running this WU with XP and an ATI card finished in 238 sec (11 CPU sec).

Thank you
4) Message boards : News : Issues with the Milkyway@home Support Server (Message 46277)
Posted 18 Feb 2011 by Traills
Post:
I don't know if this event is one of those "crazy things" ... the symptoms look to me more like a problem with the local MW executable, but ... yesterday 2/16 my computer did an n-body task, cleaned it from my computer, and 21 hours later the server still acts as though it does not know about it. My local BAM "Messages" tab records the task completion:

2/16/2011 9:16:36 PM Milkyway@home Computation for task de_nbody_model6_4_24159_1297838514_0 finished

but it does not show any communication with the server - no upload message for the results, no completed task report message, also no log of any problem communicating with the server. The Tasks list on my account at the MW website still shows this task as In Progress:

313634391 237231713 16 Feb 2011 7:02:51 UTC 24 Feb 2011 7:02:51 UTC In progress --- --- --- --- MilkyWay@Home N-Body Simulation v0.21 (sse2)

I don't compute for MW often: last fall I was engaging MW regularly but backed off after the version update around November-December slowed all my tasks down to ~half the previous speed. I'd rather put my FLOPS where they're more productive, since my machine is just a pre-CUDA dual cpu. Every so often I try a task to see if things have improved ... things haven't, and stuff like this vanishing task isn't much encouragement either.
5) Message boards : News : a fix for the output file issue (Message 42953)
Posted 19 Oct 2010 by Traills
Post:
Thank you, Travis. Sorry about the flu, that's never fun. I will cancel out of the affected WUs, which I had suspended, and try a new batch of MW after my slug of Cosmology finishes in about a week.
6) Message boards : News : updated the CPU applications (Message 42775)
Posted 12 Oct 2010 by Traills
Post:
I think it has a bug. The first task my CPU ran after the update ran to completion and then the upload failed with a Message "Output file ... absent". The task was # 216429150, work unit 163715939, de_11_3s_5_1447892_1286681319_0. It was also predicted to take about 16:14:-- of CPU time, but it took close to 23 hrs. (This is the first time that one of my Milky Way tasks has been more than a little bit over the prediction, and the first error case.) That caused me to monitor the second work unit after it started. It has been running for 16 hrs now (16:14:-- predicted initially) and is only 68.8% done. There is no output file accumulating that I can find, and MW disk usage has not increased during this task. I suspect it is headed for the same fate as the first so I am suspending it and will not allow it or the other 3 in the set of 5 I received to start until the problem is resolved. I received no credit for the failed job.




©2024 Astroinformatics Group