Welcome to MilkyWay@home

Anormal Wu time

Message boards : Number crunching : Anormal Wu time
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile [XTBA>TSA] Biour

Send message
Joined: 14 Feb 09
Posts: 5
Credit: 157,994,413
RAC: 0
Message 37408 - Posted: 16 Mar 2010, 9:59:44 UTC

I'm facing some cumputation problem on my crossfireX system (2X HD5970)

Often (when i was AFK) my computer become very slow
CPU charge increase dramaticly (near 100% of each core) and GPU become idle or just have 50% of load

and WU time increase dramticaly record@10 000s for some (normaly 100s and 150s for DE_13)



It only happen on new long WU (de_13_3s_const for exemple)
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=75789084
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=75789083
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=75788633

A screen capture of the problem, it happen at the beginning of the WU
I can use Pause/resume or start stop to correct but i'm not always on my computer


Normaly i can expect around 30 000 credit per hour
but due to this some day it les than 10 000



I try to use:
App 0.20b, 0.21, 0.22
Boinc 6.10.18/35/37

My config:
Win 7 Pro 64b
Q9450@3.2Ghz
Quad crossfire HD5970@stock
ATI 10.2


Some way to fix it?
ID: 37408 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile David Glogau*
Avatar

Send message
Joined: 12 Aug 09
Posts: 172
Credit: 645,240,165
RAC: 0
Message 37409 - Posted: 16 Mar 2010, 10:10:44 UTC - in response to Message 37408.  

I too, am having the same problems.

I have three machines, and they all do it.
One
BOINC 6.10.34
Cal 10.1
Two
BOINC 6.10.34
Cal 10.1

Three
BOINC 6.10.24
Cal 10.2

I have tried different versions of BOINC and Cal drivers to no avail.
The screen refreshes VERY slowly, so aborting the offending files takes several minutes each. After whichever file is slow, things return to normal.

Shutting BOINC exiting, then restarting sometimes fixes it, sometimes not. likewise shutting the whole computer and rebooting, sometimes fixes it sometimes not.

These days I just abort when I notice the afterburner monitors sticking at 30% of flatlining.

A major nuisance and very hands on.
ID: 37409 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>HFR>RR] Jim PROFIT

Send message
Joined: 19 Mar 08
Posts: 5
Credit: 232,926,469
RAC: 0
Message 37412 - Posted: 16 Mar 2010, 10:54:49 UTC

Same problem for me.
Same effect.
ID: 37412 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>HFR>RR] Jim PROFIT

Send message
Joined: 19 Mar 08
Posts: 5
Credit: 232,926,469
RAC: 0
Message 37414 - Posted: 16 Mar 2010, 11:32:09 UTC
Last modified: 16 Mar 2010, 11:35:30 UTC

Maybe a lead to follow.
In the WU with problem and long time to finish, i found this
'dividing each iteration in 6 parts'
With normal WUs this is 'dividing each iteration in 5 parts'.

edit : error in my conclusion, there are another WUs whith 'dividing each iteration in 6 parts' and no problem with.
ID: 37414 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 11 Sep 08
Posts: 22
Credit: 9,081,761
RAC: 0
Message 37416 - Posted: 16 Mar 2010, 12:42:56 UTC

I'm not sure if this is related or not; but of late I've been noticing WUs that never seem to progress beyond 0.000% completion. I've aborted, thinking bad WU, but as it has happened a bit more; I'm wondering what is going on with WUs which appear to hang/not progress for a fair amount of time. And by all appearances we're not talking slow, but rather stalled.
ID: 37416 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bob.smith

Send message
Joined: 10 Mar 10
Posts: 1
Credit: 28,822,623
RAC: 0
Message 37417 - Posted: 16 Mar 2010, 13:14:00 UTC - in response to Message 37416.  
Last modified: 16 Mar 2010, 13:39:04 UTC

I've also had a few work units that sit at 0% for ages (some after 28 minutes), which I've canceled. Other sit at 0% for a bit, then jump to over 100%, now I have one at 128% and still running, this after 7 minutes. Average WU time was 4-6 minutes before this started happening (only been running for a week...)

Edit: woah, just sat here watching it work. That same WU stayed at 128% till around 17 minutes, then started dropping percentage points till it read 100% (took just over a minute), then uploaded normally...
ID: 37417 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 11 Sep 08
Posts: 22
Credit: 9,081,761
RAC: 0
Message 37420 - Posted: 16 Mar 2010, 13:53:57 UTC

Perhaps check pointing got messed up, not sure. But I'm having to run it on a CPU, not a GPU; and I've got 2 more sitting here at 0% for it's been over 1.5 hours now. I can let these sit if that's what's being seen, but no idea. When the thing doesn't budge for hours, it really does look stuck.
ID: 37420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile David Glogau*
Avatar

Send message
Joined: 12 Aug 09
Posts: 172
Credit: 645,240,165
RAC: 0
Message 37423 - Posted: 16 Mar 2010, 15:09:07 UTC - in response to Message 37420.  

Yep, that is a different problem that has been fixed. Basically the WU sits at 0% then will jump to 143% and work backwards to 100% then uploads.

You can download a manual app (.22) if it bothers you, otherwise just wait for a few days, and normality will be restored.
ID: 37423 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [XTBA>TSA] Biour

Send message
Joined: 14 Feb 09
Posts: 5
Credit: 157,994,413
RAC: 0
Message 37428 - Posted: 16 Mar 2010, 18:11:58 UTC

i'm trying to use this command line in an automated task every 10minutes

boinccmd.exe --set_run_mode never 1

it will stop and start boinc for 1s every 10min
ID: 37428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [XTBA>TSA] Biour

Send message
Joined: 14 Feb 09
Posts: 5
Credit: 157,994,413
RAC: 0
Message 37431 - Posted: 16 Mar 2010, 19:36:27 UTC - in response to Message 37428.  
Last modified: 16 Mar 2010, 19:36:53 UTC

i'm trying to use this command line in an automated task every 10minutes

boinccmd.exe --set_run_mode never 1

it will stop and start boinc for 1s every 10min

It woks but ...
in worst case i will loose around 10 000points per hour (it's better than 0 in fact)

so ...
ID: 37431 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nuadormrac

Send message
Joined: 11 Sep 08
Posts: 22
Credit: 9,081,761
RAC: 0
Message 37433 - Posted: 16 Mar 2010, 19:51:31 UTC
Last modified: 16 Mar 2010, 19:52:04 UTC

Both new ones have now been over 6 hours, and still at 0%
ID: 37433 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 37436 - Posted: 16 Mar 2010, 19:58:42 UTC

It would be nice if anyone would give an update on what is going on and being done. As well if the wus are fine or should be aborted.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 37436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile David Glogau*
Avatar

Send message
Joined: 12 Aug 09
Posts: 172
Credit: 645,240,165
RAC: 0
Message 37444 - Posted: 16 Mar 2010, 22:33:25 UTC
Last modified: 16 Mar 2010, 22:57:47 UTC

You can fix the progress issue now, see here http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1600&nowrap=true#37333,

And here for the apps: http://www.arkayn.us/milkyway/index.html

As to the other, going slow issue, I have noticed it only happens on my Windows 7 boxes as well, and not the Vista one.
ID: 37444 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Anormal Wu time

©2024 Astroinformatics Group