Welcome to MilkyWay@home

maximum time limit elapsed bug

Message boards : News : maximum time limit elapsed bug
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49889 - Posted: 4 Jul 2011, 23:01:51 UTC

Well seems that it goes to all kind of OS.


Unstability caused from to much heat could that be??

I'll do some test tomorrow on one of my running win 32 single GPU System, mayby a can provocate this error.
(I hope I won't burn down my card )
ID: 49889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cristipurdel

Send message
Joined: 1 Jul 11
Posts: 10
Credit: 422,543
RAC: 0
Message 49894 - Posted: 5 Jul 2011, 4:15:54 UTC

I'm also experiencing the same problem on Win 7 x64, Cat 11.6 on a 6950
How do you reset or increase the DCF?
ID: 49894 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Confusius

Send message
Joined: 31 Mar 10
Posts: 12
Credit: 13,722,511
RAC: 0
Message 49895 - Posted: 5 Jul 2011, 6:33:29 UTC

Unstability caused from to much heat could that be??


Maybe in some cases, but shure not mine. My 6950 is running at stock settings. I avoided overclocking my card right from the beginning.
ID: 49895 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Confusius

Send message
Joined: 31 Mar 10
Posts: 12
Credit: 13,722,511
RAC: 0
Message 49896 - Posted: 5 Jul 2011, 6:35:50 UTC

Maybe there is a debug Version with a verbosed log that can be provided? I would gladly participate in systematicaly testing against this bug.
ID: 49896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
davebodger

Send message
Joined: 3 Jul 11
Posts: 8
Credit: 66,330,086
RAC: 0
Message 49899 - Posted: 5 Jul 2011, 6:55:30 UTC - in response to Message 49887.  

I have this problem too - every unit seems to return this error, exactly as described in this thread.

Sapphire HD4890 - WinXP 32bit SP3 - Q6600 @ 3.43GHz with 4Gb DDR2(800).

I run an overclock so was concerned in case this was indicating an instability, however I have run this system at this overclock on Folding @ Home for several years without issue.


To make it clear - it is the CPU that is overclocked 2.66 -> 3.43GHz. It is watercoooled (CPU and full chipset) so no overheating - in fact it runs cooler by 3 or 4 degrees running BOINC than it did running F@H.

The GPU is an "overclocked" one but the "overclocking" was done by the manufacturer and I have not overclocked it further from it's standard settings (901MHz GPU/1000MHz RAM). It runs at 60C as standard with fan at 25% and ramps up to 81C under full 100% load with fan at 41%. I have already tried icreasing fan speed manually to 60% continuous, which drops temps by at least 20C at all loads, but this did not fix anything.

These GPU chips are OK up to 90C+ so I am well within limits and the system is also running Rosetta and Climate Prediction OK (but they do not use the GPU). It also runs MilkyWay OK if I disable the GPU in BOINC.

I don't know where to start debugging this as I am a new contributor and have not had time to look through all the config files to find if there are any settings I can change that might help. Of course if this is a feneral application problem then there is nothing I can do to fix it. :-(

I did think of replacing the GPU card with a HD6950 that I have, but others here have already reported this problem on that card too, so there seems little point in trying that.

Regards.

Dave.
ID: 49899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
davebodger

Send message
Joined: 3 Jul 11
Posts: 8
Credit: 66,330,086
RAC: 0
Message 49900 - Posted: 5 Jul 2011, 7:27:41 UTC - in response to Message 49887.  

I have this problem too - every unit seems to return this error, exactly as described in this thread.

Sapphire HD4890 - WinXP 32bit SP3 - Q6600 @ 3.43GHz with 4Gb DDR2(800).


Sorry, forgot to say - I am running the latest Catalyst 11.6 drivers also.
ID: 49900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
buck

Send message
Joined: 1 Jul 11
Posts: 1
Credit: 1,178,742
RAC: 0
Message 49906 - Posted: 5 Jul 2011, 9:34:27 UTC

all my WU are aborting due to elapsed time limit

hd 6950, catalyst 11.6, win 64 bit

http://i.imgur.com/l5JYa.png
ID: 49906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dan

Send message
Joined: 17 May 09
Posts: 5
Credit: 25,350,789
RAC: 0
Message 49907 - Posted: 5 Jul 2011, 9:38:59 UTC

Just curious. How is it this bug has continued now for a couple months? It comes and it goes and then it comes back. I'm a software developer and if I allowed this to happen, I'd be crucified...

Have you considered beating your developers more often?
ID: 49907 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dskagcommunity
Avatar

Send message
Joined: 26 Feb 11
Posts: 170
Credit: 205,557,553
RAC: 0
Message 49911 - Posted: 5 Jul 2011, 10:30:43 UTC

hmm very interesting had no problems with the time elapsed bug for last days. only have a 3850 so perhaps the low power helps in this case.
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 49911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 49913 - Posted: 5 Jul 2011, 13:28:02 UTC

I've seen nothing of this bug since BOINC 6.12.33 (alpha) was released a while ago. I've asked many times but have never gotten an answer: is MW trying to use the new credit system? Could this be the cause of the TE bug? One of the admins mentioned such a connection when the bug first appeared but there's been silence on the subject since.

Someone above mentioned OCing: since the weather has turned hot I've noticed an increase in invalid WUs. None of my GPUs ever hit 70C so heat probably shouldn't be a problem. Lowering the OC seemed to help on all but my 5870 which is now crunching Collatz & Moo! where it has zero errors. I've been suspecting a problem with .82 since I never saw this with .62 (cooler temps during the .62 days though). With .62 the OCs were set considerably higher. It's hard to diagnose anything because of the project's love of insta-purge though :( :(
ID: 49913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cristipurdel

Send message
Joined: 1 Jul 11
Posts: 10
Credit: 422,543
RAC: 0
Message 49915 - Posted: 5 Jul 2011, 15:27:18 UTC - in response to Message 49913.  

Let's try a different approach. Is someone with a 6950 or other ATI cards which doesn't experience this problem and has a lower version of Cat 11.6?
ID: 49915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 49917 - Posted: 5 Jul 2011, 17:24:29 UTC - in response to Message 49915.  

Let's try a different approach. Is someone with a 6950 or other ATI cards which doesn't experience this problem and has a lower version of Cat 11.6?

Have a scroll through the top computers list...
ID: 49917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cristipurdel

Send message
Joined: 1 Jul 11
Posts: 10
Credit: 422,543
RAC: 0
Message 49918 - Posted: 5 Jul 2011, 18:00:10 UTC - in response to Message 49917.  

Let's try a different approach. Is someone with a 6950 or other ATI cards which doesn't experience this problem and has a lower version of Cat 11.6?

Have a scroll through the top computers list...

A more easier approach...use the card on something that works for me like seti and collatz
ID: 49918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 49922 - Posted: 5 Jul 2011, 21:05:18 UTC - in response to Message 49918.  

Let's try a different approach. Is someone with a 6950 or other ATI cards which doesn't experience this problem and has a lower version of Cat 11.6?

Have a scroll through the top computers list...

A more easier approach...use the card on something that works for me like seti and collatz

Whatever...just trying to help.
ID: 49922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
davebodger

Send message
Joined: 3 Jul 11
Posts: 8
Credit: 66,330,086
RAC: 0
Message 49923 - Posted: 5 Jul 2011, 21:59:53 UTC

New insight - in BOINC if I select the Properties of a MilkyWay work unit it shows "Estimated app speed = 31110.33 GFLOPs/sec.

This seems a little high to me (by a factor of around 100).

PrimeGrid workunits report 322.12 GFLOPs/sec and Collatz workunits show 354.13 GFLOPs/sec.

It looks to me like the MilkyWay app is mis-estimating the speed of the GPU.

When is starts up, the BOINC event log shows my video card as capable of a peak of 1422 GFLOPs/sec. We know ATI cards seldom reach their peak throughput so ~300 GFLOPs/sec seems a resonable estimate of achievable throughput on my HD4890.

Regards.

Dave.
ID: 49923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 49930 - Posted: 6 Jul 2011, 3:30:59 UTC - in response to Message 49923.  

Perhaps you could try the optimised application then. The flops in that is specified as <flops>1.0e11</flops> and in Task Properties in BOINC Manager I see Estimated app speed 100.00 GFLOPs/sec. Duration correction factor shows 0.0987 so that is close to the ideal of 1. HD 5870 @ 940/500 shows 3008 GFLOPS peak at start in BOINC event log.

I installed the optimised application to run 2 tasks concurrently but I am unable to do that because everything becomes unresponsive, so I am just running a single task now but I left the app_info.xml and .DS_Store files in place.
ID: 49930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 49931 - Posted: 6 Jul 2011, 3:44:53 UTC

The .DS Store file is a byproduct of zipping up the archives on my Mac, you can safely delete it.
ID: 49931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 49933 - Posted: 6 Jul 2011, 4:07:49 UTC - in response to Message 49931.  
Last modified: 6 Jul 2011, 4:14:19 UTC

Thanks for letting me know. I wasn't sure what it was for, I thought it may be some kind of replacement for the brook file that used to be required.

Thanks for hosting the files also.
ID: 49933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 49934 - Posted: 6 Jul 2011, 6:20:06 UTC - in response to Message 49930.  

....Duration correction factor shows 0.0987 so that is close to the ideal of 1....


Oops, added an extra zero by mistake, duration correction factor was 0.9870 and is now 0.9928.
ID: 49934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49939 - Posted: 6 Jul 2011, 8:27:07 UTC

I didn't finde the link to the opti apps anymore in this forum.
For everybody witch have the same problem:

Milkyway optimized Apps on Arkayn's download page

Mayby somebody with the TE -Bug could try out.

@kashi: do have tried the commandline parameter to increase the responsive in the App_info.xml:

<cmdline>--gpu-target-frequency 120 </cmdline>
120 is an rather hig value default is 30 but this works on my old machine with an rather old (Intel(R) Pentium(R) 4 CPU 3.20GHz) Proz. and 1 HD4850 without any noticeable loss of performance (GPU utilization 99%).

To my experiments yesterday with higher temperatures:
Traised the Temperature from 72 C to 91 C by increasing the core clock from 725 MHZ tu 755 MHZ (that ist 120% OC to orignal 625 MHZ !!) and reducing the fan speed. But nothing happens besides the WU finshed faster ;-).

I also tried stock app without app_inf.xml. (1 WU instead of 2 WU running at the sayme time) this reduces my output about 5% !!! (This comes frome the slow CPU when this time isn't mask by the outer WU et the end of a WU).

Well next week I'll try to upgrade my catalyst from 10.2 to 11.6.
I hope I could easily downgrade again if I have to. I have never done this before.
Do anybody know witch is the best way to do that (are there any traps??)

greetings

Franz
ID: 49939 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : News : maximum time limit elapsed bug

©2024 Astroinformatics Group