Welcome to MilkyWay@home

Massive "exceeded elapsed time limit" errors

Message boards : Number crunching : Massive "exceeded elapsed time limit" errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Confusius

Send message
Joined: 31 Mar 10
Posts: 12
Credit: 13,722,511
RAC: 0
Message 49816 - Posted: 3 Jul 2011, 6:03:09 UTC

Hi,
after months of not beeing able to participate, due to missing hardware, i finally got my new 6950.

I updated to the latest boinc client, started milkyway and was amazed: error following error.

Every MW WU i got since start yesterday afternoon fails: "exceeded elapsed time limit".

Other Projects run perfect.
Drivers, Boinc and MW App are up to date, no app_config is used.

So here the Question: am i the only one wasting GPU Time or is it a common problem?

thnx in advance

André
ID: 49816 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 49817 - Posted: 3 Jul 2011, 7:06:58 UTC - in response to Message 49816.  

Can you unhide your computer?
ID: 49817 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49821 - Posted: 3 Jul 2011, 9:53:16 UTC
Last modified: 3 Jul 2011, 9:53:37 UTC

I have found 2 machines (btw. users) witch seemd they have an almost similar configuration to yours.

http://milkyway.cs.rpi.edu/milkyway/hosts_user.php?userid=80670

and
http://milkyway.cs.rpi.edu/milkyway/hosts_user.php?userid=2765

The difference they work with the newer Beta-Boinc client (6.12.33)

Maybe it's worth a try

lg

franz
ID: 49821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Confusius

Send message
Joined: 31 Mar 10
Posts: 12
Credit: 13,722,511
RAC: 0
Message 49829 - Posted: 3 Jul 2011, 12:20:40 UTC

I have unhidden the machine.

ID: 49829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 588
Credit: 18,918,826
RAC: 5,139
Message 49831 - Posted: 3 Jul 2011, 12:40:50 UTC - in response to Message 49829.  

As a workaround, set the DCF in your client_state.xml to 100, that should help untill the server side DCF kicks in:

<project>
    <master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url>
    (...)
    <duration_correction_factor>100.000000</duration_correction_factor>

ID: 49831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 49832 - Posted: 3 Jul 2011, 12:41:03 UTC

I've seen two of your wu's failing with error-code -177.
There are ongoing discussions in other threads about this error (it's a timeout-error); other crunchers face the same problem, so it might not be a problem of your computer / setup.
ID: 49832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Confusius

Send message
Joined: 31 Mar 10
Posts: 12
Credit: 13,722,511
RAC: 0
Message 49833 - Posted: 3 Jul 2011, 12:41:25 UTC

I gave it a try and installed the Boinc beta client, but this did not change anything.

As far as i can see the calculations run fine till 1:05 (about 60%), then the error shows up. But neverless the calculation does continue until 100% is reached (progress bar continues as usual), while (sometimes) a new calculation thread gets startet before the last one has finished (Progress 100%).

Like mentioned before, this only happens with milkyway projects. All others run smoothly like they should.
ID: 49833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Confusius

Send message
Joined: 31 Mar 10
Posts: 12
Credit: 13,722,511
RAC: 0
Message 49834 - Posted: 3 Jul 2011, 12:46:20 UTC - in response to Message 49831.  

thanks for the advice, but changing the DCF did not change anything. Still got the problem.
ID: 49834 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49835 - Posted: 3 Jul 2011, 13:19:29 UTC

Mayby you want to make a try with the older 0.62 application
I can upload it for you to a WEB-Server.

I work on my dual HD4850 Machine with that, without Problems so if you want let me know.

(Kannst mir ein PN schicken)
ID: 49835 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Simplex0
Avatar

Send message
Joined: 11 Nov 07
Posts: 232
Credit: 178,229,009
RAC: 0
Message 49836 - Posted: 3 Jul 2011, 14:03:56 UTC - in response to Message 49835.  

Mayby you want to make a try with the older 0.62 application
I can upload it for you to a WEB-Server.

I work on my dual HD4850 Machine with that, without Problems so if you want let me know.

(Kannst mir ein PN schicken)


I don't think that is a good idea. The validation system here seams to only work as long we use the right application. If you are using the right application and produce a correct output file it can be marked as 'Invalid' if it is validated against 2 other output files produced by an incorect application as long they are identical.


We have had that problem here before and i believe that the validation procedure still works the same way.
ID: 49836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 49838 - Posted: 3 Jul 2011, 14:31:41 UTC - in response to Message 49834.  

thanks for the advice, but changing the DCF did not change anything. Still got the problem.


Changing the DCF will have no effect on the problem - DCF has nothing to do with the elapsed time bug.

There is a thread dealing with this at:
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2468

Regards
Zy
ID: 49838 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49845 - Posted: 3 Jul 2011, 17:12:30 UTC - in response to Message 49836.  

Mayby you want to make a try with the older 0.62 application
I can upload it for you to a WEB-Server.

I work on my dual HD4850 Machine with that, without Problems so if you want let me know.

(Kannst mir ein PN schicken)


I don't think that is a good idea. The validation system here seams to only work as long we use the right application. If you are using the right application and produce a correct output file it can be marked as 'Invalid' if it is validated against 2 other output files produced by an incorect application as long they are identical.


We have had that problem here before and i believe that the validation procedure still works the same way.


You are quit right! If you are using "opti" Apps with an App_info.xml you have to be very carefully, because you must have the your system in the eye every day. You have no automatic Updateetc, and you have to be sure as Simplex0 says the you don't produce a mess.
In this case a watched many WU's if they are validating against the 0.82 Application. This takes some time and is not so easy because you have to wait till the other machine is ready. Than you have only a short time to proof till the wu is out of the database.
I know that, but I also know that 0.62 App doesn't mess up system.
ID: 49845 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gamer007

Send message
Joined: 29 Aug 07
Posts: 4
Credit: 9,480,721
RAC: 0
Message 50016 - Posted: 8 Jul 2011, 22:03:30 UTC
Last modified: 8 Jul 2011, 22:04:40 UTC

I reformatted my computer around 2 days ago, rejoined this project and all my WU have gotten this error. I've tried the latest BOINC version and even downgraded to 6.12.27.

Is there a solution to this problem?
ID: 50016 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 50023 - Posted: 9 Jul 2011, 8:02:44 UTC

For win64 systems an app_info.xml file has helped in several cases.:

<app_info>
 <app>
 <name>milkyway</name>
 </app>
 <file_info>
  <name>milkyway_separation_0.82_windows_x86_64__ati14.exe</name>
  <executable/>
 </file_info>
 <app_version>
  <app_name>milkyway</app_name>
  <version_num>82</version_num>
    <flops>1.0e11</flops>
    <avg_ncpus>0.05</avg_ncpus>
    <max_ncpus>1</max_ncpus>
    <plan_class>ati14ati</plan_class>
    <coproc>
      <type>ATI</type>
      <count>0.5</count>
    </coproc>
    <cmdline>--gpu-target-frequency 100 --gpu-disable-checkpointing</cmdline>
  <file_ref>
   <file_name>milkyway_separation_0.82_windows_x86_64__ati14.exe</file_name>
   <main_program/>
  </file_ref>
 </app_version>
</app_info>


It has to be put into your project dir
Win7 something like:

C:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway

HTH

Franz
ID: 50023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gamer007

Send message
Joined: 29 Aug 07
Posts: 4
Credit: 9,480,721
RAC: 0
Message 50033 - Posted: 9 Jul 2011, 19:40:29 UTC - in response to Message 50023.  

Thanks. Looks like it's working.

Any idea what's causing it? Before I reformatted, Milkyway worked perfectly without the app_info. Then this happened after and I didn't change any hardware. Also noticed the Notices tab says "Your app_info.xml file doesn't have a usable version of MilkyWay@Home N-Body Simulation."
ID: 50033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 50034 - Posted: 9 Jul 2011, 20:44:50 UTC

"Reformatted" did you change the OS? win32 to win64?
Mayby here is a problem, which is "repaired" by an parameter in the App_info.xml.
But this has to be looked at by a programmer (-> Matt) but I think they are in holidays or so.
To the N-body massage: Do you use your CPU also?

Then the app_info.xml has to be extended for the N-body application but I have no valid example.
But I think you don't, so you can ignore this notice
regards

franz
ID: 50034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gamer007

Send message
Joined: 29 Aug 07
Posts: 4
Credit: 9,480,721
RAC: 0
Message 50035 - Posted: 9 Jul 2011, 23:23:37 UTC - in response to Message 50034.  

Nope. I've always been using Windows 7 Ultimate 64-bit. Thanks for your help though, I'm happy that it's now working.
ID: 50035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Massive "exceeded elapsed time limit" errors

©2024 Astroinformatics Group