Welcome to MilkyWay@home

another change for the maximum time limit elapsed bug


Advanced search

Message boards : News : another change for the maximum time limit elapsed bug
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 48626 - Posted: 9 May 2011, 10:24:30 UTC

I've tried yet another fix (the rsc_fpops_bound is now 10000 times higher than our estimate). I'm really hoping this should cover most everyone that's still having workunit immediately error out. Let me know if it works.
ID: 48626 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
50 million credit badge10 year member badge
Message 48656 - Posted: 9 May 2011, 21:32:14 UTC
Last modified: 9 May 2011, 21:40:21 UTC

Latest tweeaks are still not working for 2 of the 4 hosts I have running MW CPU apps which have more than one core.

The two which run them successfully are a Dell Latitude notebook with an Intel CD T2400 processor runing XPP x86 SP3 and the Ph II X4 955 running W7U x64. One note about the 955 is since it's running in protected app mode (service) BOINC cannot currently detect the fact the IGP is enabled. Also, the IGP is the only GPU it has.

The other Ph II X4's fail N-Body instantly with a 'code 128; No child processes to wait for' error. One is a 945 with the IGP enabled and is the only GPU, and the other is the 955 with the HD 4850 (both enabled in BIOS, IGP primary, but the IGP disabled for BOINC in cc_config). Both the hosts are running XPP x64 SP2, and all the hosts with ATI graphics are running Cat 11.3.

I was able to have Process Monitor running just before the 955 in front of me tried to run the last N-Body it got, but I haven't looked over the capture file yet. So there might be some more intel to gained.
ID: 48656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChrisS

Send message
Joined: 10 Feb 09
Posts: 1
Credit: 1,771,086
RAC: 0
1 million credit badge10 year member badge
Message 48659 - Posted: 9 May 2011, 22:43:42 UTC

I've just had approximatley 10 work-units fail due to 'computational errors' after approx 5-15 mins is this a related issue?
ID: 48659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
50 million credit badge10 year member badge
Message 48660 - Posted: 9 May 2011, 22:54:08 UTC - in response to Message 48659.  

It would seem there might be an issue with some systems not related to the resource bounds problems Travis has been working on.

Obviously it helped some folks based on other posts I've seen (code 177's), but I'd almost bet good money it hasn't any difference for hosts throwing code 128's, 185's and 226's.
ID: 48660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 4 Feb 11
Posts: 83
Credit: 40,628,976
RAC: 27
30 million credit badge8 year member badge
Message 48661 - Posted: 9 May 2011, 23:17:29 UTC

You might need to double the deadline. I just got a computation error on an N-body work unit after getting around 63% of the work unit done. I am using a Core i7 980X with hyper-threading enabled. Nothing is overclocked.
ID: 48661 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gumpokc

Send message
Joined: 7 Sep 09
Posts: 3
Credit: 147,071
RAC: 0
100 thousand credit badge9 year member badge
Message 48672 - Posted: 10 May 2011, 6:10:11 UTC

5/10/2011 1:04:36 AM Milkyway@home Starting de_nbody_orphan_test_2model_4_20631_1304909701_1
5/10/2011 1:04:36 AM Milkyway@home Starting task de_nbody_orphan_test_2model_4_20631_1304909701_1 using milkyway_nbody version 40
5/10/2011 1:04:41 AM Milkyway@home Computation for task de_nbody_orphan_test_2model_4_20631_1304909701_1 finished


doing the exact same thing ever since nbody 40 came out.
I've tried setting it so I don't even get nbody's anymore, but it keeps downloading them.
I'm going to drop MW@H, maybe i'll check back in 6 months and see if things have been worked out.
ID: 48672 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
olav

Send message
Joined: 31 Mar 10
Posts: 2
Credit: 248,732
RAC: 0
100 thousand credit badge9 year member badge
Message 48674 - Posted: 10 May 2011, 7:20:58 UTC

Hi!

It seems to work on my machine now. The latest n_body task is at 8% progress now, which quite a few before never reached due to some software error. Good job!


Cheers,

Olav
ID: 48674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
olav

Send message
Joined: 31 Mar 10
Posts: 2
Credit: 248,732
RAC: 0
100 thousand credit badge9 year member badge
Message 48675 - Posted: 10 May 2011, 7:30:02 UTC - in response to Message 48674.  

Sorry, white smoke too early... It crashed again - after completing roughly 60% of the unit.
ID: 48675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profiledskagcommunity
Avatar

Send message
Joined: 26 Feb 11
Posts: 170
Credit: 183,085,176
RAC: 0
100 million credit badge8 year member badge
Message 48733 - Posted: 13 May 2011, 6:59:47 UTC

Let the MW Computer run overnight, all seems fine again :)
DSKAG Austria Research Team: http://www.research.dskag.at



ID: 48733 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Anton Rang

Send message
Joined: 25 Feb 11
Posts: 2
Credit: 1,353,635
RAC: 0
1 million credit badge8 year member badge
Message 48737 - Posted: 13 May 2011, 15:00:40 UTC - in response to Message 48626.  

On my Intel Mac, I’m still seeing the nbody computations error out (though after about 19 seconds rather than 3 seconds of elapsed time), but now the separation jobs have estimated completion times which are two orders of magnitude more time than they actually take.

(It looks like those nbody jobs were received yesterday and today, May 12 & 13.)
ID: 48737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
POPSIE

Send message
Joined: 25 Jan 11
Posts: 12
Credit: 3,915,453
RAC: 4,616
3 million credit badge8 year member badge
Message 48845 - Posted: 18 May 2011, 6:04:26 UTC

since Mai 7. n-body produces error.

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
Maximum elapsed time exceeded
</message>
<stderr_txt>
<search_application>milkywayathome nbody 0.40 Windows x86_64 double OpenMP Crlibm</search_application>
07:19:24: Using OpenMP 4 max threads on a system with 4 processors

</stderr_txt>
]]>


For more Info look at this
ID: 48845 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : another change for the maximum time limit elapsed bug

©2019 Astroinformatics Group