Welcome to MilkyWay@home

Computation Errors

Message boards : Number crunching : Computation Errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
pstehno
Avatar

Send message
Joined: 16 Jun 10
Posts: 6
Credit: 7,402,186
RAC: 0
Message 42829 - Posted: 13 Oct 2010, 22:48:11 UTC

tired of computational errors ... aborting all work units.
ID: 42829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Anthony Liggins

Send message
Joined: 18 Jul 09
Posts: 1
Credit: 144,463
RAC: 0
Message 42836 - Posted: 14 Oct 2010, 4:18:47 UTC - in response to Message 42822.  

Yes I agree all of my mw@h 0.04 units are ending in a computational error, each one taking about 42hrs to run. Something has gone very wrong somewhere and needs to be corrected, this is one of mine.

Name de_14_3s_5_1295023_1286658690_2
Workunit 163563070
Created 9 Oct 2010 22:27:41 UTC
Sent 9 Oct 2010 22:35:01 UTC
Received 14 Oct 2010 3:25:26 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 90687
Report deadline 17 Oct 2010 22:35:01 UTC
Run time 144876.90625
CPU time 136111.3
stderr out
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
10:53:51: Checkpoint exists. Attempting to resume from it
10:53:51: Successfully resumed checkpoint
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
13:25:02: Checkpoint exists. Attempting to resume from it
13:25:02: Successfully resumed checkpoint
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
00:28:49: Checkpoint exists. Attempting to resume from it
00:28:49: Successfully resumed checkpoint
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
16:56:33: Checkpoint exists. Attempting to resume from it
16:56:33: Successfully resumed checkpoint
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
21:01:23: Checkpoint exists. Attempting to resume from it
21:01:23: Successfully resumed checkpoint
<background_integral> 0.00076736511344802007 </background_integral>
<stream_integrals> 1910.83401000542110000000 1123.56796570525830000000 117.93797255286349000000 </stream_integrals>
<background_only_likelihood> -3.29763314280799860000 </background_only_likelihood>
<stream_only_likelihood> -11.15931313169090300000 -3.69598905249152660000 -4.08219071179394350000 </stream_only_likelihood>
<search_likelihood> -2.96516921866002560000 </search_likelihood>
02:34:36 (3652): called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>de_14_3s_5_1295023_1286658690_2_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
Validate state Invalid
Claimed credit 382.400815366014
Granted credit 0
application version MilkyWay@Home v0.04
ID: 42836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42837 - Posted: 14 Oct 2010, 4:46:20 UTC

Hi Crunch3r

Looks like you've spotted the problem does anyone here know if the guys in the back office have figured it out yet. Am I being stupid why don't they just go back to the previous version of the software as it worked.

Ok I have a WU that should have taken 16 hours to run that has been running for 32 hours and has 18 hours left so I guess that it will be a computation error as well. I cant remember when I last got a WU to run sucessfully on MW&H
ID: 42837 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr. Ronald C. Spencer

Send message
Joined: 16 Apr 08
Posts: 13
Credit: 699,460
RAC: 222
Message 42841 - Posted: 14 Oct 2010, 12:58:15 UTC - in response to Message 42837.  
Last modified: 14 Oct 2010, 13:00:03 UTC

I just suspended work for Milkyway@home and aborted my last work unit which is v.04. I think it best to use my computer cycles for my other astronomy research projects until such a time as Travis can correct the problem. I will keep checking daily so that I can crunch again for this fine project.

Dr. Ronald C. Spencer
Emeritus Member: The American Astronomical Society
Member: The Division for Planetary Sciences of the AAS
Member:The American Association of Variable Star Observers
Member: The Astronomical Society of the Pacific
Member: The Planetary Society
ID: 42841 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 42865 - Posted: 15 Oct 2010, 11:17:51 UTC - in response to Message 42837.  

Hi Crunch3r

Looks like you've spotted the problem does anyone here know if the guys in the back office have figured it out yet. Am I being stupid why don't they just go back to the previous version of the software as it worked.


Something else seems to be fishy.
Just for the fun of it, I tested with the last optimized cpu version:
- no progress bar
- still running for hours after time was counted down to zero
- missing output file
ID: 42865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Paul Forsdick

Send message
Joined: 19 Feb 09
Posts: 29
Credit: 5,452,255
RAC: 0
Message 42867 - Posted: 15 Oct 2010, 11:47:21 UTC

the solution I use is as follows

I use a PC with 2 CPU only and have the SSE3 running

I notice they should show a percentage complete of 0.062 within about 2 minutes , if they do, they run ok for me within about 4 to 6 hours and I get credits.

if they still show 0.00% complete after a few minutes I abort them and I get another download to try as these ones never seem to start at all.

it is not easy to spot those when you are asleep so what I do is to just make sure about 3 or 4 of these have started running correctly to prove the are all going to work during the night and set it to no new tasks until I am back on the computer with 3 or 4 ready to submit the following morning then I allow new tasks.
I hope this helps
ID: 42867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 42868 - Posted: 15 Oct 2010, 12:07:46 UTC - in response to Message 42867.  

the solution I use is as follows

I notice they should show a percentage complete of 0.062 within about 2 minutes , if they do, they run ok for me within about 4 to 6 hours and I get credits.

if they still show 0.00% complete after a few minutes I abort them and I get another download to try as these ones never seem to start at all.


That is what I do.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 42868 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sonthakit

Send message
Joined: 13 Jan 08
Posts: 2
Credit: 1,100,134
RAC: 0
Message 42884 - Posted: 16 Oct 2010, 3:16:06 UTC

Me too!

13/10/2553 20:14:37 Milkyway@home Starting de_12_3s_5_1369434_1286974525_0
13/10/2553 20:14:37 Milkyway@home Starting task de_12_3s_5_1369434_1286974525_0 using milkyway version 4
16/10/2553 9:14:00 Milkyway@home Computation for task de_12_3s_5_1369434_1286974525_0 finished
16/10/2553 9:14:00 Milkyway@home Output file de_12_3s_5_1369434_1286974525_0_0 for task de_12_3s_5_1369434_1286974525_0 absent

Task 218961821

Name de_12_3s_5_1369434_1286974525_0
Workunit 165611481
Created 13 Oct 2010 12:55:33 UTC
Sent 13 Oct 2010 13:03:15 UTC
Received 16 Oct 2010 2:15:06 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 0 (0x0)
Computer ID 220214
Report deadline 21 Oct 2010 13:03:15 UTC
Run time 218064.71875
CPU time 206519.2
stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
<background_integral> 0.00035138456830742249 </background_integral>
<stream_integrals> 1092.14776775121570000000 0.15442448218105154000 292.03734846070904000000 </stream_integrals>
<background_only_likelihood> -3.36708912498443880000 </background_only_likelihood>
<stream_only_likelihood> -4.43237294000859270000 -1.#IND0000000000000000 -5.11089042641821490000 </stream_only_likelihood>
<search_likelihood> -3.19042820882796500000 </search_likelihood>
09:13:58 (3252): called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>de_12_3s_5_1369434_1286974525_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid
Claimed credit 541.049792898229
Granted credit 0
application version MilkyWay@Home v0.04
ID: 42884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ricky@SETI.USA

Send message
Joined: 4 Feb 08
Posts: 19
Credit: 179,971
RAC: 0
Message 42913 - Posted: 17 Oct 2010, 12:54:46 UTC

Don't get me wrong... I didn't mean to say I would NOT run MW anymore. I am just waiting for them to get the problem fixed and then I will return (if I am not in the middle of a PG race!):)
ID: 42913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42931 - Posted: 18 Oct 2010, 11:06:51 UTC

Check out NEWS "a fix for output file" by Travis at 03:00 this morning. This looks like the start of a fix. I am going to download and try the new WUs
ID: 42931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sonthakit

Send message
Joined: 13 Jan 08
Posts: 2
Credit: 1,100,134
RAC: 0
Message 42932 - Posted: 18 Oct 2010, 14:06:29 UTC

Why don't give any credit for all fail WU?
ID: 42932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42935 - Posted: 18 Oct 2010, 14:48:33 UTC - in response to Message 42932.  

I do not think that the BOINC software is sophisticated enough to award credits for failed tasks even though the failure is not of our making. As it is the same for everybody it should not make any difference to any competition that you are involved with.
ID: 42935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42999 - Posted: 20 Oct 2010, 6:00:44 UTC
Last modified: 20 Oct 2010, 6:02:18 UTC

I think that I spoke too soon as my first two WUs are running way over time. One is running 17 hours with a time to completion of 28 hours. This is way over the estimate of 18 hours so I think that I have either got the last of the bad ones or something else is going wrong.

Is anybody else having this problem?

For full details check out News> a fix for the output file issue by Travis my post 42998
ID: 42999 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gary Sell

Send message
Joined: 28 Sep 10
Posts: 3
Credit: 20,534,099
RAC: 0
Message 43022 - Posted: 20 Oct 2010, 18:55:58 UTC

my WUs are now processing to completion... woot!!
ID: 43022 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 43023 - Posted: 20 Oct 2010, 19:45:22 UTC - in response to Message 42999.  

Is anybody else having this problem?

Yes. I have two 0.40 WU's claiming they will take 52 hours to complete. Initial estimate was 29 hours before they started running. Other WU's, though, are completing in 18-20 hours.
ID: 43023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 43030 - Posted: 20 Oct 2010, 22:45:50 UTC

I dumped these WUs after 60 hours without producing a result. I then reset the project and got WUs showing 4:31 to completion. The first of these is has been running for 2:41 and now shows 7:55 and rising to completion. This is very weird and I still dont understand it.

Quite a few people are reporting thing being back to normal but I dont see it that way yet.
ID: 43030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael

Send message
Joined: 27 Sep 10
Posts: 1
Credit: 98,790
RAC: 0
Message 43033 - Posted: 21 Oct 2010, 0:38:56 UTC - in response to Message 43030.  

I'm having the same problem.When it starts it shows 4-5 hours then it starts to increase to 20 plus.I just aborted 1 that had 10 hours on it but now showed 20+ more to complete.
ID: 43033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Computation Errors

©2024 Astroinformatics Group