Message boards :
Number crunching :
Computation Errors
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 16 Jun 10 Posts: 6 Credit: 7,402,186 RAC: 0 |
tired of computational errors ... aborting all work units. |
Send message Joined: 18 Jul 09 Posts: 1 Credit: 144,463 RAC: 0 |
Yes I agree all of my mw@h 0.04 units are ending in a computational error, each one taking about 42hrs to run. Something has gone very wrong somewhere and needs to be corrected, this is one of mine. Name de_14_3s_5_1295023_1286658690_2 Workunit 163563070 Created 9 Oct 2010 22:27:41 UTC Sent 9 Oct 2010 22:35:01 UTC Received 14 Oct 2010 3:25:26 UTC Server state Over Outcome Client error Client state Compute error Exit status 0 (0x0) Computer ID 90687 Report deadline 17 Oct 2010 22:35:01 UTC Run time 144876.90625 CPU time 136111.3 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkywayathome separation 0.4 Windows x86 double </search_application> <search_application> milkywayathome separation 0.4 Windows x86 double </search_application> 10:53:51: Checkpoint exists. Attempting to resume from it 10:53:51: Successfully resumed checkpoint <search_application> milkywayathome separation 0.4 Windows x86 double </search_application> 13:25:02: Checkpoint exists. Attempting to resume from it 13:25:02: Successfully resumed checkpoint <search_application> milkywayathome separation 0.4 Windows x86 double </search_application> 00:28:49: Checkpoint exists. Attempting to resume from it 00:28:49: Successfully resumed checkpoint <search_application> milkywayathome separation 0.4 Windows x86 double </search_application> 16:56:33: Checkpoint exists. Attempting to resume from it 16:56:33: Successfully resumed checkpoint <search_application> milkywayathome separation 0.4 Windows x86 double </search_application> 21:01:23: Checkpoint exists. Attempting to resume from it 21:01:23: Successfully resumed checkpoint <background_integral> 0.00076736511344802007 </background_integral> <stream_integrals> 1910.83401000542110000000 1123.56796570525830000000 117.93797255286349000000 </stream_integrals> <background_only_likelihood> -3.29763314280799860000 </background_only_likelihood> <stream_only_likelihood> -11.15931313169090300000 -3.69598905249152660000 -4.08219071179394350000 </stream_only_likelihood> <search_likelihood> -2.96516921866002560000 </search_likelihood> 02:34:36 (3652): called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>de_14_3s_5_1295023_1286658690_2_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Validate state Invalid Claimed credit 382.400815366014 Granted credit 0 application version MilkyWay@Home v0.04 |
Send message Joined: 3 May 10 Posts: 74 Credit: 1,532,760 RAC: 0 |
Hi Crunch3r Looks like you've spotted the problem does anyone here know if the guys in the back office have figured it out yet. Am I being stupid why don't they just go back to the previous version of the software as it worked. Ok I have a WU that should have taken 16 hours to run that has been running for 32 hours and has 18 hours left so I guess that it will be a computation error as well. I cant remember when I last got a WU to run sucessfully on MW&H |
Send message Joined: 16 Apr 08 Posts: 13 Credit: 718,465 RAC: 0 |
I just suspended work for Milkyway@home and aborted my last work unit which is v.04. I think it best to use my computer cycles for my other astronomy research projects until such a time as Travis can correct the problem. I will keep checking daily so that I can crunch again for this fine project. Dr. Ronald C. Spencer Emeritus Member: The American Astronomical Society Member: The Division for Planetary Sciences of the AAS Member:The American Association of Variable Star Observers Member: The Astronomical Society of the Pacific Member: The Planetary Society |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
Hi Crunch3r Something else seems to be fishy. Just for the fun of it, I tested with the last optimized cpu version: - no progress bar - still running for hours after time was counted down to zero - missing output file |
Send message Joined: 19 Feb 09 Posts: 29 Credit: 5,452,691 RAC: 2 |
the solution I use is as follows I use a PC with 2 CPU only and have the SSE3 running I notice they should show a percentage complete of 0.062 within about 2 minutes , if they do, they run ok for me within about 4 to 6 hours and I get credits. if they still show 0.00% complete after a few minutes I abort them and I get another download to try as these ones never seem to start at all. it is not easy to spot those when you are asleep so what I do is to just make sure about 3 or 4 of these have started running correctly to prove the are all going to work during the night and set it to no new tasks until I am back on the computer with 3 or 4 ready to submit the following morning then I allow new tasks. I hope this helps |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
the solution I use is as follows That is what I do. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 13 Jan 08 Posts: 2 Credit: 1,100,134 RAC: 0 |
Me too! 13/10/2553 20:14:37 Milkyway@home Starting de_12_3s_5_1369434_1286974525_0 13/10/2553 20:14:37 Milkyway@home Starting task de_12_3s_5_1369434_1286974525_0 using milkyway version 4 16/10/2553 9:14:00 Milkyway@home Computation for task de_12_3s_5_1369434_1286974525_0 finished 16/10/2553 9:14:00 Milkyway@home Output file de_12_3s_5_1369434_1286974525_0_0 for task de_12_3s_5_1369434_1286974525_0 absent Task 218961821 Name de_12_3s_5_1369434_1286974525_0 Workunit 165611481 Created 13 Oct 2010 12:55:33 UTC Sent 13 Oct 2010 13:03:15 UTC Received 16 Oct 2010 2:15:06 UTC Server state Over Outcome Client error Client state Compute error Exit status 0 (0x0) Computer ID 220214 Report deadline 21 Oct 2010 13:03:15 UTC Run time 218064.71875 CPU time 206519.2 stderr out <core_client_version>6.10.58</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkywayathome separation 0.4 Windows x86 double </search_application> <background_integral> 0.00035138456830742249 </background_integral> <stream_integrals> 1092.14776775121570000000 0.15442448218105154000 292.03734846070904000000 </stream_integrals> <background_only_likelihood> -3.36708912498443880000 </background_only_likelihood> <stream_only_likelihood> -4.43237294000859270000 -1.#IND0000000000000000 -5.11089042641821490000 </stream_only_likelihood> <search_likelihood> -3.19042820882796500000 </search_likelihood> 09:13:58 (3252): called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>de_12_3s_5_1369434_1286974525_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> ]]> Validate state Invalid Claimed credit 541.049792898229 Granted credit 0 application version MilkyWay@Home v0.04 |
Send message Joined: 4 Feb 08 Posts: 19 Credit: 179,971 RAC: 0 |
|
Send message Joined: 3 May 10 Posts: 74 Credit: 1,532,760 RAC: 0 |
Check out NEWS "a fix for output file" by Travis at 03:00 this morning. This looks like the start of a fix. I am going to download and try the new WUs |
Send message Joined: 13 Jan 08 Posts: 2 Credit: 1,100,134 RAC: 0 |
Why don't give any credit for all fail WU? |
Send message Joined: 3 May 10 Posts: 74 Credit: 1,532,760 RAC: 0 |
I do not think that the BOINC software is sophisticated enough to award credits for failed tasks even though the failure is not of our making. As it is the same for everybody it should not make any difference to any competition that you are involved with. |
Send message Joined: 3 May 10 Posts: 74 Credit: 1,532,760 RAC: 0 |
I think that I spoke too soon as my first two WUs are running way over time. One is running 17 hours with a time to completion of 28 hours. This is way over the estimate of 18 hours so I think that I have either got the last of the bad ones or something else is going wrong. Is anybody else having this problem? For full details check out News> a fix for the output file issue by Travis my post 42998 |
Send message Joined: 28 Sep 10 Posts: 3 Credit: 20,534,099 RAC: 0 |
my WUs are now processing to completion... woot!! |
Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0 |
Is anybody else having this problem? Yes. I have two 0.40 WU's claiming they will take 52 hours to complete. Initial estimate was 29 hours before they started running. Other WU's, though, are completing in 18-20 hours. |
Send message Joined: 3 May 10 Posts: 74 Credit: 1,532,760 RAC: 0 |
I dumped these WUs after 60 hours without producing a result. I then reset the project and got WUs showing 4:31 to completion. The first of these is has been running for 2:41 and now shows 7:55 and rising to completion. This is very weird and I still dont understand it. Quite a few people are reporting thing being back to normal but I dont see it that way yet. |
Send message Joined: 27 Sep 10 Posts: 1 Credit: 98,790 RAC: 0 |
I'm having the same problem.When it starts it shows 4-5 hours then it starts to increase to 20 plus.I just aborted 1 that had 10 hours on it but now showed 20+ more to complete. |
©2024 Astroinformatics Group