Welcome to MilkyWay@home

Computation Errors

Message boards : Number crunching : Computation Errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Ricky@SETI.USA

Send message
Joined: 4 Feb 08
Posts: 19
Credit: 179,971
RAC: 0
Message 42763 - Posted: 11 Oct 2010, 18:47:09 UTC

Had 4 WU's errored out after 55+ hours here is one:

http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=214612877

Any idea what caused this?
ID: 42763 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gary Sell

Send message
Joined: 28 Sep 10
Posts: 3
Credit: 20,534,099
RAC: 0
Message 42774 - Posted: 12 Oct 2010, 3:23:33 UTC - in response to Message 42763.  

I have also had three separate packets error out with computational errors in the last few days. It is a little worrisome.

ID: 42774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile James Sotherden
Avatar

Send message
Joined: 3 Jan 09
Posts: 139
Credit: 50,066,562
RAC: 0
Message 42781 - Posted: 12 Oct 2010, 12:25:54 UTC
Last modified: 12 Oct 2010, 12:26:18 UTC

Ive had that same problem with my mac and the i7. They crunch for 32 hours and are allmost done then get a comp error. Kind of ticks me off. So have set both for NNT and will let them die.
ID: 42781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile prairie69

Send message
Joined: 2 Nov 08
Posts: 11
Credit: 169,476
RAC: 0
Message 42783 - Posted: 12 Oct 2010, 14:33:03 UTC - in response to Message 42781.  

Ive had that same problem with my mac and the i7. They crunch for 32 hours and are allmost done then get a comp error. Kind of ticks me off. So have set both for NNT and will let them die.


Ditto. My last two units computed for over twenty hours each and then were called 'computational errors' and my computer got no credit for the time. Pisses me off, to be honest. I'm seriously thinking of abandoning Milky Way if this keeps up.
ID: 42783 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42786 - Posted: 12 Oct 2010, 15:19:43 UTC

In the last few days I have had four computational errors with MW@H O.O4 with the following properties on the latest two.

Task de_16_2s_5_265369_1286519163_1 de_12_2s_5_269934_1286519188_0
CPU time 24:35:08 25:42:16
Elapsed 25:23:30 27:26:07

From what I read here this is not an isolated event what is happening with MW@H?
I am also having issues with the SETI people and their server problems and this leads me to question the whole BOINC effort that I am making with my meagre resources. What is the point of me wasting power for something that leads to an error? If they have sent out a batch of WUs that cannot be calculated then they should tell us that that is what they are doing.

Does anybody know what is going on?
ID: 42786 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42787 - Posted: 12 Oct 2010, 15:20:44 UTC
Last modified: 12 Oct 2010, 15:23:09 UTC

Ignore this.
ID: 42787 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ricky@SETI.USA

Send message
Joined: 4 Feb 08
Posts: 19
Credit: 179,971
RAC: 0
Message 42795 - Posted: 12 Oct 2010, 21:11:36 UTC
Last modified: 12 Oct 2010, 21:18:38 UTC

My Last 2 WU's also ended with Computation Error. BOINC reports the the output file absent.




stderr out <core_client_version>6.10.43</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
<background_integral> 0.00021293239583262686 </background_integral>
<stream_integrals> 863.45934626397388000000 120.64898908315530000000 1520.57420027483390000000 </stream_integrals>
<background_only_likelihood> -3.28895426621550650000 </background_only_likelihood>
<stream_only_likelihood> -4.30389524353796380000 -4.63177136354859620000 -4.25227405661039270000 </stream_only_likelihood>
<search_likelihood> -3.08599847228174930000 </search_likelihood>
14:23:53 (3668): called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>de_11_3s_5_97691_1286493870_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
ID: 42795 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 42796 - Posted: 12 Oct 2010, 21:36:05 UTC - in response to Message 42786.  


From what I read here this is not an isolated event what is happening with MW@H?
I am also having issues with the SETI people and their server problems and this leads me to question the whole BOINC effort that I am making with my meagre resources. What is the point of me wasting power for something that leads to an error? If they have sent out a batch of WUs that cannot be calculated then they should tell us that that is what they are doing.

Does anybody know what is going on?

I believe that Seti is another issue, a server went bad. At Mw a portion of the latest wu's are erroring out. It is not all of them but a good number. This last weekend I had 5 out of 18 that were bad. I guess it isn't considered bad enough to cancel and start a new batch.

As for Boinc it is what you make of it. So projects have done some good, others are just basically busy work.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 42796 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42797 - Posted: 12 Oct 2010, 22:00:49 UTC - in response to Message 42796.  

Fair enough Banditwolf. None of us do this for anything but the satisfaction of donating our spare computing time to a project. Its not that I, with with a tiny E4700 core duo processor, am chasing WUs. Its just very annoying to see 100+ hours of effort go to waste. But it will be the same for everyone and sometimes a null result can steer the investigators in another direction.

I am not complaining overmuch its just that I have had bad luck recently with both my BOINC projects. If it continues I may try some different projects which are not so error prone as I wasted a lot of time chasing around to see whether the error was at my end.
ID: 42797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ricky@SETI.USA

Send message
Joined: 4 Feb 08
Posts: 19
Credit: 179,971
RAC: 0
Message 42799 - Posted: 12 Oct 2010, 23:36:58 UTC
Last modified: 12 Oct 2010, 23:39:21 UTC

3 more WU's with same error on another laptop running Windows 7.

From BOINC:
10/10/2010 10:00:40 PM Milkyway@home Starting de_15_2s_5_1354922_1286667234_0
10/10/2010 10:00:40 PM Milkyway@home Starting task de_15_2s_5_1354922_1286667234_0 using milkyway version 4
10/10/2010 11:00:42 PM Milkyway@home Restarting task de_11_2s_5_1353996_1286667228_0 using milkyway version 4
10/11/2010 5:07:42 AM Milkyway@home Computation for task de_11_2s_5_1353996_1286667228_0 finished
10/11/2010 5:07:42 AM Milkyway@home Output file de_11_2s_5_1353996_1286667228_0_0 for task de_11_2s_5_1353996_1286667228_0 absent
10/11/2010 5:07:42 AM Milkyway@home Restarting task de_15_2s_5_1354922_1286667234_0 using milkyway version 4
10/11/2010 5:18:46 AM Milkyway@home Computation for task de_11_2s_5_1353986_1286667228_0 finished
10/11/2010 5:18:46 AM Milkyway@home Output file de_11_2s_5_1353986_1286667228_0_0 for task de_11_2s_5_1353986_1286667228_0 absent
10/11/2010 5:18:46 AM Milkyway@home Starting de_12_2s_5_1355679_1286667238_0
10/11/2010 5:18:46 AM Milkyway@home Starting task de_12_2s_5_1355679_1286667238_0 using milkyway version 4
10/11/2010 6:18:46 AM Milkyway@home Restarting task de_11_2s_5_1353997_1286667228_0 using milkyway version 4
10/11/2010 1:18:53 PM Milkyway@home Starting de_11_2s_5_1353999_1286667228_0
10/11/2010 1:18:53 PM Milkyway@home Starting task de_11_2s_5_1353999_1286667228_0 using milkyway version 4
10/12/2010 3:25:04 PM Milkyway@home Computation for task de_11_2s_5_1353997_1286667228_0 finished
10/12/2010 3:25:04 PM Milkyway@home Output file de_11_2s_5_1353997_1286667228_0_0 for task de_11_2s_5_1353997_1286667228_0 absent
10/12/2010 3:25:04 PM Milkyway@home Restarting task de_15_2s_5_1354922_1286667234_0 using milkyway version 4

One WU is showing 2 hours from completion.... I'm willing to bet it errors out too!!!


stderr out <core_client_version>6.10.56</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
<search_application> milkywayathome separation 0.4 Windows x86 double </search_application>
23:00:42: Checkpoint exists. Attempting to resume from it
23:00:42: Successfully resumed checkpoint
<background_integral> 0.00017088569561192552 </background_integral>
<stream_integrals> 1655.37187629284520000000 34.80227440087782000000 </stream_integrals>
<background_only_likelihood> -8.31211169977413890000 </background_only_likelihood>
<stream_only_likelihood> -3.35327952852759830000 -20.76988236791163100000 </stream_only_likelihood>
<search_likelihood> -3.35320004403327280000 </search_likelihood>
05:07:39 (8176): called boinc_finish

</stderr_txt>
<message>
<file_xfer_error>
<file_name>de_11_2s_5_1353996_1286667228_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>

Validate state Invalid
Claimed credit 328.639601708414
Granted credit 0
application version MilkyWay@Home v0.04
ID: 42799 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ricky@SETI.USA

Send message
Joined: 4 Feb 08
Posts: 19
Credit: 179,971
RAC: 0
Message 42802 - Posted: 13 Oct 2010, 2:44:58 UTC

Yes the one that had 2 hours to go also ended up with comp error. So I aborted the remaining 4 WU's... no point wasting anymore time. I switch back to WCG. Maybe some can find out what the problem is or was!

ID: 42802 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dr. Ronald C. Spencer

Send message
Joined: 16 Apr 08
Posts: 13
Credit: 699,460
RAC: 222
Message 42803 - Posted: 13 Oct 2010, 2:48:47 UTC - in response to Message 42783.  

Errors will occur at times but your contribution to this research is important. Ignore any negative comments about the fact that you mentioned you didn't get credit. It happens to all of us. The credits give us an idea of our contribution to the research. Those that could care less about credit should not insult those that like to see it. I know I would have been concerned if after 14 years or more of studying I didn't receive my degree's in Astronomy and Education and a professor said, "hey your not here for the credit" but thanks for all those hard years of work for nothing, that would have made no sense.
I'm sure Travis will work the bugs out so keep crunching and don't get discouraged because your time is appreciated I'm sure. :)

Dr. Ronald C. Spencer

Emeritus Member: The American Astronomical Society
Member: The Division for Planetary Sciences of the AAS
Member: The American Association of Variable Star Observers
Member: The Astronomical Society of the Pacific
Member: The Planetary Society
ID: 42803 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 42806 - Posted: 13 Oct 2010, 7:14:42 UTC - in response to Message 42803.  

I'm sure Travis will work the bugs out so keep crunching and don't get discouraged because your time is appreciated I'm sure.

I'm sure he will. But until he does, MilkyWay@Home is no longer getting my CPU time. (GPU work units still work and these are still enabled.)

This is not merely a minor annoyance that aborts a small percentage of the CPU-only work units. It's all of them (for me) and it wastes CPU weeks that could be productively used for other research projects. To add insult to injury, the new work units (for reasons unknown) typically run twice as long on Windows as the old version 0.19 did before they now blow up trying to return results to the server.
ID: 42806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gary Sell

Send message
Joined: 28 Sep 10
Posts: 3
Credit: 20,534,099
RAC: 0
Message 42808 - Posted: 13 Oct 2010, 12:22:37 UTC - in response to Message 42806.  

At this point all my WUs are getting computational errors. boy I hope this is fixed soon.
ID: 42808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 42811 - Posted: 13 Oct 2010, 12:47:38 UTC - in response to Message 42797.  

Fair enough Banditwolf. None of us do this for anything but the satisfaction of donating our spare computing time to a project. Its not that I, with with a tiny E4700 core duo processor, am chasing WUs. Its just very annoying to see 100+ hours of effort go to waste. But it will be the same for everyone and sometimes a null result can steer the investigators in another direction.

I am not complaining overmuch its just that I have had bad luck recently with both my BOINC projects. If it continues I may try some different projects which are not so error prone as I wasted a lot of time chasing around to see whether the error was at my end.


I understand. I get annoyed when I loose a few hours or all day when a wu goes bad. I don't have much processing power either with a pentium 4. I stick with MW and Rosetta and both have had some problems lately. Last weekend both ran out of work. In the last month it seems that if one has problems the other does too. I'm sure you'll have better luck in the future.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 42811 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile prairie69

Send message
Joined: 2 Nov 08
Posts: 11
Credit: 169,476
RAC: 0
Message 42814 - Posted: 13 Oct 2010, 13:27:27 UTC - in response to Message 42811.  

Maybe I missed it, but I don't see anything from the folks who run this project explaining what's going on. It's beginning to remind of how the airlines don't give out information even after your flight has been delayed for hours or canceled altogether. I'm taking another flight (Einstein, Spinhenge, etc.) so to speak, but checking in here from time to time to see what's happening.

How about some information? And in language that us non-computer experts can understand?
ID: 42814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 42815 - Posted: 13 Oct 2010, 13:35:54 UTC - in response to Message 42814.  

Maybe I missed it, but I don't see anything from the folks who run this project explaining what's going on.


That's because the folks don't have a clue either, which is one major part of the issue itself... None of them is really running/testing any of their own apps.

As for the latest screw up... If only one of those developers would have had a look at one of those error messages of the latest icarnation of the MW app... it would have been interesting to know why a CPU app suddenly identifies itself as a "stock_win32_gpu"...



Join Support science! Joinc Team BOINC United now!
ID: 42815 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile James Sotherden
Avatar

Send message
Joined: 3 Jan 09
Posts: 139
Credit: 50,066,562
RAC: 0
Message 42816 - Posted: 13 Oct 2010, 13:37:34 UTC

Just had another comp error on the mac. Big bummer. I have 8 on the i7 but seti wont let them run,I can hope they dont error out.
ID: 42816 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Clive

Send message
Joined: 14 Mar 10
Posts: 4
Credit: 360,186
RAC: 0
Message 42822 - Posted: 13 Oct 2010, 17:35:20 UTC

Everything was fine with Milkyway until MW@H O.O4 was rolled out and all of my WUs have been ending up with computational errors with no credits awarded for the CPU time invested.

I am fed up with this situation so I have aborted all of the WUs in my queue. If I am not going to get any credits for the time invested, then I might as well donate my CPU time on a BONIC project where I will be rewarded for the CPU time that I have donated.

Clive Hunt
Edmonton, Alberta
Canada
ID: 42822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 42823 - Posted: 13 Oct 2010, 17:56:37 UTC - in response to Message 42816.  

Just had another comp error on the mac. Big bummer.


That's quite interesting. If we take a look at one of those results :

<message>
<file_xfer_error>
<file_name>de_14_2s_5_1054160_1286929219_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

So that means that the app itself expected to have a result file <file_name>de_14_2s_5_1054160_1286929219_0_0</file_name>...

Now cosidering that the new code logs it's results directly into "stderr" and doesn't create the usual result file associated with it, the boinc client now "thinks" that the app didn't finnish properly (missing result file) and reports the WU back as an tranfer error although everything finnished just fine.









Join Support science! Joinc Team BOINC United now!
ID: 42823 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Computation Errors

©2024 Astroinformatics Group