Welcome to MilkyWay@home

A 'crashed' that actually completed

Message boards : Number crunching : A 'crashed' that actually completed
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile JLDun
Avatar

Send message
Joined: 17 Nov 07
Posts: 77
Credit: 117,183
RAC: 0
Message 1033 - Posted: 13 Dec 2007, 4:54:16 UTC
Last modified: 13 Dec 2007, 5:05:03 UTC

For some unknown reason, while reattaching to another project (SAH-Beta) Boinc appeared to freeze; this, of course caused the current MAH task to crash with the WinERR pop. However, after using the taskmanager to Force-close BOINC, then reboot, I got a nice surprise:

12/12/2007 10:36:05 PM|Milkyway@home|Restarting task ps_32_1197440276_5292_0 using astronomy version 112

This was for Result 1064638, and a looooong Stderr file:

Sent 12 Dec 2007 5:24:53 UTC
Received 13 Dec 2007 4:42:41 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 2100
Report deadline 17 Dec 2007 5:24:53 UTC
CPU time 879.625
stderr out

<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
aZ9 C > 8B 5A 46 5C D3 1A 4C 40 17 61 5A 39 D3 96 43 C0
c:researchboinc_samplesastronomystar_points.c(68) : {75534} normal block at 0x011D5240, 24 bytes long.
Data: </ 4 g@ H `C > 2F CE 82 CA 34 8A 67 40 B8 48 B1 0E 0D 60 43 C0
c:researchboinc_samplesastronomystar_points.c(68) : {75533} normal block at 0x011D51F8, 24 bytes long.
Data: < #J@ og ? > 9F F6 84 BA 82 23 4A 40 03 6F 67 B0 C0 C0 3F C0

... About 6 out of 7 pages for the memory dump according to Print Preview ...

</stderr_txt>
]]>

Validate state Initial
Claimed credit 1.98410154736025
Granted credit 2

application version 1.12



[Edit]Shortened cut-&-past[/Edit]
ID: 1033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1049 - Posted: 14 Dec 2007, 1:16:20 UTC - in response to Message 1033.  

For some unknown reason, while reattaching to another project (SAH-Beta) Boinc appeared to freeze; this, of course caused the current MAH task to crash with the WinERR pop. However, after using the taskmanager to Force-close BOINC, then reboot, I got a nice surprise:

12/12/2007 10:36:05 PM|Milkyway@home|Restarting task ps_32_1197440276_5292_0 using astronomy version 112

This was for Result 1064638, and a looooong Stderr file:

Sent 12 Dec 2007 5:24:53 UTC
Received 13 Dec 2007 4:42:41 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 2100
Report deadline 17 Dec 2007 5:24:53 UTC
CPU time 879.625
stderr out

5.10.30

aZ9 C > 8B 5A 46 5C D3 1A 4C 40 17 61 5A 39 D3 96 43 C0
c:researchboinc_samplesastronomystar_points.c(68) : {75534} normal block at 0x011D5240, 24 bytes long.
Data: 2F CE 82 CA 34 8A 67 40 B8 48 B1 0E 0D 60 43 C0
c:researchboinc_samplesastronomystar_points.c(68) : {75533} normal block at 0x011D51F8, 24 bytes long.
Data: < #J@ og ? > 9F F6 84 BA 82 23 4A 40 03 6F 67 B0 C0 C0 3F C0

... About 6 out of 7 pages for the memory dump according to Print Preview ...


]]>

Validate state Initial
Claimed credit 1.98410154736025
Granted credit 2

application version 1.12



[Edit]Shortened cut-&-past[/Edit]



now thats an interesting one :) luckily it didnt break anything on our end.
ID: 1049 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JLDun
Avatar

Send message
Joined: 17 Nov 07
Posts: 77
Credit: 117,183
RAC: 0
Message 1054 - Posted: 14 Dec 2007, 4:59:26 UTC - in response to Message 1049.  

<core_client_version>5.10.30</core_client_version>


now thats an interesting one :)

And Today I paused one (forget which one, or I'd linkify it) accidentally by pausing/snoozing BOINC, and it completed after unpausing. So that breaks my personal streak of breaking WU's by pausing.
ID: 1054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1069 - Posted: 14 Dec 2007, 18:49:40 UTC - in response to Message 1054.  

5.10.30


now thats an interesting one :)

And Today I paused one (forget which one, or I'd linkify it) accidentally by pausing/snoozing BOINC, and it completed after unpausing. So that breaks my personal streak of breaking WU's by pausing.


thats good news :) i wonder if people are still having this problem with the new binaries.
ID: 1069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jord
Avatar

Send message
Joined: 30 Aug 07
Posts: 125
Credit: 207,206
RAC: 0
Message 1075 - Posted: 14 Dec 2007, 23:18:38 UTC - in response to Message 1069.  

i wonder if people are still having this problem with the new binaries.

Yup, still got problems.

Just stopped the BOINC service while 1.13 was running.
Program Error. astronomy_1.13_.exe has generated errors and will be closed by Windows. You will need to restart the program. An error log is being created.

Let me see if I can find it then. ;-)
stderr.txt says:

Data: <[ S ,J@fASSERT: output.c(1688) : Assertion failed: ("'n' format specifier disabled", 0)

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x0047E246

Engaging BOINC Windows Runtime Debugger...

And that's all.
Jord.

The BOINC FAQ Service.
ID: 1075 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1077 - Posted: 14 Dec 2007, 23:33:50 UTC - in response to Message 1075.  

i wonder if people are still having this problem with the new binaries.

Yup, still got problems.

Just stopped the BOINC service while 1.13 was running.
Program Error. astronomy_1.13_.exe has generated errors and will be closed by Windows. You will need to restart the program. An error log is being created.

Let me see if I can find it then. ;-)
stderr.txt says:

Data: <[ S ,J@fASSERT: output.c(1688) : Assertion failed: ("'n' format specifier disabled", 0)

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x0047E246

Engaging BOINC Windows Runtime Debugger...

And that's all.



well thats not a very helpful error message. i'll try and track that down.
ID: 1077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jord
Avatar

Send message
Joined: 30 Aug 07
Posts: 125
Credit: 207,206
RAC: 0
Message 1080 - Posted: 15 Dec 2007, 0:27:29 UTC - in response to Message 1077.  

There were the usual Memory Leak messages before it. I just thought it was a bit big to post the full 1,049KB here. Have zipped it and emailed it to you. :-)
Jord.

The BOINC FAQ Service.
ID: 1080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1097 - Posted: 15 Dec 2007, 20:04:24 UTC - in response to Message 1080.  

There were the usual Memory Leak messages before it. I just thought it was a bit big to post the full 1,049KB here. Have zipped it and emailed it to you. :-)


thanks :)
ID: 1097 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JLDun
Avatar

Send message
Joined: 17 Nov 07
Posts: 77
Credit: 117,183
RAC: 0
Message 1108 - Posted: 17 Dec 2007, 2:32:48 UTC

Had another case of having to force-close Boinc (Version 5.10.30) in the middle of a MAH WU; good news is: It restarted the task, didn't have an error popup for MAH (but one for BOINC; figures ;-p), and since I was using the <checkpoint_debug> tag, I can rest easier KNOWING that MAH checkpoints. (Primegrid doesn't, at least on the tasks I run.)
ID: 1108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
popandbob

Send message
Joined: 13 Nov 07
Posts: 1
Credit: 4,176,140
RAC: 29
Message 1147 - Posted: 21 Dec 2007, 6:23:44 UTC - in response to Message 1108.  

Had another case of having to force-close Boinc (Version 5.10.30) in the middle of a MAH WU; good news is: It restarted the task, didn't have an error popup for MAH (but one for BOINC; figures ;-p), and since I was using the <checkpoint_debug> tag, I can rest easier KNOWING that MAH checkpoints. (Primegrid doesn't, at least on the tasks I run.)


All PrimeGrid apps do checkpoint. I believe most don't report it to boinc though as it runs in a wrapper app.

~BoB

(sorry for off topic)
ID: 1147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JLDun
Avatar

Send message
Joined: 17 Nov 07
Posts: 77
Credit: 117,183
RAC: 0
Message 1148 - Posted: 21 Dec 2007, 6:31:07 UTC - in response to Message 1147.  

All PrimeGrid apps do checkpoint. I believe most don't report it to boinc though as it runs in a wrapper app.

~BoB

[b](sorry for off topic)[b]

Considering I'm the thread author... I'll forgive you. ;-)
"Wrapper?"
ID: 1148 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jord
Avatar

Send message
Joined: 30 Aug 07
Posts: 125
Credit: 207,206
RAC: 0
Message 1150 - Posted: 21 Dec 2007, 10:53:52 UTC - in response to Message 1148.  

"Wrapper?"

[trac]wiki:WrapperApp[/trac]

It's a program that handles all communications between the application and BOINC, when the application cannot do this itself by default as it is not specifically written to run under BOINC.
Jord.

The BOINC FAQ Service.
ID: 1150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JLDun
Avatar

Send message
Joined: 17 Nov 07
Posts: 77
Credit: 117,183
RAC: 0
Message 1167 - Posted: 23 Dec 2007, 5:11:27 UTC - in response to Message 1033.  
Last modified: 23 Dec 2007, 5:12:56 UTC

A non-crasher, but.. another long Stderr file (for Result 1454872.)

<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>
chboinc_samplesastronomystar_points.c(68) : {214334} normal block at 0x01B6CE50, 24 bytes long.
Data: < g e@ ZcH > 8C F0 9B 67 D1 F9 65 40 13 9F B7 88 5A 63 48 C0
c:researchboinc_samplesastronomystar_points.c(68) : {214333} normal block at 0x01B6CE08, 24 bytes long.
Data: < q L'f@ J G > CD E6 71 18 4C 27 66 40 8D 97 F2 4A A7 C5 47 C0
About 9 pages of memory dump before...

c:researchboinc_samplesastronomystar_points.c(68) : {213990} normal block at 0x01B66D88, 24 bytes long.
Data: < #7|tb@ "jN > E3 01 23 37 7C 74 62 40 89 BF FD EB 22 6

**********
**********

Memory Leaks Detected!!!
Memory Statistics:
0 bytes in 0 Free Blocks.
94 bytes in 3 Normal Blocks.
4652 bytes in 3 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 6054569 bytes.
Total allocations: -1679216043 bytes.

Dumping objects ->
c:researchboincapiboinc_api.c(155) : {55} normal block at 0x009C2A58, 4 bytes long.
Data: < > 00 00 AD 00
c:researchboinclibparse.c(142) : {54} normal block at 0x009C29D0, 86 bytes long.
Data: < <color_scheme>T> 0A 3C 63 6F 6C 6F 72 5F 73 63 68 65 6D 65 3E 54
{47} normal block at 0x009C2958, 4 bytes long.
Data: <P@ > 50 40 9C 00
Object dump complete.


</stderr_txt>
]]>

Validate state Initial
Claimed credit 4.37430041789452
Granted credit 6.5
application version 1.13

ID: 1167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JLDun
Avatar

Send message
Joined: 17 Nov 07
Posts: 77
Credit: 117,183
RAC: 0
Message 1747 - Posted: 22 Feb 2008, 4:23:00 UTC

Result 4046990.
Closed BOINC in preparation for a restart... M@H crashed, of course. Now it's working again.
ID: 1747 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : A 'crashed' that actually completed

©2024 Astroinformatics Group