Message boards :
Number crunching :
Torrents of Invalid WU's
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 3 May 10 Posts: 74 Credit: 1,532,760 RAC: 0 |
Check out Crunch3r's post 42823 at Number Crunching>Computation errors thread. Unless someone comes up with a better answer then I am convinced that this is the root of the problem. All we need is someone to write the corrective code. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Yes it is likely the problem. For me all 2s are bad and I was trying to add a little more info if it would help. It may depend on the system or batch they came from then. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 22 Apr 09 Posts: 38 Credit: 27,377,932 RAC: 0 |
|
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Had a batch of 6 that were bad. All 2s again. de_separation_82_2s_30_1_1858744_1287955760 de_separation_82_2s_30_1_1858743_1287955760 de_separation_82_2s_30_1_1858742_1287955760 de_separation_82_2s_30_1_1858737_1287955760 de_separation_82_2s_30_1_1858736_1287955760 de_separation_82_2s_30_1_1858722_1287955760 Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Lost another hour due to no progress. 7 more bad wus. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
Looked into the last finished one of your tasks and found: Running Milkyway@home version 0.20 (Win32, SSE2) by Gipsel The latest cpu version from Gispel is v0.21 from end of march where he fixed a progress bar issue that came around at that time. Should be worth a try to see if it fixes atleast a part of your problem. |
Send message Joined: 21 Jun 09 Posts: 1 Credit: 7,224 RAC: 0 |
I keep having many, many errors - seems like every work unit. So I have to give up on Milkyway. FWIW I am having 100% errors also on ABC@home, but other projects work fine. My GPU isn't good enough so these are CPU tasks. I might not have given up if there were at least an acknowledgement from the people running this project that there are problems but there seems to be no peep, and bad work units keep getting sent. Alan Taylor. |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
I keep having many, many errors - seems like every work unit. So I have to give up on Milkyway. FWIW I am having 100% errors also on ABC@home, but other projects work fine. My GPU isn't good enough so these are CPU tasks. I might not have given up if there were at least an acknowledgement from the people running this project that there are problems but there seems to be no peep, and bad work units keep getting sent.From the errors in your workunits, they are out of resource errors. Your settings might be too strict. Check your BOINC settings for the max. allowed disk space and memory. |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
Probably disc space - extract from stderr_txt file (my bolding): 18:22:11: Checkpoint exists. Attempting to resume from it 18:22:11: Successfully resumed checkpoint Opening checkpoint temp: Not enough space Write checkpoint failed 18:27:23 (3672): called boinc_finish The other one in the task list shows similar - cant write the checkpoint file. Regards Zy |
Send message Joined: 16 Dec 07 Posts: 37 Credit: 26,025,089 RAC: 6,495 |
I think I've been handed a batch of Invalids 'Completed Marked as Invalid' CPU Onlys MW 0.45 sse2's. I can't list them as they've already be refreshed out of reach but I've returned 12-16 and probably only had 3 or 4 awarded full marks over the last two days kind of annoying to lose full workunits Thing is They Ran for the 6-7 hours which was the average under 0.19 and looked error free when I sent them. will take careful note of the 15 I've got on the client (*cross fingers* I won't have to announce them 'Marked as Invalid'). BTW any Idea how long an n-body is supposed to run I've had 4 run 30-60 minutes and then one run for 20 hours?? |
Send message Joined: 24 Jul 09 Posts: 32 Credit: 18,139,650 RAC: 10 |
BTW any Idea how long an n-body is supposed to run I've had 4 run 30-60 minutes and then one run for 20 hours?? I've seen N-body WU's run anywhere from 20 minutes to 40 hours (on CPU). There seems to me no rhyme or reason to how long they will run. It is really throwing the time estimates in my BOINC manager for a loop. It is forced to assume that they it is going to take forever and a day to complete each of them so it is constantly shifting into High Priority mode to get them done in time. When in reality it has plenty of time. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
The 2s wus still don't like my computer. I have let them run past the time it takes for 3s and they dont finish or have a %. It seems some error still exists. All 3s wus run just fine when I get them. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0 |
I think I've been handed a batch of Invalids 'Completed Marked as Invalid'...CPU Onlys MW 0.45 sse2's. I'm getting the same thing after some have run for 10+ hours. It is a puzzlement... |
Send message Joined: 28 Mar 09 Posts: 68 Credit: 1,003,982,681 RAC: 0 |
I think I've been handed a batch of Invalids 'Completed Marked as Invalid'...CPU Onlys MW 0.45 sse2's. Same here. Three in succession now (SSE2). |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Are you guys using optimized applications? They probably won't work with most of the new WUs we're sending out. If not, there's a good chance it was the corrupted disk (which is now fixed). |
Send message Joined: 31 Oct 10 Posts: 137 Credit: 3,755,067 RAC: 0 |
I checked it out and my wingmen has lots of it also. |
Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0 |
Are you guys using optimized applications? These are 0.45 SSE2 plain vanilla applications failing here. |
Send message Joined: 21 Mar 09 Posts: 25 Credit: 11,410,869 RAC: 0 |
Hello All..It appears that there are some problems with some MY@H tasks,267728727,264256459,263423802. Those tasks corraspond to my system crash. So I down loaded MS debug And came up with this report. History, For the past two weeks my system would flash BSOD too quick to read and just shut down. Took out RAM, GFX card, reconfigured hard ware,bios(all this before I used MS debug) Microsoft (R) Windows Debugger Version 6.12.0002.633 X86 Copyright (c) Microsoft Corporation. All rights reserved. Loading Dump File [C:\WINDOWS\Minidump\Mini120910-01.dmp] Mini Kernel Dump File: Only registers and stack trace are available WARNING: Whitespace at end of path element Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols Executable search path is: Windows XP Kernel Version 2600 (Service Pack 3) MP (4 procs) Free x86 compatible Product: WinNt, suite: TerminalServer SingleUserTS Personal Built by: 2600.xpsp.080413-2111 Machine Name: Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055d720 Debug session time: Thu Dec 9 05:20:22.796 2010 (UTC - 5:00) System Uptime: 0 days 0:02:35.393 Loading Kernel Symbols Loading User Symbols Loading unloaded module list Bugcheck Analysis Use !analyze -v to get detailed debugging information. BugCheck 9C, {0, b833c050, b66b4000, 2c000135} Unable to load image RtkHDAud.sys, Win32 error 0n2 *** WARNING: Unable to verify timestamp for RtkHDAud.sys *** ERROR: Module load completed but symbols could not be loaded for RtkHDAud.sys Probably caused by : RtkHDAud.sys ( RtkHDAud+49f068 ) Followup: MachineOwner --------- 1: kd> !analyze -v Bugcheck Analysis MACHINE_CHECK_EXCEPTION (9c) A fatal Machine Check Exception has occurred. KeBugCheckEx parameters; x86 Processors If the processor has ONLY MCE feature available (For example Intel Pentium), the parameters are: 1 - Low 32 bits of P5_MC_TYPE MSR 2 - Address of MCA_EXCEPTION structure 3 - High 32 bits of P5_MC_ADDR MSR 4 - Low 32 bits of P5_MC_ADDR MSR If the processor also has MCA feature available (For example Intel Pentium Pro), the parameters are: 1 - Bank number 2 - Address of MCA_EXCEPTION structure 3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error 4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error IA64 Processors 1 - Bugcheck Type 1 - MCA_ASSERT 2 - MCA_GET_STATEINFO SAL returned an error for SAL_GET_STATEINFO while processing MCA. 3 - MCA_CLEAR_STATEINFO SAL returned an error for SAL_CLEAR_STATEINFO while processing MCA. 4 - MCA_FATAL FW reported a fatal MCA. 5 - MCA_NONFATAL SAL reported a recoverable MCA and we don't support currently support recovery or SAL generated an MCA and then couldn't produce an error record. 0xB - INIT_ASSERT 0xC - INIT_GET_STATEINFO SAL returned an error for SAL_GET_STATEINFO while processing INIT event. 0xD - INIT_CLEAR_STATEINFO SAL returned an error for SAL_CLEAR_STATEINFO while processing INIT event. 0xE - INIT_FATAL Not used. 2 - Address of log 3 - Size of log 4 - Error code in the case of x_GET_STATEINFO or x_CLEAR_STATEINFO AMD64 Processors 1 - Bank number 2 - Address of MCA_EXCEPTION structure 3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error 4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error Arguments: Arg1: 00000000 Arg2: b833c050 Arg3: b66b4000 Arg4: 2c000135 Debugging Details: ------------------ NOTE: This is a hardware error. This error was reported by the CPU via Interrupt 18. This analysis will provide more information about the specific error. Please contact the manufacturer for additional information about this error and troubleshooting assistance. This error is documented in the following publication: - Bios and Kernel Developers Guid for AMD Athlon(r) 64 and AMD Opteron(r) Processors Bit Mask: MA Model Specific MCA O ID Other Information Error Code Error Code VV SDP ___________|____________ _______|_______ _______|______ AEUECRC| | | | LRCNVVC| | | | ^^^^^^^| | | | 6 5 4 3 2 1 3210987654321098765432109876543210987654321098765432109876543210 ---------------------------------------------------------------- 1011011001101011010000000000000000101100000000000000000100110101 VAL - MCi_STATUS register is valid Indicates that the information contained within the IA32_MCi_STATUS register is valid. When this flag is set, the processor follows the rules given for the OVER flag in the IA32_MCi_STATUS register when overwriting previously valid entries. The processor sets the VAL flag and software is responsible for clearing it. UC - Error Uncorrected Indicates that the processor did not or was not able to correct the error condition. When clear, this flag indicates that the processor was able to correct the error condition. EN - Error Enabled Indicates that the error was enabled by the associated EEj bit of the IA32_MCi_CTL register. ADDRV - IA32_MCi_ADDR register valid Indicates that the IA32_MCi_ADDR register contains the address where the error occurred. PCC - Processor Context Corrupt Indicates that the state of the processor might have been corrupted by the error condition detected and that reliable restarting of the processor may not be possible MEMHIRERR - Memory Hierarchy Error {TT}CACHE{LL}_{RRRR}_ERR These errors match the format 0000 0001 RRRR TTLL Concatenated Error Code: -------------------------- _VAL_UC_EN_ADDRV_PCC_MEMHIRERR_35 This error code can be reported back to the manufacturer. They may be able to provide additional information based upon this error. All questions regarding STOP 0x9C should be directed to the hardware manufacturer. BUGCHECK_STR: 0x9C_AuthenticAMD CUSTOMER_CRASH_COUNT: 1 DEFAULT_BUCKET_ID: DRIVER_FAULT PROCESS_NAME: milkyway_nbody_LAST_CONTROL_TRANSFER: from 806e9bfb to 804f9f33 SYMBOL_ON_RAW_STACK: 1 STACK_ADDR_RAW_STACK_SYMBOL: ffffffffb833c1ec STACK_COMMAND: dds B833C1EC-0x20 ; kb STACK_TEXT: b833c1cc 7f40f3fd b833c1d0 0400ffff b833c1d4 0000f200 b833c1d8 00000000 b833c1dc 00000000 b833c1e0 b0800068 b833c1e4 b8008933 b833c1e8 b0f00068 RtkHDAud+0x49f068 b833c1ec b8008933 b833c1f0 2f40ffff b833c1f4 00009302 b833c1f8 80003fff b833c1fc 0000920b b833c200 700003ff b833c204 ff0092ff b833c208 0000ffff b833c20c 80009a40 b833c210 0000ffff b833c214 80009240 b833c218 00000000 b833c21c 00009200 b833c220 00000000 b833c224 00000000 b833c228 00000000 b833c22c 00000000 b833c230 12d00068 b833c234 89008be3 b833c238 00000000 b833c23c 00000000 b833c240 00000000 b833c244 00000000 b833c248 00000000 FOLLOWUP_IP: RtkHDAud+49f068 b0f00068 ?? ??? SYMBOL_NAME: RtkHDAud+49f068 FOLLOWUP_NAME: MachineOwner MODULE_NAME: RtkHDAud IMAGE_NAME: RtkHDAud.sys DEBUG_FLR_IMAGE_TIMESTAMP: 4ccff31d FAILURE_BUCKET_ID: 0x9C_AuthenticAMD_RtkHDAud+49f068 BUCKET_ID: 0x9C_AuthenticAMD_RtkHDAud+49f068 Followup: MachineOwner So if I am wrong and or need to give this to some one else please e-mail me. This is all happening on my AMD quad, also on my AMD dual laptop will have 50 to 60 error msg on the desktop, "Milkyway stopped trying to restart" but no BSOD or Minidumps, so could it be AMD related? My single core AMD has no problems. No Opp app no OC, all stock. Every driver up to date Bios up to date, latest BIONC app. |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
Hello All..It appears that there are some problems with some MY@H tasks,267728727,264256459,263423802. Those tasks corraspond to my system crash. According to that error it's a hardware problem. |
Send message Joined: 21 Mar 09 Posts: 25 Credit: 11,410,869 RAC: 0 |
What hardware is failing? When this started I pulled out my GTS250, pulled out 1 stick of mem,of 2, even put 1 in "B" channel, took out 1 of 2 optical drives, took out my pci ata card. I plugged my tower into a non used outlet. When none of that worked I COMPLETLY disasymbeld the entire tower, board and all. Any other debuggers i can use? Details Product: Windows Operating System ID: 7023 Source: Service Control Manager Version: 5.2 Symbolic Name: EVENT_SERVICE_EXIT_FAILED Message: The %1 service terminated with the following error: %2 I have alot of these in my event veiwer, gona dig into that. Also some of these,Product: Windows Operating System ID: 26 Source: Application Popup Version: 5.2 Symbolic Name: STATUS_LOG_HARD_ERROR Message: Application popup: %1 : %2 Only thing I have not done yet is install an Ethernet card to eliminat my onboard Ethernet. My Puter has only Minidump once since complete teardown which is when I used MS debugger. If it happens again I will have to stop running MY@H. This problem only started when I started MY@H, 2/3 weeks ago..Thanks for the input and will dig into this, if it is my machine. O also this is a fresh OS install when everything else seamed not to work. And I have also preformed a deep and intensive Virus,Trojen,spyware scan. Any help or direction would be great, or if anyone needs more info from me to help. |
©2025 Astroinformatics Group