Welcome to MilkyWay@home

Torrents of Invalid WU's

Message boards : Number crunching : Torrents of Invalid WU's
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42873 - Posted: 15 Oct 2010, 19:57:24 UTC - in response to Message 42872.  

Check out Crunch3r's post 42823 at Number Crunching>Computation errors thread. Unless someone comes up with a better answer then I am convinced that this is the root of the problem. All we need is someone to write the corrective code.
ID: 42873 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 42874 - Posted: 15 Oct 2010, 20:27:37 UTC

Yes it is likely the problem. For me all 2s are bad and I was trying to add a little more info if it would help. It may depend on the system or batch they came from then.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 42874 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fred J. Verster

Send message
Joined: 22 Apr 09
Posts: 38
Credit: 27,377,932
RAC: 0
Message 42929 - Posted: 18 Oct 2010, 10:12:58 UTC - in response to Message 42874.  
Last modified: 18 Oct 2010, 10:14:04 UTC

Exception proves the rule?
Again
Also.
So, it has to be something else.

Knight Who says Ni
ID: 42929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 43143 - Posted: 25 Oct 2010, 2:19:55 UTC

Had a batch of 6 that were bad. All 2s again.

de_separation_82_2s_30_1_1858744_1287955760
de_separation_82_2s_30_1_1858743_1287955760
de_separation_82_2s_30_1_1858742_1287955760
de_separation_82_2s_30_1_1858737_1287955760
de_separation_82_2s_30_1_1858736_1287955760
de_separation_82_2s_30_1_1858722_1287955760
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 43143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 43286 - Posted: 30 Oct 2010, 0:22:47 UTC

Lost another hour due to no progress. 7 more bad wus.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 43286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 43290 - Posted: 30 Oct 2010, 1:55:23 UTC

Looked into the last finished one of your tasks and found:

Running Milkyway@home version 0.20 (Win32, SSE2) by Gipsel


The latest cpu version from Gispel is v0.21 from end of march where he fixed a progress bar issue that came around at that time.
Should be worth a try to see if it fixes atleast a part of your problem.
ID: 43290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alan

Send message
Joined: 21 Jun 09
Posts: 1
Credit: 7,224
RAC: 0
Message 43601 - Posted: 8 Nov 2010, 23:49:14 UTC

I keep having many, many errors - seems like every work unit. So I have to give up on Milkyway. FWIW I am having 100% errors also on ABC@home, but other projects work fine. My GPU isn't good enough so these are CPU tasks. I might not have given up if there were at least an acknowledgement from the people running this project that there are problems but there seems to be no peep, and bad work units keep getting sent.

Alan Taylor.
ID: 43601 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 43602 - Posted: 9 Nov 2010, 0:10:49 UTC - in response to Message 43601.  

I keep having many, many errors - seems like every work unit. So I have to give up on Milkyway. FWIW I am having 100% errors also on ABC@home, but other projects work fine. My GPU isn't good enough so these are CPU tasks. I might not have given up if there were at least an acknowledgement from the people running this project that there are problems but there seems to be no peep, and bad work units keep getting sent.

Alan Taylor.
From the errors in your workunits, they are out of resource errors. Your settings might be too strict. Check your BOINC settings for the max. allowed disk space and memory.
ID: 43602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 43608 - Posted: 9 Nov 2010, 4:04:57 UTC - in response to Message 43601.  
Last modified: 9 Nov 2010, 4:07:16 UTC

Probably disc space - extract from stderr_txt file (my bolding):

18:22:11: Checkpoint exists. Attempting to resume from it
18:22:11: Successfully resumed checkpoint
Opening checkpoint temp: Not enough space
Write checkpoint failed
18:27:23 (3672): called boinc_finish

The other one in the task list shows similar - cant write the checkpoint file.

Regards
Zy
ID: 43608 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cameron

Send message
Joined: 16 Dec 07
Posts: 37
Credit: 24,358,733
RAC: 5,896
Message 43799 - Posted: 13 Nov 2010, 11:36:02 UTC

I think I've been handed a batch of Invalids 'Completed Marked as Invalid'

CPU Onlys MW 0.45 sse2's.

I can't list them as they've already be refreshed out of reach but I've returned 12-16 and probably only had 3 or 4 awarded full marks over the last two days
kind of annoying to lose full workunits

Thing is They Ran for the 6-7 hours which was the average under 0.19 and looked error free when I sent them.


will take careful note of the 15 I've got on the client (*cross fingers* I won't have to announce them 'Marked as Invalid').


BTW any Idea how long an n-body is supposed to run I've had 4 run 30-60 minutes and then one run for 20 hours??
ID: 43799 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile White Mountain Wes
Avatar

Send message
Joined: 24 Jul 09
Posts: 32
Credit: 18,088,324
RAC: 377
Message 43968 - Posted: 19 Nov 2010, 3:07:41 UTC - in response to Message 43799.  

BTW any Idea how long an n-body is supposed to run I've had 4 run 30-60 minutes and then one run for 20 hours??

I've seen N-body WU's run anywhere from 20 minutes to 40 hours (on CPU). There seems to me no rhyme or reason to how long they will run. It is really throwing the time estimates in my BOINC manager for a loop. It is forced to assume that they it is going to take forever and a day to complete each of them so it is constantly shifting into High Priority mode to get them done in time. When in reality it has plenty of time.
ID: 43968 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 43969 - Posted: 19 Nov 2010, 3:12:11 UTC

The 2s wus still don't like my computer. I have let them run past the time it takes for 3s and they dont finish or have a %. It seems some error still exists. All 3s wus run just fine when I get them.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 43969 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 43980 - Posted: 19 Nov 2010, 23:01:47 UTC - in response to Message 43799.  

I think I've been handed a batch of Invalids 'Completed Marked as Invalid'...CPU Onlys MW 0.45 sse2's.

I'm getting the same thing after some have run for 10+ hours. It is a puzzlement...
ID: 43980 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Toppie*

Send message
Joined: 28 Mar 09
Posts: 68
Credit: 1,003,982,681
RAC: 0
Message 44009 - Posted: 21 Nov 2010, 8:17:16 UTC - in response to Message 43980.  

I think I've been handed a batch of Invalids 'Completed Marked as Invalid'...CPU Onlys MW 0.45 sse2's.

I'm getting the same thing after some have run for 10+ hours. It is a puzzlement...


Same here. Three in succession now (SSE2).
ID: 44009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 44096 - Posted: 23 Nov 2010, 0:38:18 UTC - in response to Message 44009.  
Last modified: 23 Nov 2010, 0:38:38 UTC

Are you guys using optimized applications? They probably won't work with most of the new WUs we're sending out.

If not, there's a good chance it was the corrupted disk (which is now fixed).
ID: 44096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Mike
Avatar

Send message
Joined: 31 Oct 10
Posts: 137
Credit: 3,755,067
RAC: 0
Message 44110 - Posted: 23 Nov 2010, 8:32:50 UTC


I checked it out and my wingmen has lots of it also.

ID: 44110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 44113 - Posted: 23 Nov 2010, 11:55:57 UTC - in response to Message 44096.  
Last modified: 23 Nov 2010, 11:56:18 UTC

Are you guys using optimized applications?

These are 0.45 SSE2 plain vanilla applications failing here.
ID: 44113 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cannibal Corpse
Avatar

Send message
Joined: 21 Mar 09
Posts: 25
Credit: 11,410,869
RAC: 0
Message 44862 - Posted: 10 Dec 2010, 0:10:34 UTC
Last modified: 10 Dec 2010, 0:17:42 UTC

Hello All..It appears that there are some problems with some MY@H tasks,267728727,264256459,263423802. Those tasks corraspond to my system crash.
So I down loaded MS debug And came up with this report. History, For the past two weeks my system would flash BSOD too quick to read and just shut down. Took out RAM, GFX card, reconfigured hard ware,bios(all this before I used MS debug)

Microsoft (R) Windows Debugger Version 6.12.0002.633 X86
Copyright (c) Microsoft Corporation. All rights reserved.
Loading Dump File [C:\WINDOWS\Minidump\Mini120910-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available
WARNING: Whitespace at end of path element
Symbol search path is: SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows XP Kernel Version 2600 (Service Pack 3) MP (4 procs) Free x86 compatible
Product: WinNt, suite: TerminalServer SingleUserTS Personal
Built by: 2600.xpsp.080413-2111
Machine Name:
Kernel base = 0x804d7000 PsLoadedModuleList = 0x8055d720
Debug session time: Thu Dec 9 05:20:22.796 2010 (UTC - 5:00)
System Uptime: 0 days 0:02:35.393
Loading Kernel Symbols
Loading User Symbols
Loading unloaded module list Bugcheck Analysis
Use !analyze -v to get detailed debugging information.
BugCheck 9C, {0, b833c050, b66b4000, 2c000135}
Unable to load image RtkHDAud.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for RtkHDAud.sys
*** ERROR: Module load completed but symbols could not be loaded for RtkHDAud.sys
Probably caused by : RtkHDAud.sys ( RtkHDAud+49f068 )
Followup: MachineOwner
---------
1: kd> !analyze -v
Bugcheck Analysis
MACHINE_CHECK_EXCEPTION (9c)
A fatal Machine Check Exception has occurred.
KeBugCheckEx parameters;
x86 Processors
If the processor has ONLY MCE feature available (For example Intel
Pentium), the parameters are:
1 - Low 32 bits of P5_MC_TYPE MSR
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of P5_MC_ADDR MSR
4 - Low 32 bits of P5_MC_ADDR MSR
If the processor also has MCA feature available (For example Intel
Pentium Pro), the parameters are:
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
IA64 Processors
1 - Bugcheck Type
1 - MCA_ASSERT
2 - MCA_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing MCA.
3 - MCA_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing MCA.
4 - MCA_FATAL
FW reported a fatal MCA.
5 - MCA_NONFATAL
SAL reported a recoverable MCA and we don't support currently
support recovery or SAL generated an MCA and then couldn't
produce an error record.
0xB - INIT_ASSERT
0xC - INIT_GET_STATEINFO
SAL returned an error for SAL_GET_STATEINFO while processing INIT event.
0xD - INIT_CLEAR_STATEINFO
SAL returned an error for SAL_CLEAR_STATEINFO while processing INIT event.
0xE - INIT_FATAL
Not used.
2 - Address of log
3 - Size of log
4 - Error code in the case of x_GET_STATEINFO or x_CLEAR_STATEINFO
AMD64 Processors
1 - Bank number
2 - Address of MCA_EXCEPTION structure
3 - High 32 bits of MCi_STATUS MSR for the MCA bank that had the error
4 - Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error
Arguments:
Arg1: 00000000
Arg2: b833c050
Arg3: b66b4000
Arg4: 2c000135
Debugging Details:
------------------
NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.
This error is documented in the following publication:
- Bios and Kernel Developers Guid for AMD Athlon(r) 64 and AMD Opteron(r) Processors
Bit Mask:

MA Model Specific MCA
O ID Other Information Error Code Error Code
VV SDP ___________|____________ _______|_______ _______|______
AEUECRC| | | |
LRCNVVC| | | |
^^^^^^^| | | |
6 5 4 3 2 1
3210987654321098765432109876543210987654321098765432109876543210
----------------------------------------------------------------
1011011001101011010000000000000000101100000000000000000100110101
VAL - MCi_STATUS register is valid
Indicates that the information contained within the IA32_MCi_STATUS
register is valid. When this flag is set, the processor follows the
rules given for the OVER flag in the IA32_MCi_STATUS register when
overwriting previously valid entries. The processor sets the VAL
flag and software is responsible for clearing it.
UC - Error Uncorrected
Indicates that the processor did not or was not able to correct the
error condition. When clear, this flag indicates that the processor
was able to correct the error condition.
EN - Error Enabled
Indicates that the error was enabled by the associated EEj bit of the
IA32_MCi_CTL register.
ADDRV - IA32_MCi_ADDR register valid
Indicates that the IA32_MCi_ADDR register contains the address where
the error occurred.
PCC - Processor Context Corrupt
Indicates that the state of the processor might have been corrupted
by the error condition detected and that reliable restarting of the
processor may not be possible
MEMHIRERR - Memory Hierarchy Error {TT}CACHE{LL}_{RRRR}_ERR
These errors match the format 0000 0001 RRRR TTLL
Concatenated Error Code:
--------------------------
_VAL_UC_EN_ADDRV_PCC_MEMHIRERR_35
This error code can be reported back to the manufacturer.
They may be able to provide additional information based upon
this error. All questions regarding STOP 0x9C should be
directed to the hardware manufacturer.
BUGCHECK_STR: 0x9C_AuthenticAMD
CUSTOMER_CRASH_COUNT: 1
DEFAULT_BUCKET_ID: DRIVER_FAULT
PROCESS_NAME: milkyway_nbody_LAST_CONTROL_TRANSFER: from 806e9bfb to 804f9f33
SYMBOL_ON_RAW_STACK: 1
STACK_ADDR_RAW_STACK_SYMBOL: ffffffffb833c1ec
STACK_COMMAND: dds B833C1EC-0x20 ; kb
STACK_TEXT:
b833c1cc 7f40f3fd
b833c1d0 0400ffff
b833c1d4 0000f200
b833c1d8 00000000
b833c1dc 00000000
b833c1e0 b0800068
b833c1e4 b8008933
b833c1e8 b0f00068 RtkHDAud+0x49f068
b833c1ec b8008933
b833c1f0 2f40ffff
b833c1f4 00009302
b833c1f8 80003fff
b833c1fc 0000920b
b833c200 700003ff
b833c204 ff0092ff
b833c208 0000ffff
b833c20c 80009a40
b833c210 0000ffff
b833c214 80009240
b833c218 00000000
b833c21c 00009200
b833c220 00000000
b833c224 00000000
b833c228 00000000
b833c22c 00000000
b833c230 12d00068
b833c234 89008be3
b833c238 00000000
b833c23c 00000000
b833c240 00000000
b833c244 00000000
b833c248 00000000
FOLLOWUP_IP:
RtkHDAud+49f068
b0f00068 ?? ???
SYMBOL_NAME: RtkHDAud+49f068
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: RtkHDAud
IMAGE_NAME: RtkHDAud.sys
DEBUG_FLR_IMAGE_TIMESTAMP: 4ccff31d
FAILURE_BUCKET_ID: 0x9C_AuthenticAMD_RtkHDAud+49f068
BUCKET_ID: 0x9C_AuthenticAMD_RtkHDAud+49f068
Followup: MachineOwner
So if I am wrong and or need to give this to some one else please e-mail me.
This is all happening on my AMD quad, also on my AMD dual laptop will have 50 to 60 error msg on the desktop, "Milkyway stopped trying to restart" but no BSOD or Minidumps, so could it be AMD related? My single core AMD has no problems. No Opp app no OC, all stock. Every driver up to date Bios up to date, latest BIONC app.
ID: 44862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44865 - Posted: 10 Dec 2010, 1:10:41 UTC - in response to Message 44862.  

Hello All..It appears that there are some problems with some MY@H tasks,267728727,264256459,263423802. Those tasks corraspond to my system crash.
So I down loaded MS debug And came up with this report. History, For the past two weeks my system would flash BSOD too quick to read and just shut down. Took out RAM, GFX card, reconfigured hard ware,bios(all this before I used MS debug)

NOTE: This is a hardware error. This error was reported by the CPU
via Interrupt 18. This analysis will provide more information about
the specific error. Please contact the manufacturer for additional
information about this error and troubleshooting assistance.

According to that error it's a hardware problem.
ID: 44865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cannibal Corpse
Avatar

Send message
Joined: 21 Mar 09
Posts: 25
Credit: 11,410,869
RAC: 0
Message 44875 - Posted: 10 Dec 2010, 3:58:19 UTC - in response to Message 44865.  

What hardware is failing? When this started I pulled out my GTS250, pulled out 1 stick of mem,of 2, even put 1 in "B" channel, took out 1 of 2 optical drives, took out my pci ata card. I plugged my tower into a non used outlet. When none of that worked I COMPLETLY disasymbeld the entire tower, board and all. Any other debuggers i can use?

Details
Product: Windows Operating System
ID: 7023
Source: Service Control Manager
Version: 5.2
Symbolic Name: EVENT_SERVICE_EXIT_FAILED
Message: The %1 service terminated with the following error:
%2
I have alot of these in my event veiwer, gona dig into that.
Also some of these,Product: Windows Operating System
ID: 26
Source: Application Popup
Version: 5.2
Symbolic Name: STATUS_LOG_HARD_ERROR
Message: Application popup: %1 : %2
Only thing I have not done yet is install an Ethernet card to eliminat my onboard Ethernet. My Puter has only Minidump once since complete teardown which is when I used MS debugger. If it happens again I will have to stop running MY@H. This problem only started when I started MY@H, 2/3 weeks ago..Thanks for the input and will dig into this, if it is my machine. O also this is a fresh OS install when everything else seamed not to work. And I have also preformed a deep and intensive Virus,Trojen,spyware scan. Any help or direction would be great, or if anyone needs more info from me to help.


ID: 44875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Torrents of Invalid WU's

©2024 Astroinformatics Group