Welcome to MilkyWay@home

Torrents of Invalid WU's

Message boards : Number crunching : Torrents of Invalid WU's
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 42613 - Posted: 6 Oct 2010, 3:17:39 UTC
Last modified: 6 Oct 2010, 3:22:20 UTC

Oct 5-6 have caused me to have at least 100 WU's marked with "Validate Error", "Completed, marked as invalid", and "Completed, can't validate".

"Validate Error" seems to be caused by dozens of CPU-only WU's that 'completed' after only 4-8sec.

One WU marked as invalid (161366443) had two other CPU-only wingmen complete it in only 5sec yet their results were accepted. In scrolling through my own 'valid' WU's, I also see dozens of 4sec WU results accepted as gospel by the validator.

The "Completed, can't validate" appears to be a quorum failure where hundreds of other wingmen are returning invalid CPU results in the same sub-5sec period.

Something seems to be very broken here...
ID: 42613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nenym

Send message
Joined: 16 Jan 09
Posts: 5
Credit: 400,164,802
RAC: 0
Message 42619 - Posted: 6 Oct 2010, 5:05:39 UTC - in response to Message 42613.  
Last modified: 6 Oct 2010, 5:06:57 UTC

ID: 42619 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fred J. Verster

Send message
Joined: 22 Apr 09
Posts: 38
Credit: 27,377,932
RAC: 0
Message 42621 - Posted: 6 Oct 2010, 8:27:31 UTC - in response to Message 42619.  

The same here http://milkyway.cs.rpi.edu/milkyway/results.php?userid=23358&offset=0&show_names=0&state=4.



Can't access your results, try if you get access to this task.
Last validated result.



Knight Who says Ni
ID: 42621 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile nenym

Send message
Joined: 16 Jan 09
Posts: 5
Credit: 400,164,802
RAC: 0
Message 42624 - Posted: 6 Oct 2010, 11:26:43 UTC - in response to Message 42621.  
Last modified: 6 Oct 2010, 11:28:09 UTC

The same as me. Buit something has happened as I have only one invalid 1 GPU x 2 Linux CPU task now.
ID: 42624 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rphstout

Send message
Joined: 11 Feb 10
Posts: 8
Credit: 11,459,648
RAC: 0
Message 42631 - Posted: 6 Oct 2010, 14:03:44 UTC

The river of invalid WU's unfortunalely still hasn't dried out as the following report shows.
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=213541708
Any clues?
ID: 42631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rphstout

Send message
Joined: 11 Feb 10
Posts: 8
Credit: 11,459,648
RAC: 0
Message 42632 - Posted: 6 Oct 2010, 14:09:50 UTC - in response to Message 42631.  

The river of invalid WU's unfortunalely still hasn't dried out as the following report shows.
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=213541708
Any clues?


The same goes for these too:
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=213541688
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=213541706
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=213541707
ID: 42632 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 42633 - Posted: 6 Oct 2010, 14:13:07 UTC - in response to Message 42631.  

Yesterday evening (GMT+2) after Travis' post I aborted all MW-wu's and loaded the new ones. My mainsystem cruched some hundred since then, not a sigle one failed. Currently I cruch only the ati-app, I stopped the nbodys since they also produced errors.

Alexander
ID: 42633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 42634 - Posted: 6 Oct 2010, 15:14:36 UTC - in response to Message 42631.  

The river of invalid WU's unfortunalely still hasn't dried out...

Travis apparently did something with the GPU WU's around 04:22UTC today. This so far has fixed the GPU validation problems for me.

But 6 of 17 CPU-only WU's sent out after that time have run for only a few seconds and been rejected with "Validate Error" or "Completed, marked as invalid". This includes several from only 2 hours ago. The remainder look like they will run for the expected time.


ID: 42634 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rphstout

Send message
Joined: 11 Feb 10
Posts: 8
Credit: 11,459,648
RAC: 0
Message 42636 - Posted: 6 Oct 2010, 15:31:07 UTC - in response to Message 42634.  

The remainder look like they will run for the expected time.

Affirmative. After half a dozen of invalid WUs, finally the valid ones rushed down the line!
We're back in CPU-business.
ID: 42636 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Haris Dublas

Send message
Joined: 25 Feb 10
Posts: 49
Credit: 10,137,837
RAC: 0
Message 42640 - Posted: 6 Oct 2010, 16:13:00 UTC
Last modified: 6 Oct 2010, 16:19:02 UTC

213355053 201854 6 Oct 2010 7:36:33 UTC 6 Oct 2010 7:54:19 UTC Completed and validated 83.27 5.83 0.04 213.76 MilkyWay@Home v0.23 (ati13ati)
213369417 182806 6 Oct 2010 8:06:35 UTC 6 Oct 2010 13:42:27 UTC Completed, marked as invalid 1,294.64 406.41 2.15 0.00 Anonymous platform
213538977 104894 6 Oct 2010 13:47:35 UTC 6 Oct 2010 14:12:59 UTC Completed and validated 0.00 9.05 0.05 213.76 Anonymous platform


The underclocked 3850 wu is invalid while the cpu wu that have 0 secs runtime and 9 secs cpu time is valid.

This is getting out of hand, I'm gonna move my host to collatz until this problem is fixed.
ID: 42640 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 42654 - Posted: 7 Oct 2010, 1:00:22 UTC - in response to Message 42636.  
Last modified: 7 Oct 2010, 1:12:33 UTC

We're back in CPU-business.

Alas, we are not. I still have CPU WU's that are completing in a few seconds. These latest were sent out from 10:53UTC October 6 through 01:09UTC October 7.
ID: 42654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 42662 - Posted: 7 Oct 2010, 17:53:20 UTC

Travis posted over in the other thread that he was going to update the CPU apps as he knew what was going on.
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1953&nowrap=true#42642
ID: 42662 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 42675 - Posted: 8 Oct 2010, 10:10:59 UTC - in response to Message 42662.  

Version 0.40 CPU app seems to have done the trick.
ID: 42675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 42737 - Posted: 10 Oct 2010, 22:06:18 UTC

Not sure where this fits best, This task ran for 3 hours with no progress and I aborted it, 4 of 5 didn't seems to show any progress either. Yet one task I had ran just fine.

task de_14_2s_5_1344965_1286666350_1
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 42737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 42739 - Posted: 10 Oct 2010, 23:02:37 UTC - in response to Message 42737.  
Last modified: 10 Oct 2010, 23:03:59 UTC

This task ran for 3 hours with no progress and I aborted it, 4 of 5 didn't seems to show any progress either.

All of my 0.40 or 0.04 WU's failed on error -161 after up to 37 CPU hours. All of them were running at least 50% slower than version 0.19. They don't yet know why it runs so slow under Windows...
ID: 42739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
harryb

Send message
Joined: 23 May 10
Posts: 1
Credit: 43,238,758
RAC: 0
Message 42746 - Posted: 11 Oct 2010, 0:01:42 UTC

Hi, Just started getting the message below. I am no longer recieving new WU's.
What should I do? Thanks.

10/10/2010 7:53:30 PM Milkyway@home Message from server: No work sent
10/10/2010 7:53:30 PM Milkyway@home Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation.
10/10/2010 7:54:26 PM Milkyway@home Computation for task de_15_2s_5_1888795_1286742812_1 finished
10/10/2010 7:54:28 PM Milkyway@home Started upload of de_15_2s_5_1888795_1286742812_1_0
10/10/2010 7:54:29 PM Milkyway@home Finished upload of de_15_2s_5_1888795_1286742812_1_0
10/10/2010 7:54:36 PM Milkyway@home Sending scheduler request: To fetch work.
10/10/2010 7:54:36 PM Milkyway@home Reporting 2 completed tasks, requesting new tasks for CPU
10/10/2010 7:54:38 PM Milkyway@home Scheduler request completed: got 0 new tasks
10/10/2010 7:54:38 PM Milkyway@home Message from server: No work sent
10/10/2010 7:54:38 PM Milkyway@home Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation.
10/10/2010 7:54:57 PM Milkyway@home Computation for task de_15_2s_5_1948709_1286751518_0 finished
ID: 42746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Paul Forsdick

Send message
Joined: 19 Feb 09
Posts: 29
Credit: 5,452,255
RAC: 0
Message 42752 - Posted: 11 Oct 2010, 11:07:58 UTC - in response to Message 42737.  

Not sure where this fits best, This task ran for 3 hours with no progress and I aborted it, 4 of 5 didn't seems to show any progress either. Yet one task I had ran just fine.[/u]
task de_14_2s_5_1344965_1286666350_1


I have had similar ones

what I notice is that if it does not start in 2 minutes it will just countdown on the right with no progress until it reaches zero Ie after 6 or 7 hours then keeps running instead of the normal 4 to 5 hours
the good ones that work usually show 0.62% progress within 2 minutes.
so when i get them if it shows the 0.62 before 2 minutes then I let it run and if not I abort it.
I notice over the weekend about 40% were good but the other 60% would not have started so these were aborted after 2 minutes

Paul
ID: 42752 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 42819 - Posted: 13 Oct 2010, 15:47:40 UTC
Last modified: 13 Oct 2010, 15:51:03 UTC

All of these were bad and aborted, first issue also.

de_15_2s_5_1435953_1286983876
de_15_2s_5_1435954_1286983876
de_15_2s_5_1435955_1286983876
de_15_2s_5_1435956_1286983876
de_15_2s_5_1435957_1286983876
de_15_2s_5_1435958_1286983876

added these, same thing.

de_16_2s_5_1437807_1286984701
de_16_2s_5_1437817_1286984701
de_16_2s_5_1437818_1286984701
de_16_2s_5_1437819_1286984701
de_16_2s_5_1437821_1286984701
de_16_2s_5_1437822_1286984701
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 42819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 42871 - Posted: 15 Oct 2010, 18:04:07 UTC

My findings is that all 2s units are bad, and all 3s units run fine.


((so far))
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 42871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 42872 - Posted: 15 Oct 2010, 18:25:59 UTC - in response to Message 42871.  

My findings is that all 2s units are bad, and all 3s units run fine.

Unfortunately, if you search the message board for "_3s_", you'll find some people reporting that they have errored out on this -161 file transfer error just like the 2S units did.
ID: 42872 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Torrents of Invalid WU's

©2024 Astroinformatics Group