Welcome to MilkyWay@home

bad wus in the database

Message boards : Number crunching : bad wus in the database
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3842 - Posted: 19 Jun 2008, 6:28:49 UTC

there were some bad WUs in the database -- i had to remove all gs_373282. right now the server says there are many available WUs to download, so let me know if you're getting these and they're working.

thanks!
ID: 3842 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DistroMan
Avatar

Send message
Joined: 28 Dec 07
Posts: 26
Credit: 1,161,815
RAC: 0
Message 3843 - Posted: 19 Jun 2008, 6:45:52 UTC - in response to Message 3842.  

there were some bad WUs in the database -- i had to remove all gs_373282. right now the server says there are many available WUs to download, so let me know if you're getting these and they're working.

thanks!


All seems fine here. 3737's are running down and I'm filling up with 602's. Thanks for the info and the time spent getting it up again. It is appreciated.
Join Cruncher Junkies on MilkyWay!
ID: 3843 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3844 - Posted: 19 Jun 2008, 7:06:05 UTC - in response to Message 3843.  

there were some bad WUs in the database -- i had to remove all gs_373282. right now the server says there are many available WUs to download, so let me know if you're getting these and they're working.

thanks!


All seems fine here. 3737's are running down and I'm filling up with 602's. Thanks for the info and the time spent getting it up again. It is appreciated.


you're getting 602s? that's a bit odd because that search should be over... but if you're getting them and they're working than that seems to be ok :) i'm a little worried because the number of unsent WUs seems to be holding steady around 15000.... and i'm not generating any new ones.
ID: 3844 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gavin Shaw
Avatar

Send message
Joined: 16 Jan 08
Posts: 98
Credit: 1,371,299
RAC: 0
Message 3845 - Posted: 19 Jun 2008, 7:12:05 UTC

I have a 602 unit as well. It is a resend due to a host failing to report back on time and has been reissued to me.

Never surrender and never give up. In the darkest hour there is always hope.

ID: 3845 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3846 - Posted: 19 Jun 2008, 7:16:13 UTC - in response to Message 3845.  

What i've done is removed all the WUs currently in the database -- because people shouldn't be getting 602s, especially the ones that are just going to error out. I've set the server to cancel any reported bad WUs, so hopefully it will automatically cancel the bad ones for everyone out there. Let me know if this works.

Right now, everyone should only be getting gs_606 and your BOINC client should be automatically canceling anything else. If this isn't working please let me know.
ID: 3846 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile alijay

Send message
Joined: 15 Apr 08
Posts: 55
Credit: 24,047
RAC: 0
Message 3847 - Posted: 19 Jun 2008, 7:16:13 UTC - in response to Message 3844.  

there were some bad WUs in the database -- i had to remove all gs_373282. right now the server says there are many available WUs to download, so let me know if you're getting these and they're working.

thanks!


All seems fine here. 3737's are running down and I'm filling up with 602's. Thanks for the info and the time spent getting it up again. It is appreciated.


you're getting 602s? that's a bit odd because that search should be over... but if you're getting them and they're working than that seems to be ok :) i'm a little worried because the number of unsent WUs seems to be holding steady around 15000.... and i'm not generating any new ones.

It appears that all the wus my computer managed to crunch Thank you Travis
ID: 3847 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DarkStar

Send message
Joined: 4 Apr 08
Posts: 3
Credit: 8,104,710
RAC: 0
Message 3849 - Posted: 19 Jun 2008, 7:20:32 UTC

My quad core cancelled a few WU's, but now it isn't getting any new ones! Server status page mentions over 15,000 WU's available, so it's odd I ain't getting them. Running 64bit Linux btw. Any ideas?
ID: 3849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile alijay

Send message
Joined: 15 Apr 08
Posts: 55
Credit: 24,047
RAC: 0
Message 3850 - Posted: 19 Jun 2008, 7:27:54 UTC - in response to Message 3849.  

My quad core cancelled a few WU's, but now it isn't getting any new ones! Server status page mentions over 15,000 WU's available, so it's odd I ain't getting them. Running 64bit Linux btw. Any ideas?

Got one then nothing

All the work done yesterday cancelled with no credits what a farce
ID: 3850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3853 - Posted: 19 Jun 2008, 7:43:05 UTC - in response to Message 3849.  

My quad core cancelled a few WU's, but now it isn't getting any new ones! Server status page mentions over 15,000 WU's available, so it's odd I ain't getting them. Running 64bit Linux btw. Any ideas?


I think this was part of the problem with WU generation. For whatever reason the server was thinking more WUs were available than actually were. Everything looks like it should be running smoothly now and all the WUs related to any search other than gs_606 and gs_607 should be automatically canceled client-side (i hope).

I also found a few heavily loaded tables in the database that might have been slowing things down even more and cleaned this out as well. So now I'm hoping work should start flowing smoothly and there shouldn't be any more assimilator or validator backups -- which means you'll be getting credit on a timely fashion and not losing out on any.
ID: 3853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DistroMan
Avatar

Send message
Joined: 28 Dec 07
Posts: 26
Credit: 1,161,815
RAC: 0
Message 3858 - Posted: 19 Jun 2008, 8:21:39 UTC - in response to Message 3853.  

-- which means you'll be getting credit on a timely fashion and not losing out on any.


Looks like all pending is cleared but without credits being awarded. :(
Join Cruncher Junkies on MilkyWay!
ID: 3858 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3859 - Posted: 19 Jun 2008, 8:23:12 UTC - in response to Message 3858.  

-- which means you'll be getting credit on a timely fashion and not losing out on any.


Looks like all pending is cleared but without credits being awarded. :(


I cleared out all the pending because the vast majority of these were gs_602 -- which would only error out anyways, so no one was going to be getting credit for them. I got the DB back to a clean slate, and now that it's constantly running cleanup daemons i think we should have smooth sailing from here on out.
ID: 3859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile RAMen
Avatar

Send message
Joined: 8 Apr 08
Posts: 45
Credit: 161,943,995
RAC: 0
Message 3860 - Posted: 19 Jun 2008, 8:27:18 UTC - in response to Message 3859.  

-- which means you'll be getting credit on a timely fashion and not losing out on any.


Looks like all pending is cleared but without credits being awarded. :(


I cleared out all the pending because the vast majority of these were gs_602 -- which would only error out anyways, so no one was going to be getting credit for them. I got the DB back to a clean slate, and now that it's constantly running cleanup daemons i think we should have smooth sailing from here on out.



All 3 computers XP and linux have downloaded gs_607 and gs_606 in batches of twenty and are processing

OWN every thing I need
EARN.. enough to live !!!
WANT a solar array on the roof so I can run a BOINC farm( DREAM on!!)
NO wife
NO kids
NO troubles

ID: 3860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3862 - Posted: 19 Jun 2008, 8:50:53 UTC - in response to Message 3860.  

-- which means you'll be getting credit on a timely fashion and not losing out on any.


Looks like all pending is cleared but without credits being awarded. :(


I cleared out all the pending because the vast majority of these were gs_602 -- which would only error out anyways, so no one was going to be getting credit for them. I got the DB back to a clean slate, and now that it's constantly running cleanup daemons i think we should have smooth sailing from here on out.



All 3 computers XP and linux have downloaded gs_607 and gs_606 in batches of twenty and are processing


well thats good news :) please keep me updated about any errors these WUs might have.
ID: 3862 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Mac-Nic
Avatar

Send message
Joined: 26 Mar 08
Posts: 15
Credit: 2,045,502
RAC: 0
Message 3863 - Posted: 19 Jun 2008, 9:21:05 UTC

XP Pro SP3 32BIT & XP home SP3 32BIT

gs_606 & gs_607 validated and granted credit.

Weldone Travis.
ID: 3863 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile RAMen
Avatar

Send message
Joined: 8 Apr 08
Posts: 45
Credit: 161,943,995
RAC: 0
Message 3865 - Posted: 19 Jun 2008, 11:17:02 UTC

It's 19.08 Western Australian Time here (UTC +8)

132 units completed without missing a beat. Maintaining the 20 unit allocation. including a linux headless commandline installation of UBUNTU 8.04.

Might have overcome the server problems!

Well done Travis !!

OWN every thing I need
EARN.. enough to live !!!
WANT a solar array on the roof so I can run a BOINC farm( DREAM on!!)
NO wife
NO kids
NO troubles

ID: 3865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Odd-Rod

Send message
Joined: 7 Sep 07
Posts: 444
Credit: 5,712,451
RAC: 0
Message 3867 - Posted: 19 Jun 2008, 11:34:12 UTC

Successfully completed 606 and 607's on my various hosts.

Great stuff, Travis.

Rod
ID: 3867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
C

Send message
Joined: 22 Nov 07
Posts: 36
Credit: 1,224,316
RAC: 0
Message 3876 - Posted: 19 Jun 2008, 14:29:43 UTC

606 and 607 are downloading to all my machines - no "problems", other than the one Travis mentioned about WU being deleted almost as fast as they receive credit. Personally, I'd rather it be this way, than to have the DB slow and cluttered so I don't receive any new WU...

C
Team MacNN
ID: 3876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 3877 - Posted: 19 Jun 2008, 16:37:33 UTC - in response to Message 3859.  

-- which means you'll be getting credit on a timely fashion and not losing out on any.


Looks like all pending is cleared but without credits being awarded. :(


I cleared out all the pending because the vast majority of these were gs_602 -- which would only error out anyways, so no one was going to be getting credit for them. I got the DB back to a clean slate, and now that it's constantly running cleanup daemons i think we should have smooth sailing from here on out.


Hmmm...

I had caught the bad 602's early on and NNT'ed my hosts, and canned them at my end when you made the announcement about the problem. I only had one of them which had run and was in 'orphaned' Pending limbo, so I just purged it from my local personal task database manually.

OTOH, I had 38 pending of the batch which was being sent out after the 602's and before the latest work cancellation.

Needless to say when I went to log data earlier, it was a bit distressing to see all the pending poofed (apparently), as well as the almost 60 in progress at the time! ;-)

I hadn't looked at the message boards before I started logging, and my first reaction was "!@#$..., WTF is going on here!!??" :-D

As it turned out when I reconciled each of my host accounts I found that only 2 of the 38 pendings didn't get their credit granted. So it would seem that the loss of credit for the pendings might not be as bad as some folks have suggested. I guess it would depend on mostly on individual circumstances as to the degree of lost credit.

One suggestion about setting up the database purging processes though. You might want to consider setting some delay to allow participants to examine their recent completed work before poofing it. Something like 12 to 24 hours usually is adequate, although some other production projects hold them for longer before purging. One would think even with the somewhat unique 'on the fly' work generation from the ongoing output here, you should be able to delay the purging of completed work for a short interval without bringing everything to its knees.

Regards,

Alinator
ID: 3877 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 3879 - Posted: 19 Jun 2008, 17:46:05 UTC - in response to Message 3877.  


One suggestion about setting up the database purging processes though. You might want to consider setting some delay to allow participants to examine their recent completed work before poofing it. Something like 12 to 24 hours usually is adequate, although some other production projects hold them for longer before purging. One would think even with the somewhat unique 'on the fly' work generation from the ongoing output here, you should be able to delay the purging of completed work for a short interval without bringing everything to its knees.

Regards,

Alinator



Thats what we were trying to do in the past, unfortunately after a whole day of crunching theres enough in the database that when the purge starts, it's so expensive that it slows down the rest of the system and can't catch up, while pretty much makes everything slow and unresponsive. so to get the purge done we have to stop the assimilator and work generation.
ID: 3879 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 3880 - Posted: 19 Jun 2008, 18:58:32 UTC - in response to Message 3879.  
Last modified: 19 Jun 2008, 19:04:15 UTC

... that when the purge starts, it's so expensive that it slows down the rest of the system and can't catch up, while pretty much makes everything slow and unresponsive. so to get the purge done we have to stop the assimilator and work generation.

Can't you automatically delete any WU that's been done for 24 hours or more? So rather than doing a full purge every 24 hours you'd be purging WUs continuously at roughly the same rate that they're coming in..

Perhaps keep a log of completed WUs (sorted by reported completion time by definition), and purge old (>24h) entries whenever you add a new one (i.e. a WU is reported)

if(thisWU was completed more than 24 hours ago) delete and examine next item;
else exit;

To put it in pseudo-code <.< (had a brain-fart, so that was the best I could do)
ID: 3880 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : bad wus in the database

©2024 Astroinformatics Group