Message boards :
Number crunching :
bad wus in the database
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
there were some bad WUs in the database -- i had to remove all gs_373282. right now the server says there are many available WUs to download, so let me know if you're getting these and they're working. thanks! |
Send message Joined: 28 Dec 07 Posts: 26 Credit: 1,161,815 RAC: 0 |
there were some bad WUs in the database -- i had to remove all gs_373282. right now the server says there are many available WUs to download, so let me know if you're getting these and they're working. All seems fine here. 3737's are running down and I'm filling up with 602's. Thanks for the info and the time spent getting it up again. It is appreciated. Join Cruncher Junkies on MilkyWay! |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
there were some bad WUs in the database -- i had to remove all gs_373282. right now the server says there are many available WUs to download, so let me know if you're getting these and they're working. you're getting 602s? that's a bit odd because that search should be over... but if you're getting them and they're working than that seems to be ok :) i'm a little worried because the number of unsent WUs seems to be holding steady around 15000.... and i'm not generating any new ones. |
Send message Joined: 16 Jan 08 Posts: 98 Credit: 1,371,299 RAC: 0 |
|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
What i've done is removed all the WUs currently in the database -- because people shouldn't be getting 602s, especially the ones that are just going to error out. I've set the server to cancel any reported bad WUs, so hopefully it will automatically cancel the bad ones for everyone out there. Let me know if this works. Right now, everyone should only be getting gs_606 and your BOINC client should be automatically canceling anything else. If this isn't working please let me know. |
Send message Joined: 15 Apr 08 Posts: 55 Credit: 24,047 RAC: 0 |
there were some bad WUs in the database -- i had to remove all gs_373282. right now the server says there are many available WUs to download, so let me know if you're getting these and they're working. It appears that all the wus my computer managed to crunch Thank you Travis |
Send message Joined: 4 Apr 08 Posts: 3 Credit: 8,104,710 RAC: 0 |
My quad core cancelled a few WU's, but now it isn't getting any new ones! Server status page mentions over 15,000 WU's available, so it's odd I ain't getting them. Running 64bit Linux btw. Any ideas? |
Send message Joined: 15 Apr 08 Posts: 55 Credit: 24,047 RAC: 0 |
My quad core cancelled a few WU's, but now it isn't getting any new ones! Server status page mentions over 15,000 WU's available, so it's odd I ain't getting them. Running 64bit Linux btw. Any ideas? Got one then nothing All the work done yesterday cancelled with no credits what a farce |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
My quad core cancelled a few WU's, but now it isn't getting any new ones! Server status page mentions over 15,000 WU's available, so it's odd I ain't getting them. Running 64bit Linux btw. Any ideas? I think this was part of the problem with WU generation. For whatever reason the server was thinking more WUs were available than actually were. Everything looks like it should be running smoothly now and all the WUs related to any search other than gs_606 and gs_607 should be automatically canceled client-side (i hope). I also found a few heavily loaded tables in the database that might have been slowing things down even more and cleaned this out as well. So now I'm hoping work should start flowing smoothly and there shouldn't be any more assimilator or validator backups -- which means you'll be getting credit on a timely fashion and not losing out on any. |
Send message Joined: 28 Dec 07 Posts: 26 Credit: 1,161,815 RAC: 0 |
-- which means you'll be getting credit on a timely fashion and not losing out on any. Looks like all pending is cleared but without credits being awarded. :( Join Cruncher Junkies on MilkyWay! |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
-- which means you'll be getting credit on a timely fashion and not losing out on any. I cleared out all the pending because the vast majority of these were gs_602 -- which would only error out anyways, so no one was going to be getting credit for them. I got the DB back to a clean slate, and now that it's constantly running cleanup daemons i think we should have smooth sailing from here on out. |
Send message Joined: 8 Apr 08 Posts: 45 Credit: 161,943,995 RAC: 0 |
-- which means you'll be getting credit on a timely fashion and not losing out on any. All 3 computers XP and linux have downloaded gs_607 and gs_606 in batches of twenty and are processing OWN every thing I need EARN.. enough to live !!! WANT a solar array on the roof so I can run a BOINC farm( DREAM on!!) NO wife NO kids NO troubles |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
-- which means you'll be getting credit on a timely fashion and not losing out on any. well thats good news :) please keep me updated about any errors these WUs might have. |
Send message Joined: 26 Mar 08 Posts: 15 Credit: 2,045,502 RAC: 0 |
XP Pro SP3 32BIT & XP home SP3 32BIT gs_606 & gs_607 validated and granted credit. Weldone Travis. |
Send message Joined: 8 Apr 08 Posts: 45 Credit: 161,943,995 RAC: 0 |
It's 19.08 Western Australian Time here (UTC +8) 132 units completed without missing a beat. Maintaining the 20 unit allocation. including a linux headless commandline installation of UBUNTU 8.04. Might have overcome the server problems! Well done Travis !! OWN every thing I need EARN.. enough to live !!! WANT a solar array on the roof so I can run a BOINC farm( DREAM on!!) NO wife NO kids NO troubles |
Send message Joined: 7 Sep 07 Posts: 444 Credit: 5,712,523 RAC: 0 |
Successfully completed 606 and 607's on my various hosts. Great stuff, Travis. Rod |
Send message Joined: 22 Nov 07 Posts: 36 Credit: 1,224,316 RAC: 0 |
606 and 607 are downloading to all my machines - no "problems", other than the one Travis mentioned about WU being deleted almost as fast as they receive credit. Personally, I'd rather it be this way, than to have the DB slow and cluttered so I don't receive any new WU... C Team MacNN |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
-- which means you'll be getting credit on a timely fashion and not losing out on any. Hmmm... I had caught the bad 602's early on and NNT'ed my hosts, and canned them at my end when you made the announcement about the problem. I only had one of them which had run and was in 'orphaned' Pending limbo, so I just purged it from my local personal task database manually. OTOH, I had 38 pending of the batch which was being sent out after the 602's and before the latest work cancellation. Needless to say when I went to log data earlier, it was a bit distressing to see all the pending poofed (apparently), as well as the almost 60 in progress at the time! ;-) I hadn't looked at the message boards before I started logging, and my first reaction was "!@#$..., WTF is going on here!!??" :-D As it turned out when I reconciled each of my host accounts I found that only 2 of the 38 pendings didn't get their credit granted. So it would seem that the loss of credit for the pendings might not be as bad as some folks have suggested. I guess it would depend on mostly on individual circumstances as to the degree of lost credit. One suggestion about setting up the database purging processes though. You might want to consider setting some delay to allow participants to examine their recent completed work before poofing it. Something like 12 to 24 hours usually is adequate, although some other production projects hold them for longer before purging. One would think even with the somewhat unique 'on the fly' work generation from the ongoing output here, you should be able to delay the purging of completed work for a short interval without bringing everything to its knees. Regards, Alinator |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Thats what we were trying to do in the past, unfortunately after a whole day of crunching theres enough in the database that when the purge starts, it's so expensive that it slows down the rest of the system and can't catch up, while pretty much makes everything slow and unresponsive. so to get the purge done we have to stop the assimilator and work generation. |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
... that when the purge starts, it's so expensive that it slows down the rest of the system and can't catch up, while pretty much makes everything slow and unresponsive. so to get the purge done we have to stop the assimilator and work generation. Can't you automatically delete any WU that's been done for 24 hours or more? So rather than doing a full purge every 24 hours you'd be purging WUs continuously at roughly the same rate that they're coming in.. Perhaps keep a log of completed WUs (sorted by reported completion time by definition), and purge old (>24h) entries whenever you add a new one (i.e. a WU is reported) if(thisWU was completed more than 24 hours ago) delete and examine next item; else exit; To put it in pseudo-code <.< (had a brain-fart, so that was the best I could do) |
©2024 Astroinformatics Group