Welcome to MilkyWay@home

Insta-Purge

Message boards : Number crunching : Insta-Purge
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 10300 - Posted: 11 Feb 2009, 16:59:25 UTC

Well, it has been around 12 hours since the last set of repairs to the project, and everything still seems to be just as snappy as it was after the last backend restart. The server status looks like the backend is holding its own as well.

So can we get off insta-purge and go back to something a little more user friendly?

I would suggest 12 hours to start with. Since you increased the in progress quota by 50%, decreasing the purge interval by the same amount from what it was before the disaster would seem to be appropriate.

Of course, this is assuming that the patch to allow less than whole day purge increments wasn't part of the problem after the big meltdown. ;-)

Alinator
ID: 10300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 10301 - Posted: 11 Feb 2009, 17:08:43 UTC

It seems to be stable again. Atleast up it in increments if needed, 1 hour, 3, 6, 12, 24. 6 would be the minimum I want to check on results 24-48 is preferrable.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 10301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10315 - Posted: 11 Feb 2009, 19:20:38 UTC - in response to Message 10301.  

I'm still tracking down the leaky workunit issue. So it'll be on insta-purge until I figure that out. Afterwards i'll start increasing it and see how the server handles it.
ID: 10315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 10316 - Posted: 11 Feb 2009, 19:48:08 UTC

Use a stick of gum and plug the hole! ;p
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 10316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 10319 - Posted: 11 Feb 2009, 20:05:08 UTC - in response to Message 10315.  
Last modified: 11 Feb 2009, 20:05:44 UTC

I'm still tracking down the leaky workunit issue. So it'll be on insta-purge until I figure that out. Afterwards i'll start increasing it and see how the server handles it.


OK...

If it helps, I have a Zombie on 20335 and 23908.

You're on your own for the stealth ones! ;-)

Alinator
ID: 10319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gavin Shaw
Avatar

Send message
Joined: 16 Jan 08
Posts: 98
Credit: 1,371,299
RAC: 0
Message 10357 - Posted: 12 Feb 2009, 0:08:31 UTC

Seems like my laptop is somehow involved in this.

Laptop has many units assigned to it that have not been reported back, but they are not on my system. They will time out and expire. Wonder what is happening here?

I noticed that it seems to happen when I am asleep in the early morning.

Never surrender and never give up. In the darkest hour there is always hope.

ID: 10357 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile DaveSun
Avatar

Send message
Joined: 10 Nov 07
Posts: 28
Credit: 2,549,231
RAC: 0
Message 10358 - Posted: 12 Feb 2009, 0:17:04 UTC

Here's some leaker's for you 7933 and 7968 found on my task list and not purged yet...strange.
ID: 10358 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 10368 - Posted: 12 Feb 2009, 2:09:29 UTC

I have quite a few from 8 Feb 09 still showing up. Task IDs/Result IDs 2911 to 2923, 2953 to 2956 and 15663, 15664, 15672 and 15675.

Please quite a few from 9 Feb as well.
ID: 10368 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 10375 - Posted: 12 Feb 2009, 5:01:52 UTC
Last modified: 12 Feb 2009, 5:03:23 UTC

Hmmm...

Due to pilot error, I allowed a couple of my hosts which aren't currently supported to contact the project and they downloaded. Since they would have all errored out anyway, I aborted the tasks.

Interestingly, this seems to have resulted in the WU's becoming Zombified. A number of them have been reissued, run, returned, and validated, yet remain unpurged hours later.

So it appears when a WU goes to extra replications, the DB loses track of the WU state, at least as far as flagging it for purging purposes.

Alinator
ID: 10375 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10448 - Posted: 13 Feb 2009, 2:22:34 UTC - in response to Message 10375.  

Hmmm...

Due to pilot error, I allowed a couple of my hosts which aren't currently supported to contact the project and they downloaded. Since they would have all errored out anyway, I aborted the tasks.

Interestingly, this seems to have resulted in the WU's becoming Zombified. A number of them have been reissued, run, returned, and validated, yet remain unpurged hours later.

So it appears when a WU goes to extra replications, the DB loses track of the WU state, at least as far as flagging it for purging purposes.

Alinator


I think i found it. These weren't having a result set as canonical, so for whatever reason, that would make their transition time += 10 days. I think this was the cause of the zombies.
ID: 10448 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 10451 - Posted: 13 Feb 2009, 3:25:01 UTC - in response to Message 10448.  

Hmmm...

Due to pilot error, I allowed a couple of my hosts which aren't currently supported to contact the project and they downloaded. Since they would have all errored out anyway, I aborted the tasks.

Interestingly, this seems to have resulted in the WU's becoming Zombified. A number of them have been reissued, run, returned, and validated, yet remain unpurged hours later.

So it appears when a WU goes to extra replications, the DB loses track of the WU state, at least as far as flagging it for purging purposes.

Alinator


I think i found it. These weren't having a result set as canonical, so for whatever reason, that would make their transition time += 10 days. I think this was the cause of the zombies.


LOL...

Yes, all of my Zombies have gone poof. :-)

Funny you should mention the part about no canonical being shown. I was going to ask about that, but figured you left it out on purpose when you integrated the validator and assimilator (give self slap on head). ;-)

Next time I see something like that I will mention it right away, instead of being stupid! Sheesh... I've been playing this game since before they even called them PC's. You'd think I'd know better than that. :-D

Alinator
ID: 10451 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Phil
Avatar

Send message
Joined: 13 Feb 08
Posts: 1124
Credit: 46,740
RAC: 0
Message 10476 - Posted: 13 Feb 2009, 11:46:26 UTC - in response to Message 10448.  

I think i found it. These weren't having a result set as canonical, so for whatever reason, that would make their transition time += 10 days. I think this was the cause of the zombies.


Yep, all gone!
ID: 10476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 10538 - Posted: 13 Feb 2009, 22:03:33 UTC
Last modified: 13 Feb 2009, 22:26:18 UTC

OK...

What I'm noticing now from my end is that a canonical result is shown for completed tasks, but the Granted Credit shows as zero in the WU info section, even though it displays the granted credit in the detail section of the WU summary page.

Alinator
ID: 10538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10545 - Posted: 13 Feb 2009, 22:26:46 UTC - in response to Message 10538.  

OK...

What I'm noticing now from my end is that a canonical result is shown for completed tasks, but the Granted Credit shows as zero in the WU info section, even though it displays the granted credit in the detail section of the WU summary page.

Alinator


Think i fixed this just now.
ID: 10545 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 10617 - Posted: 14 Feb 2009, 2:45:05 UTC - in response to Message 10545.  

Yep, everything looks OK from here.

Alinator
ID: 10617 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WimTea

Send message
Joined: 16 Nov 07
Posts: 23
Credit: 4,774,710
RAC: 0
Message 10725 - Posted: 14 Feb 2009, 21:10:00 UTC

Since the probs seem to be solved it would be nice if insta-purge is converted to say in-an-hour-purge to start with. Or are we not there yet?
ID: 10725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10726 - Posted: 14 Feb 2009, 21:23:51 UTC - in response to Message 10725.  

Since the probs seem to be solved it would be nice if insta-purge is converted to say in-an-hour-purge to start with. Or are we not there yet?


I've fixed most the zombie WU problem but not quite sure if it's all fixed yet.
ID: 10726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 10728 - Posted: 14 Feb 2009, 21:25:57 UTC

Shouldn't that be enough to extend the time to atleast an hour?
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 10728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10729 - Posted: 14 Feb 2009, 21:33:40 UTC - in response to Message 10728.  

Shouldn't that be enough to extend the time to atleast an hour?


It makes it a lot harder (or impossible) to tell if the WUs I'm looking at in the database are ones that will be deleted after the hour or not at all.

With the insta-purge, if something sticks around I know it's problematic.
ID: 10729 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
Message 10730 - Posted: 14 Feb 2009, 21:43:21 UTC - in response to Message 10729.  

Shouldn't that be enough to extend the time to atleast an hour?


It makes it a lot harder (or impossible) to tell if the WUs I'm looking at in the database are ones that will be deleted after the hour or not at all.

With the insta-purge, if something sticks around I know it's problematic.

And are you still seeing things hanging about?
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 10730 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Insta-Purge

©2024 Astroinformatics Group