Welcome to MilkyWay@home

Work fetch poor

Message boards : Number crunching : Work fetch poor
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 49034 - Posted: 26 May 2011, 12:05:58 UTC

Hi,

since yesterday my systems need to be updated manually from time to time.
The whole bunch of wu's is uploaded and ... nothing more happens for an hour or so, if no manual intervention happens.
In 'Earlier Days' there were never more than three wu's uploaded and not picked up automaically.
Or, when only two ore three wu's where left uncrunched, the finished wu's were picked up and new ones delivered.
I'm not shure, but maybe this came along with the new boinc version which I've installed two days ago.

Are there workarounds?
ID: 49034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Berserk_Tux
Avatar

Send message
Joined: 2 Jan 08
Posts: 79
Credit: 365,471,675
RAC: 0
Message 49037 - Posted: 26 May 2011, 12:39:36 UTC - in response to Message 49034.  

Hi,

since yesterday my systems need to be updated manually from time to time.
The whole bunch of wu's is uploaded and ... nothing more happens for an hour or so, if no manual intervention happens.
In 'Earlier Days' there were never more than three wu's uploaded and not picked up automaically.
Or, when only two ore three wu's where left uncrunched, the finished wu's were picked up and new ones delivered.
I'm not shure, but maybe this came along with the new boinc version which I've installed two days ago.

Are there workarounds?


I have the same problem.

ID: 49037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cncguru
Avatar

Send message
Joined: 11 Jun 10
Posts: 329
Credit: 1,166,222,661
RAC: 0
Message 49039 - Posted: 26 May 2011, 12:47:05 UTC

I can second that!
This morning(to me at least)they had the server down a couple times doing who knows what and I have had to manually update 3 times already otherwise my guys sit spinning their wheels going nowhere.
This has been happening just lately but collectively I have lost approx. 60K RAC and this is something that has only happened before after extended outages.
So what's up with these new wu's?
They somehow effecting work fetch??
I have seen it happen on ALL the boinc clients as I have switched around trying to see if it was only one but it does it no matter which client.
ID: 49039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sunny129
Avatar

Send message
Joined: 25 Jan 11
Posts: 271
Credit: 346,072,284
RAC: 0
Message 49047 - Posted: 26 May 2011, 15:11:45 UTC - in response to Message 49039.  

So what's up with these new wu's?
They somehow effecting work fetch??

i too thought that it was the new "test" WU's that Travis started sending out yesterday. but after investigating the problem further, i'm not so sure that the test WU's are directly responsible for our work fetch issues. sure, at first it seemed as though the initial errors produced by the first round of test WU's coincided with the work fetch outages i was experiencing. then my host started seeing the 4th round of test WU's and processed several of them without error. but even after a few hours of error-free crunching, my host saw another work fetch outage that lasted approx. 60 minutes. then it happened again this morning, even though my host's results had been error-free since yesterday afternoon. i looked at BOINC's message log to discover that my host didn't get any new work b/c it tried to fetch while the server was "down for maintenance." so i tried to manually contact the server, only to find that the project was either still down for maintenance, or down again for maintenance. i clicked on the projects tab to find that the count down had begun, and that the next server contact/work fetch would not be attempted by BOINC for another 60 minutes.

that was my moment of clarity - the project is designed to offset the next server contact/work fetch by 60 minutes for any host that attempts to contact the server while it is down for maintenance, obviously to prevent the server from being overloaded with host queries. and i have no problem with this, particularly if the server is going to be down for an hour or longer. but, as was the case this morning, the server was only down for a few minutes at a time, and my host just happened to try and make contact with the server during one of those small windows of down time, automatically bumping my next work fetch back 60 minutes, even if the server came back online a minute later (again, unless i'm there to manually update the project).

i figure this was also the case yesterday when we started seeing the first round of test WU's error out. i would imagine implementation of the next round of test WU's would have required a short window of project down time. some folks were seeing errors with the 1st, 2nd, and 3rd rounds of test WU's, and things didn't settle down until the 4th round of test WU's was distributed. that probably means that Travis had to shut down some of the project servers at least a few times yesterday, causing any host that tried to contact the server during that time to have to wait 60 minutes for another server contact/work fetch.

so it seems that, while the error rate of the test WU's the initially went out yesterday may have been an indirect cause of all our work fetch issues, the direct cause was simply the project being "down for maintenance."
ID: 49047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cncguru
Avatar

Send message
Joined: 11 Jun 10
Posts: 329
Credit: 1,166,222,661
RAC: 0
Message 49048 - Posted: 26 May 2011, 15:36:29 UTC

Thank you Sunny129!!
ID: 49048 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Work fetch poor

©2024 Astroinformatics Group