Message boards :
News :
Reverting Change to Remove Unreliable Hosts
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, I am going to be reverting the change to use the built in BOINC use reliable hosts option. It seems to be having unintended consequences to the usability of the project for some users. In the future, Sidd and I will look into manually removing the worst offenders who are sending back erroring workunits. For anyone effected, I apologize if you haven't been able to crunch for us recently. Jake |
Send message Joined: 31 Oct 10 Posts: 83 Credit: 38,632,375 RAC: 0 |
I was kind of hoping that once those offenders were removed, the number of WUs in our cache could be increased, to carry us through any server down time. I know that my machine isn't anywhere near the fastest, but any outage more than a few hours and I start to run out of work. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Sadly, the issue was something that could not be resolved using the current BOINC defaults for removing unreliable hosts. By using the BOINC reliable host detection, it actually limited the workunits given to hosts who were trying to crunch both CPU and GPU applications. The server would, seemingly at random, choose one or the other to compute on but not both. This was causing issues for a lot of users so we had to turn off the option. I am going to look into implementing a custom version of reliable host detection in the near future to help reduce invalid workunits caused by too many hosts returning nothing but errors. It might be a while before I get it up and running though. Jake |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, I finally figured out how to start banning hosts with really high error rates. I am going to be spending some time working with Sidd and other MilkyWay@home developers to determine a good way to decide who deserves to be banned and how long they should be banned for. Jake |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Hey Everyone, To me sending them one to 5 wu's per day, that they then trash, would be sufficient to establish that the host is still unreliable. If it returns a valid wu then send it a handful more wu's and if they return those as valid then open the pipe and welcome them back. This does mean being pretty sure that the host IS NOT reliable in the first place though and that criteria is up to you guys. To me some are pretty obvious, others are more iffy. I don't think any of us want some pc banned that's going thru a 'rough patch' or crashed and the user is working on it, but none of us want to go back to having some pc be our wingman that hasn't returned a valid wu in 6 months or more. I personally wouldn't mind having a few wu's like that, but pages of them would not be a good thing. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
That's a good idea. I agree that we want to have some automated way to retest hosts who have been sending errors. Additionally, I have added a way for people to see if their hosts are currently suspended. That way, if people are actively attempting to fix their hosts, we can work with them to unsuspend the host while they work. Jake |
Send message Joined: 2 Mar 18 Posts: 9 Credit: 457,043,383 RAC: 0 |
Don't forget a beginner can't use the forum due to credit limit, if his computer starts with errors it's impossible to get in contact with you; I had this problem last week. |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,008,062,758 RAC: 3,336 |
All or nearly all errors on these hosts. https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=512556 https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=606991 https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=191911 https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=737902 https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=718669 |
©2024 Astroinformatics Group