Hosts with only invalid results

Author	Message
Brickhead Send message Joined: 20 Mar 08 Posts: 108 Credit: 2,607,924,860 RAC: 0	Message 66204 - Posted: 18 Feb 2017, 15:06:56 UTC Do the BOINC servers throttle the flow of new work to hosts that produce nothing but validate errors? If not, should they? Not a big deal, but it's a bit annoying to see good results invalidated by a few wingmen that seem to do nothing else. Example: https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=606779 ID: 66204 · Rating: 0 · rate: / Reply Quote

Arif Mert Kapicioglu Send message Joined: 14 Dec 09 Posts: 161 Credit: 589,318,064 RAC: 0	Message 66209 - Posted: 20 Feb 2017, 18:55:09 UTC It keeps getting new wus and there seems to be no check on this rig. Strange that there's no official response yet. ID: 66209 · Rating: 0 · rate: / Reply Quote

Brickhead Send message Joined: 20 Mar 08 Posts: 108 Credit: 2,607,924,860 RAC: 0	Message 66210 - Posted: 20 Feb 2017, 21:08:02 UTC CPDN has (for more than a decade) a dynamic value per computer - "Maximum daily WU quota per CPU". To me, this indicates that the throttling mechanism might already be present in the BOINC server software. ID: 66210 · Rating: 0 · rate: / Reply Quote

ritterm Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0	Message 66942 - Posted: 8 Jan 2018, 2:41:02 UTC Last modified: 8 Jan 2018, 2:42:23 UTC I've returned to the project after being away for awhile so I'm not sure how much of a problem this has been recently. Regardless, I thought I'd give this thread a bump... I've only returned a few results so far, but all of my inconclusives for the MilkyWay@Home app (not the N-body) include a wingman whose host (ID 643627) has been returning nothing but errors for at least the last 4 days. Can't something be done to halt the sending of new work to unreliable hosts? ID: 66942 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 708 Credit: 544,031,872 RAC: 125,580	Message 66971 - Posted: 17 Jan 2018, 18:58:04 UTC Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids? Over at Seti, we have an ongoing thread devoted to hosts that produce nothing but garbage. BOINC has a mechanism to punish hosts that produce errors but does not punish invalids. It also depends on each project whether to implement the default BOINC mechanism. We also know the mechanism is mostly broken and doesn't work well. ID: 66971 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3321 Credit: 520,600,015 RAC: 30,536	Message 66973 - Posted: 18 Jan 2018, 11:46:32 UTC - in response to Message 66942. I've returned to the project after being away for awhile so I'm not sure how much of a problem this has been recently. Regardless, I thought I'd give this thread a bump... I've only returned a few results so far, but all of my inconclusives for the MilkyWay@Home app (not the N-body) include a wingman whose host (ID 643627) has been returning nothing but errors for at least the last 4 days. Can't something be done to halt the sending of new work to unreliable hosts? The problem seems to be a lack of people power at the project level, there's more than one Admin that has quit crunching, I don't know if that's because they are 'former' Admins or because of some other reason, but when the Admins stop contributing to their own project that's a problem to me. It means they have less and less contact with the daily goings on OUR level and depend more and more on messages that most of us have no clue who or how to contact. In a thread last week one of the Senior Admins asked for a list of "unsent" workunits on our pc's, if I can see them for my account can't they run a small script and get the list for everyone themselves? Was he taking the easy way out and just trying to be helpful, I don't know but he never replied back in the thread. My "unsent" workunits were sent out to other computers by the next day though. ID: 66973 · Rating: 0 · rate: / Reply Quote

ritterm Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0	Message 66974 - Posted: 18 Jan 2018, 15:39:23 UTC - in response to Message 66971. Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids? In my case, I did but got no response. The host continues to generate nothing but errors. At least it's being limited to 80 tasks/day and is contacting the project only once every 24-hrs. So, it seems like some kind of restriction is being imposed. The user has another host and it's returning valid results. ID: 66974 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3321 Credit: 520,600,015 RAC: 30,536	Message 66978 - Posted: 19 Jan 2018, 11:44:59 UTC - in response to Message 66974. Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids? In my case, I did but got no response. The host continues to generate nothing but errors. At least it's being limited to 80 tasks/day and is contacting the project only once every 24-hrs. So, it seems like some kind of restriction is being imposed. The user has another host and it's returning valid results. 80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits. ID: 66978 · Rating: 0 · rate: / Reply Quote

ritterm Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0	Message 66979 - Posted: 19 Jan 2018, 14:07:17 UTC - in response to Message 66978. 80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits. Right. But, there seems to be a 24-hour backoff imposed. It's returning all 80 crashed tasks at once, getting 80 new ones, and repeating that cycle 24 hours later. ID: 66979 · Rating: 0 · rate: / Reply Quote

ritterm Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0	Message 66983 - Posted: 21 Jan 2018, 15:40:13 UTC Last modified: 21 Jan 2018, 15:40:32 UTC On Workunit 1563422500, all three of my wingmen returned computation errors which caused my result to be marked invalid. Each of their respective hosts (629667, 551062, and 740662) is returning errors, almost entirely. ID: 66983 · Rating: 0 · rate: / Reply Quote

ritterm Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0	Message 66984 - Posted: 21 Jan 2018, 16:13:38 UTC I've sent a PM to admins Sidd and Jake Weiss asking them to review this thread. ID: 66984 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 708 Credit: 544,031,872 RAC: 125,580	Message 66986 - Posted: 21 Jan 2018, 20:21:58 UTC - in response to Message 66979. 80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits. Right. But, there seems to be a 24-hour backoff imposed. It's returning all 80 crashed tasks at once, getting 80 new ones, and repeating that cycle 24 hours later. That is the BOINC mechanism in play and is working as designed with the 24 hour backoff. I have seen many comments in various projects that bad hosts need to be expunged by the admins. But I have not once ever seen that action imposed. Seems that all admins are afraid of the publicity/recrimination of banning someone. Moderators ban users at will and frequently for violating posting policy. Why would banning an incorrectly configured host be any different? ID: 66986 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3321 Credit: 520,600,015 RAC: 30,536	Message 66987 - Posted: 21 Jan 2018, 22:58:07 UTC - in response to Message 66986. 80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits. Right. But, there seems to be a 24-hour backoff imposed. It's returning all 80 crashed tasks at once, getting 80 new ones, and repeating that cycle 24 hours later. That is the BOINC mechanism in play and is working as designed with the 24 hour backoff. I have seen many comments in various projects that bad hosts need to be expunged by the admins. But I have not once ever seen that action imposed. Seems that all admins are afraid of the publicity/recrimination of banning someone. Moderators ban users at will and frequently for violating posting policy. Why would banning an incorrectly configured host be any different? One thing other Projects do is to reduce the number of workunits a bad pc can get though and MW does NOT do that currently. For instance a pc returns 80 bad workunits today it only gets 5 workunits tomorrow, it returns them as invalid and it only gets one workunit per day until it starts returning valid workunits again. ID: 66987 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 66989 - Posted: 22 Jan 2018, 15:39:11 UTC Hey Everyone, I just tried turning on some options to fix this problem. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4227 Let me know if there are any issues there. Jake ID: 66989 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 708 Credit: 544,031,872 RAC: 125,580	Message 66991 - Posted: 22 Jan 2018, 17:36:04 UTC - in response to Message 66989. That's great Jake. Should help immensely with the turnaround time and the inconclusives. It might also cause users with bad performing hosts to ask for help or inquire why something seems to have changed with their account. But likely not any increase in problems being asked for help on as the misbehaving hosts likely are never looked at in the first place. ID: 66991 · Rating: 0 · rate: / Reply Quote

Koppany Send message Joined: 11 Jul 08 Posts: 4 Credit: 1,357,818 RAC: 0	Message 66995 - Posted: 23 Jan 2018, 6:07:35 UTC - in response to Message 66991. I haven't received any PMs on this issue, but I have noticed a change in my account since January 4. I'm sure I have not made any configuration changes in 2018, so if someone could help me debug this, I'd be most grateful. Thanks. ID: 66995 · Rating: 0 · rate: / Reply Quote

Koppany Send message Joined: 11 Jul 08 Posts: 4 Credit: 1,357,818 RAC: 0	Message 66996 - Posted: 23 Jan 2018, 7:07:43 UTC - in response to Message 66995. Replying to my own post, I resolved my problem with a lack of WUs by detaching from the project, then attaching. Not very elegant, but effective in this case. Apologies for the noise. ID: 66996 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3321 Credit: 520,600,015 RAC: 30,536	Message 66998 - Posted: 23 Jan 2018, 11:10:05 UTC - in response to Message 66996. Replying to my own post, I resolved my problem with a lack of WUs by detaching from the project, then attaching. Not very elegant, but effective in this case. Apologies for the noise. We all try different things, some work and some don't, your way worked and you are back to crunching again!! WOO HOO!!! ID: 66998 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3321 Credit: 520,600,015 RAC: 30,536	Message 67015 - Posted: 30 Jan 2018, 13:00:15 UTC - in response to Message 66989. Hey Everyone, I just tried turning on some options to fix this problem. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4227 Let me know if there are any issues there. Jake Maybe you can look at this host and see why your fix isn't working on it? http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=763378 798 workunits NOT ONE is valid yet you guys keep sending them!!! Maybe THIS is why it takes soooo fricking long to get credits here, how many of these hosts are there out there that are just clogging up the process? ID: 67015 · Rating: 0 · rate: / Reply Quote

Henk Haneveld Send message Joined: 15 Aug 14 Posts: 6 Credit: 1,623,008 RAC: 296	Message 67016 - Posted: 31 Jan 2018, 9:21:20 UTC I think the problem is the high default setting (10000) of the "max tasks per day" parameter in the application details of a host. The value goes up with a valid return and down when a result is invalid. If the default value would start lower (perhaps 100) a bad host would block itself real quick. ID: 67016 · Rating: 0 · rate: / Reply Quote