Welcome to MilkyWay@home

Hosts with only invalid results

Message boards : Number crunching : Hosts with only invalid results
Message board moderation

To post messages, you must log in.

AuthorMessage
Brickhead
Avatar

Send message
Joined: 20 Mar 08
Posts: 108
Credit: 2,607,924,860
RAC: 0
Message 66204 - Posted: 18 Feb 2017, 15:06:56 UTC

Do the BOINC servers throttle the flow of new work to hosts that produce nothing but validate errors? If not, should they?

Not a big deal, but it's a bit annoying to see good results invalidated by a few wingmen that seem to do nothing else.

Example:
https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=606779
ID: 66204 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Arif Mert Kapicioglu

Send message
Joined: 14 Dec 09
Posts: 161
Credit: 589,318,064
RAC: 0
Message 66209 - Posted: 20 Feb 2017, 18:55:09 UTC

It keeps getting new wus and there seems to be no check on this rig. Strange that there's no official response yet.
ID: 66209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brickhead
Avatar

Send message
Joined: 20 Mar 08
Posts: 108
Credit: 2,607,924,860
RAC: 0
Message 66210 - Posted: 20 Feb 2017, 21:08:02 UTC

CPDN has (for more than a decade) a dynamic value per computer - "Maximum daily WU quota per CPU". To me, this indicates that the throttling mechanism might already be present in the BOINC server software.
ID: 66210 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 16 Jun 08
Posts: 93
Credit: 366,882,323
RAC: 0
Message 66942 - Posted: 8 Jan 2018, 2:41:02 UTC
Last modified: 8 Jan 2018, 2:42:23 UTC

I've returned to the project after being away for awhile so I'm not sure how much of a problem this has been recently. Regardless, I thought I'd give this thread a bump...

I've only returned a few results so far, but all of my inconclusives for the MilkyWay@Home app (not the N-body) include a wingman whose host (ID 643627) has been returning nothing but errors for at least the last 4 days.

Can't something be done to halt the sending of new work to unreliable hosts?
ID: 66942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 712
Credit: 553,294,280
RAC: 55,426
Message 66971 - Posted: 17 Jan 2018, 18:58:04 UTC

Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids?

Over at Seti, we have an ongoing thread devoted to hosts that produce nothing but garbage. BOINC has a mechanism to punish hosts that produce errors but does not punish invalids. It also depends on each project whether to implement the default BOINC mechanism.

We also know the mechanism is mostly broken and doesn't work well.
ID: 66971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 1
Message 66973 - Posted: 18 Jan 2018, 11:46:32 UTC - in response to Message 66942.  

I've returned to the project after being away for awhile so I'm not sure how much of a problem this has been recently. Regardless, I thought I'd give this thread a bump...

I've only returned a few results so far, but all of my inconclusives for the MilkyWay@Home app (not the N-body) include a wingman whose host (ID 643627) has been returning nothing but errors for at least the last 4 days.

Can't something be done to halt the sending of new work to unreliable hosts?


The problem seems to be a lack of people power at the project level, there's more than one Admin that has quit crunching, I don't know if that's because they are 'former' Admins or because of some other reason, but when the Admins stop contributing to their own project that's a problem to me. It means they have less and less contact with the daily goings on OUR level and depend more and more on messages that most of us have no clue who or how to contact.

In a thread last week one of the Senior Admins asked for a list of "unsent" workunits on our pc's, if I can see them for my account can't they run a small script and get the list for everyone themselves? Was he taking the easy way out and just trying to be helpful, I don't know but he never replied back in the thread. My "unsent" workunits were sent out to other computers by the next day though.
ID: 66973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 16 Jun 08
Posts: 93
Credit: 366,882,323
RAC: 0
Message 66974 - Posted: 18 Jan 2018, 15:39:23 UTC - in response to Message 66971.  

Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids?

In my case, I did but got no response.

The host continues to generate nothing but errors. At least it's being limited to 80 tasks/day and is contacting the project only once every 24-hrs. So, it seems like some kind of restriction is being imposed. The user has another host and it's returning valid results.
ID: 66974 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 1
Message 66978 - Posted: 19 Jan 2018, 11:44:59 UTC - in response to Message 66974.  

Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids?

In my case, I did but got no response.

The host continues to generate nothing but errors. At least it's being limited to 80 tasks/day and is contacting the project only once every 24-hrs. So, it seems like some kind of restriction is being imposed. The user has another host and it's returning valid results.


80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits.
ID: 66978 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 16 Jun 08
Posts: 93
Credit: 366,882,323
RAC: 0
Message 66979 - Posted: 19 Jan 2018, 14:07:17 UTC - in response to Message 66978.  

80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits.

Right. But, there seems to be a 24-hour backoff imposed. It's returning all 80 crashed tasks at once, getting 80 new ones, and repeating that cycle 24 hours later.
ID: 66979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 16 Jun 08
Posts: 93
Credit: 366,882,323
RAC: 0
Message 66983 - Posted: 21 Jan 2018, 15:40:13 UTC
Last modified: 21 Jan 2018, 15:40:32 UTC

On Workunit 1563422500, all three of my wingmen returned computation errors which caused my result to be marked invalid. Each of their respective hosts (629667, 551062, and 740662) is returning errors, almost entirely.
ID: 66983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile ritterm
Avatar

Send message
Joined: 16 Jun 08
Posts: 93
Credit: 366,882,323
RAC: 0
Message 66984 - Posted: 21 Jan 2018, 16:13:38 UTC

I've sent a PM to admins Sidd and Jake Weiss asking them to review this thread.
ID: 66984 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 712
Credit: 553,294,280
RAC: 55,426
Message 66986 - Posted: 21 Jan 2018, 20:21:58 UTC - in response to Message 66979.  

80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits.

Right. But, there seems to be a 24-hour backoff imposed. It's returning all 80 crashed tasks at once, getting 80 new ones, and repeating that cycle 24 hours later.

That is the BOINC mechanism in play and is working as designed with the 24 hour backoff.

I have seen many comments in various projects that bad hosts need to be expunged by the admins. But I have not once ever seen that action imposed. Seems that all admins are afraid of the publicity/recrimination of banning someone. Moderators ban users at will and frequently for violating posting policy. Why would banning an incorrectly configured host be any different?
ID: 66986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 1
Message 66987 - Posted: 21 Jan 2018, 22:58:07 UTC - in response to Message 66986.  

80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits.

Right. But, there seems to be a 24-hour backoff imposed. It's returning all 80 crashed tasks at once, getting 80 new ones, and repeating that cycle 24 hours later.

That is the BOINC mechanism in play and is working as designed with the 24 hour backoff.

I have seen many comments in various projects that bad hosts need to be expunged by the admins. But I have not once ever seen that action imposed. Seems that all admins are afraid of the publicity/recrimination of banning someone. Moderators ban users at will and frequently for violating posting policy. Why would banning an incorrectly configured host be any different?


One thing other Projects do is to reduce the number of workunits a bad pc can get though and MW does NOT do that currently. For instance a pc returns 80 bad workunits today it only gets 5 workunits tomorrow, it returns them as invalid and it only gets one workunit per day until it starts returning valid workunits again.
ID: 66987 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66989 - Posted: 22 Jan 2018, 15:39:11 UTC

Hey Everyone,

I just tried turning on some options to fix this problem.

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4227

Let me know if there are any issues there.

Jake
ID: 66989 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 712
Credit: 553,294,280
RAC: 55,426
Message 66991 - Posted: 22 Jan 2018, 17:36:04 UTC - in response to Message 66989.  

That's great Jake. Should help immensely with the turnaround time and the inconclusives. It might also cause users with bad performing hosts to ask for help or inquire why something seems to have changed with their account.

But likely not any increase in problems being asked for help on as the misbehaving hosts likely are never looked at in the first place.
ID: 66991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Koppany

Send message
Joined: 11 Jul 08
Posts: 4
Credit: 1,357,960
RAC: 0
Message 66995 - Posted: 23 Jan 2018, 6:07:35 UTC - in response to Message 66991.  

I haven't received any PMs on this issue, but I have noticed a change in my account since January 4. I'm sure I have not made any configuration changes in 2018, so if someone could help me debug this, I'd be most grateful. Thanks.
ID: 66995 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Koppany

Send message
Joined: 11 Jul 08
Posts: 4
Credit: 1,357,960
RAC: 0
Message 66996 - Posted: 23 Jan 2018, 7:07:43 UTC - in response to Message 66995.  

Replying to my own post, I resolved my problem with a lack of WUs by detaching from the project, then attaching. Not very elegant, but effective in this case. Apologies for the noise.
ID: 66996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 1
Message 66998 - Posted: 23 Jan 2018, 11:10:05 UTC - in response to Message 66996.  

Replying to my own post, I resolved my problem with a lack of WUs by detaching from the project, then attaching. Not very elegant, but effective in this case. Apologies for the noise.


We all try different things, some work and some don't, your way worked and you are back to crunching again!! WOO HOO!!!
ID: 66998 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 1
Message 67015 - Posted: 30 Jan 2018, 13:00:15 UTC - in response to Message 66989.  

Hey Everyone,

I just tried turning on some options to fix this problem.

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4227

Let me know if there are any issues there.

Jake


Maybe you can look at this host and see why your fix isn't working on it?

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=763378

798 workunits NOT ONE is valid yet you guys keep sending them!!!

Maybe THIS is why it takes soooo fricking long to get credits here, how many of these hosts are there out there that are just clogging up the process?
ID: 67015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Henk Haneveld

Send message
Joined: 15 Aug 14
Posts: 6
Credit: 1,664,288
RAC: 6
Message 67016 - Posted: 31 Jan 2018, 9:21:20 UTC

I think the problem is the high default setting (10000) of the "max tasks per day" parameter in the application details of a host.

The value goes up with a valid return and down when a result is invalid.

If the default value would start lower (perhaps 100) a bad host would block itself real quick.
ID: 67016 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Hosts with only invalid results

©2024 Astroinformatics Group