Message boards :
Number crunching :
Hosts with only invalid results
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Mar 08 Posts: 108 Credit: 2,607,924,860 RAC: 0 |
Do the BOINC servers throttle the flow of new work to hosts that produce nothing but validate errors? If not, should they? Not a big deal, but it's a bit annoying to see good results invalidated by a few wingmen that seem to do nothing else. Example: https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=606779 |
Send message Joined: 14 Dec 09 Posts: 161 Credit: 589,318,064 RAC: 0 |
It keeps getting new wus and there seems to be no check on this rig. Strange that there's no official response yet. |
Send message Joined: 20 Mar 08 Posts: 108 Credit: 2,607,924,860 RAC: 0 |
CPDN has (for more than a decade) a dynamic value per computer - "Maximum daily WU quota per CPU". To me, this indicates that the throttling mechanism might already be present in the BOINC server software. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
I've returned to the project after being away for awhile so I'm not sure how much of a problem this has been recently. Regardless, I thought I'd give this thread a bump... I've only returned a few results so far, but all of my inconclusives for the MilkyWay@Home app (not the N-body) include a wingman whose host (ID 643627) has been returning nothing but errors for at least the last 4 days. Can't something be done to halt the sending of new work to unreliable hosts? |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,392,887 RAC: 53,916 |
Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids? Over at Seti, we have an ongoing thread devoted to hosts that produce nothing but garbage. BOINC has a mechanism to punish hosts that produce errors but does not punish invalids. It also depends on each project whether to implement the default BOINC mechanism. We also know the mechanism is mostly broken and doesn't work well. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I've returned to the project after being away for awhile so I'm not sure how much of a problem this has been recently. Regardless, I thought I'd give this thread a bump... The problem seems to be a lack of people power at the project level, there's more than one Admin that has quit crunching, I don't know if that's because they are 'former' Admins or because of some other reason, but when the Admins stop contributing to their own project that's a problem to me. It means they have less and less contact with the daily goings on OUR level and depend more and more on messages that most of us have no clue who or how to contact. In a thread last week one of the Senior Admins asked for a list of "unsent" workunits on our pc's, if I can see them for my account can't they run a small script and get the list for everyone themselves? Was he taking the easy way out and just trying to be helpful, I don't know but he never replied back in the thread. My "unsent" workunits were sent out to other computers by the next day though. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids? In my case, I did but got no response. The host continues to generate nothing but errors. At least it's being limited to 80 tasks/day and is contacting the project only once every 24-hrs. So, it seems like some kind of restriction is being imposed. The user has another host and it's returning valid results. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids? 80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits. Right. But, there seems to be a 24-hour backoff imposed. It's returning all 80 crashed tasks at once, getting 80 new ones, and repeating that cycle 24 hours later. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
On Workunit 1563422500, all three of my wingmen returned computation errors which caused my result to be marked invalid. Each of their respective hosts (629667, 551062, and 740662) is returning errors, almost entirely. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
|
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,392,887 RAC: 53,916 |
80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits. That is the BOINC mechanism in play and is working as designed with the 24 hour backoff. I have seen many comments in various projects that bad hosts need to be expunged by the admins. But I have not once ever seen that action imposed. Seems that all admins are afraid of the publicity/recrimination of banning someone. Moderators ban users at will and frequently for violating posting policy. Why would banning an incorrectly configured host be any different? |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits. One thing other Projects do is to reduce the number of workunits a bad pc can get though and MW does NOT do that currently. For instance a pc returns 80 bad workunits today it only gets 5 workunits tomorrow, it returns them as invalid and it only gets one workunit per day until it starts returning valid workunits again. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, I just tried turning on some options to fix this problem. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4227 Let me know if there are any issues there. Jake |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,392,887 RAC: 53,916 |
That's great Jake. Should help immensely with the turnaround time and the inconclusives. It might also cause users with bad performing hosts to ask for help or inquire why something seems to have changed with their account. But likely not any increase in problems being asked for help on as the misbehaving hosts likely are never looked at in the first place. |
Send message Joined: 11 Jul 08 Posts: 4 Credit: 1,357,960 RAC: 0 |
I haven't received any PMs on this issue, but I have noticed a change in my account since January 4. I'm sure I have not made any configuration changes in 2018, so if someone could help me debug this, I'd be most grateful. Thanks. |
Send message Joined: 11 Jul 08 Posts: 4 Credit: 1,357,960 RAC: 0 |
Replying to my own post, I resolved my problem with a lack of WUs by detaching from the project, then attaching. Not very elegant, but effective in this case. Apologies for the noise. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Replying to my own post, I resolved my problem with a lack of WUs by detaching from the project, then attaching. Not very elegant, but effective in this case. Apologies for the noise. We all try different things, some work and some don't, your way worked and you are back to crunching again!! WOO HOO!!! |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Hey Everyone, Maybe you can look at this host and see why your fix isn't working on it? http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=763378 798 workunits NOT ONE is valid yet you guys keep sending them!!! Maybe THIS is why it takes soooo fricking long to get credits here, how many of these hosts are there out there that are just clogging up the process? |
Send message Joined: 15 Aug 14 Posts: 6 Credit: 1,664,288 RAC: 0 |
I think the problem is the high default setting (10000) of the "max tasks per day" parameter in the application details of a host. The value goes up with a valid return and down when a result is invalid. If the default value would start lower (perhaps 100) a bad host would block itself real quick. |
©2024 Astroinformatics Group