Some "strange" crunchers

Author	Message
TJ Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0	Message 62468 - Posted: 4 Oct 2014, 22:41:17 UTC Last modified: 4 Oct 2014, 22:42:06 UTC I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated? PS: if moderators have a better name for this thread, please change it. Greetings from, TJ ID: 62468 · Rating: 0 · rate: / Reply Quote

Blurf Volunteer moderator Project administrator Send message Joined: 13 Mar 08 Posts: 804 Credit: 26,380,161 RAC: 0	Message 62469 - Posted: 5 Oct 2014, 0:48:03 UTC - in response to Message 62468. I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated? PS: if moderators have a better name for this thread, please change it. Please link some examples. Thanks ID: 62469 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0	Message 62475 - Posted: 5 Oct 2014, 12:13:28 UTC - in response to Message 62469. I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated? PS: if moderators have a better name for this thread, please change it. Please link some examples. Thanks Hello Blurf, Here are some links to computers, there are many more... http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=557095 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=555648 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=590516 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=285657 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=453518 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=588454 If you then click on the tasks you will see. The third one is outrageous! Greetings from, TJ ID: 62475 · Rating: 0 · rate: / Reply Quote

ronnyhugo Send message Joined: 16 Aug 12 Posts: 16 Credit: 33,573,431 RAC: 0	Message 62482 - Posted: 5 Oct 2014, 20:43:11 UTC Ahh, I think I know what it is, I think they load up weeks or months worth of tasks due to not having proper preferences. I'm speaking specifically about "minimum work buffer". Is 0.00 equal infinity on that? Is there a hard cap on lets say 7 days or can un-knowing people put 999999? ID: 62482 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0	Message 62485 - Posted: 5 Oct 2014, 22:14:26 UTC - in response to Message 62482. I don't know, "Aborted bu user" means a manual action from the user or not? Greetings from, TJ ID: 62485 · Rating: 0 · rate: / Reply Quote

swiftmallard Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,703,674 RAC: 4	Message 62487 - Posted: 5 Oct 2014, 22:17:29 UTC - in response to Message 62482. Ahh, I think I know what it is, I think they load up weeks or months worth of tasks due to not having proper preferences. I'm speaking specifically about "minimum work buffer". Is 0.00 equal infinity on that? Is there a hard cap on lets say 7 days or can un-knowing people put 999999? I thought there was a limit of 40 WU per system. ID: 62487 · Rating: 0 · rate: / Reply Quote

RaymondFO* Send message Joined: 24 Dec 12 Posts: 6 Credit: 2,720,018,100 RAC: 0	Message 62488 - Posted: 6 Oct 2014, 1:56:38 UTC Last modified: 6 Oct 2014, 1:56:52 UTC There is a limit of how may tasks can be downloaded. Whether you set your cache to 1 day, 5 days or 50 days, you are maxed out. Having a cache setting of zero ("0") is just that, having no other tasks pending waiting to be crunched or just have a task(s) ready to go since another task is almost completed. Some possible explanations are, but not all: 1) Potential spammer trying to build up enough credits or just trying to make the account look "real"; 2) Someone downloaded the tasks and should not have and was forced or had to "abort" the tasks; or 3) Someone trying to disrupt the projects operations. This appears to be systemic and needs to be investigated. ID: 62488 · Rating: 0 · rate: / Reply Quote

alanb1951 Send message Joined: 16 Mar 10 Posts: 218 Credit: 111,183,008 RAC: 598	Message 62490 - Posted: 6 Oct 2014, 5:26:13 UTC As I've been irritated by this once or twice when the job that got thrown away by inability to validate happened to have one of my [long-running] CPU results included, I keep looking at the errors... There's a pattern to quite a lot of the "huge numbers of jobs" machines... The host machines are Windows 7 with ATI graphics and are failing out with "MISSING COPROCESSOR" errors. The failing jobs are (typically) returned a couple of minutes after receipt, and the latest versions of BOINC will quite happily go looking for more work at once. The end result is that if the server doesn't realize that the machine is sending back huge numbers of failed tasks then any "40 tasks" limits will be futile (as it's sending them back almost as soon as it gets them!) So, in perhaps a variant of RaymondFO's category 2 in his message 62488, there's an explanation for aborted tasks - misconfiguration, (accidental) mismanagement or mis-allocation of resources on the host machine. (Is some other Windows application denying BOINC applications access to the GPU?) (And I know how easily the mismanagement one can happen in Linux using the packaged BOINC - if I forgot to suspend all GPU-related projects before a reboot, I'd end up doing the same as these Windows machines... - however, I thought that shouldn't happen on Windows boxes?) Another one I see quite often is Apple machines failing jobs because it says the GPU can't do double precision. Whether this is a driver error or, perhaps, they shouldn't be being sent the jobs in the first place (!) is the question in those cases. Hopefully this can get sorted out. Especially, it would be nice to have something done about those "MISSING COPROCESSOR" jobs to stop them behaving like errors! As a [retired] programmer, I can think of solutions, but I suspect that the fairly prescriptive nature of BOINC server software may well prevent some of the solutions we might think obvious :-( Al. ID: 62490 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,398,788 RAC: 48	Message 62491 - Posted: 6 Oct 2014, 10:03:26 UTC - in response to Message 62490. As I've been irritated by this once or twice when the job that got thrown away by inability to validate happened to have one of my [long-running] CPU results included, I keep looking at the errors... There's a pattern to quite a lot of the "huge numbers of jobs" machines... The host machines are Windows 7 with ATI graphics and are failing out with "MISSING COPROCESSOR" errors. The failing jobs are (typically) returned a couple of minutes after receipt, and the latest versions of BOINC will quite happily go looking for more work at once. The end result is that if the server doesn't realize that the machine is sending back huge numbers of failed tasks then any "40 tasks" limits will be futile (as it's sending them back almost as soon as it gets them!) So, in perhaps a variant of RaymondFO's category 2 in his message 62488, there's an explanation for aborted tasks - misconfiguration, (accidental) mismanagement or mis-allocation of resources on the host machine. (Is some other Windows application denying BOINC applications access to the GPU?) (And I know how easily the mismanagement one can happen in Linux using the packaged BOINC - if I forgot to suspend all GPU-related projects before a reboot, I'd end up doing the same as these Windows machines... - however, I thought that shouldn't happen on Windows boxes?) Another one I see quite often is Apple machines failing jobs because it says the GPU can't do double precision. Whether this is a driver error or, perhaps, they shouldn't be being sent the jobs in the first place (!) is the question in those cases. Hopefully this can get sorted out. Especially, it would be nice to have something done about those "MISSING COPROCESSOR" jobs to stop them behaving like errors! As a [retired] programmer, I can think of solutions, but I suspect that the fairly prescriptive nature of BOINC server software may well prevent some of the solutions we might think obvious :-( Al. On possible explanation is the 'default' Boinc version still being used, the newer 7.3.? versions, and above, support multiple people using the pc AND the gpu still working no matter who is logged on, while the current 'default' version does not. So while Boinc would keep crunching the "missing coprocessor" note would continue to come up until the original person was active again. ID: 62491 · Rating: 0 · rate: / Reply Quote

ronnyhugo Send message Joined: 16 Aug 12 Posts: 16 Credit: 33,573,431 RAC: 0	Message 62493 - Posted: 6 Oct 2014, 11:09:45 UTC So basically we need to remove these computers from the WU queue without running the risk of removing too many computers that had errors that were fixed quite promptly (like my 6950 spitting out a few dozen errors for a few days). ID: 62493 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,398,788 RAC: 48	Message 62502 - Posted: 7 Oct 2014, 10:59:22 UTC - in response to Message 62493. So basically we need to remove these computers from the WU queue without running the risk of removing too many computers that had errors that were fixed quite promptly (like my 6950 spitting out a few dozen errors for a few days). It used to be that the Server side of Boinc could reduce the units each pc receives if they return a bad unit. For most people that makes no difference, but for some it would mean the end of getting MW units as all they do is return junk anyway. The Server then automatically would adjust upwards the units you could get as you return valid units, but at twice the rate. Meaning your recovery form a few bad units was pretty quick. I don't see anything reflected under my account anymore about how many units per day I can receive, so that may not be possible anymore. ID: 62502 · Rating: 0 · rate: / Reply Quote

Blurf Volunteer moderator Project administrator Send message Joined: 13 Mar 08 Posts: 804 Credit: 26,380,161 RAC: 0	Message 62510 - Posted: 8 Oct 2014, 1:43:33 UTC Jake Weiss says he's made some changes that should stop a lot of the Aborted WU's mentioned above in this thread ID: 62510 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0	Message 62513 - Posted: 8 Oct 2014, 7:34:58 UTC - in response to Message 62510. Thanks Blurf for letting us know. This was the answer I was looking for. Greetings from, TJ ID: 62513 · Rating: 0 · rate: / Reply Quote

SuperSluether Send message Joined: 2 Jul 14 Posts: 15 Credit: 20,991,384 RAC: 0	Message 62643 - Posted: 28 Oct 2014, 20:14:35 UTC - in response to Message 62513. I know this is an old thread, but has anyone else noticed that all of the aborted tasks are for a GPU? All of the CPU tasks complete, but each and every GPU task, whether Nvidia or ATI, have been aborted by the user. Maybe these users don't want GPU tasks for this project and don't know how to get rid of them? ID: 62643 · Rating: 0 · rate: / Reply Quote