Message boards :
Number crunching :
Some "strange" crunchers
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0 |
I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated? PS: if moderators have a better name for this thread, please change it. Greetings from, TJ |
Send message Joined: 13 Mar 08 Posts: 804 Credit: 26,380,161 RAC: 0 |
I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated? Please link some examples. Thanks |
Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0 |
I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated? Hello Blurf, Here are some links to computers, there are many more... http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=557095 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=555648 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=590516 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=285657 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=453518 http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=588454 If you then click on the tasks you will see. The third one is outrageous! Greetings from, TJ |
Send message Joined: 16 Aug 12 Posts: 16 Credit: 33,573,431 RAC: 0 |
|
Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0 |
I don't know, "Aborted bu user" means a manual action from the user or not? Greetings from, TJ |
Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,562,776 RAC: 0 |
Ahh, I think I know what it is, I think they load up weeks or months worth of tasks due to not having proper preferences. I'm speaking specifically about "minimum work buffer". Is 0.00 equal infinity on that? Is there a hard cap on lets say 7 days or can un-knowing people put 999999? I thought there was a limit of 40 WU per system. |
Send message Joined: 24 Dec 12 Posts: 6 Credit: 2,720,018,100 RAC: 0 |
There is a limit of how may tasks can be downloaded. Whether you set your cache to 1 day, 5 days or 50 days, you are maxed out. Having a cache setting of zero ("0") is just that, having no other tasks pending waiting to be crunched or just have a task(s) ready to go since another task is almost completed. Some possible explanations are, but not all: 1) Potential spammer trying to build up enough credits or just trying to make the account look "real"; 2) Someone downloaded the tasks and should not have and was forced or had to "abort" the tasks; or 3) Someone trying to disrupt the projects operations. This appears to be systemic and needs to be investigated. |
Send message Joined: 16 Mar 10 Posts: 210 Credit: 106,030,150 RAC: 24,413 |
As I've been irritated by this once or twice when the job that got thrown away by inability to validate happened to have one of my [long-running] CPU results included, I keep looking at the errors... There's a pattern to quite a lot of the "huge numbers of jobs" machines... The host machines are Windows 7 with ATI graphics and are failing out with "MISSING COPROCESSOR" errors. The failing jobs are (typically) returned a couple of minutes after receipt, and the latest versions of BOINC will quite happily go looking for more work at once. The end result is that if the server doesn't realize that the machine is sending back huge numbers of failed tasks then any "40 tasks" limits will be futile (as it's sending them back almost as soon as it gets them!) So, in perhaps a variant of RaymondFO's category 2 in his message 62488, there's an explanation for aborted tasks - misconfiguration, (accidental) mismanagement or mis-allocation of resources on the host machine. (Is some other Windows application denying BOINC applications access to the GPU?) (And I know how easily the mismanagement one can happen in Linux using the packaged BOINC - if I forgot to suspend all GPU-related projects before a reboot, I'd end up doing the same as these Windows machines... - however, I thought that shouldn't happen on Windows boxes?) Another one I see quite often is Apple machines failing jobs because it says the GPU can't do double precision. Whether this is a driver error or, perhaps, they shouldn't be being sent the jobs in the first place (!) is the question in those cases. Hopefully this can get sorted out. Especially, it would be nice to have something done about those "MISSING COPROCESSOR" jobs to stop them behaving like errors! As a [retired] programmer, I can think of solutions, but I suspect that the fairly prescriptive nature of BOINC server software may well prevent some of the solutions we might think obvious :-( Al. |
Send message Joined: 8 May 09 Posts: 3320 Credit: 520,465,122 RAC: 25,983 |
As I've been irritated by this once or twice when the job that got thrown away by inability to validate happened to have one of my [long-running] CPU results included, I keep looking at the errors... On possible explanation is the 'default' Boinc version still being used, the newer 7.3.? versions, and above, support multiple people using the pc AND the gpu still working no matter who is logged on, while the current 'default' version does not. So while Boinc would keep crunching the "missing coprocessor" note would continue to come up until the original person was active again. |
Send message Joined: 16 Aug 12 Posts: 16 Credit: 33,573,431 RAC: 0 |
|
Send message Joined: 8 May 09 Posts: 3320 Credit: 520,465,122 RAC: 25,983 |
So basically we need to remove these computers from the WU queue without running the risk of removing too many computers that had errors that were fixed quite promptly (like my 6950 spitting out a few dozen errors for a few days). It used to be that the Server side of Boinc could reduce the units each pc receives if they return a bad unit. For most people that makes no difference, but for some it would mean the end of getting MW units as all they do is return junk anyway. The Server then automatically would adjust upwards the units you could get as you return valid units, but at twice the rate. Meaning your recovery form a few bad units was pretty quick. I don't see anything reflected under my account anymore about how many units per day I can receive, so that may not be possible anymore. |
Send message Joined: 13 Mar 08 Posts: 804 Credit: 26,380,161 RAC: 0 |
Jake Weiss says he's made some changes that should stop a lot of the Aborted WU's mentioned above in this thread |
Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0 |
Thanks Blurf for letting us know. This was the answer I was looking for. Greetings from, TJ |
Send message Joined: 2 Jul 14 Posts: 15 Credit: 20,991,384 RAC: 4 |
I know this is an old thread, but has anyone else noticed that all of the aborted tasks are for a GPU? All of the CPU tasks complete, but each and every GPU task, whether Nvidia or ATI, have been aborted by the user. Maybe these users don't want GPU tasks for this project and don't know how to get rid of them? |
©2024 Astroinformatics Group