Welcome to MilkyWay@home

Some "strange" crunchers

Message boards : Number crunching : Some "strange" crunchers
Message board moderation

To post messages, you must log in.

AuthorMessage
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 62468 - Posted: 4 Oct 2014, 22:41:17 UTC
Last modified: 4 Oct 2014, 22:42:06 UTC

I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated?

PS: if moderators have a better name for this thread, please change it.
Greetings from,
TJ
ID: 62468 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Blurf
Volunteer moderator
Project administrator

Send message
Joined: 13 Mar 08
Posts: 804
Credit: 26,380,161
RAC: 0
Message 62469 - Posted: 5 Oct 2014, 0:48:03 UTC - in response to Message 62468.  

I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated?

PS: if moderators have a better name for this thread, please change it.


Please link some examples. Thanks

ID: 62469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 62475 - Posted: 5 Oct 2014, 12:13:28 UTC - in response to Message 62469.  

I see several Completed, can't validate messages at my results and while checking there I see at least two wing(wo)men with about 4000 results all Aborted by user. This is not helping the project. Can it be investigated?

PS: if moderators have a better name for this thread, please change it.


Please link some examples. Thanks

Hello Blurf,

Here are some links to computers, there are many more...

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=557095

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=555648

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=590516

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=285657

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=453518

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=588454

If you then click on the tasks you will see. The third one is outrageous!
Greetings from,
TJ
ID: 62475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ronnyhugo

Send message
Joined: 16 Aug 12
Posts: 16
Credit: 33,573,431
RAC: 0
Message 62482 - Posted: 5 Oct 2014, 20:43:11 UTC

Ahh, I think I know what it is, I think they load up weeks or months worth of tasks due to not having proper preferences. I'm speaking specifically about "minimum work buffer". Is 0.00 equal infinity on that? Is there a hard cap on lets say 7 days or can un-knowing people put 999999?
ID: 62482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 62485 - Posted: 5 Oct 2014, 22:14:26 UTC - in response to Message 62482.  

I don't know, "Aborted bu user" means a manual action from the user or not?
Greetings from,
TJ
ID: 62485 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,562,776
RAC: 0
Message 62487 - Posted: 5 Oct 2014, 22:17:29 UTC - in response to Message 62482.  

Ahh, I think I know what it is, I think they load up weeks or months worth of tasks due to not having proper preferences. I'm speaking specifically about "minimum work buffer". Is 0.00 equal infinity on that? Is there a hard cap on lets say 7 days or can un-knowing people put 999999?

I thought there was a limit of 40 WU per system.
ID: 62487 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RaymondFO*

Send message
Joined: 24 Dec 12
Posts: 6
Credit: 2,720,018,100
RAC: 0
Message 62488 - Posted: 6 Oct 2014, 1:56:38 UTC
Last modified: 6 Oct 2014, 1:56:52 UTC

There is a limit of how may tasks can be downloaded. Whether you set your cache to 1 day, 5 days or 50 days, you are maxed out. Having a cache setting of zero ("0") is just that, having no other tasks pending waiting to be crunched or just have a task(s) ready to go since another task is almost completed.

Some possible explanations are, but not all:

1) Potential spammer trying to build up enough credits or just trying to make the account look "real";

2) Someone downloaded the tasks and should not have and was forced or had to "abort" the tasks; or

3) Someone trying to disrupt the projects operations.


This appears to be systemic and needs to be investigated.
ID: 62488 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 210
Credit: 105,911,223
RAC: 25,353
Message 62490 - Posted: 6 Oct 2014, 5:26:13 UTC

As I've been irritated by this once or twice when the job that got thrown away by inability to validate happened to have one of my [long-running] CPU results included, I keep looking at the errors...

There's a pattern to quite a lot of the "huge numbers of jobs" machines... The host machines are Windows 7 with ATI graphics and are failing out with "MISSING COPROCESSOR" errors.

The failing jobs are (typically) returned a couple of minutes after receipt, and the latest versions of BOINC will quite happily go looking for more work at once. The end result is that if the server doesn't realize that the machine is sending back huge numbers of failed tasks then any "40 tasks" limits will be futile (as it's sending them back almost as soon as it gets them!)

So, in perhaps a variant of RaymondFO's category 2 in his message 62488, there's an explanation for aborted tasks - misconfiguration, (accidental) mismanagement or mis-allocation of resources on the host machine. (Is some other Windows application denying BOINC applications access to the GPU?)

(And I know how easily the mismanagement one can happen in Linux using the packaged BOINC - if I forgot to suspend all GPU-related projects before a reboot, I'd end up doing the same as these Windows machines... - however, I thought that shouldn't happen on Windows boxes?)

Another one I see quite often is Apple machines failing jobs because it says the GPU can't do double precision. Whether this is a driver error or, perhaps, they shouldn't be being sent the jobs in the first place (!) is the question in those cases.

Hopefully this can get sorted out. Especially, it would be nice to have something done about those "MISSING COPROCESSOR" jobs to stop them behaving like errors! As a [retired] programmer, I can think of solutions, but I suspect that the fairly prescriptive nature of BOINC server software may well prevent some of the solutions we might think obvious :-(

Al.
ID: 62490 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3319
Credit: 520,280,404
RAC: 19,499
Message 62491 - Posted: 6 Oct 2014, 10:03:26 UTC - in response to Message 62490.  

As I've been irritated by this once or twice when the job that got thrown away by inability to validate happened to have one of my [long-running] CPU results included, I keep looking at the errors...

There's a pattern to quite a lot of the "huge numbers of jobs" machines... The host machines are Windows 7 with ATI graphics and are failing out with "MISSING COPROCESSOR" errors.

The failing jobs are (typically) returned a couple of minutes after receipt, and the latest versions of BOINC will quite happily go looking for more work at once. The end result is that if the server doesn't realize that the machine is sending back huge numbers of failed tasks then any "40 tasks" limits will be futile (as it's sending them back almost as soon as it gets them!)

So, in perhaps a variant of RaymondFO's category 2 in his message 62488, there's an explanation for aborted tasks - misconfiguration, (accidental) mismanagement or mis-allocation of resources on the host machine. (Is some other Windows application denying BOINC applications access to the GPU?)

(And I know how easily the mismanagement one can happen in Linux using the packaged BOINC - if I forgot to suspend all GPU-related projects before a reboot, I'd end up doing the same as these Windows machines... - however, I thought that shouldn't happen on Windows boxes?)

Another one I see quite often is Apple machines failing jobs because it says the GPU can't do double precision. Whether this is a driver error or, perhaps, they shouldn't be being sent the jobs in the first place (!) is the question in those cases.

Hopefully this can get sorted out. Especially, it would be nice to have something done about those "MISSING COPROCESSOR" jobs to stop them behaving like errors! As a [retired] programmer, I can think of solutions, but I suspect that the fairly prescriptive nature of BOINC server software may well prevent some of the solutions we might think obvious :-(

Al.


On possible explanation is the 'default' Boinc version still being used, the newer 7.3.? versions, and above, support multiple people using the pc AND the gpu still working no matter who is logged on, while the current 'default' version does not. So while Boinc would keep crunching the "missing coprocessor" note would continue to come up until the original person was active again.
ID: 62491 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ronnyhugo

Send message
Joined: 16 Aug 12
Posts: 16
Credit: 33,573,431
RAC: 0
Message 62493 - Posted: 6 Oct 2014, 11:09:45 UTC

So basically we need to remove these computers from the WU queue without running the risk of removing too many computers that had errors that were fixed quite promptly (like my 6950 spitting out a few dozen errors for a few days).
ID: 62493 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3319
Credit: 520,280,404
RAC: 19,499
Message 62502 - Posted: 7 Oct 2014, 10:59:22 UTC - in response to Message 62493.  

So basically we need to remove these computers from the WU queue without running the risk of removing too many computers that had errors that were fixed quite promptly (like my 6950 spitting out a few dozen errors for a few days).


It used to be that the Server side of Boinc could reduce the units each pc receives if they return a bad unit. For most people that makes no difference, but for some it would mean the end of getting MW units as all they do is return junk anyway. The Server then automatically would adjust upwards the units you could get as you return valid units, but at twice the rate. Meaning your recovery form a few bad units was pretty quick. I don't see anything reflected under my account anymore about how many units per day I can receive, so that may not be possible anymore.
ID: 62502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Blurf
Volunteer moderator
Project administrator

Send message
Joined: 13 Mar 08
Posts: 804
Credit: 26,380,161
RAC: 0
Message 62510 - Posted: 8 Oct 2014, 1:43:33 UTC

Jake Weiss says he's made some changes that should stop a lot of the Aborted WU's mentioned above in this thread

ID: 62510 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TJ

Send message
Joined: 12 Aug 09
Posts: 262
Credit: 92,631,041
RAC: 0
Message 62513 - Posted: 8 Oct 2014, 7:34:58 UTC - in response to Message 62510.  

Thanks Blurf for letting us know.
This was the answer I was looking for.
Greetings from,
TJ
ID: 62513 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile SuperSluether
Avatar

Send message
Joined: 2 Jul 14
Posts: 15
Credit: 20,991,384
RAC: 7
Message 62643 - Posted: 28 Oct 2014, 20:14:35 UTC - in response to Message 62513.  

I know this is an old thread, but has anyone else noticed that all of the aborted tasks are for a GPU? All of the CPU tasks complete, but each and every GPU task, whether Nvidia or ATI, have been aborted by the user. Maybe these users don't want GPU tasks for this project and don't know how to get rid of them?
ID: 62643 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Some "strange" crunchers

©2024 Astroinformatics Group