Users Auto-Aborting Work Units
log in

Advanced search

Message boards : News : Users Auto-Aborting Work Units

Previous · 1 · 2 · 3
Author Message
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 8,946,432
RAC: 109,741

Message 60126 - Posted: 7 Oct 2013, 18:22:58 UTC

Richard,

I am working on fixing the plan class to ignore GPUs without a certain minimum OpenCL requirement on the applications that need this. It was rarely a problem until MWSMF was released which can not run on CAL. Hopefully this will be resolved around the same time the small segfault is fixed in the MWSMF application.

Jake W

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2032
Credit: 179,194,509
RAC: 208,626

Message 60128 - Posted: 8 Oct 2013, 12:13:47 UTC - in response to Message 60124.

@mikey:

Just an idea, which I have not tested myself: have you tried to disable every application in the default settings? Than the server should not send anything on the first request and you could than assign the computer to the prefered venue before the 2nd request.


No I had not thought of that but will give it some thought. I have to go to the webpage anyway to assign it, so maybe it could work.
Thanks!

GCGZpfuy3zLYVrtDTUhmoccc7Kx4pGM6AH
Send message
Joined: 4 Oct 11
Posts: 1
Credit: 384,625
RAC: 0

Message 60132 - Posted: 9 Oct 2013, 10:47:30 UTC

In my case the (modified fit) WUs always abort themselves after 2 seconds runtime.
I did try to not get them, but could not find were. It's bad that this checkboxes are not shown if you are not "editing Settings".
Now I have found it and disabled the modified.

Using a 5800 APU

Toby Broom
Send message
Joined: 13 Jun 09
Posts: 12
Credit: 59,384,069
RAC: 0

Message 60453 - Posted: 25 Nov 2013, 1:19:37 UTC

Thanks for tips on the titan config files.

I got sick of my ATI card crashing my computer all the time!

Karl De Ruyck
Send message
Joined: 2 Sep 12
Posts: 5
Credit: 4,609,628
RAC: 27,547

Message 60527 - Posted: 6 Dec 2013, 0:03:41 UTC

Hi everyone, I hope this is appropriate to post here...

I was previously manually aborting modfit work units because when I let them run, they would result in a computation error. After some discussion in another thread, it was determined that my C library was outdated.

I am running Debian 7.2, which comes with eglibc 2.13, while the modfit units require 2.14.

To solve the issue, I switched repos to jessie, updated libc6 & dependents, then switched repos back to wheezy. This allowed me to upgrade to eglibc 2.17, without breaking anything (yet).

All my modfit work units are now completing successfully. :-)

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2032
Credit: 179,194,509
RAC: 208,626

Message 60529 - Posted: 6 Dec 2013, 12:56:39 UTC - in response to Message 60527.

Hi everyone, I hope this is appropriate to post here...

I was previously manually aborting modfit work units because when I let them run, they would result in a computation error. After some discussion in another thread, it was determined that my C library was outdated.

I am running Debian 7.2, which comes with eglibc 2.13, while the modfit units require 2.14.

To solve the issue, I switched repos to jessie, updated libc6 & dependents, then switched repos back to wheezy. This allowed me to upgrade to eglibc 2.17, without breaking anything (yet).

All my modfit work units are now completing successfully. :-)


As a non Linux user one would think the project would recognize the missing files and provide them in a download package, making all that work around stuff unnecessary. I am glad you are crunching the units successfully again though!!

[TA]Assimilator1
Avatar
Send message
Joined: 22 Jan 11
Posts: 339
Credit: 41,633,772
RAC: 5,927

Message 60819 - Posted: 26 Jan 2014, 22:16:44 UTC
Last modified: 26 Jan 2014, 22:17:04 UTC

So is the MW team going to do anything about WUs on the MilkyWay@Home v1.02 (opencl_amd_ati) erroring out on Radeon 5800s & 6900s??
And it's affected at least 1 7950 too.

Theirs this thread about it here http://tp://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3400 , but no answer yet :(.

We've had to switch off that app so as not to spew out errored WUs.

Maybe this is what GCGZpfuy3zLYVrtDTUhmoccc7Kx4pGM6AH was talking about? (daft name btw, oh & APU = audio processing unit ;) ).
____________
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, Einstein@H.

Main rig - i7 4930k @4.1 GHz, HD 7970 3 GB, 16 GB DDR3 1866, Win 7 64bit, BOINC 7.6.22
2nd rig - Q9550 @3.6 GHz, HD 7870 XT 3GB(DS), 8 GB DDR2 1066, Win 7 64bit

[TA]Assimilator1
Avatar
Send message
Joined: 22 Jan 11
Posts: 339
Credit: 41,633,772
RAC: 5,927

Message 60828 - Posted: 27 Jan 2014, 17:46:05 UTC

Doh! or Accelerated Processing Units, but I bet you meant GPU.
____________
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, Einstein@H.

Main rig - i7 4930k @4.1 GHz, HD 7970 3 GB, 16 GB DDR3 1866, Win 7 64bit, BOINC 7.6.22
2nd rig - Q9550 @3.6 GHz, HD 7870 XT 3GB(DS), 8 GB DDR2 1066, Win 7 64bit

Josiah - Images of Heaven
Avatar
Send message
Joined: 4 Jan 14
Posts: 3
Credit: 112,409
RAC: 0

Message 60951 - Posted: 4 Feb 2014, 0:04:50 UTC

My issue is that I notice the Nbody jobs come in and take over all 8 of my processors thereby suspending all my other BOINC projects. The only one that doesn't do that is the flagship milkyway@home. Therefore I aborted all of the 'vampire' workunits that suck up all 8 processors and then unchecked them Sorry folks but I'm not letting workunits take over all 8 processors.

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2032
Credit: 179,194,509
RAC: 208,626

Message 60953 - Posted: 4 Feb 2014, 13:05:07 UTC - in response to Message 60951.

My issue is that I notice the Nbody jobs come in and take over all 8 of my processors thereby suspending all my other BOINC projects. The only one that doesn't do that is the flagship milkyway@home. Therefore I aborted all of the 'vampire' workunits that suck up all 8 processors and then unchecked them Sorry folks but I'm not letting workunits take over all 8 processors.


Supposedly they are going to stop creating those units in the near future anyway, but they are no more intrusive then running 8 different units on your pc at the same time. AND they gave some insight into how to truly share your 8 core processor while crunching a single unit, true super computer type like computing. I stopped them awhile back too.

Jacob Klein
Send message
Joined: 22 Jun 11
Posts: 32
Credit: 2,427,229
RAC: 6,752

Message 60954 - Posted: 4 Feb 2014, 13:12:20 UTC - in response to Message 60953.
Last modified: 4 Feb 2014, 13:15:42 UTC

If I'm reading this correctly, you are referring to "MT" (multi-threaded) tasks in general, where they use multiple virtual cores to get the task done, instead of working as an "ST" (single-threaded) task which only uses 1 virtual core.

The thing is... BOINC is sufficiently setup to handle this just fine. It won't overcommit your system (unless it must due to high-priority tasks), it won't undercommit your system, and it properly records REC (recent estimated credit) such that your RS (resource share) percentages are honored across your projects. Sure, other projects can't work concurrently as the MT task, but BOINC is constantly keeping track of the work done, to ensure RS is honored before the MT task and afterward.

There is nothing inherently wrong with MT tasks. They've just been designed to use multiple threads/cores to get the task done quicker.

I'm not sure if it is setup this way, but... if MilkyWay had/has the MT tasks put into their own application, then "disabling" them would be as easy as editing the project preferences to disable that application. Though, I still don't see why you guys don't want to run MT tasks.

[TA]Assimilator1
Avatar
Send message
Joined: 22 Jan 11
Posts: 339
Credit: 41,633,772
RAC: 5,927

Message 60975 - Posted: 5 Feb 2014, 18:39:33 UTC - in response to Message 60954.

Probably because he wants to run more than 1 project at a time I'd guess.

Didn't know their was any DC projects that did true MT!
____________
Team AnandTech - SETI@H, Muon1 DPAD, F@H, MW@H, A@H, LHC@H, POGS, R@H, Einstein@H.

Main rig - i7 4930k @4.1 GHz, HD 7970 3 GB, 16 GB DDR3 1866, Win 7 64bit, BOINC 7.6.22
2nd rig - Q9550 @3.6 GHz, HD 7870 XT 3GB(DS), 8 GB DDR2 1066, Win 7 64bit

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2032
Credit: 179,194,509
RAC: 208,626

Message 60977 - Posted: 5 Feb 2014, 19:10:59 UTC - in response to Message 60975.

Probably because he wants to run more than 1 project at a time I'd guess.

Didn't know their was any DC projects that did true MT!


Collatz is doing it as of today, but I do not know if they are doing it the same way or not.

Arivald Ha'gel
Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,074,149
RAC: 0

Message 61703 - Posted: 7 May 2014, 14:13:43 UTC

Hello,

Please look at this computer:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=573990&offset=0&show_names=0&state=6&appid=

Shouldn't he be "banned" for mass abort? Or at least banned from receiving GPU tasks?

Link
Avatar
Send message
Joined: 19 Jul 10
Posts: 327
Credit: 16,283,020
RAC: 0

Message 61704 - Posted: 7 May 2014, 17:20:38 UTC - in response to Message 61703.

Hello,

Please look at this computer:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=573990

Shouldn't he be "banned" for mass abort? Or at least banned from receiving GPU tasks?

The quota system will do that in this case.

PS: I made your link clickable.
____________
.

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 218
Credit: 448,778
RAC: 0

Message 61705 - Posted: 7 May 2014, 18:08:07 UTC - in response to Message 61703.

Hello,

Please look at this computer:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=573990&offset=0&show_names=0&state=6&appid=

Shouldn't he be "banned" for mass abort? Or at least banned from receiving GPU tasks?

Note that the error message (every task I've looked at) is

201 (0xc9) EXIT_MISSING_COPROC

The card is NVIDIA Quadro K1000M (2048MB) driver: 296.79, but OpenCL support isn't being reported by BOINC - though the card iself can run OpenCL 1.2

I think the question is more - why does the project keep allocating OpenCL tasks to it?

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2032
Credit: 179,194,509
RAC: 208,626

Message 61707 - Posted: 8 May 2014, 10:36:35 UTC - in response to Message 61703.

Hello,

Please look at this computer:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=573990&offset=0&show_names=0&state=6&appid=

Shouldn't he be "banned" for mass abort? Or at least banned from receiving GPU tasks?


Sometimes the words projects use aren't totally accurate in this case 'aborted by user' can also mean 'timed out', or 'the server already has a valid return and your unit is not needed' and I am sure there are others, such as 'no device found'. The point of my message is that the actual message is not always an accurate representation of what is going on, kind of like the blue screen in Windows 'something is wrong'...no duh! But no actual clue as to what caused the problems, just a generic message. The Boinc programmers have been 'accused' in the past of learning to write the error messages from Microsoft, generic and meaning little. I THINK they are getting better though.

Arivald Ha'gel
Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,074,149
RAC: 0

Message 61745 - Posted: 21 May 2014, 13:36:24 UTC

Then I would suggest setting:
Max tasks per day
to 100 at the beginning (or when it's reset due to being >100 & validate error).

Right now I can see PCs wasting over 10 000 tasks...

Previous · 1 · 2 · 3
Post to thread

Message boards : News : Users Auto-Aborting Work Units


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group