Welcome to MilkyWay@home

Users Auto-Aborting Work Units

Message boards : News : Users Auto-Aborting Work Units
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 60126 - Posted: 7 Oct 2013, 18:22:58 UTC

Richard,

I am working on fixing the plan class to ignore GPUs without a certain minimum OpenCL requirement on the applications that need this. It was rarely a problem until MWSMF was released which can not run on CAL. Hopefully this will be resolved around the same time the small segfault is fixed in the MWSMF application.

Jake W
ID: 60126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 60128 - Posted: 8 Oct 2013, 12:13:47 UTC - in response to Message 60124.  

@mikey:

Just an idea, which I have not tested myself: have you tried to disable every application in the default settings? Than the server should not send anything on the first request and you could than assign the computer to the prefered venue before the 2nd request.


No I had not thought of that but will give it some thought. I have to go to the webpage anyway to assign it, so maybe it could work.
Thanks!
ID: 60128 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GCGZpfuy3zLYVrtDTUhmoccc7Kx4pG...

Send message
Joined: 4 Oct 11
Posts: 1
Credit: 1,397,192
RAC: 0
Message 60132 - Posted: 9 Oct 2013, 10:47:30 UTC

In my case the (modified fit) WUs always abort themselves after 2 seconds runtime.
I did try to not get them, but could not find were. It's bad that this checkboxes are not shown if you are not "editing Settings".
Now I have found it and disabled the modified.

Using a 5800 APU
ID: 60132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toby Broom

Send message
Joined: 13 Jun 09
Posts: 24
Credit: 137,665,647
RAC: 2,232
Message 60453 - Posted: 25 Nov 2013, 1:19:37 UTC

Thanks for tips on the titan config files.

I got sick of my ATI card crashing my computer all the time!
ID: 60453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Karl De Ruyck

Send message
Joined: 2 Sep 12
Posts: 5
Credit: 16,610,474
RAC: 0
Message 60527 - Posted: 6 Dec 2013, 0:03:41 UTC

Hi everyone, I hope this is appropriate to post here...

I was previously manually aborting modfit work units because when I let them run, they would result in a computation error. After some discussion in another thread, it was determined that my C library was outdated.

I am running Debian 7.2, which comes with eglibc 2.13, while the modfit units require 2.14.

To solve the issue, I switched repos to jessie, updated libc6 & dependents, then switched repos back to wheezy. This allowed me to upgrade to eglibc 2.17, without breaking anything (yet).

All my modfit work units are now completing successfully. :-)
ID: 60527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 60529 - Posted: 6 Dec 2013, 12:56:39 UTC - in response to Message 60527.  

Hi everyone, I hope this is appropriate to post here...

I was previously manually aborting modfit work units because when I let them run, they would result in a computation error. After some discussion in another thread, it was determined that my C library was outdated.

I am running Debian 7.2, which comes with eglibc 2.13, while the modfit units require 2.14.

To solve the issue, I switched repos to jessie, updated libc6 & dependents, then switched repos back to wheezy. This allowed me to upgrade to eglibc 2.17, without breaking anything (yet).

All my modfit work units are now completing successfully. :-)


As a non Linux user one would think the project would recognize the missing files and provide them in a download package, making all that work around stuff unnecessary. I am glad you are crunching the units successfully again though!!
ID: 60529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 22 Jan 11
Posts: 375
Credit: 64,706,194
RAC: 1,403
Message 60819 - Posted: 26 Jan 2014, 22:16:44 UTC
Last modified: 26 Jan 2014, 22:17:04 UTC

So is the MW team going to do anything about WUs on the MilkyWay@Home v1.02 (opencl_amd_ati) erroring out on Radeon 5800s & 6900s??
And it's affected at least 1 7950 too.

Theirs this thread about it here http://tp://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3400 , but no answer yet :(.

We've had to switch off that app so as not to spew out errored WUs.

Maybe this is what GCGZpfuy3zLYVrtDTUhmoccc7Kx4pGM6AH was talking about? (daft name btw, oh & APU = audio processing unit ;) ).
Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG

Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7
ID: 60819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 22 Jan 11
Posts: 375
Credit: 64,706,194
RAC: 1,403
Message 60828 - Posted: 27 Jan 2014, 17:46:05 UTC

Doh! or Accelerated Processing Units, but I bet you meant GPU.
Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG

Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7
ID: 60828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Josiah - Images of Heaven
Avatar

Send message
Joined: 4 Jan 14
Posts: 3
Credit: 140,563
RAC: 0
Message 60951 - Posted: 4 Feb 2014, 0:04:50 UTC

My issue is that I notice the Nbody jobs come in and take over all 8 of my processors thereby suspending all my other BOINC projects. The only one that doesn't do that is the flagship milkyway@home. Therefore I aborted all of the 'vampire' workunits that suck up all 8 processors and then unchecked them Sorry folks but I'm not letting workunits take over all 8 processors.
ID: 60951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 60953 - Posted: 4 Feb 2014, 13:05:07 UTC - in response to Message 60951.  

My issue is that I notice the Nbody jobs come in and take over all 8 of my processors thereby suspending all my other BOINC projects. The only one that doesn't do that is the flagship milkyway@home. Therefore I aborted all of the 'vampire' workunits that suck up all 8 processors and then unchecked them Sorry folks but I'm not letting workunits take over all 8 processors.


Supposedly they are going to stop creating those units in the near future anyway, but they are no more intrusive then running 8 different units on your pc at the same time. AND they gave some insight into how to truly share your 8 core processor while crunching a single unit, true super computer type like computing. I stopped them awhile back too.
ID: 60953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jacob Klein

Send message
Joined: 22 Jun 11
Posts: 32
Credit: 41,852,496
RAC: 0
Message 60954 - Posted: 4 Feb 2014, 13:12:20 UTC - in response to Message 60953.  
Last modified: 4 Feb 2014, 13:15:42 UTC

If I'm reading this correctly, you are referring to "MT" (multi-threaded) tasks in general, where they use multiple virtual cores to get the task done, instead of working as an "ST" (single-threaded) task which only uses 1 virtual core.

The thing is... BOINC is sufficiently setup to handle this just fine. It won't overcommit your system (unless it must due to high-priority tasks), it won't undercommit your system, and it properly records REC (recent estimated credit) such that your RS (resource share) percentages are honored across your projects. Sure, other projects can't work concurrently as the MT task, but BOINC is constantly keeping track of the work done, to ensure RS is honored before the MT task and afterward.

There is nothing inherently wrong with MT tasks. They've just been designed to use multiple threads/cores to get the task done quicker.

I'm not sure if it is setup this way, but... if MilkyWay had/has the MT tasks put into their own application, then "disabling" them would be as easy as editing the project preferences to disable that application. Though, I still don't see why you guys don't want to run MT tasks.
ID: 60954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 22 Jan 11
Posts: 375
Credit: 64,706,194
RAC: 1,403
Message 60975 - Posted: 5 Feb 2014, 18:39:33 UTC - in response to Message 60954.  

Probably because he wants to run more than 1 project at a time I'd guess.

Didn't know their was any DC projects that did true MT!
Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG

Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7
ID: 60975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 60977 - Posted: 5 Feb 2014, 19:10:59 UTC - in response to Message 60975.  

Probably because he wants to run more than 1 project at a time I'd guess.

Didn't know their was any DC projects that did true MT!


Collatz is doing it as of today, but I do not know if they are doing it the same way or not.
ID: 60977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 61703 - Posted: 7 May 2014, 14:13:43 UTC

Hello,

Please look at this computer:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=573990&offset=0&show_names=0&state=6&appid=

Shouldn't he be "banned" for mass abort? Or at least banned from receiving GPU tasks?
ID: 61703 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 623
Credit: 19,254,980
RAC: 0
Message 61704 - Posted: 7 May 2014, 17:20:38 UTC - in response to Message 61703.  

Hello,

Please look at this computer:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=573990

Shouldn't he be "banned" for mass abort? Or at least banned from receiving GPU tasks?

The quota system will do that in this case.

PS: I made your link clickable.
ID: 61704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 61705 - Posted: 7 May 2014, 18:08:07 UTC - in response to Message 61703.  

Hello,

Please look at this computer:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=573990&offset=0&show_names=0&state=6&appid=

Shouldn't he be "banned" for mass abort? Or at least banned from receiving GPU tasks?

Note that the error message (every task I've looked at) is

201 (0xc9) EXIT_MISSING_COPROC

The card is NVIDIA Quadro K1000M (2048MB) driver: 296.79, but OpenCL support isn't being reported by BOINC - though the card iself can run OpenCL 1.2

I think the question is more - why does the project keep allocating OpenCL tasks to it?
ID: 61705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 61707 - Posted: 8 May 2014, 10:36:35 UTC - in response to Message 61703.  

Hello,

Please look at this computer:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=573990&offset=0&show_names=0&state=6&appid=

Shouldn't he be "banned" for mass abort? Or at least banned from receiving GPU tasks?


Sometimes the words projects use aren't totally accurate in this case 'aborted by user' can also mean 'timed out', or 'the server already has a valid return and your unit is not needed' and I am sure there are others, such as 'no device found'. The point of my message is that the actual message is not always an accurate representation of what is going on, kind of like the blue screen in Windows 'something is wrong'...no duh! But no actual clue as to what caused the problems, just a generic message. The Boinc programmers have been 'accused' in the past of learning to write the error messages from Microsoft, generic and meaning little. I THINK they are getting better though.
ID: 61707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 61745 - Posted: 21 May 2014, 13:36:24 UTC

Then I would suggest setting:
Max tasks per day
to 100 at the beginning (or when it's reset due to being >100 & validate error).

Right now I can see PCs wasting over 10 000 tasks...
ID: 61745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : News : Users Auto-Aborting Work Units

©2024 Astroinformatics Group