Welcome to MilkyWay@home

Collatz and MW togetther

Message boards : Number crunching : Collatz and MW togetther
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 36583 - Posted: 16 Feb 2010, 16:27:06 UTC - in response to Message 36581.  

Hmm -- OK -- I may try that. One approach I can try is to revise for no new work on one of the computers -- let the queues drain natually. Then, change the resource shares for both Collatz and MW to a lower number.

Then delete the client on the workstation including the program and project data side.

Then reboot the computer

Then download the test client version.

Then join MW only -- let it run for an hour or so (ie don't force reporting).

If at that point, I still have no success, then I can be sure that the foible is personal and I'll no longer try to teach that pig to sing.






The thing is, I did what I would have thought would be a variable killing test by clearing everything out including a full uninstall of BOINC and a delete of the leftover program and project folders and a clean new install with ONLY MW GPU configured as GPU only. I really don't know what more I can do there to set up a fresh start.

That should do it on the client side. The only question I have is if the computer trashed so many tasks that its daily quota is down in the dirt. That is the last thing I can think of. But my recollection is that you say that the tasks are completed and validated so that does not make much sense either.

My last suggestion is to try 6.10.28 to see if anything changes... sadly I have not been impressed with the work fetch and Resource Scheduler in the post GPU era because UCB did not really consider the impacts of so many design choices on other issues and they have been very reluctant to address issues or to even acknowledge that they exist.


ID: 36583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 36613 - Posted: 17 Feb 2010, 14:23:22 UTC - in response to Message 36583.  
Last modified: 17 Feb 2010, 14:24:15 UTC

Remember the RS are relative numbers so there is no difference between 100/200 and 1000/2000 ...

The only time that the shares become important is when you have something like 100/2000 in which case one project will be 20 times the share of the other. I am sure you know that ... :)

I think you mean that it would be to reduce the percentage time ...

In my case I have all GPU projects at shares of 100 with the sole exception of PG at 50 (on CUDA only at the moment) so I spend a day something on one project before switching to do a day on another project. For example I am running MW on my dual ATI system and expect it to switch to Collatz late today or early tomorrow.

My Nvidia systems do MW, Collatz, PG, and GPU Grid so they take longer to make the rounds... though I have gotten my first silver badge on PG on AP26 from running the CUDA application (the pay stinks though).

To be honest, I am still concerned though about the version of BOINC you are using as the version I have seen (.18) is somewhat long in the tooth and there have been quite a few Work Fetch and Resource Scheduling changes made in the later versions. I don't suggest .32 for the simple reason that UCB added a feature that can cause extra suspensions because of a "fix" to apparent lag that can cause an unloaded system to suspend randomly for about 10 seconds or so several times a day (I was in bed for most occurrences so I know it was not something I did, AV was not running, and nothing else is set to run unattended).

Anyway, in your test case I would also set it to report results immediately to see if it can get into a cycle of do work, report it, then get more ... I have seen the 6.10.x versions allow idle instances for considerable time periods as it futilely looks for work on the wrong projects...
ID: 36613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 36623 - Posted: 17 Feb 2010, 18:42:30 UTC - in response to Message 36613.  

I don't have the other interim versions to play with - just 6.10.18 and 6.10.32. I know there is a site that collects all of them --- do you have that link handy?

I expect to run thru my cache on Collatz and the remaining POEM workunits on the XP system today. Then I will detach all the projects, uninstall 6.10.18, delete the leftover program and project data leftovers and install either the .32 (as it is all that I have, or one of the other interim versions if I can download them).

I will then attach ONLY to MW and give it a chance (say an hour or so) to 'play righteous'. If that doesn't work, I'll revert to the projects which do play nice.

By the way, the ATI driver version I'm running on the XP system is 9.12


ID: 36623 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile dnolan
Avatar

Send message
Joined: 26 Oct 09
Posts: 55
Credit: 352,166,802
RAC: 0
Message 36625 - Posted: 17 Feb 2010, 18:49:02 UTC - in response to Message 36623.  

I don't have the other interim versions to play with - just 6.10.18 and 6.10.32. I know there is a site that collects all of them --- do you have that link handy?



Boinc version archive


I expect to run thru my cache on Collatz and the remaining POEM workunits on the XP system today. Then I will detach all the projects, uninstall 6.10.18, delete the leftover program and project data leftovers and install either the .32 (as it is all that I have, or one of the other interim versions if I can download them).

I will then attach ONLY to MW and give it a chance (say an hour or so) to 'play righteous'. If that doesn't work, I'll revert to the projects which do play nice.

By the way, the ATI driver version I'm running on the XP system is 9.12



I run 6.10.25 on all of mine, 1 currently has 2 x HD 4770 and three have 1 x HD 4850 in them (and a 2 x GTX 260), I am using 9.12 on the 4770 system and a Win 7 4850 system, the others are using 10.1. Haven't seen your problem, but I do allow both MW and Collatz on all systems. On a couple of them, the Collatz share is really low, the only thing I've anecdotally noticed is that it doesn't seem to respect resource share, but when it's doing too much Collatz, the MW cache is full. Good luck!

-Dave
ID: 36625 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 36626 - Posted: 17 Feb 2010, 19:57:26 UTC - in response to Message 36625.  

Thanks for the report back along with the link. I will give 9.10.25 a shot in a few minutes and report back later today.
ID: 36626 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 36633 - Posted: 18 Feb 2010, 1:49:39 UTC - in response to Message 36626.  

OK -- that works -- at least for the XP workstation. The default install is for .25 days cache. I installed MW and it grabbed the initial one work unit, then it filled the cache. I set up Spinhenge and POEM and Collatz and they appear to be grabbing work properly. (I set up my 'Work' group with all the lower (20 and 40) shares to keep things balanced).

Currently I'm letting the other workstation 'drain' the caches, then I will see if MW will grab work (using 6.10.18) having set the cache back to .25 days.

If that doesn't do the trick, then I'll do the same thing for that workstation -- drain all work out via completion, then detach all projects, then delete BOINC, then clear out leftover project and program folders, then reboot and then install 6.10.25.


ID: 36633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 36645 - Posted: 18 Feb 2010, 15:46:30 UTC

The "Better" Link as it sorts the newest to the top so you don't have to hunt...

It looks like you were running into some odd version issue ...

Note that with the strict FIFO rule though JM VII says it does not equal EDF (true) it does mean that you cannot really have the queue work as it should. My ATI systems yo-yo between MW and Collatz and only part of the time do I get a list of tasks from both projects. Most usually when I am in the Collatz part of the cycle.
ID: 36645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 36674 - Posted: 19 Feb 2010, 21:22:44 UTC - in response to Message 36645.  

So far it seems I need a bit of manual intervention - elsewise Collatz seems to run all the time. So I can either 'no new work' on Collatz to let it drain and then MW will kick in, or I can suspend one or the other to force the switch.

That works for now since these are two fairly closely watched workstations.

At some point I may simply let one workstation run Collatz only and the other run MW only (by using 'no new work' to let the other GPU project drain out for a while).

It would be nice if the resource share and queue handling actually worked though.


ID: 36674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 36681 - Posted: 20 Feb 2010, 15:53:56 UTC - in response to Message 36674.  

It would be nice if the resource share and queue handling actually worked though.

As I just pointed out on Collatz, with the strict FIFO rule you will see BOINC do one GPU project for the length of time you have the queue set for and then it will switch over to the next GPU project. Only at rare moments do I see work from alternative projects on the system in the queue.

As an example I am looking at W4 and it shows only Collatz tasks in the queue. Late today or early tomorrow if it can get some it will switch over to MW and then I will only see MW tasks for a day or so until we lurch back to Collatz processing.

One work around is to have a short queue but then ISP outages leave you vulnerable to running out of work. Of course MW's short queue fills also leaves you equally vulnerable ...

Sadly the last tester that UCB listens to on a few rare occasions runs SAH pretty much full time and does not seem to notice this condition. At least not yet...
ID: 36681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 36685 - Posted: 20 Feb 2010, 21:59:43 UTC

My 4830 primarily does Milkyway, but when we go down I let it load up with Collatz work and then set NNW again.

I am almost done with the 105 units from there and will get back to MW in about 15 minutes or so.
ID: 36685 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 15
Message 36689 - Posted: 21 Feb 2010, 2:40:01 UTC - in response to Message 36681.  

Right -- I know that the handling of multiple GPU projects by the client is 'suboptimal'. I had seen that already when running 9800 CUDA cards with Collatz, GPUGrid, and SETI. Pretty much a case of forcing which project I wanted to run by suspending the others.

So for reasonably 'close supervision' -- that work around is what I do here -- basically suspending MW or Collatz for 8 to 12 hours a day and letting the other project run when I want it to.

With two workstations running double precision GPU's, the solution if I'm going to be away for some time would be to clear the queue for MW on one, letting Collatz go auto-pilot, and do the reverse on the other workstation.

Since most of my farm runs single precision GPU's -- the choice is simpler, especially for the ATI GPU's -- no choice but Collatz anyway.

I may get around to adding more 4850's into the mix, and then at a certain point, avoid the management hassle and run 100% MW on some and 100% Collatz on others (simply suspending the other project with a cleared queue).




It would be nice if the resource share and queue handling actually worked though.

As I just pointed out on Collatz, with the strict FIFO rule you will see BOINC do one GPU project for the length of time you have the queue set for and then it will switch over to the next GPU project. Only at rare moments do I see work from alternative projects on the system in the queue.

As an example I am looking at W4 and it shows only Collatz tasks in the queue. Late today or early tomorrow if it can get some it will switch over to MW and then I will only see MW tasks for a day or so until we lurch back to Collatz processing.

One work around is to have a short queue but then ISP outages leave you vulnerable to running out of work. Of course MW's short queue fills also leaves you equally vulnerable ...

Sadly the last tester that UCB listens to on a few rare occasions runs SAH pretty much full time and does not seem to notice this condition. At least not yet...


ID: 36689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 36701 - Posted: 21 Feb 2010, 23:30:48 UTC - in response to Message 36689.  

Right -- I know that the handling of multiple GPU projects by the client is 'suboptimal'. I had seen that already when running 9800 CUDA cards with Collatz, GPUGrid, and SETI. Pretty much a case of forcing which project I wanted to run by suspending the others.

BOINC does eventually get around to doing RS over the long haul meaning of course in my case several days to a week to get through them all. For the ATI cards I flip-flop with MW and Collatz changing places and with the CUDA cards I also add in GPU Grid and PG (except the Mac on which PG causes too much lag and makes it unusable)...

With more machines you can do the alternative with some dedicated to one project and others to the other. Bad news is you have to watch them closely for those cases where MW runs out of work. In my case I usually then do an extra turn with Collatz ...

Still, I wish UCB would remove the "fix" of strict FIFO which was originally put in to "cure" the starting of task after task after task and not completing any of them ... especially in that it is likely that this problem also showed up on the CPU side and other code changes may have gotten the problem solved.

Sadly, the FIFO rule means that queuing is effectively not allowed so that you cannot get work from multiple projects when it is available and to run it with work from other projects so that we don't have boom and bust cycles on projects like MW and Collatz which can ill afford it ...
ID: 36701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Collatz and MW togetther

©2024 Astroinformatics Group