Problem with tiny cache in MW

Author	Message
Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 744,068,851 RAC: 32,697	Message 31679 - Posted: 29 Sep 2009, 7:12:03 UTC There has to be a better way to assign work here. The WU cache on a dual core is 12 WUs, which lasts a total of 12 minutes. The frequent outages and maintenance causes the GPU to run dry and is then told not to check back for a hour, so the GPU sits idle for at least 48 minutes. Not good. In addition this tiny WU queue makes it impossible for the BOINC client to properly schedule the ATI GPU with other projects such as Collatz. It also just doesn't make sense to assign WUs to a GPU based on how many CPU cores are present. Can we please have another system that works? Maybe at least 60 WUs/machine? That would be an hour of work, less for faster cards. Thanks! PS: Excuse me if I sound frustrated but have been trying to find a way to make this work for days. I am frustrated and it takes a lot. This system of assigning work makes no sense whatsoever. ID: 31679 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 31680 - Posted: 29 Sep 2009, 8:58:18 UTC You will also run afoul of the FIFO bug where tasks DL are run in strict FIFO order ... since 6.6.? I forget when ... most funnily they did not get that working right until 6.10.7 ... :) Of course, when they made the change that made FIFO work right the fixed the bug (as near as I can tell) that drove them to start running tasks in strict FIFO order on the GPU in the first place ... ID: 31680 · Rating: 0 · rate: / Reply Quote

Thamir Ghaslan Send message Joined: 31 Mar 08 Posts: 61 Credit: 18,325,284 RAC: 0	Message 31681 - Posted: 29 Sep 2009, 10:50:44 UTC WHICH IS PRECISELY why I've completly detached from MW and have made my GPU solely run Collatz. Instead of whining and getting frustrated, the better alternative was collatz. Collatz offers 120 tasks, each of which runs 15 minutes, and the queue seldom runs dry. In the long run I'm sure my RAC on collatz will be higher than it is on MW. ID: 31681 · Rating: 0 · rate: / Reply Quote

Berserk_Tux Send message Joined: 2 Jan 08 Posts: 79 Credit: 365,471,675 RAC: 0	Message 31683 - Posted: 29 Sep 2009, 12:21:29 UTC - in response to Message 31679. There has to be a better way to assign work here. The WU cache on a dual core is 12 WUs, which lasts a total of 12 minutes. The frequent outages and maintenance causes the GPU to run dry and is then told not to check back for a hour, so the GPU sits idle for at least 48 minutes. Not good. In addition this tiny WU queue makes it impossible for the BOINC client to properly schedule the ATI GPU with other projects such as Collatz. It also just doesn't make sense to assign WUs to a GPU based on how many CPU cores are present. Can we please have another system that works? Maybe at least 60 WUs/machine? That would be an hour of work, less for faster cards. Thanks! PS: Excuse me if I sound frustrated but have been trying to find a way to make this work for days. I am frustrated and it takes a lot. This system of assigning work makes no sense whatsoever. I second that. ID: 31683 · Rating: 0 · rate: / Reply Quote

Crunch3r Volunteer developer Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0	Message 31685 - Posted: 29 Sep 2009, 12:57:47 UTC - in response to Message 31679. There has to be a better way to assign work here. The WU cache on a dual core is 12 WUs, which lasts a total of 12 minutes. The frequent outages and maintenance causes the GPU to run dry and is then told not to check back for a hour, so the GPU sits idle for at least 48 minutes. Not good. In addition this tiny WU queue makes it impossible for the BOINC client to properly schedule the ATI GPU with other projects such as Collatz. It also just doesn't make sense to assign WUs to a GPU based on how many CPU cores are present. Can we please have another system that works? Maybe at least 60 WUs/machine? That would be an hour of work, less for faster cards. Thanks! PS: Excuse me if I sound frustrated but have been trying to find a way to make this work for days. I am frustrated and it takes a lot. This system of assigning work makes no sense whatsoever. Theoretically it should be easy to assign more jobs to GPUs. The code is already there in sched_send.cpp max_jobs_on_host_cpu and max_jobs_on_host_gpu But then again, this requires a 6.10.x boinc client and a server software update... Now given the fact that RPIs Labstaff is paranoid and the MW Admins are not allowed to upgrade the server software on their own, i'd say it'll take a while before that could happen. Join Support science! Joinc Team BOINC United now! ID: 31685 · Rating: 0 · rate: / Reply Quote

cornel Send message Joined: 28 Feb 09 Posts: 38 Credit: 10,200,014 RAC: 0	Message 31686 - Posted: 29 Sep 2009, 12:59:54 UTC Last modified: 29 Sep 2009, 13:04:31 UTC Very good idea, a much bigger WU cache would definitely lower the network usage and the the current hardware would be kept longer. Suppose I use a dial-up connection, wouldn't it be much cheaper to update MW once in a few hours? Travis, tell us what you think about this idea. ID: 31686 · Rating: 0 · rate: / Reply Quote

Berserk_Tux Send message Joined: 2 Jan 08 Posts: 79 Credit: 365,471,675 RAC: 0	Message 31689 - Posted: 29 Sep 2009, 15:31:36 UTC - in response to Message 31686. I will not crunch more MW before the catche get bigger. ID: 31689 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 31692 - Posted: 29 Sep 2009, 16:53:16 UTC This project's work is generated differently from other projects. Other projects, by in large, have a predefined set of work to go through and divide that into chunks. The difference is that here the tasks are generated based upon the results of tasks that have been recently returned. I understand the desire for competition and having the most points, but please think on the following: Why do you think it was that the project ran out of work when the validator was down the other day? The validator handles validating tasks and then other parts of the server-side daemons move the valid task along. New tasks are then created from the results of those validated tasks. When the validator went down, no new tasks could be generated because this project relies on a continual incoming data feed from the validator, with the exception of when they start up NEW searches. Every time you all demand a larger cache, you ignore the possibility that those of us running slow CPUs might grab a large batch of work and then sit on it for a longer period of time than normal, which would then have the exact same effect of the validator going down, since if the tasks are sitting on our computers, they can't be validated and if they can't be validated, new tasks can't be created, thus creating a work stall / shortage... Crunch3r's comment about the separate setting for the GPUs is all well and good, however what will likely happen is you all with GPUs will completely outrun those of us with CPUs, thus making this for all intents and purposes a GPU project. It's either that or those of us with CPUs only would still get enough work and sit on it for "too long" for the liking of those of you with GPUs because we'd create work shortages again because of having to wait on those of us with CPUs to turn results in for the search to progress further... The proper way to have fixed this issue was what they came up with a long time ago, which was a separate GPU project that was to do more complex calculations. That idea was abandoned though... It shouldn't have been... ID: 31692 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 744,068,851 RAC: 32,697	Message 31697 - Posted: 29 Sep 2009, 18:13:46 UTC - in response to Message 31692. Last modified: 29 Sep 2009, 18:16:22 UTC Every time you all demand a larger cache, you ignore the possibility that those of us running slow CPUs might grab a large batch of work and then sit on it for a longer period of time than normal, which would then have the exact same effect of the validator going down, since if the tasks are sitting on our computers, they can't be validated and if they can't be validated, new tasks can't be created, thus creating a work stall / shortage... Crunch3r's comment about the separate setting for the GPUs is all well and good, however what will likely happen is you all with GPUs will completely outrun those of us with CPUs, thus making this for all intents and purposes a GPU project. It's either that or those of us with CPUs only would still get enough work and sit on it for "too long" for the liking of those of you with GPUs because we'd create work shortages again because of having to wait on those of us with CPUs to turn results in for the search to progress further... If fast turn around time is of such great importance why even allow CPUs? The GPUs are so much faster it makes no sense to waste a CPU here anyway IMO and I moved my CPUs elsewhere long ago. The tiny cache DOES have the effect of motivating GPU users to move elsewhere (like Collatz) now that other alternatives for ATI GPUs are available. Collatz does not have the tiny cache problem that MW has and for that matter is a lot less prone to crashes and work stoppages. Personally I'd like to run a balance of both but the tiny MW cache limitation also effectively keeps the BOINC strict FIFO scheduling from working so even that is not an option. Right now the only effective way to run MilkyWay is to babysit it and that's just a ridiculous solution. ID: 31697 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 31698 - Posted: 29 Sep 2009, 18:24:20 UTC - in response to Message 31697. If fast turn around time is of such great importance why even allow CPUs? You can't shut down MW for CPUs. All the work that Cluster Physik did for optimized CPU apps would be lost and that would be inexcusable. Without that work you would never have had optimized apps for MW GPUs. ID: 31698 · Rating: 0 · rate: / Reply Quote

PeteS Send message Joined: 19 Mar 09 Posts: 27 Credit: 117,670,452 RAC: 0	Message 31699 - Posted: 29 Sep 2009, 18:37:17 UTC Last modified: 29 Sep 2009, 18:42:03 UTC I totally agree on making this CAL/CUDA only. Then you can rely on fast enough response times and set due date quite agressively + have a longer wu queue. There are a lot of CPU only projects where CPU's are needed, here using a CPU seems like quite a waste when exactly the same can be achieved with much faster GPU's. Ice, if you really think about it, that is a really bad excuse for keeping CPU apps around <a lot of examples here>. I am also thinking of moving totally to Collatz since things really progress slowly here and the project is often down. I just wish projects like WCG had GPU apps, since I have never problems there. I have also tried setting primary+secondary projects, but with no success. - Promise to set up GPU project ->never happened - Promise to set higher queues ->never happened - Promise to make much longer wu's ->never happened and the list goes on. I'm a real fan of sci-fi and wish to help human kind progress more rapidly, and that's why I pay big electricity bills and have many computers counting BOINC projects. ID: 31699 · Rating: 0 · rate: / Reply Quote

Crunch3r Volunteer developer Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0	Message 31700 - Posted: 29 Sep 2009, 18:39:22 UTC - in response to Message 31698. If fast turn around time is of such great importance why even allow CPUs? You can't shut down MW for CPUs. All the work that Cluster Physik did for optimized CPU apps would be lost and that would be inexcusable. Without that work you would never have had optimized apps for MW GPUs. No-one want's to single out the CPUs. What we need is max_jobs_on_host_gpu=200 ... that should reduce the server load a bit since the clients won't hammer it every 60 sec.. Join Support science! Joinc Team BOINC United now! ID: 31700 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 384 Credit: 744,068,851 RAC: 32,697	Message 31701 - Posted: 29 Sep 2009, 18:50:52 UTC - in response to Message 31700. No-one want's to single out the CPUs. What we need is max_jobs_on_host_gpu=200 ... that should reduce the server load a bit since the clients won't hammer it every 60 sec.. Perfect. Bravo! ID: 31701 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 31702 - Posted: 29 Sep 2009, 18:52:41 UTC - in response to Message 31699. I have also tried setting primary+secondary projects, but with no success. - Promise to set up GPU project ->never happened - Promise to set higher queues ->never happened - Promise to make much longer wu's ->never happened and the list goes on. I'm a real fan of sci-fi and wish to help human kind progress more rapidly, and that's why I pay big electricity bills and have many computers counting BOINC projects. Go back another year and a half before you joined. The list is much longer. The most recent promise/lie was to never cut the credits. They were to be recalculated when any changes were made. For most of the time I have been here the lack of communication has been huge. What the users want or what makes sense doesn't happen most of the time. So the chance of a bigger cache is unlikely at best. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 31702 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 31703 - Posted: 29 Sep 2009, 18:54:47 UTC - in response to Message 31701. No-one want's to single out the CPUs. What we need is max_jobs_on_host_gpu=200 ... that should reduce the server load a bit since the clients won't hammer it every 60 sec.. Perfect. Bravo! Well the solution was the GPU project & 100's times longer wu's for GPUs. ID: 31703 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 31708 - Posted: 29 Sep 2009, 20:01:36 UTC - in response to Message 31697. The tiny cache DOES have the effect of motivating GPU users to move elsewhere (like Collatz) now that other alternatives for ATI GPUs are available. This was the specific design purpose of BOINC, to have alternatives available. When there was no such thing as doing this on a GPU, the complaining back then was primarily the same. Even a poster on a forum on another project stated that if you didn't fork over cash in the form of "donations" to the project that you should not be able to get work so that more work would be available for those who wanted all that they can get... ("pay-to-crunch" plan) Collatz does not have the tiny cache problem that MW has and for that matter is a lot less prone to crashes and work stoppages. Their project is a different project. New work there does not depend on the outcome of current work in the pipeline. Also, their web page mentions their server running out of disk space VERY recently...and MySQL problems... As the user base grows, so will their problems, and they will likely be very similar to here. Every single BOINC project eventually runs into a problem. For a very long time, Einstein was considered to be the most stable project from the server side, but over the past 1-2 years they've had numerous crashes. Right now the only effective way to run MilkyWay is to babysit it and that's just a ridiculous solution. Only if you are strictly interested in the most points. If you leave it alone, it does eventually get work, just not on your desired timetable to get the most points. Again, the real solution would've been to get that GPU project going and have the GPUs doing the more complex work as was initially mentioned. Instead a poor design decision has brought us back to the same complaining that happened months ago... :sigh: ID: 31708 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 31709 - Posted: 29 Sep 2009, 20:04:49 UTC - in response to Message 31703. No-one want's to single out the CPUs. What we need is max_jobs_on_host_gpu=200 ... that should reduce the server load a bit since the clients won't hammer it every 60 sec.. Perfect. Bravo! Well the solution was the GPU project & 100's times longer wu's for GPUs. Yep, and that's what got abandoned and should not have been abandoned. We're basically back to where we were just before they had that idea. Maybe this time they'll try it? ID: 31709 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 31710 - Posted: 29 Sep 2009, 20:17:52 UTC I guess I am going to have to expose my multi-project prejudice here ... but ... though I would prefer a more rational internal operation (eliminate Strict FIFO for GPU) I have been running reasonably stably with MW and Collatz on my one ATI equipped machine. At the moment I have about 2 hours of Collatz work after running for hours with MW (I think I hit a bobble in MW workflow) like I had yesterday. Bottom line, though it is certainly not exactly as I would wish it, I seem to be running almost continually ... THAT said, I would wish for more stability here and better work flow ... but I am grateful that I can get work and stay busy most of the time with work from MW ... cup 3/4 full I guess ... Communication could be better at almost all projects and though MW is better than most it is not at all where I would like to see it ... ID: 31710 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 31712 - Posted: 29 Sep 2009, 20:40:00 UTC Communications use to be much better. Things have dropped off since the server problems started many months ago and the gpu only project was suggested as a way of moving forward. I'd love to see the number of wu's lifted but I doubt that 200 would be set. Maybe we can have small steps starting at 50 on Monday then increasing by 10 every other day? ID: 31712 · Rating: 0 · rate: / Reply Quote

ExtraTerrestrial Apes Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0	Message 31717 - Posted: 29 Sep 2009, 21:06:10 UTC My solution to keep the GPU busy is to use a BOINC core client modified by Twodee, one of the good guys from planet3dnow.de. It's actually very easy: you've got a separate section for MW in the cc_config.xml where you can tell it to report & fetch work every x seconds (timeintervall), keep a cache (if possible) for y seconds (requestworktime) and to report enough cores so you can get the maximum 8 x 6 = 48 WU cache (hostinfo_ncpus). To activate the functionality you need to set the "connect every xx hours" to 0. This way I can get a cache of 34 mins for my GPU, if I don't get any short ones. I set the update interval to 10 mins, which should be somewhat reasonable and much better than a stock BOINC client with "return results immediately" on. MrS Scanning for our furry friends since Jan 2002 ID: 31717 · Rating: 0 · rate: / Reply Quote