Problem with tiny cache in MW

Author	Message
verstapp Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0	Message 33751 - Posted: 26 Nov 2009, 10:25:46 UTC The 'variable cache size depending on turnaround time for that PC' idea, presented elsewhere, seems to have merit. Of course that would require more code at the server end... Cheers, PeterV . ID: 33751 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33799 - Posted: 27 Nov 2009, 6:35:52 UTC - in response to Message 33749. Last modified: 27 Nov 2009, 6:47:31 UTC The project is addicted to the fast turn around times that GPU crunching gives them. I don't think they care if GPU crunchers can't cache many wu's. For our project to do the science we're doing, we need fast turnaround times on workunits. I hope there's still the sticky describing why we really need that. On top of this, we've found that when we increase the workunit cache, the amount of WUs out in the system increases to a point where the database can't handle it's queries fast enough (because a lot of it is dependent on the result table). So right now, this project needs a low cache, partly because of the server and partly because of the scientific needs of the project. If you guys appreciate the science we're doing here, then we hope you'll put up with having a small WU cache. Part of what's nice about BOINC is that you can also be part of other projects that will fill out your cache when you don't have work from us. But you seem to forget that cpu crunching gets you the same cache and yet takes 50 times longer to complete a wu. So your argument tends to fall in a heap at that.... This is where a lot of you just tune out some of what he said. I have bolded and underlined the important part that you're tuning out for you. I have told many of you this over and over and over and over, and now someone from the project is supporting what I have said. The project cannot keep up with you all by giving you more tasks. If they give you all more tasks, the entire project comes to a halt eventually because the server cannot handle the load. Please, read...without selectively tuning out things... ID: 33799 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33800 - Posted: 27 Nov 2009, 6:41:32 UTC - in response to Message 33751. The 'variable cache size depending on turnaround time for that PC' idea, presented elsewhere, seems to have merit. Of course that would require more code at the server end... Getting you GPU folks your own project or work that is significantly more complex than the current work has the most merit and is the most likely to produce long-term relief. If there were any relief by giving you more tasks, you'd merely demand more and more due to the drive to accumulate credits until the point where we'd grind to a halt again. ID: 33800 · Rating: 0 · rate: / Reply Quote

Bralle Send message Joined: 5 Sep 08 Posts: 5 Credit: 14,228,045 RAC: 0	Message 33804 - Posted: 27 Nov 2009, 8:08:23 UTC - in response to Message 33800. Some numbers for you. Running quads on my 2 main crunchers. One quad needs 3 hours +- per core for one milkyway WU, with a total of 24WUs with work, thats 18 hours of work in total for that batch of 24WUs. My ATI 3870x2, which isnt fast compared to the 4000 and 5000 series, can do a WU in just under 2 minutes, with 2 cores that adds up to 1 WU per minute. now, with 24WUs in cache thats a stagering 24 minutes of work. Now, some of you say that we need that flow of steady WUs to return to generate more work, well, cant we atleast get "up to" a 3 hour cache on GPUs atleast ? or even better, full 18hours ? If a CPU uses 18 hours to complete its batch, a gpu shouldnt slow down anything by having a larger cache, times 4-6-8x. If the problem is not enough work, then I would like to know if its not possible to generate more work, 2-4-6x the current work ? or is this humanly impossible ? And stop using the "credit Whore" comments all the time, ofc credits is a big driver, but in the end, we want loads of work to do. We mainly here for the science. And its your job to keep us fed with WUs :) ID: 33804 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 33807 - Posted: 27 Nov 2009, 9:19:54 UTC - in response to Message 33804. Last modified: 27 Nov 2009, 9:23:29 UTC Some numbers for you. And some back ... Now, some of you say that we need that flow of steady WUs to return to generate more work, well, cant we atleast get "up to" a 3 hour cache on GPUs atleast ? or even better, full 18hours ? If a CPU uses 18 hours to complete its batch, a gpu shouldnt slow down anything by having a larger cache, times 4-6-8x. On my "fastest" system which is not the fastest out there ... MW tasks run in about 55 seconds ... two ATI GPUs meaning that I would need 2 GPUs * 18 Hours (18 * 60 = 1080) = 2160 tasks ... times even say 5,000 participants (er, computers) = 10,800,000 tasks ... In that the tasks issued are dependent on the prior tasks issued this is, at least to me, clearly not possible. It has nothing to do with credit, it has to do with the fact that MW like GPU Grid is a "flow oriented" project ... where new tasks are based on the results returned by participants ... ID: 33807 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 33808 - Posted: 27 Nov 2009, 10:10:14 UTC - in response to Message 33799. The project is addicted to the fast turn around times that GPU crunching gives them. I don't think they care if GPU crunchers can't cache many wu's. For our project to do the science we're doing, we need fast turnaround times on workunits. I hope there's still the sticky describing why we really need that. On top of this, we've found that when we increase the workunit cache, the amount of WUs out in the system increases to a point where the database can't handle it's queries fast enough (because a lot of it is dependent on the result table). So right now, this project needs a low cache, partly because of the server and partly because of the scientific needs of the project. If you guys appreciate the science we're doing here, then we hope you'll put up with having a small WU cache. Part of what's nice about BOINC is that you can also be part of other projects that will fill out your cache when you don't have work from us. But you seem to forget that cpu crunching gets you the same cache and yet takes 50 times longer to complete a wu. So your argument tends to fall in a heap at that.... This is where a lot of you just tune out some of what he said. I have bolded and underlined the important part that you're tuning out for you. I have told many of you this over and over and over and over, and now someone from the project is supporting what I have said. The project cannot keep up with you all by giving you more tasks. If they give you all more tasks, the entire project comes to a halt eventually because the server cannot handle the load. Please, read...without selectively tuning out things... Something you've never done before... I just wish Travis would acknowledge that once the new server is installed that he will look into increasing the number of wu's allowed to be cached. Not by CPU but by GPU! ID: 33808 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33815 - Posted: 27 Nov 2009, 12:01:25 UTC - in response to Message 33808. I just wish Travis would acknowledge that once the new server is installed that he will look into increasing the number of wu's allowed to be cached. Not by CPU but by GPU! So, you want no relief for him in regards to server upkeep duties? You want him to have to have no room for growth and for him to have to keep being on edge, having to constantly make sure things are done just exactly right on here, thus diverting his time away from training other people to help him and taking time away from getting a more permanent solution in place? ID: 33815 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 33818 - Posted: 27 Nov 2009, 12:27:37 UTC - in response to Message 33815. I just wish Travis would acknowledge that once the new server is installed that he will look into increasing the number of wu's allowed to be cached. Not by CPU but by GPU! So, you want no relief for him in regards to server upkeep duties? You want him to have to have no room for growth and for him to have to keep being on edge, having to constantly make sure things are done just exactly right on here, thus diverting his time away from training other people to help him and taking time away from getting a more permanent solution in place? Collatz can run their database with over 300k wu's in progress and yet MW can only handle 90k...give me a break. With an updated server there'd be plenty of capability for both more cached wu's and room to grow. ID: 33818 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33820 - Posted: 27 Nov 2009, 13:01:03 UTC - in response to Message 33818. Last modified: 27 Nov 2009, 13:05:26 UTC I just wish Travis would acknowledge that once the new server is installed that he will look into increasing the number of wu's allowed to be cached. Not by CPU but by GPU! So, you want no relief for him in regards to server upkeep duties? You want him to have to have no room for growth and for him to have to keep being on edge, having to constantly make sure things are done just exactly right on here, thus diverting his time away from training other people to help him and taking time away from getting a more permanent solution in place? Collatz can run their database with over 300k wu's in progress and yet MW can only handle 90k...give me a break. With an updated server there'd be plenty of capability for both more cached wu's and room to grow. Three issues: 1) Collatz is not dependent upon current work being completed in a timely fashion for generation of new work. 2) The Project Administrator here has said that they don't have the capability of increasing to meet your credit gathering demands. If it were to help scientifically, I'm sure he'd be all about giving you all you and the server could handle. Oh, wait, he has said he is giving you all you and the server can handle (within reason) right now. 3) As I told ETA, increasing work for just those of you with GPUs would end up starving out everyone else. To cater to you, to have just a 3-hour cache you'd have to have 20x the current workload, from what I remember. Maybe more. By the time we add up all of you with GPUs, you would outnumber the rest of us if that 20:1 ratio got applied. Out goes everyone with a CPU and this becomes a GPU-only project. In your short-sighted moment of cheering, you'd neglect to consider that eventually the low-end GPUs would end up being considered to be "CPU-like" in nature, meaning that they'd turn in tasks slower than the newer GPUs, thus creating a never-ending cycle of having to have people on the lower end forced out to placate people on the high end and their drive for the highest number of credits. Edit: It's also unlikely that a 3-hour cache would satisfy people, or if it did, only very temporarily, so there'd be yet another round of wanting a larger cache sometime shortly after the 20x increase to get to 3-hour caches. How about trying to advocate something that has real long-term benefits for everyone instead of this scenario? ID: 33820 · Rating: 0 · rate: / Reply Quote

STE\/E Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0	Message 33822 - Posted: 27 Nov 2009, 13:56:37 UTC - in response to Message 33820. Last modified: 27 Nov 2009, 14:00:20 UTC Edit: It's also unlikely that a 3-hour cache would satisfy people, or if it did, only very temporarily, so there'd be yet another round of wanting a larger cache sometime shortly after the 20x increase to get to 3-hour caches. How about trying to advocate something that has real long-term benefits for everyone instead of this scenario? Increasing the size of the GPU Wu's by 5-10 times they are now or even more & adjusting the Credits accordingly would be one solution to the Low Caches but then there would be a lot of complaining about wasting Processing Power. Funny though that's what Slicker did to help ease the burden on his Server, he doubled the WU Length or something close to that & I haven't seen 1 Post complaining about it, but then that's Collatz and not Milkyway. ID: 33822 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33825 - Posted: 27 Nov 2009, 14:31:01 UTC - in response to Message 33822. Edit: It's also unlikely that a 3-hour cache would satisfy people, or if it did, only very temporarily, so there'd be yet another round of wanting a larger cache sometime shortly after the 20x increase to get to 3-hour caches. How about trying to advocate something that has real long-term benefits for everyone instead of this scenario? Increasing the size of the GPU Wu's by 5-10 times they are now or even more & adjusting the Credits accordingly would be one solution to the Low Caches but then there would be a lot of complaining about wasting Processing Power. Funny though that's what Slicker did to help ease the burden on his Server, he doubled the WU Length or something close to that & I haven't seen 1 Post complaining about it, but then that's Collatz and not Milkyway. That's exactly what I've been advocating for a long time now. It was the whole purpose behind the MW_GPU project. We've already seen what short-sighted thinking does, as when the tweak that was found about the feeder/shared memory segment was made, the project backed away from the separate project. I'd rather not see another short-sighted move if it can be avoided. I do not understand how it would be "wasting processing power" though. If more actual science is being done, then I do not see how that is possible. What I think would be a very good thing to do is to restart work into getting MW_GPU going and provide those of you with GPUs some very intense work there. To help turnaround time here, they could also move all of the 3-stream (3s) tasks over to MW_GPU, leaving the 1-stream and 2-stream tasks here, which are well within the capabilities of CPUs. This type of plan would have a much longer-lasting effect and would benefit far more people, not the least of which is the project itself as it could be getting a lot more work done in less time and with less continual hassles, either from the equipment or from the volunteers. ID: 33825 · Rating: 0 · rate: / Reply Quote

STE\/E Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0	Message 33826 - Posted: 27 Nov 2009, 14:39:18 UTC I have no problems with longer Wu's for my GPU's here, in fact the Short Wu's & Low Cache is the Main reason I switched to Collatz & haven't looked back once. I just got tired of the Constant Idleness of my GPU's here at Milkyway & all the Hassle of continually having to change App's to keep up. Slickers got it so that now all you have to do is run fairly new Drivers & Clients & everything is downloaded automatically for you & your off to the races. Once that becomes the norm here & the Caches increase I may come back here to run more but until then I'm very comfortable at Collatz & will stay there as long as they have work anyway ... ID: 33826 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33827 - Posted: 27 Nov 2009, 15:08:28 UTC - in response to Message 33826. I have no problems with longer Wu's for my GPU's here, in fact the Short Wu's & Low Cache is the Main reason I switched to Collatz & haven't looked back once. I just got tired of the Constant Idleness of my GPU's here at Milkyway & all the Hassle of continually having to change App's to keep up. Then you, as someone who has GPUs, need to talk to your fellow GPU users and convince them that it's not an evil plan and would actually work better in the long run than just having more of the shorter tasks, because I can't do it. People tend to shoot me first, rather than the message, then proceed to shoot the message. ID: 33827 · Rating: 0 · rate: / Reply Quote

STE\/E Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0	Message 33830 - Posted: 27 Nov 2009, 15:37:27 UTC - in response to Message 33827. Last modified: 27 Nov 2009, 15:49:47 UTC I have no problems with longer Wu's for my GPU's here, in fact the Short Wu's & Low Cache is the Main reason I switched to Collatz & haven't looked back once. I just got tired of the Constant Idleness of my GPU's here at Milkyway & all the Hassle of continually having to change App's to keep up. Then you, as someone who has GPUs, need to talk to your fellow GPU users and convince them that it's not an evil plan and would actually work better in the long run than just having more of the shorter tasks, because I can't do it. People tend to shoot me first, rather than the message, then proceed to shoot the message. No it's the Participants that are actually running the project that need to do it if that's what they want, I'm not running it ATM so I don't really care what they do with the Wu's. We complained a little at the Collatz Project so finally Slicker proposed to make the WU's Longer. After a bit when nobody objected he did it and nobody complained after he did it either. The Project has to want to do it & the Participating Participants have to want to do it then whats the problem with doing it. Obviously one or the other has an Objection to it or it would have been done already. ID: 33830 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33868 - Posted: 27 Nov 2009, 22:19:33 UTC - in response to Message 33830. Last modified: 27 Nov 2009, 22:31:05 UTC I have no problems with longer Wu's for my GPU's here, in fact the Short Wu's & Low Cache is the Main reason I switched to Collatz & haven't looked back once. I just got tired of the Constant Idleness of my GPU's here at Milkyway & all the Hassle of continually having to change App's to keep up. Then you, as someone who has GPUs, need to talk to your fellow GPU users and convince them that it's not an evil plan and would actually work better in the long run than just having more of the shorter tasks, because I can't do it. People tend to shoot me first, rather than the message, then proceed to shoot the message. No it's the Participants that are actually running the project that need to do it if that's what they want, I'm not running it ATM so I don't really care what they do with the Wu's. We complained a little at the Collatz Project so finally Slicker proposed to make the WU's Longer. After a bit when nobody objected he did it and nobody complained after he did it either. The Project has to want to do it & the Participating Participants have to want to do it then whats the problem with doing it. Obviously one or the other has an Objection to it or it would have been done already. MW_GPU was designed to have significantly more complex tasks so that it gave GPUs something more complex to work on because they can handle it. The project gave up on it after they discovered a setting change on the server side of things that relieved the pressure for the moment. The participants didn't (and likely still don't) want to wait the amount of time it would take for it to be implemented. They tend to want things NOW, so that they can have their credit keep going up, which is what Travis is noticing and posted about in this thread - that many participants are primarily concerned with getting credits, not what would make the most sense scientifically for the project. The real irony is that a better long-term plan would make it to where their credit can keep going up more smoothly as there would be less interruptions in total. Problem is, people are impatient... Already even with the mere mention of a newer server, people are asking for it to be ramped to full capacity right from the start. Doing that will just have the same problems for the new server eventually without a long-term plan. What needs to be done with a new server is things need to be left alone so that Travis wouldn't have to tend to it as much so that work can be done on getting other people up to speed on how to run the project and getting something changed to make MW_GPU or something similar happen. As far as I know, the project hadn't been focused on implementing more complex tasks for GPUs lately, although I don't know if they are thinking about it again now that server problems that are related to constant heavy I/O loads have come up again... ID: 33868 · Rating: 0 · rate: / Reply Quote

STE\/E Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0	Message 33870 - Posted: 27 Nov 2009, 23:31:30 UTC - in response to Message 33868. I have no problems with longer Wu's for my GPU's here, in fact the Short Wu's & Low Cache is the Main reason I switched to Collatz & haven't looked back once. I just got tired of the Constant Idleness of my GPU's here at Milkyway & all the Hassle of continually having to change App's to keep up. Then you, as someone who has GPUs, need to talk to your fellow GPU users and convince them that it's not an evil plan and would actually work better in the long run than just having more of the shorter tasks, because I can't do it. People tend to shoot me first, rather than the message, then proceed to shoot the message. No it's the Participants that are actually running the project that need to do it if that's what they want, I'm not running it ATM so I don't really care what they do with the Wu's. We complained a little at the Collatz Project so finally Slicker proposed to make the WU's Longer. After a bit when nobody objected he did it and nobody complained after he did it either. The Project has to want to do it & the Participating Participants have to want to do it then whats the problem with doing it. Obviously one or the other has an Objection to it or it would have been done already. MW_GPU was designed to have significantly more complex tasks so that it gave GPUs something more complex to work on because they can handle it. The project gave up on it after they discovered a setting change on the server side of things that relieved the pressure for the moment. The participants didn't (and likely still don't) want to wait the amount of time it would take for it to be implemented. They tend to want things NOW, so that they can have their credit keep going up, which is what Travis is noticing and posted about in this thread - that many participants are primarily concerned with getting credits, not what would make the most sense scientifically for the project. The real irony is that a better long-term plan would make it to where their credit can keep going up more smoothly as there would be less interruptions in total. Problem is, people are impatient... Already even with the mere mention of a newer server, people are asking for it to be ramped to full capacity right from the start. Doing that will just have the same problems for the new server eventually without a long-term plan. What needs to be done with a new server is things need to be left alone so that Travis wouldn't have to tend to it as much so that work can be done on getting other people up to speed on how to run the project and getting something changed to make MW_GPU or something similar happen. As far as I know, the project hadn't been focused on implementing more complex tasks for GPUs lately, although I don't know if they are thinking about it again now that server problems that are related to constant heavy I/O loads have come up again... It's probably to simple of a solution for them to implement ... ;) ID: 33870 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33872 - Posted: 27 Nov 2009, 23:59:59 UTC - in response to Message 33870. As far as I know, the project hadn't been focused on implementing more complex tasks for GPUs lately, although I don't know if they are thinking about it again now that server problems that are related to constant heavy I/O loads have come up again... It's probably to simple of a solution for them to implement ... ;) More likely it was a difficult solution to implement. They appear to have struggled with doing the CUDA code themselves. After that, they didn't have a way to implement the ATI application natively, so app_info override would still be required for ATI cards. I wish they would not have broadcast that they were looking at the new credit plan that DA has cooked up along with the announcement that they were looking at making the server-side changes to natively support ATI cards. That was a mistake. Not saying to not disclose it at all, it's just that due to the sensitivity of people it was not the best thing to have linked the two together. Anyway, this is all moot unless a new server gets installed or more complex scientific work is made available for GPU users. I wish that in the meantime people would settle down and realize that we've hit a hard limit on the capacity, and that adding strain to a server that is having problems will not help matters any... ID: 33872 · Rating: 0 · rate: / Reply Quote

STE\/E Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0	Message 33874 - Posted: 28 Nov 2009, 1:27:34 UTC Anyway, this is all moot unless a new server gets installed or more complex scientific work is made available for GPU users. I wish that in the meantime people would settle down and realize that we've hit a hard limit on the capacity, and that adding strain to a server that is having problems will not help matters any I don't think that will happen anytime soon as more & more Participants are jumping on the GPU Bandwagon because that's where the Credit is at and in some sense the Future of BOINC at many Projects as they get on Board with GPU Work. So if anything it's just going to get worse unless the Project does something to lessen the Strain on the Server. Putting a new Controller in isn't going to solve the problem either, it might make things a little smoother but there's still going to be the constant complaints about Low Caches & Caches being run out before new work is issued. So until the Cache Problem is solved the Project still going to have a problem too with people leaving & complaints about the Caches ... IMO ID: 33874 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 33876 - Posted: 28 Nov 2009, 2:18:45 UTC - in response to Message 33874. Last modified: 28 Nov 2009, 2:25:13 UTC Anyway, this is all moot unless a new server gets installed or more complex scientific work is made available for GPU users. I wish that in the meantime people would settle down and realize that we've hit a hard limit on the capacity, and that adding strain to a server that is having problems will not help matters any I don't think that will happen anytime soon as more & more Participants are jumping on the GPU Bandwagon because that's where the Credit is at and in some sense the Future of BOINC at many Projects as they get on Board with GPU Work. So if anything it's just going to get worse unless the Project does something to lessen the Strain on the Server. Putting a new Controller in isn't going to solve the problem either, it might make things a little smoother but there's still going to be the constant complaints about Low Caches & Caches being run out before new work is issued. So until the Cache Problem is solved the Project still going to have a problem too with people leaving & complaints about the Caches ... IMO Perhaps, but there will still be enough people that will stay. The credits are higher here, so they come back, even if it is only as a "filler" for the other project. Not only that, but having fewer GPUs for a little while would help give him a break with regards to having to continually stay on top of the server. That's really what they need, is some time to do things other than constantly deal with support. The team I was with at a prior job had that same problem - too much support to do. We got so far behind on other things because of having to devote so much time to support tasks. At one point in time we had 250-300 new problem tickets every hour, with only 1 person working the inbound tickets. Even diverting one other person meant that just to keep pace, you'd have to address 125-150 tickets an hour, so less than 30 seconds for each issue. It just wasn't doable...especially when the weak computer they gave several of us to use took 90 seconds just to open one of the tickets...and another 90 to close it. 3 minutes were taken up just in opening and closing, not even counting the work that needed to be done, which sometimes would take 2-3 hours (if there was a file corruption that needed to be dealt with). ID: 33876 · Rating: 0 · rate: / Reply Quote

Arion Send message Joined: 10 Aug 08 Posts: 218 Credit: 41,846,854 RAC: 0	Message 33885 - Posted: 28 Nov 2009, 9:57:00 UTC - in response to Message 33876. Brian, it seems one solution that has been overlooked is cutting the credits in 1/2 again. In that case all the GPU's would be looking at other projects that pay more and those with CPU's that were only here for the science would be left. I'd think that would be a long term solution to an overtaxed infrastructure. <smile> ID: 33885 · Rating: 0 · rate: / Reply Quote