Message boards :
Number crunching :
Problem with tiny cache in MW
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 8 · Next
Author | Message |
---|---|
Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0 |
Sorry but you cant get rid of the cpu side, as Ice said if not for that no one would be using GPU or optimised apps, and you people who are whingeing about the work queue go else where |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
My solution to keep the GPU busy is to use a BOINC core client modified by Twodee, one of the good guys from planet3dnow.de. It's actually very easy: you've got a separate section for MW in the cc_config.xml where you can tell it to report & fetch work every x seconds (timeintervall), keep a cache (if possible) for y seconds (requestworktime) and to report enough cores so you can get the maximum 8 x 6 = 48 WU cache (hostinfo_ncpus). Thanks ET for the heads up. Will this alternative BOINC even allow an X2 to DL 48 WUs or will it still be limited to 12? Also, I tried this last night and it didn't allow me to DL WUs from either MW or Collatz, said there was no ATI GPU. I'm probably missing something simple... On another note, can we keep this thread more or less on the topic of the tiny MW cache, solutions for the problem, and ways to get around it to make the client work. Thanks. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
We sure could, but I still think you haven't differentiated between a type of project that has work that is not dependent upon current workflow and a type of project that is dependent upon current workflow. This project's work generation is linked to work that is currently being processed. The more you have in a "waiting" state sitting on computers, regardless of CPU, GPU, credit competition, or true scientific dedication, means that more tasks are being held by fewer systems. As such, if those systems reduce their participation, for whatever reason (power outage, floods, needing to go out of town, needing to do other work on the computer, etc...), then the total amount of input back into the project for new work generation could decrease, thus reducing the amount of new work available and possibly creating a period where no work is available. The real solution is getting the GPU project going to do the more complex work that they claimed they wanted to do, since the existing work can still be done in a reasonable amount of time with CPUs... |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
Actually I did, but my suggestion created an uproar tha's not the purpose of this thread. I'd like to leave that part of the subject drop so we can concentrate on the OT. The real solution is getting the GPU project going to do the more complex work that they claimed they wanted to do, since the existing work can still be done in a reasonable amount of time with CPUs... That may well be and I didn't pay much attention to that controversy as it just doesn't make sense for ME to use my CPUs here when they can actually do useful work elsewhere. The subject of whether or not there should be separate projects would make a great new topic though if someone wants to address it again. I'd like to stay away from that issue here though. |
Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0 |
Will this alternative BOINC even allow an X2 to DL 48 WUs or will it still be limited to 12? Yes. If you set BOINc to use 100% of your CPUs set the number to 8, or more if you use less than 100% (I need 75% on my quad for maximum GPU performance, so I set it to 12 CPUs). Also, I tried this last night and it didn't allow me to DL WUs from either MW or Collatz, said there was no ATI GPU. I'm probably missing something simple... Probably should have stated that as well: it's still based on 6.6.23, so it neither knows about ATIs nor can report them to the server. So you still need the anonymous platform with the manually installed clients, for both MW and Collatz. Hope that solves it! MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
The problem is, it is relevant to the OT. If more people are holding onto work for longer, whether they be GPU or CPU, the throughput back into the project that drives new work creation could drop. I say "could", because it may not drop. What it will not do is increase. There are enough of us that are CPU-only that are still here that will practically ensure that won't happen, as we'll also get the increased number of tasks in cache... The real solution is getting the GPU project going to do the more complex work that they claimed they wanted to do, since the existing work can still be done in a reasonable amount of time with CPUs... Continued efforts by those of you with GPUs, or even CPUs for that matter, although it's more of an impact by GPUs, to utilize scripts or other methods to grab more work will end up having people complaining about no work being available at all. It's what happened before, and it will almost certainly happen again. Your solution that you should be concentrating on is getting the GPU project back to the front burner for Travis and Dave, not in trying to hammer the existing server. If people start hammering on it to the point where there's no work available, what good will that have done? The GPU-only project would make everyone much happier all around...at least as far as having a steady flow of work for their CPUs or GPUs. People may not like the different stats, which is part of what the large outcry was before when the separate projects were brought up... |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
Probably should have stated that as well: it's still based on 6.6.23, so it neither knows about ATIs nor can report them to the server. So you still need the anonymous platform with the manually installed clients, for both MW and Collatz. Hope that solves it! Saw that and DLed 6.6.23 then replaced the boinc.exe and app_info.xml. Briefly tried to get it going but saw on the forum that Twodee will probably be bringing out an enhanced version based on 6.10.x. Most likely will give it another try when the update hits the streets. The native ATI support is more than I want to give up at this point. I've got it switching, it's just that Collatz is getting MUCH, MUCH MORE time than MW regardless of the project settings. Since this is due to the silly MW 12 - 24 minute cache limit and judging from the difference in the responsiveness of the project admins it seems fitting that Collatz gets the lions share of the GPU time anyway. Maybe that will change but I'd advise against holding your breath during the wait :-( |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Probably should have stated that as well: it's still based on 6.6.23, so it neither knows about ATIs nor can report them to the server. So you still need the anonymous platform with the manually installed clients, for both MW and Collatz. Hope that solves it! It is not solely due to the cache size, it is also due to UCB deciding the GPU work will be done in strict FIFO order. A rule put in place to handle the chaotic behavior of the resource scheduler partly caused by an inappropriate internal design that re-shuffles the task list on every drop of a hat of change in state (I forget the count of things that can cause this, but it is over 12 IIRC) ... anyway, with the instability and some other bugs in the code this FIFO thing was imposed ... And now RS is ignored. You will see other bad behavior if you mixed Collatz with say GPU Grid with Collatz probably dominating there too ... I have not yet tried that test to see ... but, if it evens out it is only because GPU Grid takes so long ... |
Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0 |
If people start hammering on it to the point where there's no work available, what good will that have done? You're right that a general cache increase will increase average result turn around times, which is not what the project wants. On the other hand the very small cache causes GPUs to run dry when ever anything gets out of order for even the shortest amount of time. This reduces overall result throughput, which is not what the project should want either. The ideal solution would be a perfect balance between latency and throughput. This is something only the project staff could determine: if they'd like to run more searches in parallel, they need higher throughput. If they're waiting for results to return they'd need faster turn around times / lower latencies. That leads me to the conclusion that it would be a good idea to tie the allowed number of concurrent WUs to a hosts average turn around time, a value easily accessible to the server. The project could set the desired value quite flexible / dynamically (within sane limits), depending on the current search state. The allowed WUs per host could be updated daily to keep the number of additional database requests low. Imagine the situation the Op is in: he's got a dual core, so currently a maximum of 12 WUs. He says for him that lasts about 12 min. For me it would be 8.6 mins. For a 5870 it should last just 4 mins. This is actually causing us to hammer the server or to run idle - neither of which does the project any good. If the cache size was increased for fast hosts only, the number of server requests / database accesses could be reduced. MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
That leads me to the conclusion that it would be a good idea to tie the allowed number of concurrent WUs to a hosts average turn around time, a value easily accessible to the server. The project could set the desired value quite flexible / dynamically (within sane limits), depending on the current search state. The allowed WUs per host could be updated daily to keep the number of additional database requests low. Didn't know that this was possible but it sounds like a great solution that would be most beneficial to the project as well as relieve some user frustration. It would help keep slow hosts from holding unfinished WUs for long periods and allow hosts with fast turn around times to process WUs more optimally. Good idea. Another question, do the admins participate in the forum or are we just talking to ourselves? |
Send message Joined: 31 Mar 08 Posts: 61 Credit: 18,325,284 RAC: 0 |
Talking to ourselves. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Yes. 99.999999999999...% ourselves. Previous posts were made by travis about 'discuss the credit cut' and then he won't show up again for 6-8 weeks and say he was snowboarding or sick again. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
If people start hammering on it to the point where there's no work available, what good will that have done? While this might seem like a wonderful idea on the surface, what happens when hosts start getting denied work? Do we then have the loud complaining about how their systems "aren't good enough" and how the project is "unfairly biased towards certain people"? The single best solution is a totally separate project for you GPU folks... It's either that or figure out how to increase the work done by GPUs by a factor of about 100, keeping the credit ratio the same. Doing the above would make you all happy, and would make those of us that aren't as fortunate to have money to spend on such items not feel like we're being told that we're expendable. It's looking like we took most of the credit cut anyway, based on dropped average credit levels for CPUs while (at least) ATI GPUs, when switched to 0.20, remained constant on average, meaning not involving work fetch struggles...as in the average credit per unit time has remained relatively constant for you all, or possibly even gone up some.
...and if GPUs were separated from CPUs or had their tasks made significantly longer, you all could have a huge playground all to yourselves...and both groups could be happy... |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Unfortunately we have a collision of capabilities and limitations. The BOINC server code has any number of compromises along with other segments of the architecture. Add in the limited capacity of the project and the unfortunate ability of us to process work quickly, as rapidly as tens of seconds per task and you have a perfect storm. Make an adjustment here and the system breaks there ... One of the ways to allow more work in flight is to increase the number of parallel searches. Problem is that more searches in flight means that there are more rows in the database, more rows, slower access... I know most want to ignore history ... but we liked to kill LHC for the same reasons. They stopped taking new participants several times because of the impact on the server ... similar things happened on a couple other projects that were overly popular and it took time to settle down. One of the hopes is that GPU Grid will get an OpenCL version of their app going or that more projects make native ports of their application to stream from CUDA ... |
Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0 |
While this might seem like a wonderful idea on the surface, what happens when hosts start getting denied work? Right now everyone is being denied work - if the work request is large enough. I don't think it would be any worse if my suggestion was put into place. Except for people with relatively slow CPUs who used to "hog" a couple of WUs. But, as you correctly explained previously, that's something the project would want to avoid anyway to get results back quickly, isn't it? What changes, though, is that everyone could keep enough work for a set amount of time. IMO that's actually more fair than the current system. The single best solution is a totally separate project for you GPU folks... At first glance it is, ignoring any doubt whether Travis will or can do this. However, I honestly think my suggestion would be better: it could bring the average turn around time across all computing ressources (CPU, GPU) to the optimal value. There's more: if the server gets overloaded by the number of WUs the number of searches within each WU (I think it's called "multi stream" and was introduced in spring) could be adapted. Once a few lines of code are written this balancing could happen automatically (again, must not forget to set sane limits, otherwise something *will* go wrong). There would be no need to fine tune the project settings manually and there'd be only one project to manage (already hard enough, apparently). Let's assume the separate GPU project was created and the tasks were made significantly longer. That would result in enough work for everyone - fine. But how'd you decide how large the allowed cache should be? By now we've already got at least a factor of 10 in speed between the slowest GPU (3850) and the fastest (5870). There could be hosts with multiple GPUs, requiring even more WUs. How could you balance things, keeping the latency low while supplying everyone with enough work? I dare say you can't, not with static settings. Be it 1 or 2 projects. MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
Except for people with relatively slow CPUs who used to "hog" a couple of WUs. What is "relatively slow"? Pentium II? Pentium III? Pentium 4? AthlonXP? Athlon64? Bottom-end Core2? At this point in time, the project itself is not mentioning that it needs results in any faster than the rate they are coming in at now. My guess is that their infrastructure couldn't deal with the project being faster than what it is right now... I think the competitiveness needs to be mingled with just a modicum of patience...to understand that there is a fine line between being competitive and coming off as greedy. The single best solution is a totally separate project for you GPU folks... The idea is far too short-sighted. It aims to quell the complaint of the moment. The separate project or separate types of workunits has much more benefit long-term. You could still employ the ideas you mentioned as well. However, nobody has the patience for such long-term planning it seems... |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
At this point in time, the project itself is not mentioning that it needs results in any faster than the rate they are coming in at now. My guess is that their infrastructure couldn't deal with the project being faster than what it is right now... I remember some time ago (months) before atleast two good speed ups were implemented that Travis and bunch didn't want to make the wus anymore complex and they were turning in data fast enough. I'm sure by now the data is only building up and will take longer to put together than to crunch. |
Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0 |
What is "relatively slow"? Pentium II? Pentium III? Pentium 4? AthlonXP? Athlon64? Bottom-end Core2? Doesn't really matter. Anything that yields a "larger than average" turn around time. At this point in time, the project itself is not mentioning that it needs results in any faster than the rate they are coming in at now. That's true. And actually it was your argument against increased cache sizes that made me consider "average" turn around time" as an important parameter ;) I think the competitiveness needs to be mingled with just a modicum of patience...to understand that there is a fine line between being competitive and coming off as greedy. Sure, I'm not screaming for immediate action. I'm seeing a situation which is less than ideal and I'd like to give the project staff some help. And I hope that you still know from our dicussions during spring that I'm in it for efficiency, not personal gain or greediness. (BTW I do have enough WUs for my GPU now, until the weekly server breakdowns) So what are the other long-term benefits of separated projects exactly? Individual credit adjustment? (that may well be needed but goes against current BOINC policy) The possibility to have functionally different clients for both? (then you have to maintain both code bases and I think it's not needed) More satisfied users overall? Combining my suggestion with the separate projects is indeed a very good suggestion. nobody has the patience for such long-term planning it seems... Personally I don't think that's the reason. It's more about being sceptical if it will be done properly in a "proper" time frame. Remember spring when WU availability had been really bad since 2 months and approached the point of non-existence? I remember you were one of the guys saying "relax, separate project's on the way". I wanted a hot fix and a long term solution, as I didn't believe the CUDA app would be out for some months to come. I think this shows best why I'm sceptical about the separate projects. And, btw, we finally did get the hot fix: the WU generator was set to run more often... MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
At this point in time, the project itself is not mentioning that it needs results in any faster than the rate they are coming in at now. My guess is that their infrastructure couldn't deal with the project being faster than what it is right now... Right, and that's where VOLUNTEERS need to understand that if the scientists are happy with the processing rate, then demanding to be catered to more is a bit selfish. Yeah, it's people's right to do with their time and resources whatever they want, and if a large group of users want to "boycott" the project because they aren't getting their competitive nature satisfied, that's their right, but there will still be enough people processing the data here I'd imagine... Look at Cosmology for example. The admins there haven't fixed problems in going on 7 months now (since the server crash), and there are problems that have existed longer than that too. Obviously the data coming back in is "good enough" for them. They don't seem to care that the memory requirements have choked out anyone with systems that have less than 1GB of memory. They don't seem to care that even people with 4-8GB of memory are stating that they're seeing performance problems. They don't seem to care that large groups of new users simply abandon tasks when they find out how long they take or how much memory they take, so those of us that remain have to go through periods of download failures because they don't keep the parameter files on the server after the first person has downloaded it. You can't edit your profile. On and on and on goes the list of the problems there. The only reason I still participate a little is because I like what the Planck mission is doing and supposedly what we're processing helps that mission. I dunno... I think I need to either give up on BOINC or move to a project like CPDN where tasks can just run for real long amounts of time and where the competitive sorts don't comprise a large percentage of the user base... |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
What is "relatively slow"? Pentium II? Pentium III? Pentium 4? AthlonXP? Athlon64? Bottom-end Core2? Then odds are both of my systems will be disallowed, as they both take over an hour for each task and are single-core, non-hyperthreaded. My AMD is at 0.26 days, while my P4 is at 0.34 days. Your slowest GPU system is at 0.03 days. The other is 0.01 days. So, I'm at best 8.6 times slower than your slowest system, and at worst 34 times slower than your fastest system by that metric. At this point in time, the project itself is not mentioning that it needs results in any faster than the rate they are coming in at now. Well, I appreciate that, but if something "must be done", I'd rather it be the longer tasks or separate project. There is just too much processing power available for the current tasks, and clearly those of us with CPUs get hit harder with more complex tasks due to less parallelism...
It really depends on whether or not the project wishes to do more complex work. They had mentioned making work a lot more complex for GPUs early on when the MilkyWay_GPU site got created. Then that died off when whatever the setting change was happened that allowed work to flow better... I felt at the time we'd run into problems again once CUDA came on board, and, well, here we are... I dunno. Without the more complex work, it is not worth doing anything suggested. Unless something changes, the project isn't going to be around much longer anyway, IMO (the request for donations and the funding problems). Remember spring when WU availability had been really bad since 2 months and approached the point of non-existence? I remember you were one of the guys saying "relax, separate project's on the way". I wanted a hot fix and a long term solution, as I didn't believe the CUDA app would be out for some months to come. Yep, and as I said, I was not happy that they did an about-face with the separated projects. It was too short-sighted. However, if they just don't want to do more complex tasks, then like I said to banditwolf, if the scientists are happy with the processing rate, then demanding to be catered to more is a bit selfish. People can huff and leave if they want...it's their right to do so... However, I think demanding more so that they can have more points from a broken credit system, especially when it comes to likely pushing me out of participation in the project due to having "slow hosts" is a bit much... |
©2024 Astroinformatics Group