Message boards :
Number crunching :
Milky Way, Project unfriendly.....
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
YoDude.SETI.USA [TopGun] Send message Joined: 29 May 09 Posts: 37 Credit: 34,016,951 RAC: 0 ![]() ![]() |
In my personal opinion, it really shouldn't matter what the cache size is. 1. The code should be smart enough to distinguish between CPU and GPU WU's. They should be considered separate entities. 2. The code should also utilize the "Switch between Projects" option and actually switch between the projects like it says it should. 3. The cache size should be set for each project and not all projects as a whole. To me, it seems like that should take care of it. Yo- |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 175 ![]() ![]() |
Why not limit the short WUs to CPU clients? That will at least help a little. Allowing GPU clients to cache more WUs would also solve the problem. That would certainly help the situation and I agree with the exception that the cache size should be more than doubled for GPUs. On the fastest GPUs the long WUs take 80 seconds, so even with only one of those GPUs on a dual core processor the cache amounts to 16 minutes. That's just not enough to allow BOINC to do any kind of management. The end result is that many get frustrated and dump the project altogether. It's only 1 in a 100 dissatisfied users that post here or even understand why it's not working correctly. We need to get the queue size up to a minimum of several hours in order for BOINC manager to schedule properly. |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 175 ![]() ![]() |
In my personal opinion, it really shouldn't matter what the cache size is. That can easily be implemented at the project level if the admins here choose to do so. 3. The cache size should be set for each project and not all projects as a whole. That would be VERY NICE but DA and the BOINC team do not want to do it. This change has been suggested for years but DA et all have shot it down repeatedly. 2. The code should also utilize the "Switch between Projects" option and actually switch between the projects like it says it should. Under the current BOINC design parameters it does do that. The problem is that BOINC always tries to fill the CPU or GPU queue to the level that it's set to do, so if a project doesn't allow that amount BOINC goes to the next project for work. Here's an example: Assume you set BOINC to queue even a minimal amount, let's say .1 day of work. That's 2.4 hours. Also assume for the sake of argument that you activate MilkyWay and Collatz and set MilkyWay to run 90% and Collatz 10%. What happens is that BOINC will DL it's allowed limit of 16 minutes of MilkyWay WUs. Then to fill that 2.4 hours it will DL over 2 hours worth of Collatz. Since BOINC is currently designed to process GPU WUs in a FIFO manner it will run the 16 minutes of MilkyWay and then proceed to do the 2+ hours of Collatz. The process then repeats. You can set an even shorter queue than .1 days but then if there's any kind of a network outage your machines will be totally dead in the water. The bottom line is that under the current BOINC design the only way to get a project to work properly is to allow a reasonable queue size, probably in the neighborhood of at least several hours. As far as I know MilkyWay is the only project that won't allow this. Thus the problem here. |
Chris S![]() Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 ![]() ![]() |
AFAIK, from the very beginning, it was envisaged that all projects would be equal in that they would give the same machine the same credit, cache, workload etc no matter which project they crunched for. That would allow people to choose the projects to support, based purely on the merits of the projects work itself. A very laudable and community oriented goal, but a bit naive. But we now have a new breed of cruncher who is more interested in the credit awarded than the worth of the project itself, not to mention the GPU onslaught. Neither of which were envisaged 10 years ago. Boinc and its senior people would greatly benefit from reviewing where they came from, where they are now, and where they want to go. Otherwise projects could consider transfering from Boinc to other distributed computing enterprises, such as Community World Grid, or setting up their own outfit, away from the current restrictive practices we are hearing about here. Boinc has been a remarkable academic success, but it doesn't seem to have transitioned very well from a university research project to real life in the 21st century. Let no-ne denigrate the successes that have been so far achieved, but the way forward needs to be more open and clearly defined. To take one example, there are people on the boards here and elsewhere that between them have an immeasurable amount of technical knowlege that could be utilised by Boinc to everyones advantage. And yet we hear of an entrenched head in the sand attitude to proffered advice. I'm sticking around for the future, but I really would like to see some positive changes...... Don't drink water, that's the stuff that rusts pipes |
![]() ![]() Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 ![]() ![]() |
And yet we hear of an entrenched head in the sand attitude to proffered advice. Maybe the acadmics don't want the proffered advice. They're not likely to get their doctorates by copying all the ATI stuff that already works. They have to rediscover all the pitfalls and put us all the the wringer again, surely? |
Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 ![]() ![]() |
Why not limit the short WUs to CPU clients? That will at least help a little. Allowing GPU clients to cache more WUs would also solve the problem. Same problem exists as before. Runtimes were increased by 4. To be effective in handling you all with GPUs without totally crushing the server, it needs to be increased by another factor of 10, perhaps 20, particularly if you're wanting "minimum of several hours". Allegedly 6 tasks was like 5 minutes back then, so 6 tasks are only 20 minutes now. To get to 3 hours, that means 9 times...or 54 tasks, but again, it's only 3 hours. I have the feeling most of you won't be happy unless it goes up to 8 hours, so that means an increase in cache of 24 times, or 144 tasks. Bumping your caches up by factors of that much will only cause problems. I do not know if the current type of tasks have that much room for expansion in their scientific value. That's why this whole time, I've said that the real long-term fix is a separate GPU project or separate type of work. |
YoDude.SETI.USA [TopGun] Send message Joined: 29 May 09 Posts: 37 Credit: 34,016,951 RAC: 0 ![]() ![]() |
Assume you set BOINC to queue even a minimal amount, let's say .1 day of work. That's 2.4 hours. Also assume for the sake of argument that you activate MilkyWay and Collatz and set MilkyWay to run 90% and Collatz 10%. What happens is that BOINC will DL it's allowed limit of 16 minutes of MilkyWay WUs. Then to fill that 2.4 hours it will DL over 2 hours worth of Collatz. Since BOINC is currently designed to process GPU WUs in a FIFO manner it will run the 16 minutes of MilkyWay and then proceed to do the 2+ hours of Collatz. Lets also assume that I have my "switch between projects" set to 60 mins. (default). In your example, the MW WU's get completed first, in 16 minutes, then the manager starts running Collatz because all the MW WU's are done and continues to run Collatz until there are none left. When the manager starts running the 2 hours worth of Collatz WU's, it should stop after 60 minutes, upload all the results, report all the results, switch back to MW, D/L 16 more minutes of work and run the WU's until they are all completed and then switch back to Collatz. Ideally, if the cache runs out or even low of WU's before the 60 min timer is up, the manager should be uploading and downloading (at a reasonably slow rate) new WU's to keep it's cache full, up to the point the 60 minutes is up and then report any remaining completed WU's and switch to Collatz and run that project in the exact same manner for 60 minutes. The way I see it, if I have my manager set to switch between projects every 60 minutes, it should do that and run a project for 60 minutes or until the cache runs dry and there are no WU's available. If the cache does in fact run dry, then it should switch to the other project, reset the timer to 60 minutes and go. Again CPU and GPU projects should be considered separate. As far as the "resource share" thing goes, I haven't seen it make any difference whatsoever in how the manager works, no matter what I change the settings to. So far as FIFO goes, yeah well, wouldn't it be better to have the manager run WU's with the closest due date rather than "X"-project got the WU's DL'd first, so we'll run them to completion? This is just my personal take on what I've seen and the way things "seem" they should be. Yo- |
![]() ![]() Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 ![]() ![]() |
I'd be happy with an hour of work cached for MW so that when a project maintenance back off occurs I can keep crunching or loose very little time. But my hour is different to your hour or his hour as we have different set ups. For example I have a 4850 and a 4870 in my Q9450 box. The longer wu's take approx 3.5 min and 3 min respectively. So that would mean I'd need to have cached 38 wu's, currently I get 24 which is about 39 minutes. With a few shorties of 55 seconds, that drops down to less than 30 minutes. In my old P4/HT with the 3850 in it, the long wu's take 10 minutes so I get closer to 2 hrs of cache or less with a few shorties. Imagine a box with 3 or 4 5870's in it....holy cow! |
Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 ![]() ![]() |
I'd be happy with an hour of work cached for MW so that when a project maintenance back off occurs I can keep crunching or loose very little time. But my hour is different to your hour or his hour as we have different set ups. For example I have a 4850 and a 4870 in my Q9450 box. The longer wu's take approx 3.5 min and 3 min respectively. So that would mean I'd need to have cached 38 wu's, currently I get 24 which is about 39 minutes. With a few shorties of 55 seconds, that drops down to less than 30 minutes. The previously mentioned 5 minute -> 20 minute cache thing was on a 4850 or a 4870, can't remember which. At any rate, the same logistical problem is in place - these tasks were originally designed to be processed by CPUs, not GPUs. Giving you all 100% of the 3-stream tasks is just a band-aid. It will not address the root cause, which is that the tasks just aren't complex enough. Allegedly the MW_GPU project was going to provide tasks perhaps 100 times the complexity. For whatever reason, that idea was tossed out. It needs to be brought to the front burner again... |
YoDude.SETI.USA [TopGun] Send message Joined: 29 May 09 Posts: 37 Credit: 34,016,951 RAC: 0 ![]() ![]() |
That would be VERY NICE but DA and the BOINC team do not want to do it. This change has been suggested for years but DA et all have shot it down repeatedly. Then DA and the BOINC team should consider getting some new management because if they won't fix the obvious, then it's time for whoever has the responsibility, to step down. .....And here is where the problems ultimately reside. Yo- |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 175 ![]() ![]() |
That would be VERY NICE but DA and the BOINC team do not want to do it. This change has been suggested for years but DA et all have shot it down repeatedly. It's a monarchy. The BOINC employees (programmers) have expressed their frustration about the situation on several occasions. They either do exactly what the monarch dictates or they're gone. Users requests, particularly power users who want some manual controls (like individual queue control, processor affinity, ability to set a backup project that DLs 1 WU at a time as needed, etc.) are smply ignored. |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 175 ![]() ![]() |
Why not limit the short WUs to CPU clients? That will at least help a little. Allowing GPU clients to cache more WUs would also solve the problem. I agree that the situation could be essentially solved by sending much longer WUs to GPUs. That can easily be done by BOINC. No need for separate projects, just set the server to send whatever type of WU they want to whatever type of client they choose. The GPU users have the same goals that you have, it shouldn't be much of a problem to make this work well for everyone and greatly improve the overall throughput of the project. |
![]() ![]() Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0 ![]() ![]() |
At CPDN we have the choice, in project preferences, from a selection of 5 types of WUs that we would prefer to crunch. We are only sent WUs of the type[s] we select. I don't know how this would be done at the server end, ask Tolu or Milo over there if you wish to implement such a thing here, Travis. A simple option of long/short WU in MW could reduce the amount of whingeing done in this thread. Cheers, PeterV ![]() |
![]() ![]() Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0 ![]() ![]() |
I second this approach, as it gives the user some control over what they do. ![]() |
![]() ![]() Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 ![]() ![]() |
Rosetta offers a choice of 1 to 24 hours per wu. @Verstapp: I know I mentioned that before. It would be nice to choose which size to run. It would cater a bit more to each person's abilities to run Boinc & MW. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 175 ![]() ![]() |
At CPDN we have the choice, in project preferences, from a selection of 5 types of WUs that we would prefer to crunch. We are only sent WUs of the type[s] we select. Another good idea. Some other projects that give the user a choice of WUs: PrimeGrid, Yoyo, SETI and NFS. Most also offer multiple types and also offer an option to receive other WUs if the chosen type is unavailable. I don't think it's about eliminating whining, but about respect for the needs of people doing the project's work. |
©2023 Astroinformatics Group