Welcome to MilkyWay@home

Milky Way, Project unfriendly.....


Advanced search

Message boards : Number crunching : Milky Way, Project unfriendly.....
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
YoDude.SETI.USA [TopGun]

Send message
Joined: 29 May 09
Posts: 37
Credit: 34,016,951
RAC: 0
30 million credit badge12 year member badge
Message 35155 - Posted: 7 Jan 2010, 15:45:20 UTC

In my personal opinion, it really shouldn't matter what the cache size is.

1. The code should be smart enough to distinguish between CPU and GPU WU's. They should be considered separate entities.

2. The code should also utilize the "Switch between Projects" option and actually switch between the projects like it says it should.

3. The cache size should be set for each project and not all projects as a whole.

To me, it seems like that should take care of it.

Yo-
ID: 35155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 503,272,888
RAC: 26,834
500 million credit badge13 year member badge
Message 35156 - Posted: 7 Jan 2010, 16:35:33 UTC - in response to Message 35153.  

Why not limit the short WUs to CPU clients? That will at least help a little. Allowing GPU clients to cache more WUs would also solve the problem.

I would think / hope that the server would be able to differentiate between a GPU and a CPU, so once all is in place, in theory the 3-stream tasks could go to those of you with GPUs and perhaps increase your caches as well, up to perhaps double what they are now. After that, the 1-stream and 2-stream tasks can go to CPU participants, again with double the cache (from 6 to 12).

That would certainly help the situation and I agree with the exception that the cache size should be more than doubled for GPUs. On the fastest GPUs the long WUs take 80 seconds, so even with only one of those GPUs on a dual core processor the cache amounts to 16 minutes. That's just not enough to allow BOINC to do any kind of management. The end result is that many get frustrated and dump the project altogether. It's only 1 in a 100 dissatisfied users that post here or even understand why it's not working correctly. We need to get the queue size up to a minimum of several hours in order for BOINC manager to schedule properly.

ID: 35156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 503,272,888
RAC: 26,834
500 million credit badge13 year member badge
Message 35160 - Posted: 7 Jan 2010, 17:50:36 UTC - in response to Message 35155.  
Last modified: 7 Jan 2010, 18:14:49 UTC

In my personal opinion, it really shouldn't matter what the cache size is.

1. The code should be smart enough to distinguish between CPU and GPU WU's. They should be considered separate entities.

That can easily be implemented at the project level if the admins here choose to do so.

3. The cache size should be set for each project and not all projects as a whole.

That would be VERY NICE but DA and the BOINC team do not want to do it. This change has been suggested for years but DA et all have shot it down repeatedly.

2. The code should also utilize the "Switch between Projects" option and actually switch between the projects like it says it should.

Under the current BOINC design parameters it does do that. The problem is that BOINC always tries to fill the CPU or GPU queue to the level that it's set to do, so if a project doesn't allow that amount BOINC goes to the next project for work. Here's an example:

Assume you set BOINC to queue even a minimal amount, let's say .1 day of work. That's 2.4 hours. Also assume for the sake of argument that you activate MilkyWay and Collatz and set MilkyWay to run 90% and Collatz 10%. What happens is that BOINC will DL it's allowed limit of 16 minutes of MilkyWay WUs. Then to fill that 2.4 hours it will DL over 2 hours worth of Collatz. Since BOINC is currently designed to process GPU WUs in a FIFO manner it will run the 16 minutes of MilkyWay and then proceed to do the 2+ hours of Collatz. The process then repeats. You can set an even shorter queue than .1 days but then if there's any kind of a network outage your machines will be totally dead in the water.

The bottom line is that under the current BOINC design the only way to get a project to work properly is to allow a reasonable queue size, probably in the neighborhood of at least several hours. As far as I know MilkyWay is the only project that won't allow this. Thus the problem here.
ID: 35160 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1391
Credit: 203,068,793
RAC: 21,811
200 million credit badge13 year member badge
Message 35163 - Posted: 7 Jan 2010, 18:38:42 UTC
Last modified: 7 Jan 2010, 18:40:48 UTC

AFAIK, from the very beginning, it was envisaged that all projects would be equal in that they would give the same machine the same credit, cache, workload etc no matter which project they crunched for. That would allow people to choose the projects to support, based purely on the merits of the projects work itself. A very laudable and community oriented goal, but a bit naive.

But we now have a new breed of cruncher who is more interested in the credit awarded than the worth of the project itself, not to mention the GPU onslaught. Neither of which were envisaged 10 years ago. Boinc and its senior people would greatly benefit from reviewing where they came from, where they are now, and where they want to go.

Otherwise projects could consider transfering from Boinc to other distributed computing enterprises, such as Community World Grid, or setting up their own outfit, away from the current restrictive practices we are hearing about here. Boinc has been a remarkable academic success, but it doesn't seem to have transitioned very well from a university research project to real life in the 21st century.

Let no-ne denigrate the successes that have been so far achieved, but the way forward needs to be more open and clearly defined. To take one example, there are people on the boards here and elsewhere that between them have an immeasurable amount of technical knowlege that could be utilised by Boinc to everyones advantage. And yet we hear of an entrenched head in the sand attitude to proffered advice.

I'm sticking around for the future, but I really would like to see some positive changes......
Don't drink water, that's the stuff that rusts pipes
ID: 35163 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileGalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
100 million credit badge13 year member badge
Message 35171 - Posted: 7 Jan 2010, 22:52:28 UTC - in response to Message 35163.  
Last modified: 7 Jan 2010, 23:36:10 UTC

And yet we hear of an entrenched head in the sand attitude to proffered advice.

Maybe the acadmics don't want the proffered advice. They're not likely to get their doctorates by copying all the ATI stuff that already works. They have to rediscover all the pitfalls and put us all the the wringer again, surely?

ID: 35171 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge13 year member badge
Message 35178 - Posted: 8 Jan 2010, 1:16:16 UTC - in response to Message 35156.  
Last modified: 8 Jan 2010, 1:17:27 UTC

Why not limit the short WUs to CPU clients? That will at least help a little. Allowing GPU clients to cache more WUs would also solve the problem.

I would think / hope that the server would be able to differentiate between a GPU and a CPU, so once all is in place, in theory the 3-stream tasks could go to those of you with GPUs and perhaps increase your caches as well, up to perhaps double what they are now. After that, the 1-stream and 2-stream tasks can go to CPU participants, again with double the cache (from 6 to 12).

That would certainly help the situation and I agree with the exception that the cache size should be more than doubled for GPUs. On the fastest GPUs the long WUs take 80 seconds, so even with only one of those GPUs on a dual core processor the cache amounts to 16 minutes. That's just not enough to allow BOINC to do any kind of management. The end result is that many get frustrated and dump the project altogether. It's only 1 in a 100 dissatisfied users that post here or even understand why it's not working correctly. We need to get the queue size up to a minimum of several hours in order for BOINC manager to schedule properly.



Same problem exists as before. Runtimes were increased by 4. To be effective in handling you all with GPUs without totally crushing the server, it needs to be increased by another factor of 10, perhaps 20, particularly if you're wanting "minimum of several hours". Allegedly 6 tasks was like 5 minutes back then, so 6 tasks are only 20 minutes now. To get to 3 hours, that means 9 times...or 54 tasks, but again, it's only 3 hours. I have the feeling most of you won't be happy unless it goes up to 8 hours, so that means an increase in cache of 24 times, or 144 tasks. Bumping your caches up by factors of that much will only cause problems. I do not know if the current type of tasks have that much room for expansion in their scientific value. That's why this whole time, I've said that the real long-term fix is a separate GPU project or separate type of work.
ID: 35178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YoDude.SETI.USA [TopGun]

Send message
Joined: 29 May 09
Posts: 37
Credit: 34,016,951
RAC: 0
30 million credit badge12 year member badge
Message 35183 - Posted: 8 Jan 2010, 2:27:24 UTC - in response to Message 35160.  

Assume you set BOINC to queue even a minimal amount, let's say .1 day of work. That's 2.4 hours. Also assume for the sake of argument that you activate MilkyWay and Collatz and set MilkyWay to run 90% and Collatz 10%. What happens is that BOINC will DL it's allowed limit of 16 minutes of MilkyWay WUs. Then to fill that 2.4 hours it will DL over 2 hours worth of Collatz. Since BOINC is currently designed to process GPU WUs in a FIFO manner it will run the 16 minutes of MilkyWay and then proceed to do the 2+ hours of Collatz.


Lets also assume that I have my "switch between projects" set to 60 mins. (default). In your example, the MW WU's get completed first, in 16 minutes, then the manager starts running Collatz because all the MW WU's are done and continues to run Collatz until there are none left.

When the manager starts running the 2 hours worth of Collatz WU's, it should stop after 60 minutes, upload all the results, report all the results, switch back to MW, D/L 16 more minutes of work and run the WU's until they are all completed and then switch back to Collatz.

Ideally, if the cache runs out or even low of WU's before the 60 min timer is up, the manager should be uploading and downloading (at a reasonably slow rate) new WU's to keep it's cache full, up to the point the 60 minutes is up and then report any remaining completed WU's and switch to Collatz and run that project in the exact same manner for 60 minutes.

The way I see it, if I have my manager set to switch between projects every 60 minutes, it should do that and run a project for 60 minutes or until the cache runs dry and there are no WU's available. If the cache does in fact run dry, then it should switch to the other project, reset the timer to 60 minutes and go. Again CPU and GPU projects should be considered separate.

As far as the "resource share" thing goes, I haven't seen it make any difference whatsoever in how the manager works, no matter what I change the settings to.

So far as FIFO goes, yeah well, wouldn't it be better to have the manager run WU's with the closest due date rather than "X"-project got the WU's DL'd first, so we'll run them to completion?

This is just my personal take on what I've seen and the way things "seem" they should be.

Yo-
ID: 35183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge13 year member badge
Message 35184 - Posted: 8 Jan 2010, 2:27:53 UTC

I'd be happy with an hour of work cached for MW so that when a project maintenance back off occurs I can keep crunching or loose very little time. But my hour is different to your hour or his hour as we have different set ups. For example I have a 4850 and a 4870 in my Q9450 box. The longer wu's take approx 3.5 min and 3 min respectively. So that would mean I'd need to have cached 38 wu's, currently I get 24 which is about 39 minutes. With a few shorties of 55 seconds, that drops down to less than 30 minutes.

In my old P4/HT with the 3850 in it, the long wu's take 10 minutes so I get closer to 2 hrs of cache or less with a few shorties.

Imagine a box with 3 or 4 5870's in it....holy cow!
ID: 35184 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
500 thousand credit badge13 year member badge
Message 35186 - Posted: 8 Jan 2010, 4:16:06 UTC - in response to Message 35184.  

I'd be happy with an hour of work cached for MW so that when a project maintenance back off occurs I can keep crunching or loose very little time. But my hour is different to your hour or his hour as we have different set ups. For example I have a 4850 and a 4870 in my Q9450 box. The longer wu's take approx 3.5 min and 3 min respectively. So that would mean I'd need to have cached 38 wu's, currently I get 24 which is about 39 minutes. With a few shorties of 55 seconds, that drops down to less than 30 minutes.

In my old P4/HT with the 3850 in it, the long wu's take 10 minutes so I get closer to 2 hrs of cache or less with a few shorties.

Imagine a box with 3 or 4 5870's in it....holy cow!


The previously mentioned 5 minute -> 20 minute cache thing was on a 4850 or a 4870, can't remember which. At any rate, the same logistical problem is in place - these tasks were originally designed to be processed by CPUs, not GPUs. Giving you all 100% of the 3-stream tasks is just a band-aid. It will not address the root cause, which is that the tasks just aren't complex enough. Allegedly the MW_GPU project was going to provide tasks perhaps 100 times the complexity. For whatever reason, that idea was tossed out. It needs to be brought to the front burner again...
ID: 35186 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YoDude.SETI.USA [TopGun]

Send message
Joined: 29 May 09
Posts: 37
Credit: 34,016,951
RAC: 0
30 million credit badge12 year member badge
Message 35187 - Posted: 8 Jan 2010, 5:24:20 UTC - in response to Message 35160.  

That would be VERY NICE but DA and the BOINC team do not want to do it. This change has been suggested for years but DA et all have shot it down repeatedly.


Then DA and the BOINC team should consider getting some new management because if they won't fix the obvious, then it's time for whoever has the responsibility, to step down. .....And here is where the problems ultimately reside.

Yo-
ID: 35187 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 503,272,888
RAC: 26,834
500 million credit badge13 year member badge
Message 35190 - Posted: 8 Jan 2010, 10:05:36 UTC - in response to Message 35187.  

That would be VERY NICE but DA and the BOINC team do not want to do it. This change has been suggested for years but DA et all have shot it down repeatedly.

Then DA and the BOINC team should consider getting some new management because if they won't fix the obvious, then it's time for whoever has the responsibility, to step down. .....And here is where the problems ultimately reside.

Yo-

It's a monarchy. The BOINC employees (programmers) have expressed their frustration about the situation on several occasions. They either do exactly what the monarch dictates or they're gone. Users requests, particularly power users who want some manual controls (like individual queue control, processor affinity, ability to set a backup project that DLs 1 WU at a time as needed, etc.) are smply ignored.

ID: 35190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 503,272,888
RAC: 26,834
500 million credit badge13 year member badge
Message 35192 - Posted: 8 Jan 2010, 10:16:04 UTC - in response to Message 35178.  

Why not limit the short WUs to CPU clients? That will at least help a little. Allowing GPU clients to cache more WUs would also solve the problem.

I would think / hope that the server would be able to differentiate between a GPU and a CPU, so once all is in place, in theory the 3-stream tasks could go to those of you with GPUs and perhaps increase your caches as well, up to perhaps double what they are now. After that, the 1-stream and 2-stream tasks can go to CPU participants, again with double the cache (from 6 to 12).

That would certainly help the situation and I agree with the exception that the cache size should be more than doubled for GPUs. On the fastest GPUs the long WUs take 80 seconds, so even with only one of those GPUs on a dual core processor the cache amounts to 16 minutes. That's just not enough to allow BOINC to do any kind of management. The end result is that many get frustrated and dump the project altogether. It's only 1 in a 100 dissatisfied users that post here or even understand why it's not working correctly. We need to get the queue size up to a minimum of several hours in order for BOINC manager to schedule properly.

Same problem exists as before. Runtimes were increased by 4. To be effective in handling you all with GPUs without totally crushing the server, it needs to be increased by another factor of 10, perhaps 20, particularly if you're wanting "minimum of several hours". Allegedly 6 tasks was like 5 minutes back then, so 6 tasks are only 20 minutes now. To get to 3 hours, that means 9 times...or 54 tasks, but again, it's only 3 hours. I have the feeling most of you won't be happy unless it goes up to 8 hours, so that means an increase in cache of 24 times, or 144 tasks. Bumping your caches up by factors of that much will only cause problems. I do not know if the current type of tasks have that much room for expansion in their scientific value. That's why this whole time, I've said that the real long-term fix is a separate GPU project or separate type of work.

I agree that the situation could be essentially solved by sending much longer WUs to GPUs. That can easily be done by BOINC. No need for separate projects, just set the server to send whatever type of WU they want to whatever type of client they choose. The GPU users have the same goals that you have, it shouldn't be much of a problem to make this work well for everyone and greatly improve the overall throughput of the project.

ID: 35192 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profileverstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
300 million credit badge12 year member badge
Message 35197 - Posted: 8 Jan 2010, 11:57:09 UTC

At CPDN we have the choice, in project preferences, from a selection of 5 types of WUs that we would prefer to crunch. We are only sent WUs of the type[s] we select.
I don't know how this would be done at the server end, ask Tolu or Milo over there if you wish to implement such a thing here, Travis.
A simple option of long/short WU in MW could reduce the amount of whingeing done in this thread.

Cheers,

PeterV

.
ID: 35197 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileDavid Glogau*
Avatar

Send message
Joined: 12 Aug 09
Posts: 172
Credit: 645,240,165
RAC: 0
500 million credit badge12 year member badge
Message 35200 - Posted: 8 Jan 2010, 12:40:15 UTC - in response to Message 35197.  

I second this approach, as it gives the user some control over what they do.
ID: 35200 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge14 year member badge
Message 35206 - Posted: 8 Jan 2010, 13:26:11 UTC - in response to Message 35197.  

Rosetta offers a choice of 1 to 24 hours per wu.

@Verstapp: I know I mentioned that before. It would be nice to choose which size to run. It would cater a bit more to each person's abilities to run Boinc & MW.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 35206 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 503,272,888
RAC: 26,834
500 million credit badge13 year member badge
Message 35211 - Posted: 8 Jan 2010, 17:22:56 UTC - in response to Message 35197.  

At CPDN we have the choice, in project preferences, from a selection of 5 types of WUs that we would prefer to crunch. We are only sent WUs of the type[s] we select.
I don't know how this would be done at the server end, ask Tolu or Milo over there if you wish to implement such a thing here, Travis.
A simple option of long/short WU in MW could reduce the amount of whingeing done in this thread.

Another good idea. Some other projects that give the user a choice of WUs: PrimeGrid, Yoyo, SETI and NFS.
Most also offer multiple types and also offer an option to receive other WUs if the chosen type is unavailable.
I don't think it's about eliminating whining, but about respect for the needs of people doing the project's work.
ID: 35211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Milky Way, Project unfriendly.....

©2021 Astroinformatics Group