Welcome to MilkyWay@home

Host with WAY too many tasks.


Advanced search

Message boards : Number crunching : Host with WAY too many tasks.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
ProfileWerkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 134,966,966
RAC: 9,433
100 million credit badge10 year member badge
Message 45879 - Posted: 30 Jan 2011, 9:11:52 UTC - in response to Message 45875.  



if you hoard 50 Wu's, by the time the 50th wu gets done, it might not make much of a difference anymore, as the system has moved on. aka, you missed the boat.

in other words, we need the Wu's to come back completed as fast as possible. the longer a WU is in a host, the less chance it will benefit the project.



Cartoonman,
I totally agree what you posted except in one point: what is the reason then to run CPU-app's? By definition they may take up to 8 days to complete, which means, they missed the boat, as you posted. And if they validate against a gpu-wu, this one is also wasted.
So maybe it makes sense to think about the idea to stop the separation-CPU-wu's and use the cpu only for the nbody's.
ID: 45879 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 357
Credit: 16,320,358
RAC: 0
10 million credit badge10 year member badge
Message 45880 - Posted: 30 Jan 2011, 10:49:34 UTC - in response to Message 45875.  

however, i don't believe BOINC manager has the ability to see the GPU as a separate variable in terms of caching WU's just yet, correct me if i'm wrong.

Yes, it works. They used that feature not so long time ago at Seti after the 3 day outages. There were different "in progress" limits per CPU and per GPU, for example 8/CPU and 40/GPU, that's what they usually started with after the outage.

So for example a host with a dual core CPU and one GPU would get 2*8 + 40 = 56 WUs while a host with a single core CPU with 4 GPUs would get 8 + 4*40 = 168 WUs.

I think something like that should work here as well and I'm actually suprised that it has not been implemented yet.
.
ID: 45880 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge10 year member badge
Message 45882 - Posted: 30 Jan 2011, 13:29:16 UTC - in response to Message 45879.  



if you hoard 50 Wu's, by the time the 50th wu gets done, it might not make much of a difference anymore, as the system has moved on. aka, you missed the boat.

in other words, we need the Wu's to come back completed as fast as possible. the longer a WU is in a host, the less chance it will benefit the project.



Cartoonman,
I totally agree what you posted except in one point: what is the reason then to run CPU-app's? By definition they may take up to 8 days to complete, which means, they missed the boat, as you posted. And if they validate against a gpu-wu, this one is also wasted.
So maybe it makes sense to think about the idea to stop the separation-CPU-wu's and use the cpu only for the nbody's.



Generally most take 4-8 hours. A p4 takes 5-5.5 hours with current wus. I haven't seen any that take 8 days. Not sure that a computer slow enough could run MW even. They seem to validate fine. I have not had any be invalid. Initially the cpu vs gpu were inconclusive and needed another run, now they seem to validate just fine.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 45882 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWerkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 134,966,966
RAC: 9,433
100 million credit badge10 year member badge
Message 45884 - Posted: 30 Jan 2011, 14:19:04 UTC - in response to Message 45882.  



Generally most take 4-8 hours. A p4 takes 5-5.5 hours with current wus. I haven't seen any that take 8 days. Not sure that a computer slow enough could run MW even. They seem to validate fine. I have not had any be invalid. Initially the cpu vs gpu were inconclusive and needed another run, now they seem to validate just fine.


For what we are discussing here we need to talk about turnaround-time, not crunching time. This may take some days if not a 7/24 cruncher. Shure, they validate, but do they still help the project (as the system has moved on. aka, you missed the boat)?
ID: 45884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 502,204,694
RAC: 940
500 million credit badge10 year member badge
Message 45885 - Posted: 30 Jan 2011, 14:40:39 UTC

The whole "limit GPU queue size" because of turn around time is invalidated by allowing CPUs to run the WUs. We've suggested several times that the old WUs be limited to GPUs and allow CPUs to run N-Body WUs. What's the downside?

1) Even with an increased GPU WU cache the turn around time would be FAR faster.

2) The larger cache would result in more WUs being run via smoothing out workflow caused by the many outages.

3) More N-Body WUs would be run because all CPUs would be doing them.

4) Fewer people would dump the project due to frustration.

I suspect the problem is that the admins don't know how to do this. After all they're scientists, not programmers. There was recently a good article (I believe in Nature) regarding this problem. I'm sure Slicker at Collatz would be willing to help. He's a wizard at such things.
ID: 45885 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemdhittle*
Avatar

Send message
Joined: 25 Jun 10
Posts: 284
Credit: 260,490,091
RAC: 0
200 million credit badge10 year member badge
Message 45890 - Posted: 31 Jan 2011, 1:10:13 UTC - in response to Message 45885.  
Last modified: 31 Jan 2011, 1:13:25 UTC

I'm sure Slicker at Collatz would be willing to help. He's a wizard at such things.


And, Slicker has been using different cache sizes for CPU/GPU at Collatz for sometime now.

And DNETC has even subdivided it down to what kind (Nvidia/ATI) and model (HD 5XXX and not HD 5XXX) for the type of workunit that is being sent.

So, everything we are asking for, has already been accomplished at other projects.
ID: 45890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : Host with WAY too many tasks.

©2020 Astroinformatics Group