Host with WAY too many tasks.

Author	Message
Werkstatt Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0	Message 45879 - Posted: 30 Jan 2011, 9:11:52 UTC - in response to Message 45875. if you hoard 50 Wu's, by the time the 50th wu gets done, it might not make much of a difference anymore, as the system has moved on. aka, you missed the boat. in other words, we need the Wu's to come back completed as fast as possible. the longer a WU is in a host, the less chance it will benefit the project. Cartoonman, I totally agree what you posted except in one point: what is the reason then to run CPU-app's? By definition they may take up to 8 days to complete, which means, they missed the boat, as you posted. And if they validate against a gpu-wu, this one is also wasted. So maybe it makes sense to think about the idea to stop the separation-CPU-wu's and use the cpu only for the nbody's. ID: 45879 · Rating: 0 · rate: / Reply Quote

Link Send message Joined: 19 Jul 10 Posts: 597 Credit: 18,982,369 RAC: 5,800	Message 45880 - Posted: 30 Jan 2011, 10:49:34 UTC - in response to Message 45875. however, i don't believe BOINC manager has the ability to see the GPU as a separate variable in terms of caching WU's just yet, correct me if i'm wrong. Yes, it works. They used that feature not so long time ago at Seti after the 3 day outages. There were different "in progress" limits per CPU and per GPU, for example 8/CPU and 40/GPU, that's what they usually started with after the outage. So for example a host with a dual core CPU and one GPU would get 28 + 40 = 56 WUs while a host with a single core CPU with 4 GPUs would get 8 + 440 = 168 WUs. I think something like that should work here as well and I'm actually suprised that it has not been implemented yet. ID: 45880 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 45882 - Posted: 30 Jan 2011, 13:29:16 UTC - in response to Message 45879. if you hoard 50 Wu's, by the time the 50th wu gets done, it might not make much of a difference anymore, as the system has moved on. aka, you missed the boat. in other words, we need the Wu's to come back completed as fast as possible. the longer a WU is in a host, the less chance it will benefit the project. Cartoonman, I totally agree what you posted except in one point: what is the reason then to run CPU-app's? By definition they may take up to 8 days to complete, which means, they missed the boat, as you posted. And if they validate against a gpu-wu, this one is also wasted. So maybe it makes sense to think about the idea to stop the separation-CPU-wu's and use the cpu only for the nbody's. Generally most take 4-8 hours. A p4 takes 5-5.5 hours with current wus. I haven't seen any that take 8 days. Not sure that a computer slow enough could run MW even. They seem to validate fine. I have not had any be invalid. Initially the cpu vs gpu were inconclusive and needed another run, now they seem to validate just fine. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 45882 · Rating: 0 · rate: / Reply Quote

Werkstatt Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0	Message 45884 - Posted: 30 Jan 2011, 14:19:04 UTC - in response to Message 45882. Generally most take 4-8 hours. A p4 takes 5-5.5 hours with current wus. I haven't seen any that take 8 days. Not sure that a computer slow enough could run MW even. They seem to validate fine. I have not had any be invalid. Initially the cpu vs gpu were inconclusive and needed another run, now they seem to validate just fine. For what we are discussing here we need to talk about turnaround-time, not crunching time. This may take some days if not a 7/24 cruncher. Shure, they validate, but do they still help the project (as the system has moved on. aka, you missed the boat)? ID: 45884 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0	Message 45885 - Posted: 30 Jan 2011, 14:40:39 UTC The whole "limit GPU queue size" because of turn around time is invalidated by allowing CPUs to run the WUs. We've suggested several times that the old WUs be limited to GPUs and allow CPUs to run N-Body WUs. What's the downside? 1) Even with an increased GPU WU cache the turn around time would be FAR faster. 2) The larger cache would result in more WUs being run via smoothing out workflow caused by the many outages. 3) More N-Body WUs would be run because all CPUs would be doing them. 4) Fewer people would dump the project due to frustration. I suspect the problem is that the admins don't know how to do this. After all they're scientists, not programmers. There was recently a good article (I believe in Nature) regarding this problem. I'm sure Slicker at Collatz would be willing to help. He's a wizard at such things. ID: 45885 · Rating: 0 · rate: / Reply Quote

mdhittle* Send message Joined: 25 Jun 10 Posts: 284 Credit: 260,490,091 RAC: 0	Message 45890 - Posted: 31 Jan 2011, 1:10:13 UTC - in response to Message 45885. Last modified: 31 Jan 2011, 1:13:25 UTC I'm sure Slicker at Collatz would be willing to help. He's a wizard at such things. And, Slicker has been using different cache sizes for CPU/GPU at Collatz for sometime now. And DNETC has even subdivided it down to what kind (Nvidia/ATI) and model (HD 5XXX and not HD 5XXX) for the type of workunit that is being sent. So, everything we are asking for, has already been accomplished at other projects. ID: 45890 · Rating: 0 · rate: / Reply Quote