Getting more than 18 work units at a time

Author	Message
Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 43508 - Posted: 5 Nov 2010, 15:04:28 UTC I doubt anyone would disagree that larger caches is a need for GPU crunching. All the main issues of moving to larger caches have been outlined, bar one. They cant it would be a nightmare. They can only set the cache as a function of the CPUs, no ability exists inside the BOINC server software to express it as a number connected with GPUs. That means if the number went to (say) 50 per cpu core, ending up as 100 cache for a dual cpu, the slower cpu machines will take forever and a day to clear that cache in some projects. In some projects a cache of 100 could, for a dual CPU run to months of crunching. If the Project insisted on validation of WUs via a second cruncher, it would grind to a halt with people yelling about delayed credit. Should they introduce a GPU value to set cache, for sure, but reality is at present BOINC do not provide a GPU related parameter. There is nothing the Project Staffs can do about that, its fixed in the BOINC server software, and needs BOINC/Berkley intervention. I wholey support a GPU related parameter, but yelling at Project Staffs is the wrong target, its BOINC/Berkley that need the pressure heaped on, Project Staffs are powerless on this one, which is a common issue across all BOINC Projects, its not a pure MW issue. Regards Zy ID: 43508 · Rating: 0 · rate: / Reply Quote

Arif Mert Kapicioglu Send message Joined: 14 Dec 09 Posts: 161 Credit: 589,318,064 RAC: 0	Message 43518 - Posted: 5 Nov 2010, 17:25:48 UTC Last modified: 5 Nov 2010, 17:26:56 UTC I'm also an old cruncher and support the idea of higher cache for gpu crunchers but considering the results, my idea just stays as an idea. Boinc intervention which is harder than extending database, although both of them require high amount of brainstorming and the results may cause unappreciated results (like a host getting huge amount of wus and return garbage). So i will not argue it, but i just want to state that it is beneficial for continuous gpu crunchers. So best of luck to Travis and Matthew:) Already the stabilization of server is one of the major work of this project, and if done so, there will be no need for higher cache for gpu crunchers and everyone will be truly happy:) It is fine by me to discuss it as this is a discussion part for performance. Regards. ID: 43518 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 43520 - Posted: 5 Nov 2010, 18:50:08 UTC - in response to Message 43508. Last modified: 5 Nov 2010, 18:52:35 UTC Not quite true. If you look at your computer details it says how many GPU's you have. So a maximum cache size could easily be set via that number. For example my computers have 3 and 1 GPUs respectively listed as coprocessors on the detail page and as GPU's on this page. BOINC can then ensure there aren't too many wu's downloaded for the deadline via it's standard practices. It's just that MW will not allow BOINC to fill the desired cache. I doubt anyone would disagree that larger caches is a need for GPU crunching. All the main issues of moving to larger caches have been outlined, bar one. They cant it would be a nightmare. They can only set the cache as a function of the CPUs, no ability exists inside the BOINC server software to express it as a number connected with GPUs. That means if the number went to (say) 50 per cpu core, ending up as 100 cache for a dual cpu, the slower cpu machines will take forever and a day to clear that cache in some projects. In some projects a cache of 100 could, for a dual CPU run to months of crunching. If the Project insisted on validation of WUs via a second cruncher, it would grind to a halt with people yelling about delayed credit. Should they introduce a GPU value to set cache, for sure, but reality is at present BOINC do not provide a GPU related parameter. There is nothing the Project Staffs can do about that, its fixed in the BOINC server software, and needs BOINC/Berkley intervention. I wholey support a GPU related parameter, but yelling at Project Staffs is the wrong target, its BOINC/Berkley that need the pressure heaped on, Project Staffs are powerless on this one, which is a common issue across all BOINC Projects, its not a pure MW issue. Regards Zy ID: 43520 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 43521 - Posted: 5 Nov 2010, 18:58:21 UTC - in response to Message 43520. Knowing the number of GPUs is one thing, and clearly the server certainly does know that. However, Project Staffs have no server Function they can use as a parameter to pass the number of GPUs in any instruction regarding the cache size as per the CPU method, to the schedular. In in passing the cache size they can only have the ability to set the cache against number of CPUs, there is no parameter inside that function that allows GPUs to be used as a baseline for the cache. Hence, cache cannot be set by Projects Staffs using GPU numbers as the baseline, they can only use the number of CPUs. Regards Zy ID: 43521 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 43522 - Posted: 5 Nov 2010, 19:52:45 UTC - in response to Message 43521. How do you know that. If they have access to the number of CPUs and limit the number of wu's per CPU then the same can be done for the number of GPU's. Knowing the number of GPUs is one thing, and clearly the server certainly does know that. However, Project Staffs have no server Function they can use as a parameter to pass the number of GPUs in any instruction regarding the cache size as per the CPU method, to the schedular. In in passing the cache size they can only have the ability to set the cache against number of CPUs, there is no parameter inside that function that allows GPUs to be used as a baseline for the cache. Hence, cache cannot be set by Projects Staffs using GPU numbers as the baseline, they can only use the number of CPUs. Regards Zy ID: 43522 · Rating: 0 · rate: / Reply Quote

Zydor Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0	Message 43526 - Posted: 5 Nov 2010, 21:17:34 UTC - in response to Message 43522. ....If they have access to the number of CPUs and limit the number of wu's per CPU then the same can be done for the number of GPU's. The Function in the server software that passes the parameter to the schedular was written in the days of CPU only applications, and it was written to pass the value of number of CPUs, not the number of GPUs. To pass the number of GPUs to the schedular, the function needs to be re-written, that has not been done, and can only be done by BOINC/Berkley. Project Staffs cannot do it. Any BOINC Server Core code amendments done by Projects Staffs are liable to be overwritten by the next BOINC Server update, they have to wait for BOINC to amend the server software. This is also wrapped up in the ongoing debate re Machine only or Application level cache. At present only a blanket cache per machine can be set, if there were cache levels set per application, then the whole cpu/gpu debate becomes mute. Which way they will finally go is not yet known. Await and see. Regards Zy ID: 43526 · Rating: 0 · rate: / Reply Quote

blox Send message Joined: 22 Apr 10 Posts: 14 Credit: 149,472,464 RAC: 0	Message 44713 - Posted: 5 Dec 2010, 21:19:27 UTC - in response to Message 43526. poss workaround.. this is my current idea http://www.overclock.net/blogs/blox/2046-bloxcache-boinc-caching-batch-file.html The main problem atm, is that the --exit_after_app_start switch doesn't seem to be hnored all the time.. and max value seems to be 90. Also any half processed wu are restarted at 0, so a bit of a waste. Any ideas? Cheers, Rick ID: 44713 · Rating: 0 · rate: / Reply Quote

Chaul Send message Joined: 29 Dec 09 Posts: 1 Credit: 6,495,477 RAC: 0	Message 44726 - Posted: 6 Dec 2010, 10:00:40 UTC The packet size is just too small. 15 minutes on a 5870 would sound about right, instead of one and a half with 6 WU limit. I believe this packet size would lower the load on the servers and give better results. My single Radeon 5870 goes through the 6 unit queue in about 10 minutes. Less, if I overclocked it. Oh, and BOINC should allow setting separate limits for GPU and CPU work queues. I'm only crunching GPU units, so the number of threads my CPU can process (8), shouldn't apply here. ID: 44726 · Rating: 0 · rate: / Reply Quote

blox Send message Joined: 22 Apr 10 Posts: 14 Credit: 149,472,464 RAC: 0	Message 44735 - Posted: 6 Dec 2010, 18:50:07 UTC - in response to Message 44726. Chaul - did you check out my idea? It's mostly working atm.. but it's not cycling through the instances properly. the idea is: Have multiple boinc data dirs and switch between them so each maintains a cache. My current problem is when I kill the instance of boinc it doesnt resume a WU where it left off.. it starts again.. so increased time per instance is beneficial.. but not too much or deadline is passed ID: 44735 · Rating: 0 · rate: / Reply Quote

blox Send message Joined: 22 Apr 10 Posts: 14 Credit: 149,472,464 RAC: 0	Message 44781 - Posted: 7 Dec 2010, 16:49:35 UTC - in response to Message 44735. I've written a guide.. and allowed non-members to view(I hope) http://www.overclock.net/blogs/blox/2050-bloxcache-boinc-caching-batch-file-initialisation.html http://www.overclock.net/blogs/blox/2051-bloxcache-boinc-caching-batch-file-running.html ID: 44781 · Rating: 0 · rate: / Reply Quote

arkayn Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0	Message 44806 - Posted: 7 Dec 2010, 22:51:42 UTC - in response to Message 44781. I've written a guide.. and allowed non-members to view(I hope) http://www.overclock.net/blogs/blox/2050-bloxcache-boinc-caching-batch-file-initialisation.html http://www.overclock.net/blogs/blox/2051-bloxcache-boinc-caching-batch-file-running.html Made links clickable. ID: 44806 · Rating: 0 · rate: / Reply Quote

blox Send message Joined: 22 Apr 10 Posts: 14 Credit: 149,472,464 RAC: 0	Message 44829 - Posted: 8 Dec 2010, 19:04:48 UTC - in response to Message 44806. In the last 3 days I have done about double my usual AC :D ID: 44829 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 44832 - Posted: 8 Dec 2010, 19:55:29 UTC I assume with an i7 quad core with Hyper-Threading and 8 processing threads I would get 48 work units at one time. ID: 44832 · Rating: 0 · rate: / Reply Quote

blox Send message Joined: 22 Apr 10 Posts: 14 Credit: 149,472,464 RAC: 0	Message 44849 - Posted: 9 Dec 2010, 17:12:28 UTC - in response to Message 44832. Yes, but you can use multiple data dirs one after the other ID: 44849 · Rating: 0 · rate: / Reply Quote

mdhittle* Send message Joined: 25 Jun 10 Posts: 284 Credit: 260,490,091 RAC: 0	Message 44850 - Posted: 9 Dec 2010, 17:49:02 UTC - in response to Message 44849. Last modified: 9 Dec 2010, 17:50:00 UTC Yes, but you can use multiple data dirs one after the other This may work, I haven't tried it. One problem with it is, it makes it look like you have more systems running MW than you actually do. Each data dir is assigned a new and unique CPID. Right now, you appear to have 87 systems running Milkyway that have identical hardware, except one. I didn't look at all 86 instances of your i7 980x computer, but it appears that you are averaging 72 workunits per instance. That means you have around 6192 workunits that you are (or will be) returning as "invalid" workunits. http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=242131&offset=0&show_names=0&state=4 You are wasting workunits in large quantities. -Mike ID: 44850 · Rating: 0 · rate: / Reply Quote

blox Send message Joined: 22 Apr 10 Posts: 14 Credit: 149,472,464 RAC: 0	Message 44928 - Posted: 11 Dec 2010, 17:47:42 UTC - in response to Message 44850. It mostly works but with some problems. I had a look at that link.. and that was the only host that had any invalid WU. Today there are 2 invalid WU, both from the same host, and none for any other host. Perhaps it is crashing, I'm using 6.12. The biggest problem I am having atm is most of my saved WU abort with message "aborted by project", which seems to mean that the WU was successful on 2 hosts.. and if I started the WU it would be wasted. So too big a cache=no point in caching as the WU has already been done, which leads to usually just 1 task being freshly downloaded and my other 3 GPU idling for a few minutes. Perhaps there is some way to report a larger no of CPU to Boinc... ID: 44928 · Rating: 0 · rate: / Reply Quote