Welcome to MilkyWay@home

Getting more than 18 work units at a time

Message boards : Number crunching : Getting more than 18 work units at a time
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 43508 - Posted: 5 Nov 2010, 15:04:28 UTC

I doubt anyone would disagree that larger caches is a need for GPU crunching. All the main issues of moving to larger caches have been outlined, bar one.

They cant it would be a nightmare. They can only set the cache as a function of the CPUs, no ability exists inside the BOINC server software to express it as a number connected with GPUs. That means if the number went to (say) 50 per cpu core, ending up as 100 cache for a dual cpu, the slower cpu machines will take forever and a day to clear that cache in some projects. In some projects a cache of 100 could, for a dual CPU run to months of crunching. If the Project insisted on validation of WUs via a second cruncher, it would grind to a halt with people yelling about delayed credit.

Should they introduce a GPU value to set cache, for sure, but reality is at present BOINC do not provide a GPU related parameter. There is nothing the Project Staffs can do about that, its fixed in the BOINC server software, and needs BOINC/Berkley intervention. I wholey support a GPU related parameter, but yelling at Project Staffs is the wrong target, its BOINC/Berkley that need the pressure heaped on, Project Staffs are powerless on this one, which is a common issue across all BOINC Projects, its not a pure MW issue.

Regards
Zy
ID: 43508 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Arif Mert Kapicioglu

Send message
Joined: 14 Dec 09
Posts: 161
Credit: 589,318,064
RAC: 0
Message 43518 - Posted: 5 Nov 2010, 17:25:48 UTC
Last modified: 5 Nov 2010, 17:26:56 UTC

I'm also an old cruncher and support the idea of higher cache for gpu crunchers but considering the results, my idea just stays as an idea. Boinc intervention which is harder than extending database, although both of them require high amount of brainstorming and the results may cause unappreciated results (like a host getting huge amount of wus and return garbage).

So i will not argue it, but i just want to state that it is beneficial for continuous gpu crunchers. So best of luck to Travis and Matthew:)

Already the stabilization of server is one of the major work of this project, and if done so, there will be no need for higher cache for gpu crunchers and everyone will be truly happy:)

It is fine by me to discuss it as this is a discussion part for performance.

Regards.
ID: 43518 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 43520 - Posted: 5 Nov 2010, 18:50:08 UTC - in response to Message 43508.  
Last modified: 5 Nov 2010, 18:52:35 UTC

Not quite true. If you look at your computer details it says how many GPU's you have. So a maximum cache size could easily be set via that number. For example my computers have 3 and 1 GPUs respectively listed as coprocessors on the detail page and as GPU's on this page. BOINC can then ensure there aren't too many wu's downloaded for the deadline via it's standard practices. It's just that MW will not allow BOINC to fill the desired cache.

I doubt anyone would disagree that larger caches is a need for GPU crunching. All the main issues of moving to larger caches have been outlined, bar one.

They cant it would be a nightmare. They can only set the cache as a function of the CPUs, no ability exists inside the BOINC server software to express it as a number connected with GPUs. That means if the number went to (say) 50 per cpu core, ending up as 100 cache for a dual cpu, the slower cpu machines will take forever and a day to clear that cache in some projects. In some projects a cache of 100 could, for a dual CPU run to months of crunching. If the Project insisted on validation of WUs via a second cruncher, it would grind to a halt with people yelling about delayed credit.

Should they introduce a GPU value to set cache, for sure, but reality is at present BOINC do not provide a GPU related parameter. There is nothing the Project Staffs can do about that, its fixed in the BOINC server software, and needs BOINC/Berkley intervention. I wholey support a GPU related parameter, but yelling at Project Staffs is the wrong target, its BOINC/Berkley that need the pressure heaped on, Project Staffs are powerless on this one, which is a common issue across all BOINC Projects, its not a pure MW issue.

Regards
Zy
ID: 43520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 43521 - Posted: 5 Nov 2010, 18:58:21 UTC - in response to Message 43520.  

Knowing the number of GPUs is one thing, and clearly the server certainly does know that. However, Project Staffs have no server Function they can use as a parameter to pass the number of GPUs in any instruction regarding the cache size as per the CPU method, to the schedular. In in passing the cache size they can only have the ability to set the cache against number of CPUs, there is no parameter inside that function that allows GPUs to be used as a baseline for the cache.

Hence, cache cannot be set by Projects Staffs using GPU numbers as the baseline, they can only use the number of CPUs.

Regards
Zy

ID: 43521 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile The Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
Message 43522 - Posted: 5 Nov 2010, 19:52:45 UTC - in response to Message 43521.  

How do you know that. If they have access to the number of CPUs and limit the number of wu's per CPU then the same can be done for the number of GPU's.

Knowing the number of GPUs is one thing, and clearly the server certainly does know that. However, Project Staffs have no server Function they can use as a parameter to pass the number of GPUs in any instruction regarding the cache size as per the CPU method, to the schedular. In in passing the cache size they can only have the ability to set the cache against number of CPUs, there is no parameter inside that function that allows GPUs to be used as a baseline for the cache.

Hence, cache cannot be set by Projects Staffs using GPU numbers as the baseline, they can only use the number of CPUs.

Regards
Zy


ID: 43522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 43526 - Posted: 5 Nov 2010, 21:17:34 UTC - in response to Message 43522.  

....If they have access to the number of CPUs and limit the number of wu's per CPU then the same can be done for the number of GPU's.


The Function in the server software that passes the parameter to the schedular was written in the days of CPU only applications, and it was written to pass the value of number of CPUs, not the number of GPUs. To pass the number of GPUs to the schedular, the function needs to be re-written, that has not been done, and can only be done by BOINC/Berkley. Project Staffs cannot do it. Any BOINC Server Core code amendments done by Projects Staffs are liable to be overwritten by the next BOINC Server update, they have to wait for BOINC to amend the server software.

This is also wrapped up in the ongoing debate re Machine only or Application level cache. At present only a blanket cache per machine can be set, if there were cache levels set per application, then the whole cpu/gpu debate becomes mute. Which way they will finally go is not yet known. Await and see.

Regards
Zy
ID: 43526 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
blox

Send message
Joined: 22 Apr 10
Posts: 14
Credit: 149,472,464
RAC: 0
Message 44713 - Posted: 5 Dec 2010, 21:19:27 UTC - in response to Message 43526.  

poss workaround..

this is my current idea http://www.overclock.net/blogs/blox/2046-bloxcache-boinc-caching-batch-file.html

The main problem atm, is that the --exit_after_app_start switch doesn't seem to be hnored all the time.. and max value seems to be 90.

Also any half processed wu are restarted at 0, so a bit of a waste.

Any ideas?

Cheers,
Rick
ID: 44713 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chaul

Send message
Joined: 29 Dec 09
Posts: 1
Credit: 6,495,477
RAC: 0
Message 44726 - Posted: 6 Dec 2010, 10:00:40 UTC

The packet size is just too small. 15 minutes on a 5870 would sound about right, instead of one and a half with 6 WU limit. I believe this packet size would lower the load on the servers and give better results. My single Radeon 5870 goes through the 6 unit queue in about 10 minutes. Less, if I overclocked it.

Oh, and BOINC should allow setting separate limits for GPU and CPU work queues. I'm only crunching GPU units, so the number of threads my CPU can process (8), shouldn't apply here.
ID: 44726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
blox

Send message
Joined: 22 Apr 10
Posts: 14
Credit: 149,472,464
RAC: 0
Message 44735 - Posted: 6 Dec 2010, 18:50:07 UTC - in response to Message 44726.  

Chaul - did you check out my idea?

It's mostly working atm.. but it's not cycling through the instances properly.

the idea is: Have multiple boinc data dirs and switch between them so each maintains a cache.

My current problem is when I kill the instance of boinc it doesnt resume a WU where it left off.. it starts again.. so increased time per instance is beneficial.. but not too much or deadline is passed
ID: 44735 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
blox

Send message
Joined: 22 Apr 10
Posts: 14
Credit: 149,472,464
RAC: 0
Message 44781 - Posted: 7 Dec 2010, 16:49:35 UTC - in response to Message 44735.  

I've written a guide.. and allowed non-members to view(I hope)

http://www.overclock.net/blogs/blox/2050-bloxcache-boinc-caching-batch-file-initialisation.html

http://www.overclock.net/blogs/blox/2051-bloxcache-boinc-caching-batch-file-running.html

ID: 44781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 44806 - Posted: 7 Dec 2010, 22:51:42 UTC - in response to Message 44781.  

ID: 44806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
blox

Send message
Joined: 22 Apr 10
Posts: 14
Credit: 149,472,464
RAC: 0
Message 44829 - Posted: 8 Dec 2010, 19:04:48 UTC - in response to Message 44806.  

In the last 3 days I have done about double my usual AC :D
ID: 44829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 44832 - Posted: 8 Dec 2010, 19:55:29 UTC

I assume with an i7 quad core with Hyper-Threading and 8 processing threads I would get 48 work units at one time.


ID: 44832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
blox

Send message
Joined: 22 Apr 10
Posts: 14
Credit: 149,472,464
RAC: 0
Message 44849 - Posted: 9 Dec 2010, 17:12:28 UTC - in response to Message 44832.  

Yes, but you can use multiple data dirs one after the other
ID: 44849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mdhittle*
Avatar

Send message
Joined: 25 Jun 10
Posts: 284
Credit: 260,490,091
RAC: 0
Message 44850 - Posted: 9 Dec 2010, 17:49:02 UTC - in response to Message 44849.  
Last modified: 9 Dec 2010, 17:50:00 UTC

Yes, but you can use multiple data dirs one after the other


This may work, I haven't tried it. One problem with it is, it makes it look like you have more systems running MW than you actually do. Each data dir is assigned a new and unique CPID. Right now, you appear to have 87 systems running Milkyway that have identical hardware, except one.

I didn't look at all 86 instances of your i7 980x computer, but it appears that you are averaging 72 workunits per instance. That means you have around 6192 workunits that you are (or will be) returning as "invalid" workunits.

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=242131&offset=0&show_names=0&state=4

You are wasting workunits in large quantities.

-Mike
ID: 44850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
blox

Send message
Joined: 22 Apr 10
Posts: 14
Credit: 149,472,464
RAC: 0
Message 44928 - Posted: 11 Dec 2010, 17:47:42 UTC - in response to Message 44850.  

It mostly works but with some problems.

I had a look at that link.. and that was the only host that had any invalid WU.

Today there are 2 invalid WU, both from the same host, and none for any other host. Perhaps it is crashing, I'm using 6.12.

The biggest problem I am having atm is most of my saved WU abort with message "aborted by project", which seems to mean that the WU was successful on 2 hosts.. and if I started the WU it would be wasted.

So too big a cache=no point in caching as the WU has already been done, which leads to usually just 1 task being freshly downloaded and my other 3 GPU idling for a few minutes.

Perhaps there is some way to report a larger no of CPU to Boinc...

ID: 44928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Getting more than 18 work units at a time

©2024 Astroinformatics Group