Message boards :
Number crunching :
tasks being sent to wrong gpu card
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,462,900,712 RAC: 26,544 ![]() ![]() ![]() |
Is there any way to direct milkyway tasks to my gtx460 and avoid the 9800gtx card which is single precision? I had 3 tasks fail within seconds because they were assigned the wrong card. Here is one of them. The Computer ID shows a pair of gtx460 but that is not true, there is only one and the other is 9800gtx. This combination works fine for primegrid and collatz but not milkyway. I recently added some single slot gtx460s to my systems and now have double precision capability. [EDIT] possible the gtx460 is not fully double double precision. I just read where it is either 1/6 or 1/12 as capable as other cards haveing DP. |
![]() Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0 ![]() ![]() |
Any nVidia except the current Fermi based Teslas (and maybe Quadros) are as "not fully double precision" as your GTX460. The chip (and all other mainstream Fermis) can run dp at 1/12th the sp speed. Fermi could be as fast as 1/2, but consumer cards with GF100 and GF110 (GTX465, GTX470, GTX480, GTX570, GTX580, GTX590) are artificially limited to 1/8 the sp performance. ATIs Cayman (HD6950, HD6970, HD6990) is at 1/2 sp performance for some fp operations and at 1/4 for others, whereas previous ATIs achieved 2/5 and 1/5. Coupled with the higher sp performance of the ATIs that's the reason why they're so much faster and more efficient at MW. MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 29 Oct 10 Posts: 89 Credit: 39,246,947 RAC: 0 ![]() ![]() |
I had the same problem a couple of months back and posted it here but never got it resolved beyond moving GPUs among my machines so that none had a mixed single precision-douuble precision combo. Regards, Steve |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,462,900,712 RAC: 26,544 ![]() ![]() ![]() |
I want to correct the computer id I listed above. It should have been http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=258308 I did get a response over at the boinc forum where Jord suggested running two copies of BOINC. What he suggested will work but it is a PITA to setup. |
![]() Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0 ![]() ![]() |
Sounds like BOINC should get to know the difference between sp and dp capable cards, in addition to the CUDA capability level (and something similar for ATIs). MrS Scanning for our furry friends since Jan 2002 |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,462,900,712 RAC: 26,544 ![]() ![]() ![]() |
I still have this problem but now it is a gtx570 and a gts250. The 570 is a full length 3x width and the gts250 a 2/3 length 2x width that just barely fits. I do not see any way to route MW tasks to avoid the 250 and will stop MW for this system. |
![]() ![]() Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 ![]() ![]() |
I still have this problem but now it is a gtx570 and a gts250. The 570 is a full length 3x width and the gts250 a 2/3 length 2x width that just barely fits. I do not see any way to route MW tasks to avoid the 250 and will stop MW for this system. There is one way to do it now, but you would have to install the 7.xx alpha version to do it. It has an option for the app_info to ignore a gpu for a project. ![]() |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 ![]() ![]() |
http://boinc.berkeley.edu/wiki/Client_configuration :
and From the code of mw sep v1.0x there is a command line param --device [Device number passed by BOINC to use] No idea if and how that works. Anyone with 2 gpus willing to test it? |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 ![]() ![]() |
The GTX460 definitely does. The fraction just means that is how fast it is compared to single. The multi GPU support in BOINC can only be described as completely broken, which is why I think you need to create the cc_config.xml with the You can either use app_info with the --device argument (looks like it should be |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,462,900,712 RAC: 26,544 ![]() ![]() ![]() |
http://boinc.berkeley.edu/wiki/Client_configuration : I put the following together and am running it under 7.0.18 <cc_config> <log_flags> </log_flags> <options> <use_all_gpus>1</use_all_gpus> <exclude_gpu> <url>http://milkyway.cs.rpi.edu/milkyway/</url> <device_num>1</device_num> </exclude_gpu> <exclude_gpu> <url>http://milkyway.cs.rpi.edu/milkyway/</url> <device_num>2</device_num> </exclude_gpu> </options> </cc_config>
13 2012-03-03 8:28:21 AM NVIDIA GPU 0: GeForce GTX 460 (driver version 285.62, CUDA version 4.10, compute capability 2.1, 1024MB, 933MB available, 907 GFLOPS peak) 14 2012-03-03 8:28:21 AM NVIDIA GPU 1: GeForce GTS 250 (driver version 285.62, CUDA version 4.10, compute capability 1.1, 1024MB, 970MB available, 705 GFLOPS peak) 15 2012-03-03 8:28:21 AM NVIDIA GPU 2: GeForce GTS 250 (driver version 285.62, CUDA version 4.10, compute capability 1.1, 1024MB, 937MB available, 705 GFLOPS peak) <snip> 94 2012-03-03 8:39:29 AM Re-reading cc_config.xml 95 2012-03-03 8:39:29 AM Config: use all coprocessors 96 Milkyway@Home 2012-03-03 8:39:29 AM Config: excluded GPU. Type: all. App: all. Device: 1 97 Milkyway@Home 2012-03-03 8:39:29 AM Config: excluded GPU. Type: all. App: all. Device: 2
|
![]() Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 ![]() ![]() ![]() |
i'm a bit late to the party, but i can also confirm that the <exclude_gpu> function works. here are my cc_config and my event log: <cc_config>
as you can see, i'm using the <exclude_gpu> function to exclude the HD 5870 from Milkyways@Home so it can focus on SETI@Home and running the display. at the same time, i'm using the function to exclude the HD 6950 from SETI@Home and display duties so it can focus solely on Milkyway@Home. i run 2 MW@H tasks simultaneously on the 6950, and either 2 E@H Astropulse tasks or 2 E@H Multibeam tasks simultaneously on the 5870. sometimes a single Astropulse task will run along side a single Multibeam task. ...i'm not really sure why it doesn't show ATI GPU 1 as OpenCL capable as well, even though it clearly states above that my HD 5870 also makes use of OpenCL. not a big deal though...i've already run some OpenCL-based v1.02 tasks on it successfully. i should note that i was running BOINC v6.12.41 and was pleasantly informed by the manager under the messages tab that it didn't recognize a command in the cc_config file. going back to where i originally referenced it, i found out that only BOINC v6.13.xx and later recognized the <exclude_gpu> command. i actually skipped v6.13.xx and updated straight to v7.0.12 since it recognizes OpenCL-capable devices. my major gripe with it so far is the altered functionality of the project caches. you see, whereas with v6.12.41 i could maintain a decent-sized, but not overwhelming cache. with v7.0.12 it draining the caches of all of my projects until they're nonexistent, and only then does my host bother to do communication w/ the project server to get more work. if i manually update a project while any amount of work is left in its cache, it'll just say "not reporting or requesting work" in the event log. is anyone else experiencing these kinds of shenanigans w/ BOINC v7.0.12? perhaps a different v7.0.xx might solve it? or might i have to go back to v6.13.xx to eliminate this behavior? and if so, which of the v6.13.xx versions are stable while crunching the new Separation v1.02 tasks? TIA, Eric ![]() |
![]() Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 ![]() ![]() |
With development BOINC 7.0.xx versions, Connect about every x.xx days has now effectively become Minimum work buffer. In fact in the later 7.0.xx versions it has been renamed. If you leave it at 0 days which was previously recommended for an always on connection it will not download any new tasks until your cache is empty. The value needs to be set above 0 if you want to download work before the cache is empty. This new method of controlling work download may cause tasks to run in high priority mode on projects that have a short deadline. The higher the value you use for Connect about every x.xx days/Minimum work buffer, the more likely it is that tasks may go into high priority mode. It depends on the deadline, for example with projects with a short deadline of 2 days, tasks may run high priority with a value above 0.7-0.8 days. This can cause trouble if they run out of order while leaving other tasks "waiting to run". In later BOINC versions (after 7.0.14, I think) high priority tasks run in "earliest deadline first" order which overcomes this problem on most projects. WCG can still upset the applecart though if a computer becomes a trusted host and gets sent repair tasks with a shorter deadline than normal tasks. These repair tasks may start to run in high priority mode as soon as they are downloaded. |
![]() Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 ![]() ![]() ![]() |
With development BOINC 7.0.xx versions, Connect about every x.xx days has now effectively become Minimum work buffer. In fact in the later 7.0.xx versions it has been renamed. If you leave it at 0 days which was previously recommended for an always on connection it will not download any new tasks until your cache is empty. The value needs to be set above 0 if you want to download work before the cache is empty. thanks for the details. i remember reading something about that in another thread not too long ago, but i couldn't remember exactly what was said or which thread it was in. so again, thanks for making it clear here. i will give BOINC v7.0.15 (or later) a try this evening and treat the "Connect about every x.xx days" parameter as the minimum work buffer instead. ![]() |
![]() Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 ![]() ![]() ![]() |
well i made the switch to BOINC v7.0.24 last night...mind you, the only reason i did it was to get BOINC to recognize the <exclude_gpu> function in order to force GPU 0 to run Milkyway@Home only, and GPU 1 to run SETI@Home only...not b/c the update was necessary in order to run the new OpenCL-based Separation v1.02 tasks. v7.0.24 has correctly renamed the parameter field names of interest from "connect about every x.xx days" & "additional work buffer x.xx days (max. 10)" to "minimum work buffer x.xx days" & "max. additional work buffer x.xx days." but even with the minimum work buffer and the max additional work buffer set to 5 days & 2 days repectively, i'm still experiencing the same behavior as before - both the Milkyway and SETI task queues are being drained completely before either project server allows more work to be downloaded. i should note that i believe this is being enforced server-side, b/c if i manually update either project before the queues have run down to zero, my host will report whatever tasks have completed, but won't "request any new tasks." is anyone using BOINC v7.0.15 or later able to build up a queue and maintain it, or are we all experiencing this behavior with our hosts? ![]() |
![]() ![]() Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 ![]() ![]() |
well i made the switch to BOINC v7.0.24 last night...mind you, the only reason i did it was to get BOINC to recognize the <exclude_gpu> function in order to force GPU 0 to run Milkyway@Home only, and GPU 1 to run SETI@Home only...not b/c the update was necessary in order to run the new OpenCL-based Separation v1.02 tasks. v7.0.24 has correctly renamed the parameter field names of interest from "connect about every x.xx days" & "additional work buffer x.xx days (max. 10)" to "minimum work buffer x.xx days" & "max. additional work buffer x.xx days." but even with the minimum work buffer and the max additional work buffer set to 5 days & 2 days repectively, i'm still experiencing the same behavior as before - both the Milkyway and SETI task queues are being drained completely before either project server allows more work to be downloaded. i should note that i believe this is being enforced server-side, b/c if i manually update either project before the queues have run down to zero, my host will report whatever tasks have completed, but won't "request any new tasks." I have a "full" queue from SETI with 7.0.18, I have my preferences set globally at Maintain enough tasks to keep busy for at least(max 10 days). 5 days ... and up to an additional 0.5 days While my other computer running 6.12.x is at Maintain enough tasks to keep busy for at least(max 10 days). 0 days ... and up to an additional 5 days ![]() |
![]() Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 ![]() ![]() ![]() |
I have a "full" queue from SETI with 7.0.18, I have my preferences set globally at well those are similar to my settings (i had the minimum work buffer set to 5 days, and the additional buffer set to 2)...though i had only made those changes host-side via the BOINC manager's settings. i've since implemented those settings server-side via my web preferences as well...although i was always under the impression that the local host settings override the web preference settings, and therefore make it unnecessary to set any of the web preferences that are also made available for edit through the BOINC manager itself. regardless, i've implemented the settings in both places just to be sure, and unfortunately i'm still seeing the same behavior as before - my SETI and Milkyway caches are draining completely before refilling (as opposed to downloading fewer tasks at a time, more often, and maintaining X number of tasks in the queue at all times). any other ideas why the work buffers aren't working like they should? its like i'm stuck in a bad dream...either i use BOINC v7.x.xx to gain access to the <exclude_gpu> function and sacrifice normal scheduling and queue characteristics, or i go back to BOINC v6.12.xx to get normal scheduling and queues and sacrifice the ability to use the <exclude_gpu> function (which i need in order to run both Milkyway@Home and SETI@Home GPU apps at the same time on the same machine). if only BOINC v6.13.xx were a happy medium between BOINC v6.12.xx and v7.x.xx, but its not - v6.13.xx exhibits the same buffer/queue problems for me that v7.x.xx does. *UPDATE* - just finished a run of Multibeam tasks, at which point a single AP task was downloaded. the same thing happened the last time i got an AP task. so not only can i not maintain a queue of S@H tasks, but i'm getting no more than 1 AP task at a time...which really poses a problem for my host should the server go down...i could be out of work (both AP and BM) for ridiculous amounts of time... ![]() |
![]() Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 ![]() ![]() |
Check the "Remaining (estimated)" time for tasks is correct. I remember when POEM changed to fixed credit tasks, the DCF was tiny so the Remaining time went very high. I had to use Minimum work buffer of 10 days or change the <flops> value in my app_info.xml file in order increase the number of tasks in my cache. The DCF would adjust over a few days but if a resend task with the much smaller estimated computation size was downloaded and processed then the DCF would go way down again. Newer versions of BOINC report double the GFLOPS value for ATI cards. The older calculation was based on code by Crunch3r and related to double precision. This may affect GPU task work fetch. You are doing 2 different types of work unit on SETI, perhaps this causes fluctuations in the DCF value. Fluctuating DCF could cause work fetch problems for both GPU projects. Although different cards are assigned to different projects, BOINC work fetch algorithms for the 2 GPUs are possibly combined, so the amount of work in the cache on one project affects when and how many tasks are downloaded for the other project. You could try suspending one GPU project and see if the other project then downloads work. While you're at it you could suspend any CPU projects too. It shouldn't make a difference but development BOINC versions may use the ncpus value in strange ways. You could also try setting your CPU projects to No new tasks and then increasing Minimum work buffer to the maximum 10 days to see if that forces GPU work download before cache is dry. You've been checking Event Log I suppose, to ensure you haven't had error tasks causing your daily quota to be reduced. Maybe you could get greater detail in your Event Log by using a cc_config.xml with the extra logging commands related to work fetch. |
![]() ![]() Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 ![]() ![]() ![]() |
I have a "full" queue from SETI with 7.0.18, I have my preferences set globally at The new Boinc version 7 is NOT clear on the words they are using in the cache area, the early versions anyway. If you are coming directly from version 6 to version 7 you should reverse the numbers you had in version 6. The newer version 7 releases are clearer about what each set of numbers does. But basically the 1st set of numbers now says "minimum work buffer", while the 2nd set says "max additional work buffer", this is in version 7.0.23. Essentially in version 6 you reverse those two with the smaller being in the 1st section and the bigger being in the 2nd part. BUT in version 7 that says to maintain only a VERY small minimum work buffer, which is not what most of us want. I just reversed the numbers and my cache has not changed from the one version to the next. |
![]() Send message Joined: 25 Jan 11 Posts: 271 Credit: 346,072,284 RAC: 0 ![]() ![]() ![]() |
The new Boinc version 7 is NOT clear on the words they are using in the cache area, the early versions anyway. If you are coming directly from version 6 to version 7 you should reverse the numbers you had in version 6. The newer version 7 releases are clearer about what each set of numbers does. But basically the 1st set of numbers now says "minimum work buffer", while the 2nd set says "max additional work buffer", this is in version 7.0.23. Essentially in version 6 you reverse those two with the smaller being in the 1st section and the bigger being in the 2nd part. BUT in version 7 that says to maintain only a VERY small minimum work buffer, which is not what most of us want. I just reversed the numbers and my cache has not changed from the one version to the next. well before i was running BOINC v6.12.41, where the "connect about every x.xx days" field was set to 0 and the "additional work buffer" field was set to 5 days. so now that i'm running BOINC v7.0.24, i have the "minimum work buffer" field set to 5 and the "additional work buffer" set to 0...in other words, the numbers are now reversed as you suggested. i made this change a good 6 hours ago, and have noticed no positive changes. my host has contacted the server several times since then to report finished tasks, but has not requested any new work, so nothing has changed...then again, the small queue of SETI tasks i had since this morning is only now winding down to zero - the last SETI WU is being crunched right now. perhaps the cache behavior will change after the next request for work (which won't happen until this last task finishes and reports). *EDIT* - the last SETI WU just completed, uploaded, and reported, but i did not receive a new cache of WU's b/c the server is down for maintenance...so i guess i'll have to wait a while before i see any new tasks... i may have to further experiment with Kashi's suggestions and suspend one of my two GPU projects (and possibly even my CPU projects) to see how it affects the cache behavior of the one project left running. ![]() |
![]() ![]() Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 ![]() ![]() ![]() |
The new Boinc version 7 is NOT clear on the words they are using in the cache area, the early versions anyway. If you are coming directly from version 6 to version 7 you should reverse the numbers you had in version 6. The newer version 7 releases are clearer about what each set of numbers does. But basically the 1st set of numbers now says "minimum work buffer", while the 2nd set says "max additional work buffer", this is in version 7.0.23. Essentially in version 6 you reverse those two with the smaller being in the 1st section and the bigger being in the 2nd part. BUT in version 7 that says to maintain only a VERY small minimum work buffer, which is not what most of us want. I just reversed the numbers and my cache has not changed from the one version to the next. I hate that when a project goes down while you are testing!!! |
©2025 Astroinformatics Group